Salesforce
6 min read

How Data 360 Vector Search Delivers Near Real-Time Intelligence on 90% of Enterprise Data

Read Full Article

Summary

The article explores the implementation of vector search capabilities within Salesforce's Data 360, focusing on the transformation of unstructured data into actionable intelligence. It highlights the integration of GPU acceleration to enhance processing speeds for media files, achieving a significant reduction in transcription costs and improving throughput by optimizing Kafka parameters. The architecture supports diverse document types through a unified schema, enabling seamless integration across various industries. Additionally, AI tools like Claude and Cursor have been leveraged to enhance development efficiency, reducing the time required for UI development and test automation.

Key Learnings

  • 1The integration of GPU acceleration into the Spark environment drastically reduces processing time and costs for unstructured data.
  • 2Optimizing Kafka throughput through top-k parameters and retry mechanisms is crucial for maintaining near real-time data processing.
  • 3A unified document schema allows for flexible handling of diverse data types without complicating the codebase.
  • 4AI tools can significantly enhance development velocity by automating tasks such as UI design and test generation.
  • 5Proactive action on unstructured data is essential for organizations to leverage their data effectively, moving beyond passive storage.

Who Should Read This

Senior Data Engineers implementing vector search solutions for large-scale unstructured data processing

Test Your Knowledge

?

What are the trade-offs between using CPU vs GPU for processing unstructured data in terms of cost and performance?

?

How does the top-k parameter optimization impact the overall system's throughput and data relevance?

?

What design decisions were made to ensure the architecture could handle diverse document types without fragmentation?

?

In what scenarios might the implemented retry mechanisms fail, and how can those failures be mitigated?

?

Why is it critical for organizations to implement near real-time processing of unstructured data instead of relying on traditional query-based approaches?

Topics

Read Full Article at Salesforce

More from Salesforce Engineering

View Salesforce engineering blogs →
Salesforce
6m

Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals

The article details the development of a Technical Health Score system at Salesforce, aimed at quantifying platform trust through analytics pipelines that handle petabytes of telemetry data. By...

Salesforce
5m

How Data 360 Optimized Kubernetes Scheduling Architecture, Delivering 13% Cost Savings

The article discusses how the Data 360 Compute Fabric team at Salesforce optimized Kubernetes scheduling to enhance resource efficiency and reduce costs. By evolving the default kube-scheduler...

Salesforce
6m

Delivering Accurate, Low-Latency Voice-to-Form AI in Real-World Field Conditions

The article explores the development of a hybrid architecture for a voice-to-form AI system used in field service applications. It highlights the integration of on-device speech-to-text capabilities...

Salesforce
7m

Hyperforce Migration at Scale: How Deterministic Automation Replaced Manual Spreadsheets Across 95,000 Organizations

The article outlines the development of the Migration Intake and Processing Service (MIPS) at Salesforce, which automates the migration of over 95,000 organizations to Hyperforce. It highlights the...

Salesforce
5m

Building an AI-Accelerated Compliance Automation Platform for 24x Faster Audits

The article outlines the development of FastTrack, a compliance automation platform by Salesforce, which significantly reduces audit execution time through AI-assisted development and API-based...