Engineering VP Josh Clemm on how we use knowledge graphs, MCP, and DSPy in Dash
Read Full ArticleSummary
In this article, Josh Clemm discusses the technical architecture behind Dropbox Dash, focusing on the integration of knowledge graphs, retrieval methods, and the use of large language models (LLMs). The context engine is designed to aggregate and understand content from various third-party applications, enabling efficient search and retrieval. Clemm elaborates on the challenges of implementing multi-modal understanding and the trade-offs between indexed retrieval and federated retrieval. The article also highlights the role of DSPy in optimizing prompts for LLMs, aiming to enhance the relevance of retrieved information while managing token usage effectively.
Key Learnings
- 1The architecture of Dropbox Dash leverages knowledge graphs to model relationships between various data sources, enhancing contextual understanding.
- 2Indexed retrieval offers advantages in speed and access to company-wide connectors, but requires significant custom development and management of content freshness.
- 3DSPy serves as a prompt optimization tool that improves the performance of LLMs in judging relevance, showcasing emergent behaviors that enhance retrieval accuracy.
- 4The challenges of multi-modal content understanding necessitate advanced models capable of handling diverse data types, from text to images and videos.
- 5Trade-offs between federated and indexed retrieval highlight the importance of pre-processing and the implications for system architecture and performance.
Who Should Read This
Senior AI Engineers designing large-scale retrieval systems using knowledge graphs and LLMs
Test Your Knowledge
What are the key advantages and disadvantages of indexed retrieval compared to federated retrieval in the context of Dropbox Dash?
How does the integration of knowledge graphs improve the contextual understanding of information within Dash?
What specific challenges does multi-modal content understanding present, and how are they addressed in the architecture of Dash?
In what ways does DSPy enhance the performance of LLMs, and what are the implications for prompt management at scale?
How does the choice of using a lexical index like BM25 impact the retrieval process and overall system performance?
Topics
More articles about Large Language Models
Explore Large Language Models engineering →LogSentinel: How Databricks uses Databricks for LLM-Powered PII Detection and Governance
The article presents LogSentinel, a sophisticated LLM-powered data classification system developed by Databricks for the automatic detection and classification of sensitive data, particularly...
From reactive to proactive: closing the phishing gap with LLMs
The article explores the transition from reactive to proactive email security measures through the integration of Large Language Models (LLMs). It highlights the limitations of traditional email...
How Cloudy translates complex security into human action
The article outlines how Cloudy, an LLM-powered explanation layer integrated into Cloudflare's security products, translates complex machine learning outputs into understandable guidance for security...
On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment
This paper addresses the critical issue of AI alignment in the context of large language models (LLMs), emphasizing the computational intractability of filtering mechanisms designed to prevent the...
Learning to Reason for Hallucination Span Detection
The paper presents a novel approach to hallucination span detection in large language models (LLMs) by incorporating explicit reasoning into the detection process. Traditional methods often treat...
More from Dropbox Engineering
View Dropbox engineering blogs →Using LLMs to amplify human labeling and improve Dash search relevance
The article outlines how Dropbox Dash utilizes a retrieval-augmented generation (RAG) approach to enhance search relevance by integrating large language models (LLMs) with human labeling. It explains...
How low-bit inference enables efficient AI
The article discusses the advancements in large machine learning models and the challenges associated with their deployment, particularly focusing on low-bit inference techniques that enhance...
Insights from our executive roundtable on AI and engineering productivity
The article provides insights into Dropbox's approach to enhancing engineering productivity through the adoption of AI tools. It highlights the importance of aligning AI initiatives with business...
Inside the feature store powering real-time AI in Dropbox Dash
The article delves into the implementation of a feature store that powers the AI-driven Dropbox Dash, focusing on how it manages and delivers data signals for effective ranking and retrieval of...
Building the future: highlights from Dropbox’s 2025 summer intern class
The article highlights the contributions of Dropbox interns during the 2025 summer program, showcasing a variety of technical projects that leverage AI and enhance system performance. Interns worked...