Engineering articles from Apple
AI summaries and key learnings from Apple engineering teams.
GenCtrl -- A Formal Controllability Toolkit for Generative Models
The article introduces GenCtrl, a formal controllability toolkit designed for generative models, addressing the critical need for fine-grained control in generative processes. It establishes a...
Flow Matching with Semidiscrete Couplings
The article presents a novel approach to flow matching using semidiscrete couplings, addressing limitations in traditional optimal transport methods. It highlights the inefficiencies of the OT flow...
Multi-Frequency Fusion for Robust Video Face Forgery Detection
The article presents a novel approach to video face forgery detection through a method termed Multi-Frequency Fusion. This technique utilizes a lightweight fusion of two handcrafted cues,...
On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment
This paper addresses the critical issue of AI alignment in the context of large language models (LLMs), emphasizing the computational intractability of filtering mechanisms designed to prevent the...
EMBridge: Enhancing Gesture Generalization from EMG Signals through Cross-Modal Representation Learning
The article presents EMBridge, a novel framework designed to enhance gesture generalization from electromyography (EMG) signals by leveraging cross-modal representation learning. By aligning EMG data...
Learning to Reason for Hallucination Span Detection
The paper presents a novel approach to hallucination span detection in large language models (LLMs) by incorporating explicit reasoning into the detection process. Traditional methods often treat...
The Way We Notice, That's What Really Matters: Instantiating UI Components with Distinguishing Variations
The article presents a novel approach to instantiating UI components by introducing the concept of distinguishing variations, which are both mimetic and distinct. It highlights the challenges...
Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments
The article presents a study on enhancing search relevance in app store rankings by integrating LLM-generated judgments. It identifies the challenge of limited expert-provided textual relevance...
A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning
The A.R.I.S. (Automated Recycling Identification System) is a novel approach to e-waste classification that leverages deep learning techniques to enhance material recovery from electronic waste. By...
Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates
The article presents 'Constructive Circuit Amplification,' a method designed to improve mathematical reasoning in large language models (LLMs) by making targeted updates to specific sub-networks,...
AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
The AMUSE framework introduces a novel benchmark for evaluating multi-speaker understanding in audio-visual contexts, addressing the limitations of current multimodal large language models (MLLMs)...
Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
The paper explores the limitations of using a single extractor for HTML-to-text conversion in the context of training large language models (LLMs). It highlights that relying on a fixed extractor can...
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
The article introduces depyf, a tool designed to demystify the PyTorch compiler, which operates at the Python bytecode level. This tool allows machine learning researchers to decompile bytecode...
The Potential of CoT for Reasoning: A Closer Look at Trace Dynamics
The article explores the potential of chain-of-thought (CoT) prompting as a technique for eliciting reasoning-like responses from large language models (LLMs). It presents an in-depth analysis of CoT...
Closing the Gap Between Text and Speech Understanding in LLMs
The article presents an analysis of the performance gap between text-based and speech-adapted large language models (LLMs) in understanding language. It identifies two primary factors contributing to...
Learning to Evict from Key-Value Cache
The article presents a novel approach to Key-Value (KV) cache management in Large Language Models (LLMs) by framing eviction as a reinforcement learning (RL) problem. The authors introduce the KV...
Apple Workshop on Reasoning and Planning 2025
The article details Apple's Workshop on Reasoning and Planning 2025, which focused on advancing AI systems' reasoning capabilities. It highlights the importance of reasoning and planning in enabling...
Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment
The article discusses a novel approach to Query Auto-Completion (QAC) that integrates Retrieval-Augmented Generation (RAG) with multi-objective Direct Preference Optimization (DPO). This unified...
Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
The article presents Ferret-UI Lite, a compact GUI agent designed for on-device operation across various platforms, including mobile, web, and desktop. It highlights the challenges of developing...
Models That Prove Their Own Correctness
The paper introduces Self-Proving models, which are designed to guarantee the correctness of their outputs for specific inputs through a verification algorithm. By employing Interactive Proofs, these...