Engineering posts about Fine-tuning
Curated summaries and key learnings for engineers working with Fine-tuning.
SpecMD: A Comprehensive Study on Speculative Expert Prefetching
The article presents SpecMD, a standardized framework designed for benchmarking caching strategies in Mixture-of-Experts (MoE) models. It highlights the importance of an expert caching mechanism to...
Adaptive Thinking: Large Language Models Know When to Think in Latent Space
The article presents research on adaptive thinking in large language models (LLMs), particularly focusing on how these models can optimize their reasoning processes during inference. It introduces...
A Practical Guide to LLM Fine Tuning
This article serves as a practical guide for ML engineers and AI practitioners focused on fine-tuning large language models (LLMs) for specific tasks. It outlines the entire lifecycle of LLM...
Can Large Language Models Understand Context?
The article explores the ability of Large Language Models (LLMs) to understand context, a critical aspect of natural language processing. It introduces a benchmark specifically designed to evaluate...
MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs
The article introduces new post-training capabilities in MaxText, specifically Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) optimized for single-host TPU configurations. It highlights...
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
The article discusses a novel approach to training data pruning aimed at improving the memorization of factual knowledge in large language models (LLMs). It formalizes the concept of fact...
LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss
The article presents LaCy, a new pretraining methodology for Small Language Models (SLMs) that addresses the limitations of knowledge representation due to parameter size. It emphasizes the...
The Hidden Cost of Complex AI Platforms: Why Developer Experience Matters
The article explores the often-overlooked costs associated with complex AI platforms, particularly emphasizing the developer experience. It highlights how fragmented workflows and unclear...
Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment
The article presents Personalized Group Relative Policy Optimization (P-GRPO), a framework designed to enhance the alignment of large language models (LLMs) with heterogeneous human preferences....
Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
This paper investigates the scaling properties of downstream metrics in the training of Large Language Models (LLMs). It challenges the traditional reliance on proxy metrics, proposing a direct...
Thinking into the Future: Latent Lookahead Training for Transformers
The article presents a novel training strategy called latent lookahead for autoregressive language models, aimed at enhancing their predictive capabilities. Traditional next-token prediction limits...
Building a Knowledge Assistant over Code
This article explores the development of a knowledge assistant for code retrieval, specifically addressing the challenges of chunking source code for effective retrieval-augmented generation (RAG)....
Optimal Splitting of Language Models from Mixtures to Specialized Domains
The article presents a novel method for optimizing the training of language models by splitting them into specialized domains. It highlights the two-stage training paradigm, where models are first...
Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning
The article presents 'Goldilocks RL', a novel approach to reinforcement learning that addresses the challenge of sparse rewards in training models for reasoning tasks. It introduces a teacher-driven...
AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval
The article presents AMES (Approximate Multimodal Enterprise Search), a unified architecture for late interaction retrieval that integrates text, image, and video modalities into a shared...
mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR
The article presents mAceReason-Math, a dataset designed to improve reinforcement learning with verifiable rewards (RLVR) by providing high-quality multilingual math problems. The dataset addresses...
GenCtrl -- A Formal Controllability Toolkit for Generative Models
The article introduces GenCtrl, a formal controllability toolkit designed for generative models, addressing the critical need for fine-grained control in generative processes. It establishes a...
Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments
The article presents a study on enhancing search relevance in app store rankings by integrating LLM-generated judgments. It identifies the challenge of limited expert-provided textual relevance...
Using LLMs to amplify human labeling and improve Dash search relevance
The article outlines how Dropbox Dash utilizes a retrieval-augmented generation (RAG) approach to enhance search relevance by integrating large language models (LLMs) with human labeling. It explains...
Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates
The article presents 'Constructive Circuit Amplification,' a method designed to improve mathematical reasoning in large language models (LLMs) by making targeted updates to specific sub-networks,...