Engineering posts about Fine-tuning

Curated summaries and key learnings for engineers working with Fine-tuning.

SpecMD: A Comprehensive Study on Speculative Expert Prefetching

The article presents SpecMD, a standardized framework designed for benchmarking caching strategies in Mixture-of-Experts (MoE) models. It highlights the importance of an expert caching mechanism to...

Apple

Adaptive Thinking: Large Language Models Know When to Think in Latent Space

The article presents research on adaptive thinking in large language models (LLMs), particularly focusing on how these models can optimize their reasoning processes during inference. It introduces...

Databricks

22m

A Practical Guide to LLM Fine Tuning

This article serves as a practical guide for ML engineers and AI practitioners focused on fine-tuning large language models (LLMs) for specific tasks. It outlines the entire lifecycle of LLM...

Apple

Can Large Language Models Understand Context?

The article explores the ability of Large Language Models (LLMs) to understand context, a critical aspect of natural language processing. It introduces a benchmark specifically designed to evaluate...

Google

MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

The article introduces new post-training capabilities in MaxText, specifically Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) optimized for single-host TPU configurations. It highlights...

Apple

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

The article discusses a novel approach to training data pruning aimed at improving the memorization of factual knowledge in large language models (LLMs). It formalizes the concept of fact...

Apple

LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss

The article presents LaCy, a new pretraining methodology for Small Language Models (SLMs) that addresses the limitations of knowledge representation due to parameter size. It emphasizes the...

DigitalOcean

16m

The Hidden Cost of Complex AI Platforms: Why Developer Experience Matters

The article explores the often-overlooked costs associated with complex AI platforms, particularly emphasizing the developer experience. It highlights how fragmented workflows and unclear...

Apple

Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment

The article presents Personalized Group Relative Policy Optimization (P-GRPO), a framework designed to enhance the alignment of large language models (LLMs) with heterogeneous human preferences....

Apple

Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training

This paper investigates the scaling properties of downstream metrics in the training of Large Language Models (LLMs). It challenges the traditional reliance on proxy metrics, proposing a direct...

Apple

Thinking into the Future: Latent Lookahead Training for Transformers

The article presents a novel training strategy called latent lookahead for autoregressive language models, aimed at enhancing their predictive capabilities. Traditional next-token prediction limits...

Databricks

11m

Building a Knowledge Assistant over Code

This article explores the development of a knowledge assistant for code retrieval, specifically addressing the challenges of chunking source code for effective retrieval-augmented generation (RAG)....

Apple

Optimal Splitting of Language Models from Mixtures to Specialized Domains

The article presents a novel method for optimizing the training of language models by splitting them into specialized domains. It highlights the two-stage training paradigm, where models are first...

Apple

Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning

The article presents 'Goldilocks RL', a novel approach to reinforcement learning that addresses the challenge of sparse rewards in training models for reasoning tasks. It introduces a teacher-driven...

Apple

AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

The article presents AMES (Approximate Multimodal Enterprise Search), a unified architecture for late interaction retrieval that integrates text, image, and video modalities into a shared...

Apple

mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR

The article presents mAceReason-Math, a dataset designed to improve reinforcement learning with verifiable rewards (RLVR) by providing high-quality multilingual math problems. The dataset addresses...

Apple

GenCtrl -- A Formal Controllability Toolkit for Generative Models

The article introduces GenCtrl, a formal controllability toolkit designed for generative models, addressing the critical need for fine-grained control in generative processes. It establishes a...

Apple

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

The article presents a study on enhancing search relevance in app store rankings by integrating LLM-generated judgments. It identifies the challenge of limited expert-provided textual relevance...

Dropbox

11m

Using LLMs to amplify human labeling and improve Dash search relevance

The article outlines how Dropbox Dash utilizes a retrieval-augmented generation (RAG) approach to enhance search relevance by integrating large language models (LLMs) with human labeling. It explains...

Apple

Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates

The article presents 'Constructive Circuit Amplification,' a method designed to improve mathematical reasoning in large language models (LLMs) by making targeted updates to specific sub-networks,...