MediaTek NPU and LiteRT: Powering the next generation of on-device AI
Read Full ArticleSummary
The article discusses the advancements in on-device AI powered by MediaTek's Neural Processing Units (NPUs) and the introduction of the LiteRT NeuroPilot Accelerator. It highlights the challenges developers face in deploying AI models on NPUs due to the diversity of SoC variants and the lack of tailored infrastructure. The LiteRT NeuroPilot Accelerator aims to simplify this process by providing a unified API, supporting both Ahead-of-Time (AOT) and on-device compilation workflows. Key features include rich generative AI capabilities, efficient cross-platform development, and seamless integration with existing ML pipelines, enabling high-performance applications across various devices.
Key Learnings
- 1The LiteRT NeuroPilot Accelerator streamlines the deployment of AI models on MediaTek NPUs, addressing the complexities of hardware fragmentation.
- 2Developers can choose between AOT and on-device compilation strategies, optimizing for either initialization speed or flexibility in model distribution.
- 3The integration of a new C++ API enhances the efficiency of building ML pipelines, particularly for real-time applications involving camera and video processing.
- 4The collaboration with MediaTek enables the use of state-of-the-art generative AI models like the Gemma family, significantly improving on-device capabilities.
Who Should Read This
Senior Embedded Systems Engineers implementing on-device AI solutions for diverse hardware platforms
Test Your Knowledge
What are the trade-offs between using AOT and on-device compilation for deploying AI models on NPUs?
How does the LiteRT NeuroPilot Accelerator improve the developer experience compared to previous solutions?
What specific optimizations are required for running generative AI models efficiently on MediaTek NPUs?
In what scenarios might a developer prefer to use the new C++ API over the previous C API for building ML applications?
What challenges do developers face when managing the diversity of SoC variants in the context of on-device AI deployment?
Topics
More articles about Neural Networks
Explore Neural Networks engineering →Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals
The article details the development of a Technical Health Score system at Salesforce, aimed at quantifying platform trust through analytics pipelines that handle petabytes of telemetry data. By...
Unified Context-Intent Embeddings for Scalable Text-to-SQL
The article outlines Pinterest's evolution from basic Text-to-SQL systems to a sophisticated Analytics Agent that leverages unified context-intent embeddings for enhanced query understanding and SQL...
GenCtrl -- A Formal Controllability Toolkit for Generative Models
The article introduces GenCtrl, a formal controllability toolkit designed for generative models, addressing the critical need for fine-grained control in generative processes. It establishes a...
Multi-Frequency Fusion for Robust Video Face Forgery Detection
The article presents a novel approach to video face forgery detection through a method termed Multi-Frequency Fusion. This technique utilizes a lightweight fusion of two handcrafted cues,...
Unifying Ads Engagement Modeling Across Pinterest Surfaces
The article presents a comprehensive approach to unify ads engagement modeling across different surfaces at Pinterest, addressing the challenges posed by previously independent models. It outlines...
More from Google Engineering
View Google engineering blogs →Introducing Finish Changes and Outlines, now available in Gemini Code Assist extensions on IntelliJ and VS Code
The article introduces two new features in the Gemini Code Assist extensions for IntelliJ and Visual Studio Code: Finish Changes and Outlines. Finish Changes acts as an AI pair programmer, allowing...
Unleash Your Development Superpowers: Refining the Core Coding Experience
The article outlines recent feature enhancements in the Gemini Code Assist tool, designed to streamline the coding experience for developers. Key features include Agent Mode with Auto Approve for...
Introducing Wednesday Build Hour
The 'Wednesday Build Hour' is a weekly initiative designed for developers to engage in hands-on learning and skill enhancement in cloud technologies. Led by Google Cloud experts, the sessions cover a...
What's new in TensorFlow 2.21
TensorFlow 2.21 introduces significant enhancements, particularly with the LiteRT stack, which is designed for high-performance on-device inference. This new runtime offers improved GPU performance,...
You can't stream the energy: A developer's guide to Google Cloud Next '26 in Vegas
The article serves as a guide for developers attending Google Cloud Next '26 in Las Vegas, highlighting the importance of in-person collaboration and the value of hands-on learning. It outlines key...