Microsoft’s open-source, hardware-aware optimization tool for ONNX models is an essential part of its AI application development tool chain. Credit: Getty Images Microsoft’s AI push goes beyond the cloud, as the company is clearly getting ready for desktop hardware with built-in AI accelerators. You only have to look at Microsoft’s collaboration with Qualcomm, which produced the SQ series of Arm processors, all of which come with AI accelerators that deliver new computer vision features on Windows. AI accelerators aren’t new. Essentially they’re an extension of the familiar GPU, only now they’re designed to accelerate neural networks. That explains the name Microsoft has adopted for them: NPUs, neural processing units. NPUs fill an important need. End users want to be able to run AI workloads locally, without relying on cloud compute, keeping their data inside their own hardware, often for security and regulatory reasons. While NPU-enabled hardware is still rare, there are signs from major silicon vendors that these accelerators will be a key feature of upcoming processor generations. Supporting AI applications across hardware architectures While technologies like ONNX (Open Neural Network Exchange) help make trained models portable, with ONNX runtimes for Windows and ONNX support in most Windows development platforms, including .NET, there’s still a significant roadblock to wider support for local AI applications: different tool chains for different hardware implementations. If you want to write machine learning applications that inference on the SQ-series Arm NPUs, you need to sign up for Qualcomm’s developer program to get access to the SDKs and libraries you need. They’re not part of the standard .NET distribution, or part of the Windows C++ SDK, nor are they available on GitHub. That makes it hard to write general purpose AI applications. It also limits features like Microsoft’s real-time camera image processing to Windows on Arm devices with an NPU, even if you have an Intel ML accelerator card or a high-end Nvidia GPU. Code needs to be specific, making it hard to distribute through mechanisms like the Microsoft Store, or even via enterprise application management tooling like Microsoft Intune. Optimizing ONNX models with Olive Build 2023 saw Microsoft start to cross the hardware divide, detailing what it describes as a “hybrid loop” based on both ONNX and a new Python tool called Olive, which is intended to give you the same level of access to AI tooling as Microsoft’s own Windows team. Using Olive, you can compress, optimize, and compile models to run on local devices (aka the edge) or in the cloud, allowing on-prem operation when necessary and bursting to Azure when data governance considerations and bandwidth allow. So, what exactly is Olive? It’s a way of simplifying the packaging process to optimize inferencing for specific hardware, allowing you to build code that can switch inferencing engines as needed. While you still build different inferencing packages for different hardware combinations, your code can load the appropriate package at run time. Or in the case of Windows on Arm, your code can be compiled with a Qualcomm NPU package that’s built at the same time as your x86 equivalents. Like much of Microsoft’s recent developer tooling, Olive is open source and available on GitHub. Once Olive is installed in your development environment, you can use it to automate the process of tuning and optimizing models for target hardware. Olive provides a range of tuning options, which targeting different model types. If you’re using a transformer, for example, Olive can apply appropriate optimizations, as well as help balance the constraints on your model to manage both latency and accuracy. Optimization in Olive is a multi-pass process, starting with either a PyTorch model or an ONNX export from any other training platform. You define your requirements for the model and for each pass, which performs a specific optimization. You can run passes (optimizations) using Azure VMs, your local development hardware, or a container that can be run anywhere you have sufficient compute resources. Olive runs a search across various possible tunings, looking for the best implementation of your model before packaging it for testing in your application. Making Olive part of your AI development process Because much of Olive’s operation is automated, it should be relatively easy to weave it into existing tool chains and build processes. Olive is triggered by a simple CLI, working against parameters set by a configuration file, so could be included in your CI/CD workflow either as a GitHub Action or as part of an Azure Pipeline. As the output is prepackaged models and runtimes, as well as sample code, you could use Olive to generate build artifacts that could then be included in a deployment package, or in a container for distributed applications, or in an installer for desktop apps. Getting started with Olive is simple enough. A Python package, Olive is installed using pip, with some dependencies for specific target environments. You need to write an Olive JSON configuration file before running an optimization. This isn’t for the beginner, although there are sample configurations in the Olive documentation to help you get started. Start by choosing the model type and its inputs and outputs, before defining your desired performance and accuracy. Finally, your configuration determines how Olive will optimize your model, for example converting a PyTorch model to ONNX and applying dynamic quantization. The results can be impressive, with the team demonstrating significant reductions in both latency and model size. That makes Olive a useful tool for local inferencing, as it ensures that you can make the most of restricted environments with limited compute capabilities and limited storage, for example for deploying safety-critical computer vision applications on edge hardware. Preparing for the next generation of AI silicon There’s a significant level of future-proofing in Olive. The tool is built around an optimization plugin model that allows silicon vendors to define their own sets of optimizations and to deliver them to Olive users. Both Intel and AMD have already delivered tooling that works with their own hardware and software, which should make it easier to improve model performance while reducing the compute needed to perform the necessary optimizations. This approach will allow Olive to quickly pick up support for new AI hardware, both integrated chipsets and external accelerators. Olive is coupled with a new Windows ONNX runtime that allows you to switch between local inferencing and a cloud endpoint, based on logic in your code. For sensitive operations it could be forced to run locally, while for less restrictive operations it could run wherever is most economical. One more useful feature in Olive is the ability to connect it directly to an Azure Machine Learning account, so you can go directly from your own custom models to ONNX packages. If you’re planning on using hybrid or cloud-only inferencing, Olive will optimize your models for running in Azure. Optimizing ONNX-format models for specific hardware has many benefits, and having a tool like Olive that supports multiple target environments should help deliver applications with the performance users expect and need on the hardware they use. But that’s only part of the story. For developers charged with building optimized machine learning applications for multiple hardware platforms, Olive provides a way to get over the first few hurdles. Related content news Spin 3.0 supports polyglot development using Wasm components Fermyon’s open source framework for building server-side WebAssembly apps allows developers to compose apps from components created with different languages. By Paul Krill Nov 18, 2024 2 mins Microservices Serverless Computing Development Libraries and Frameworks how-to How to use DispatchProxy for AOP in .NET Core Take advantage of the DispatchProxy class in C# to implement aspect-oriented programming by creating proxies that dynamically intercept method calls. By Joydip Kanjilal Nov 14, 2024 7 mins Microsoft .NET C# Development Libraries and Frameworks news Microsoft’s .NET 9 arrives, with performance, cloud, and AI boosts Cloud-native apps, AI-enabled apps, ASP.NET Core, Aspire, Blazor, MAUI, C#, and F# all get boosts with the latest major rev of the .NET platform. By Paul Krill Nov 12, 2024 4 mins C# Generative AI Microsoft .NET feature Can Wasm replace containers? WebAssembly revolutionized browser apps, and promises to upend the server stack. How will it impact containers and Kubernetes? Six experts weigh in. By Bill Doerrfeld Nov 11, 2024 12 mins Containers Kubernetes Cloud Native Resources Videos