Simon Bisson
Contributor

Use DirectML to train PyTorch machine learning models on a PC

analysis
Mar 02, 20226 mins
Machine LearningPythonPyTorch

Microsoft’s new tool makes it possible to use your own GPU to work with popular machine learning platforms.

ai artificial intelligence circuit board circuitry mother board nodes computer chips
Credit: Getty Images

Machine learning is an increasingly important tool for developers, providing a way to build applications that can deliver a wide range of prediction-based tasks. In the past you might have had to build a complex rules engine, using numeric techniques to deliver the required statistical models. Now you can work with a ML platform to build, train, and test models for your applications.

We call ML outputs predictions, but they can be anything. If you’re using computer vision, they can be identified objects. If you’re using a language model, they’re intent or translations. But whatever the output, it’s a statistically weighted response with a confidence level that can validate any returns.

There are two parts to working with machine learning. If you have a prebuilt model, you can run it from a cloud platform such as Azure ML, using a REST API to work with its predictions, or you can export it in the widely supported ONNX (Open Neural Network Exchange) format and run it on a PC using tools like WinML. That’s the easy part; the hard part is training and testing a model. That process needs a lot of validated and labeled data, as well as a significant amount of compute, either CPU or GPGPU (general-purpose GPU).

The logical place to train a new model is on a cloud-hosted platform, such as Azure’s Machine Learning studio. This can get expensive, requiring large virtual machines to host your models and a lot of storage for your training and test data. If you’re just learning how to build models or are creating a simple prototype with a relatively small set of training data, you’re more likely to want to use a PC.

Build ML models on your PC

A modern developer workstation has more than enough power for basic ML workloads. The machine I’m typing this on fits the bill: It has an 11th-generation Intel processor, 32GB of RAM, and a discrete Nvidia graphics card with support for GPGPU, via its own drivers or through Microsoft’s own DirectML APIs.

How do we set up a machine learning framework on a device like this? It can be complex, with incompatible drivers and setups that are targeted on Linux servers. Microsoft has been working to accelerate this, providing a link between the popular PyTorch ML environment and Windows’ GPU APIs, by using DirectML. You’re not limited to Windows; you can also use WSL (Windows Subsystem for Linux), with the appropriate graphics drivers.

Using DirectML simplifies working with PyTorch, as it’s part of Microsoft’s DirectX graphics APIs. If your graphics card supports DirectML, you can use its GPU to deliver the parallel processing tasks at the heart of training a machine learning model, keeping the load on your development PC’s CPU to a minimum. Microsoft has been working with Windows GPU vendors, including Nvidia and AMD, to support training one of the more common PyTorch model types: convolutional neural networks.

A second preview release of PyTorch-DirectML integration rolled out recently, adding support for more versions of Python and support for working with multiple GPUs, allowing you to choose which GPU is being used. Integration is delivered by a new virtual device called DML. This meshes the DirectML APIs with PyTorch’s primitives, mapping calls in PyTorch to the native DirectML tools.

With PyTorch-DirectML, once a PyTorch tensor is called, it’s passed to the DirectML kernel. This calls into the DirectML back end, which constructs the GPGPU operators, allocates GPU memory, and sets up a queue to manage execution before passing training data and the operators to the GPU for training. It’s an approach that supports both Windows and WSL. 

Getting started with PyTorch-DirectML

Trying it out is easy enough. The PyTorch-DirectML package is available from either GitHub as part of the DirectML project or from popular Python repositories like PyPl. You can use familiar tools such as pip to add it to your Python environment, with only a single change to PyTorch Python code needed to run PyTorch through the DML virtual device.

Where things get interesting is with its WSL support. This way you can build code that’s targeted at cloud-hosted Linux systems on your desktop. You’ll need a Windows 11 system to use the DirectML integration, using the WSLg (Windows Subsystem for Linux GUI) GUI-based system that adds tools to access the Windows graphics platform from the WSL environment. With WSL2 and WSLg installed, you next need to set up a virtual Python environment to host PyTorch.

Microsoft’s documentation is based around Miniconda Python from the Anaconda team. It’s a stripped version of Anaconda that ships with the conda package manager, used by many Python numerical methods tools and often used by frameworks like PyTorch. Once installed, use the conda create and activate commands to set up a Python environment.

With that in place, install a set of required libraries before using pip to install the pytorch-directml package. This contains the DML virtual device and the supported PyTorch 1.8 release. Once installed in your Python virtual environment, you can start working with Pytorch tensors in the DML virtual device. The key to using DirectML is to use a to(“dml”) command to run on your GPU. For example, to create a simple tensor ready for use: tensor1 = torch.tensor([1]).to(“dml”)

The future of the platform

Microsoft has a GitHub repository of samples ready for use with DirectML, including the popular resnet50 image classification algorithm. Using a well-known algorithm like this makes it easier to benchmark using a developer PC to build and test your own machine learning models. By using Miniconda as the foundation of a Python development algorithm, there’s quick access to the tools you need to build and explore your algorithms, for example, working with Jupyter notebooks to share code with colleagues.

Not every PyTorch operator is supported in the current preview release. There’s a list of the operators you can use on GitHub, along with a road map that shows what will be supported in the next milestone release. The remaining 22 operators are marked as possibly being implemented in the future, so if you are bringing existing PyTorch code to DirectML, you should check if you have any dependencies on them.

The cloud is a powerful tool, but it’s important to remember that there’s plenty of power on our desktops. Tools such as PyTorch-DirectML take advantage of those often-ignored capabilities, allowing us to work wherever we want and giving access to those who can’t afford to use the cloud, for education as well as for product development. With access to common algorithms, it’s definitely a good way to start building and customizing machine learning models.

Simon Bisson
Contributor

Author of InfoWorld's Enterprise Microsoft blog, Simon Bisson prefers to think of “career” as a verb rather than a noun, having worked in academic and telecoms research, as well as having been the CTO of a startup, running the technical side of UK Online (the first national ISP with content as well as connections), before moving into consultancy and technology strategy. He’s built plenty of large-scale web applications, designed architectures for multi-terabyte online image stores, implemented B2B information hubs, and come up with next generation mobile network architectures and knowledge management solutions. In between doing all that, he’s been a freelance journalist since the early days of the web and writes about everything from enterprise architecture down to gadgets.

More from this author