Learn how this popular Python library accelerates math at scale, especially when paired with tools like Cython and Numba. Credit: Carlos Castilla / Getty Images Python is convenient and flexible, yet notably slower than other languages for raw computational speed. The Python ecosystem has compensated with tools that make crunching numbers at scale in Python both fast and convenient. NumPy is one of the most common Python tools developers and data scientists use for assistance with computing at scale. It provides libraries and techniques for working with arrays and matrices, all backed by code written in high-speed languages like C, C++, and Fortran. And, all of NumPy’s operations take place outside the Python runtime, so they aren’t constrained by Python’s limitations. Using NumPy for array and matrix math in Python Many mathematical operations, especially in machine learning or data science, involve working with matrixes, or lists of numbers. The naive way to do that in Python is to store the numbers in a structure, typically a Python list, then loop over the structure and perform an operation on every element of it. That’s both slow and inefficient, since each element must be translated back and forth from a Python object to a machine-native number. NumPy provides a specialized array type that is optimized to work with machine-native numerical types such as integers or floats. Arrays can have any number of dimensions, but each array uses a uniform data type, or dtype, to represent its underlying data. Here’s a simple example: import numpy as np np.array([0, 1, 2, 3, 4, 5, 6]) This creates a one-dimensional NumPy array from the provided list. We didn’t specify a dtype for this array, so it’s automatically inferred from the supplied data that it will be a 32- or 64-bit signed integer (depending on the platform). If we wanted to be explicit about the dtype, we could do this: np.array([0, 1, 2, 3, 4, 5, 6], dtype=np.uint32) np.uint32 is, as the name implies, the dtype for an unsigned 32-bit integer. It is possible to use generic Python objects as the dtype for a NumPy array, but if you do this, you’ll get no better performance with NumPy than you would with Python generally. NumPy works best for machine-native numerical types (ints, floats) rather than Python-native types (complex numbers, the Decimal type). How NumPy speeds array math in Python A big part of NumPy’s speed comes from using machine-native datatypes, instead of Python’s object types. But the other big reason NumPy is fast is because it provides ways to work with arrays without having to individually address each element. NumPy arrays have many of the behaviors of conventional Python objects, so it’s tempting to use common Python metaphors for working with them. If we wanted to create a NumPy array with the numbers 0-1000, we could in theory do this: x = np.array([_ for _ in range(1000)]) This works, but its performance is hidebound by the time it takes for Python to create a list, and for NumPy to convert that list into an array. By contrast, we can do the same thing far more efficiently inside NumPy itself: x = np.arange(1000) You can use many other kinds of NumPy built-in operations for creating new arrays without looping: creating arrays of zeroes (or any other initial value), or using an existing dataset, buffer, or other source. Another key way NumPy speeds things up is by providing ways to not have to address array elements individually to do work on them at scale. As noted above, NumPy arrays behave a lot like other Python objects, for the sake of convenience. For instance, they can be indexed like lists; arr[0] accesses the first element of a NumPy array. This lets you set or read individual elements in an array. However, if you want to modify all the elements of an array, you’re best off using NumPy’s “broadcasting” functions—ways to execute operations across a whole array, or a slice, without looping in Python. Again, this is so all the performance-sensitive work can be done in NumPy itself. Here’s an example: x1 = np.array( [np.arange(0, 10), np.arange(10,20)] ) This creates a two-dimensional NumPy array, each dimension of which consists of a range of numbers. (We can create arrays of any number of dimensions by simply using nested lists in the constructor.) [[ 0 1 2 3 4 5 6 7 8 9] [10 11 12 13 14 15 16 17 18 19]] If we wanted to transpose the axes of this array in Python, we’d need to write a loop of some kind. NumPy allows us to do this kind of operation with a single command: x2 = np.transpose(x1) The output: [[ 0 10] [ 1 11] [ 2 12] [ 3 13] [ 4 14] [ 5 15] [ 6 16] [ 7 17] [ 8 18] [ 9 19]] Operations like these are the key to using NumPy well. NumPy offers a broad catalog of built-in routines for manipulating array data. Built-ins for linear algebra, discrete Fourier transforms, and pseudorandom number generators save you the trouble of having to roll those things yourself, too. In most cases, you can accomplish what you need with one or more built-ins, without using Python operations. NumPy universal functions (ufuncs) Another set of features NumPy offers that let you do advanced computation techniques without Python loops are called universal functions, or ufuncs for short. ufuncs take in an array, perform some operation on each element of the array, and either send the results to another array or do the operation in-place. An example: x1 = np.arange(1, 9, 3) x2 = np.arange(2, 18, 6) x3 = np.add(x1, x2) Here, np.add takes each element of x1 and adds it to x2, with the results stored in a newly created array, x3. This yields [ 3 12 21]. All the actual computation is done in NumPy itself. ufuncs also have attribute methods that let you apply them more flexibly, and reduce the need for manual loops or Python-side logic. For instance, if we wanted to take x1 and use np.add to sum the array, we could use the .add method np.add.accumulate(x1) instead of looping over each element in the array to create a sum. Likewise, let’s say we wanted to perform a reduction function—that is, apply .add along one axis of a multi-dimensional array, with the results being a new array with one less dimension. We could loop and create a new array, but that would be slow. Or, we could use np.add.reduce to achieve the same thing with no loop: x1 = np.array([[0,1,2],[3,4,5]]) # [[0 1 2] [3 4 5]] x2 = np.add.reduce(x1) # [3 5 7] We can also perform conditional reductions, using a where argument: x2 = np.add.reduce(x1, where=np.greater(x1, 1)) This would return x1+x2, but only in cases where the elements in x1‘s first axis are greater than 1; otherwise, it just returns the value of the elements in the second axis. Again, this spares us from having to manually iterate over the array in Python. NumPy provides mechanisms like this for filtering and sorting data by some criterion, so we don’t have to write loops—or at the very least, the loops we do write are kept to a minimum. NumPy and Cython: Using NumPy with C The Cython library in Python lets you write Python code and convert it to C for speed, using C types for variables. Those variables can include NumPy arrays, so any Cython code you write can work directly with NumPy arrays. Using Cython with NumPy confers some powerful features: Accelerating manual loops: Sometimes you have no choice but to loop over a NumPy array. Writing the loop operation in a Cython module provides a way to perform the looping in C, rather than Python, and thus enables dramatic speedups. Note that this is only possible if the types of all the variables in question are either NumPy arrays or machine-native C types. Using NumPy arrays with C libraries: A common use case for Cython is to write convenient Python wrappers for C libraries. Cython code can act as a bridge between an existing C library and NumPy arrays. Cython allows two ways to work with NumPy arrays. One is via a typed memoryview, a Cython construct for fast and bounds-safe access to a NumPy array. Another is to obtain a raw pointer to the underlying data and work with it directly, but this comes at the cost of being potentially unsafe and requiring that you know ahead of time the object’s memory layout. NumPy and Numba: JIT-accelerating Python code for NumPy Another way to use Python in a performant way with NumPy arrays is to use Numba, a JIT compiler for Python. Numba translates Python-interpreted code into machine-native code, with specializations for things like NumPy. Loops in Python over NumPy arrays can be optimized automatically this way. But Numba’s optimizations are only automatic up to a point, and may not manifest significant performance improvements for all programs. Related content feature 14 great preprocessors for developers who love to code Sometimes it seems like the rules of programming are designed to make coding a chore. Here are 14 ways preprocessors can help make software development fun again. By Peter Wayner Nov 18, 2024 10 mins Development Tools Software Development news JetBrains IDEs ease debugging for Kubernetes apps Version 2024.3 updates to IntelliJ, PyCharm, WebStorm, and other JetBrains IDEs streamline remote debugging of Kubernetes microservices and much more. By Paul Krill Nov 14, 2024 3 mins Integrated Development Environments Java Python analysis Understanding Hyperlight, Microsoft’s minimal VM manager Microsoft is making its Rust-based, functions-focused VM tool available on Azure at last, ready to help event-driven applications at scale. By Simon Bisson Nov 14, 2024 8 mins Microsoft Azure Rust Serverless Computing analysis GitHub Copilot learns new tricks GitHub and Microsoft have taken their AI-powered programming assistant into new territories, tackling code reviews, simple web apps, Java upgrades, and Azure help and troubleshooting. By Simon Bisson Nov 07, 2024 8 mins GitHub Java Microsoft Azure Resources Videos