Improvements to the next (and future) versions of Python are set to speed it up, slim it down, and pave the way toward even better things. Credit: eamesBot/Shutterstock Because Python is a dynamic language, making it faster has been a challenge. But over the last couple of years, developers in the core Python team have focused on various ways to do it. At PyCon 2023, held in Salt Lake City, Utah, several talks highlighted Python’s future as a faster and more efficient language. Python 3.12 will showcase many of those improvements. Some are new in that latest version, others are already in Python but have been further refined. Mark Shannon, a longtime core Python contributor now at Microsoft, summarized many of the initiatives to speed up and streamline Python. Most of the work he described in his presentation centered on reducing Python’s memory use, making the interpreter faster, and optimizing the compiler to yield more efficient code. Other projects, still under wraps but already showing promise, offer ways to expand Python’s concurrency model. This will allow Python to better use multiple cores with fewer of the tradeoffs imposed by threads, async, or multiprocessing. The per-interpreter GIL and subinterpreters What keeps Python from being truly fast? One of the most common answers is “lack of a better way to execute code across multiple cores.” Python does have multithreading, but threads run cooperatively, yielding to each other for CPU-bound work. And Python’s support for multiprocessing is top-heavy: you have to spin up multiple copies of the Python runtime for each core and distribute your work between them. One long-dreamed way to solve this problem is to remove Python’s GIL, or Global Interpreter Lock. The GIL synchronizes operations between threads to ensure objects are accessed by only one thread at a time. In theory, removing the GIL would allow true multithreading. In practice—and it’s been tried many times—it slows down non-threaded use cases, so it’s not a net win. Core python developer Eric Snow, in his talk, unveiled a possible future solution for all this: subinterpreters, and a per-interpreter GIL. In short: the GIL wouldn’t be removed, just sidestepped. Subinterpreters is a mechanism where the Python runtime can have multiple interpreters running together inside a single process, as opposed to each interpreter being isolated in its own process (the current multiprocessing mechanism). Each subinterpreter gets its own GIL, but all subinterpreters can share state more readily. While subinterpreters have been available in the Python runtime for some time now, they haven’t had an interface for the end user. Also, the messy state of Python’s internals hasn’t allowed subinterperters to be used effectively. With Python 3.12, Snow and his cohort cleaned up Python’s internals enough to make subinterpreters useful, and they are adding a minimal module to the Python standard library called interpreters. This gives programmers a rudimentary way to launch subinterpreters and execute code on them. Snow’s own initial experiments with subinterpreters significantly outperformed threading and multiprocessing. One example, a simple web service that performed some CPU-bound work, maxed out at 100 requests per second with threads, and 600 with multiprocessing. But with subinterpreters, it yielded 11,500 requests, and with little to no drop-off when scaled up from one client. The interpreters module has very limited functionality right now, and it lacks robust mechanisms for sharing state between subinterpreters. But Snow believes by Python 3.13 a good deal more functionality will appear, and in the interim developers are encouraged to experiment. A faster Python interpreter Another major set of performance improvements Shannon mentioned, Python’s new adaptive specializing interpreter, was discussed in detail in a separate session by core Python developer Brandt Bucher. Python 3.11 introduced new bytecodes to the interpreter, called adaptive instructions. These instructions can be replaced automatically at runtime with versions specialized for a given Python type, a process called quickening. This saves the interpreter the step of having to look up what types the objects are, speeding up the whole process enormously. For instance, if a given addition operation regularly takes in two integers, that instruction can be replaced with one that assumes the operands are both integers. Not all code specializes well, though. For instance, arithmetic between ints and floats is allowed in Python, but operations between ints and ints, or floats and ints, don’t specialize well. Bucher provides a tool called specialist, available on PyPI, to determine if code will specialize well or badly, and to suggest where it can be improved. Python 3.12 has more adaptive specialization opcodes, such as accessors for dynamic attributes, which are slow operations. Version 3.12 also simplifies the overall process of specializing, with fewer steps involved. The big Python object slim-down Python objects have historically used a lot of memory. A Python 3 object header, even without the data for the object, occupied 208 bytes. Over the last several versions of Python, though, various efforts took place to streamline the way Python objects were designed, finding ways to share memory or represent things more compactly. Shannon outlined how as of Python 3.12, the object header’s now a mere 96 bytes—slightly less than half of what it was before. These changes don’t just allow more Python objects to be kept in memory, they also improve cache locality for Python objects. While that by itself may not speed things up as significantly as other efforts, it’s still a boon. Future-proofing Python’s internals The default Python implementation, CPython, has three decades of development behind it. That also means three decades of cruft, legacy APIs, and design decisions that can be hard to transcend—all of which make it hard to improve Python in key ways. Core Python developer Victor Stinner, in a presentation about how Python features are deprecated over time, touched on some of the ways Python’s internals are being cleaned up and future-proofed. One key issue is the proliferation of C APIs found in CPython, the reference runtime for the language. As of Python 3.8, there are a few different sets of APIs, each with different maintenance requirements. Over the last five years, Stinner worked to make many public APIs private, so programmers don’t need to deal as directly with sensitive CPython internals. The long-term goal is to make components that use the C APIs, like Python extension modules, less dependent on things that might change with each version. A third-party project named HPy aims to ease the maintenance burden on the developer. HPy is a substitute C API for Python—stabler across versions, yielding faster code at runtime, and abstracted from CPython’s often messy internals. The downside is that it’s an opt-in project, not a requirement, but various key projects like NumPy are experimenting with using it, and some (like the HPy port of ultrajson) are enjoying big performance gains as a result. The biggest win for cleaning up the C API is that it opens the door to many more kinds of improvements that previously weren’t possible. Like all the other improvements described here, they’re about paving the way toward future Python versions that run faster and more efficiently than ever. Related content feature 14 great preprocessors for developers who love to code Sometimes it seems like the rules of programming are designed to make coding a chore. Here are 14 ways preprocessors can help make software development fun again. By Peter Wayner Nov 18, 2024 10 mins Development Tools Software Development feature Designing the APIs that accidentally power businesses Well-designed APIs, even those often-neglected internal APIs, make developers more productive and businesses more agile. By Jean Yang Nov 18, 2024 6 mins APIs Software Development news Spin 3.0 supports polyglot development using Wasm components Fermyon’s open source framework for building server-side WebAssembly apps allows developers to compose apps from components created with different languages. By Paul Krill Nov 18, 2024 2 mins Microservices Serverless Computing Development Libraries and Frameworks news Go language evolving for future hardware, AI workloads The Go team is working to adapt Go to large multicore systems, the latest hardware instructions, and the needs of developers of large-scale AI systems. By Paul Krill Nov 15, 2024 3 mins Google Go Generative AI Programming Languages Resources Videos