PyPy is a drop-in replacement for the stock Python interpreter, and it runs many times faster on some Python programs. Credit: Thinkstock Python has earned a reputation for being powerful, flexible, and easy to work with. These virtues have led to its use in a huge and growing variety of applications, workflows, and fields. But the design of the language—its interpreted nature and runtime dynamism—means that Python has always been an order of magnitude slower than machine-native languages like C or C++. Over the years, developers have come up with a variety of workarounds for Python’s speed limitations. For instance, you could write performance-intensive tasks in C and wrap the C code with Python; many machine learning libraries do exactly this. Or you could use Cython, a project that lets you sprinkle Python code with runtime type information that allows it to be compiled to C. But workarounds are never ideal. Wouldn’t it be great if we could just take an existing Python program as is and run it dramatically faster? That’s exactly what you can do with PyPy. PyPy vs. CPython PyPy is a drop-in replacement for the stock Python interpreter, CPython. Whereas CPython compiles Python to intermediate bytecode that is then interpreted by a virtual machine, PyPy uses just-in-time (JIT) compilation to translate Python code into machine-native assembly language. Depending on the task being performed, the performance gains can be dramatic. On the (geometric) average, PyPy speeds up Python by about 4.7 times over Python 3.7, with some tasks accelerated 50 times or more. While JIT optimizations of certain kinds are being added to new versions of the CPython interpreter, they aren’t of the same scope and power as what PyPy does right now. (This doesn’t rule out the chance they might be in the future, but for now, they aren’t.) The best part is that little to no effort is required on the part of the developer to unlock the gains PyPy provides. Simply swap out CPython for PyPy, and for the most part you’re done. There are a few exceptions, discussed below, but PyPy’s stated goal is to run existing, unmodified Python code and provide it with an automatic speed boost. PyPy currently supports both Python 2 and Python 3, by way of different incarnations of the project. In other words, you need to download different versions of PyPy depending on the version of Python you will be running. The Python 2 branch of PyPy has been around much longer, but the Python 3 version has been brought up to speed as of late. It currently supports versions of Python up to 3.9, with Python 3.10 supported experimentally. In addition to supporting all of the core Python language, PyPy works with the vast majority of tools in the Python ecosystem, such as pip for packaging or virtualenv for virtual environments. Most Python packages, even those with C modules, should work as-is. There are limitations, however, which we’ll discuss shortly. How PyPy works PyPy uses optimization techniques found in other just-in-time compilers for dynamic languages. It analyzes running Python programs to determine the type information of objects as they’re created and used, then uses that type information as a guide to speed things up. For instance, if a Python function works with only one or two different object types, PyPy generates machine code to handle those specific cases. PyPy’s optimizations are handled automatically at runtime, so you generally don’t need to tweak its performance. An advanced user might experiment with PyPy’s command-line options to generate faster code for special cases, but only rarely is this necessary. PyPy also departs from the way CPython handles some internal functions, but tries to preserve compatible behaviors. For instance, PyPy handles garbage collection differently than CPython. Not all objects are immediately collected once they go out of scope, so a Python program running under PyPy may show a larger memory footprint than when running under CPython. But you can still use Python’s high-level garbage collection controls exposed through the gc module, such as gc.enable(), gc.disable(), and gc.collect(). If you want information about PyPy’s JIT behavior at runtime, PyPy includes a module, pypyjit, that exposes many JIT hooks to your Python application. If you have a function or module that seems to be performing poorly with the JIT, pypyjit allows you to obtain detailed statistics about it. Another PyPy-specific module, __pypy__, exposes other features specific to PyPy, which can be useful for writing applications that leverage those features. Because of Python’s runtime dynamism, it is possible to construct Python applications that use these features when PyPy is present and ignores them when it is not. PyPy’s limitations Magical as PyPy might seem, it isn’t magic. PyPy is not a completely universal replacement for the stock CPython runtime. Some of its limitations reduce or obviate its effectiveness for certain kinds of programs. Let’s consider the most important ones. PyPy works best with pure Python apps PyPy has always performed best with “pure” Python applications—that is, applications written in Python and nothing else. Python packages that interface with C libraries, such as NumPy, have not fared as well due to the way PyPy emulates CPython’s native binary interfaces. PyPy’s developers have whittled away at this issue and made PyPy more compatible with the majority of Python packages that depend on C extensions. NumPy, for instance, works very well with PyPy now. But if you want maximum compatibility with C extensions, use CPython. PyPy works best with longer-running programs A side-effect of how PyPy optimizes Python programs is that longer-running programs benefit most from its optimizations. The longer the program runs, the more runtime type information PyPy can gather, and the more optimizations it can make. One-and-done Python scripts won’t benefit from this sort of thing. The applications that do benefit typically have loops that run for long periods of time, or run continuously in the background—web frameworks, for instance. PyPy doesn’t do ahead-of-time compilation PyPy compiles Python code, but it isn’t a compiler for Python code. Because of the way PyPy performs its optimizations and the inherent dynamism of Python, there’s no way to emit the resulting JITted code as a standalone binary and re-use it. Each program has to be compiled for each run, as explained in the documentation. If you want to compile Python into faster code that can run as a standalone application, use Cython, Numba, or the currently experimental Nuitka project. Related content feature 14 great preprocessors for developers who love to code Sometimes it seems like the rules of programming are designed to make coding a chore. Here are 14 ways preprocessors can help make software development fun again. By Peter Wayner Nov 18, 2024 10 mins Development Tools Software Development feature Designing the APIs that accidentally power businesses Well-designed APIs, even those often-neglected internal APIs, make developers more productive and businesses more agile. By Jean Yang Nov 18, 2024 6 mins APIs Software Development news Spin 3.0 supports polyglot development using Wasm components Fermyon’s open source framework for building server-side WebAssembly apps allows developers to compose apps from components created with different languages. By Paul Krill Nov 18, 2024 2 mins Microservices Serverless Computing Development Libraries and Frameworks news Go language evolving for future hardware, AI workloads The Go team is working to adapt Go to large multicore systems, the latest hardware instructions, and the needs of developers of large-scale AI systems. By Paul Krill Nov 15, 2024 3 mins Google Go Generative AI Programming Languages Resources Videos