Compilers, Runtimes, and Users – Oh my!

I’ve been meaning to respond to this since last week, but was totally caught up with SciPy and a product launch. Better late than never.

This is a response to Alex Gaynor’s response  to my previous post, “Does the Compiler Know Best?”

I plan to respond to some of the more specific points from Alex in the comment stream of his blog post itself. However, I am writing a new post here to reiterate the broader picture of the forest which I think is being missed for the trees. Specifically: I wrote the blog post in response to the disturbing rise in comments of the form “Why shouldn’t PyPy be the official implementation of Python?” I think part of Alex’s (and other PyPy devs’) confusion or consternation at the post is due to my failure to more clearly identify the intended audience. The post was meant as a response to those voices in the peanut gallery of the blogo/twitter-sphere, and was not meant as a harangue against the PyPy community.

However, I do stand firmly by my core point: a huge part of Python’s success is due to how well its runtime plays with existing codebases and libraries, and it would be utterly foolish to abandon that trump card to join in the “who has the fastest VM” race.

This is wrong. Complete and utterly. PyPy is in fact not a change to Python at all, PyPy faithfully implements the Python language as described by the Python language reference, and as verified by the test suite.

There is certainly no argument that PyPy is a faithful implementation of the Python official language standard. It passes the test suites! However, do the test suites reflect the use cases out in the wild? Or do those not matter? On the basis of the many years of consulting I have done in applying Python to solving real problems at companies with dozens, hundreds, or even thousands of Python programmers, my perspective is that CPython-as-a-runtime is almost as important as Python-as-a-language. They have co-evolved and symbiotically contributed to each other’s success.

It is not my place, nor is it my intent, to tell people what they can and cannot implement. My one goal is to voice a dissenting opinion when I feel the ecosystem is presented with proposals which I think are either short-sighted or damaging. My previous post was in response to the clamor from those who have only been exposed to Python-the-language, and who only have visibility to those surface layer use cases.

Second, he writes, “What is the core precept of PyPy? It’s that “the compiler knows best”.” This too, is wrong. First, PyPy’s central thesis is, “any task repeatedly performed manually will be done incorrectly”, this is why we have things like automatic insertion of the garbage collector, in preference to CPython’s “reference counting everywhere”, and automatically generating the just in time compiler from the interpreter…

Interestingly enough, CPython’s reference-counting based garbage collection scheme is sometimes cited as one of the things that makes it integrate more nicely with external code. (It might not exactly be easy to track all the INCREFs and DECREFs, but the end result is more stable and easier to deal with.) And there is no problem with auto-generating the JIT compiler, except that (as far as I know) there is not a well-defined API into it, so that external code can interoperate with the compiler and the code it’s running.

It would appear that there is a one way funnel from a user’s Python source and the RPython interpreter through the PyPy machinery, to emit a JITting compiler at the other end. This is fine if the center and the bulk of all the action is in the Python source. However, for a huge number of Python users, this is simply not the case. For those users, having an opaque “Voila! Here’s some fast machine code!” compiler pipeline is not nearly as useful as a pipeline whose individual components they can control.

And that is the primary difference between a compiler and an interpreter. The interpreter has well-defined state that can be inspected and modified as it processes a program. A compiler has a single-minded goal of producing optimized code for a target language. Of course, the PyPy project has a compiler and an interpreter, but the generated runtime is not nearly as easy to integrate and embed as CPython. I will say it again: CPython-the-runtime is almost as important as Python-the-language in contributing to the success of the Python ecosystem.

Second, the PyPy developers would never argue that the compiler knows best, … having an intelligent compiler does not prohibit giving the user more control, in fact it’s a necessity! There are no pure-python hints that you can give to CPython to improve performance, but these can easily be added with PyPy’s JIT.

Quick quiz: How long has Python had static typing?
Answer: Since 1995, when Jim Hugunin & others wrote Numeric.

Numeric/Numarray/Numpy have been the longest-lived and most popular static typing system for Python, even though they are generally only used by people who wanted to statically type millions or billions of memory locations all at once. CPython made it easy to extend the interpreter so that variables and objects in the runtime were like icebergs, with a huge amount of sub-surface mass. The perspective of CPython-the-runtime has been one of “extensible & embeddable”.

Does it matter that these extensions were “impure”? Did that hurt or help the Python ecosystem?

At the end of the day, it comes down to what users want. In the last 5 years, since Python has successfully regained mindshare in the web development space that was lost to RoR, there have been a large number of relative new-comers to the ecosystem whose needs are much more focused on latency and I/O throughput, and for whom extensions modules are an afterthought. For these users, there is a single-minded focus on raw VM performance; the closest they will ever get to an extension module is maybe a DB driver or some XML/text processing library or an async library. It’s understandable that such people do not understand what all the fuss is, and why they might vocally push for PyPy to replace CPython as the reference implementation.

My experiences with Python and the users that I’ve been exposed to are a much different crowd. I don’t think they are any fewer in number; however, they are usually working at large companies and banks or military and government, and generally do not tweet and blog about their technology. I have sat in numerous corporate IT discussions where Python has been stacked up against Java, .Net, and the like – and in all of these things, I can assure you that the extensibility of the CPython interpreter (and by extension, the available of libraries like NumPy) have been major points in our favor. In these environments, Python does not exist in a vacuum, nor is it even at the top of the foodchain. Its adoption is usually due to how well it plays with others, both as an extensible and as an embeddable VM.

The reason I felt like I needed to write a “dissenting” blog post is because I know that for every comment or Reddit/HN comment, there are dozens or hundreds of other people who are watching the discussions and debate, and not able to (or willing to) comment, but who have internal corporate discussions about technology direction. If Python-the-community were to deprecate CPython-the-runtime, those internal discussions would head south very quickly.

My post was directed specifically at people who make comments about what Python should or should not do, without having good insight into what its user base (both individual and corporate) look like. The subtext of my post is that “there are voices in the Python ecosystem who understand traditional business computing needs”, and the audience of that subtext is all the lurkers and watchers on the sidelines who are constantly evaluating or defending their choice of language and technologies.


2 thoughts on “Compilers, Runtimes, and Users – Oh my!

  1. I think your clarification of Python-the-language vs CPython-the-runtime being the heart of one of the biggest community divisions is a good one.

    This is a large part of why being a core dev can be both fun and infuriating. We get to see (or hear about) most of the use cases from widely diverging audiences with wildly differing priorities. However, the world views sometimes end up so different, that it can be hard for different groups to communicate directly and python-dev ends up playing translator and/or referee almost by default.

    And then we get to make design decisions that try to at least vaguely balance all those competing interests with our best guesses about the future interests of the people that aren’t using Python yet, but will hopefully start doing so in the future, as well as our assessment of what we think is maintainable long term with primarily volunteer resources :)

  2. Peter says:

    Yes, it is an unenviable position you guys are in.

    However, you all must be doing something right because there is a pile of goodness in 3.3. Although the Scipy community has been slowly moving to support Python 3 more broadly, for our work at Continuum, we are planning on support for Python 3 from the very early stages of our Python Data Analysis stack. (Since our goal is to introduce Python to many new users, we want to be able to point them at the new hotness, with no hemming and hawing about that whole 2-vs-3 thing…)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: