Tag Archives: python language

Does the compiler know best?

Ted Dziuba recently blogged about Python3’s Marketing Problem. I chimed in on the comment thread, but there was a deeper point that I felt is missed in the discussions about the GIL and PyPy and performance.  Lately I’ve seen more and more people expressing sentiments along the lines of:

I’m of the same mind, but think that instead of offering a GIL fix, the goodie should have been switching over to PyPy. That would have sold even more people on it than GIL removal, I think.

I know it is an unpopular opinion, but somebody’s got to say it: PyPy is an even more drastic change to the Python language than Python3. It’s not even a silver bullet for performance. I believe that its core principles are, in fact, antithetical to the very things that have brought Python its current success. This is not to say that it’s not an interesting project. But I really, really feel that there needs to be a visible counter to the meme that “PyPy is the future of Python performance”.

What is the core precept of PyPy? It’s that “the compiler knows best”. Whether it’s JIT hotspot optimization, or using STM to manage concurrency, the application writer, in principle, should not have to be bothered with mundane details like how the computer actually executes instructions, or which instructions it’s executing, or how memory is accessed. The compiler knows best.

Conversely, one of the core strengths of Python has been that it talks to everybody, because its inner workings are so simple. Not only is it used heavily by folks of all stripes to integrate legacy libraries, but it’s also very popular as an embedded scripting system in a great number of applications. It is starting to dominate on the backend and the front-end in the computer graphics industry, and hedge funds are starting to converge on it as the best language to layer on top of their low-level finance libraries.

If you doubt that transparency is a major feature, you simply have to look at the amount of hand-wringing that JVM folks do about “being hit by the GC” to understand that there, but by the grace of Guido, go we. If we have to give up
ease of embedding and interoperability, and visibility into what the running system is doing, for a little improvement in performance, then the cost is too steep.

It’s understandable that those who see Python as merely a runtime for some web app request handlers will have a singular fixation with “automagically” getting more performance (JIT) and concurrency (STM) from their runtime. I never thought I’d say this, but… for those things, just fucking use Node.js. Build a Python-to-JS cross compiler and use a runtime that was designed to be concurrent, sandboxed, lightweight, and has the full force of Google, Mozilla, Apple, and MSFT behind optimizing its performance across all hardware types. (It would not surprise me one bit if V8+NaCl finally became what the CLR/DLR could have been.) Armin and the PyPY team are incredibly talented, and I think Nick is probably right when he says that nobody has more insight and experience with optimizing Python execution than Armin.

But even Armin has essentially conceded that optimizing Python really requires optimization at a lower level, which is why PyPy is a meta-tracing JIT. However, PyPy has made the irreversible architectural decision that that level should be merely an opaque implementation detail; the compiler knows best.

An alternative view is that language runtimes should be layered, but always transparent.

Given the recent massive increase of commercial investment in LLVM, and the existence of tools in that ecosystem like DragonEgg, syntax really ceases to be a lock-in feature of a language. (Yes, I know that sounds counter-intuitive.) Instead, what matters more is a runtime’s ability to play nicely with others, and of course its stable of libraries which idiomatically use that runtime. Python could be that runtime. Its standard library could become the equivalent of a dynamic language libc.

Python gained popularity in its first decade because it was a non-write-only Perl, and it worked well with C. It exploded in popularity in its second decade because it was more portable than Java, and because the AMD-Intel led to spectacular improvements in CPU performance, so that an interpreted language was fast enough for most things. For Python to emerge from its third decade as the dynamic language of choice, its core developers and the wider developer community/family will have to make wise, pragmatic choices about what the core strengths of Python are, and what things are best left to others.

View in this light, stressing Unicode over mere performance is a very justifiable decision that will yield far-reaching, long term returns for the language. (FWIW, this is also why I keep trolling Guido about better DSL support in Python; “playing nicely with others” in a post-LLVM world means syntax interop, as well.)

The good news is that the python core developers have been consistently great at making pragmatic choices. One new challenge is that the blogosphere/twittersphere has a logic unto itself, and can lead to very distracting, low signal-to-noise ratio firestorms over nothing. (fib(), anyone?) Will Python survive the noise- and gossip-mill of the modern software bazaar? Only time will tell…