PyPy is an absolutely amazing piece of technologies and the core developers are ...

maattdd · on Oct 21, 2015

As you stated, targeting numerical code (array access and primitive operations) versus general code (with modern features such as virtual dispatch, varargs, generics or even metaprog) is completely different in terms of difficulty and solutions.

It is indeed fairly easy to create a highly performant JIT for numerical operations on N-dimensional arrays (vector,matrix..etc..); and depending on your field, it might represent 99% of the execution time.

For example, I wrote a really simple JIT compiler for Matlab which performs sometimes better than raw C (and it's backed by LLVM vectorizer to generate the correct SIMD assembly). Link to the Master thesis if you are interested: https://www.dropbox.com/s/caz7d4d08xhbwcu/thesis.pdf?dl=0

fijal · on Oct 21, 2015

I'm not really going to argue with you, if you can make numba perform - great! We're aiming at providing middle ground. Situations where the code is heavy on numerics, but complex enough that numba either can't handle it at all or is not performing very well. That covers some usecases and from what you're saying very likely not yours, but that's ok. We don't have to cater to everyone and there is a place for tools like pypy (also in numerics) and tools like numba :-)

mangecoeur · on Oct 21, 2015

PyPy is really impressive, I love the idea of getting all these optimisations (SIMD, STM) for free... but as you say, numerical work means numpy, scipy, pandas, which don't work with PyPy. Even if the NumPyPy project was able to fully match the Numpy api, you still have a lot of large projects like Pandas that depend on the c api. it would be stupid to copy everything. Perhaps something can be worked out between pypy, cffi, and cython.

In the short term Numba is much more practical for numerics. In the longer term Pyston looks promising - it's actually similar to Numba in that it also uses LLVM, I imagine there could be synergy between the two...

sitkack · on Oct 21, 2015

NumPy is a protocol that dictates the layout of multidimensional arrays with really fast Fortran code that knows that layout. What needs to be copied is that memory layout protocol so that we get n:m sharing instead of n^2 duplication.

mangecoeur · on Oct 21, 2015

My point was that even if you copy the numpy protocol, you still have huge projects that depend on c-extensions that you wouldn't want to port.

Pandas is one, others include scikit-learn, scikit-image, Astropy, Bioinformatics libraries, stats libraries, etc... which all have heavy C/Cython use and depend to varying degrees on the Python C-api. Porting NumPy barely scratches the surface of scientific python.

dalke · on Oct 21, 2015

Biopython runs on pypy. It also runs under Jython. While it has C use, it is not "heavy" C use.

Going on a tangent, and though I realize it's a lost battle, I wish people would stop saying that NumPy is the base of scientific programming in Python. As Biopython shows, it isn't required for at least some of bioinformatics.

My own research[1] deals with chemical graphs, and NumPy/SciPy/etc. are nearly irrelevant to that research.

[1] For example, given a set of 100 structures, what is the largest substructure (based on the number of bonds) which is in at least 90 of the structures?

Alphasite_ · on Oct 21, 2015

Last I heard there were efforts to support the c api in pypy: https://bitbucket.org/pypy/compatibility/wiki/c-api

pjmlp · on Oct 21, 2015

The great thing is that Python JIT research is happening in multiple fronts, and maybe someday in a very far future CPython won't matter any longer (except for legacy deployments).