Python Scientific Library

dereckli · on Aug 2, 2009

Hello. Reading this new get back to my mind a debate I have with a friend of mine, maybe you can help us ; )

My friend works in the scientific research domain, in chemistry. In his lab he makes simulation of water particles and often he launches simulations that can take even a week to complete. He is using frontan.

I am using ruby since a few years and, as usual, I love its simplicity,etc,etc.

What I've been always wondering is if would be feasible to use a High Level Language (as Ruby, Python, etc) in a case like his.

I understand that fortran is language for the scientific community with loads of libraries but what matter me is the performance.

Saying Fortran and (maybe) Python with this library have the same functionalities, would my friend simulations take longer using Python rather than Fortran? how much longer? do you think it worths the jump to a nicer (from the programmer p.o.v.) programming language?

What do you think?

thanks ; )

jacquesm · on Aug 2, 2009

The reason why fortran is used for simulations is twofold, first there is a large and very well tested set of software tools written in fortran for exactly that purpose, these codebases have been through hell and back with respect to having their output inspected. They are very close to being bug free. A re-write of such a package would cost considerable time and would probably not yield better results, and rewriting them in an interpreted language would yield those results considerably slower. Programming staff time is expensive, so is computer time.

Second, the hardware that these simulations run on ('supercomputers') is optimized for vector processing, which is an expensive way of saying that you are operating on whole arrays of data at a time.

Typical operations are multiplying two vectors of data with each other, multiply-and-sum operations and so on. Because simulations of real world scenarios almost always contain large amounts of these operations 'supercomputers' are one way to get good performance, and most numerical libraries in fortran are available for them, and sometimes even optimized for that particular hardware.

But even on commodity hardware (clusters of pc's) the general availability of tested code is the cause of inertia in the scientific world, these people are trying to solve a problem, they're not 'in love' with any language, the tool that suits them best (libraries available, co-workers versed in it, citations all over the place) is what they'll use.

If you want to get very high performance numerical work done on a pc I'd advise you to look into CUDA. Fortran bindings for CUDA exist so you can use all those libraries at absolutely amazing speeds for very little money.

profquail · on Aug 2, 2009

FORTRAN bindings for CUDA simply allow you to call compiled CUDA libraries from FORTRAN, but they don't let you take libraries written in FORTRAN and compile them to run on CUDA. However, there is a company that nVidia has just partnered with that is writing a FORTRAN compiler for CUDA, which would let you run those libraries on the GPU (though I can't say whether they will work without some modifications).

jacquesm · on Aug 2, 2009

I must have misinterpreted some piece of text somewhere then, I remembered it that way, apologies.

Create · on Aug 2, 2009

Actually, initial CUDA trials can be quite disappointing, unless heavy effort is spent in understanding the architecture, so to be able to exploit it. Then you are back at square one: (almost?) a rewrite. GSI has nice examples about their studies on different architectures. Then again, the real danger when going with non mass-consumer items, is that the product lifecycle ends before you finish your project. A weird but teaching example, even in case of mass produced technology: a high-performance switch maker used a chip that was built for a mainstream game console. As the console switched generations, the part availability dried up...

I just hope that OpenCL takes off and becomes really open (read free as in free drivers and code). Once familiarized with the quirks of a new architecture (read non x86-alike) then it would require less investment to draw computing power (btw. Nvidia is a power-hog, significantly reducing its appeal in large data centres).

profquail · on Aug 2, 2009

I've worked with CUDA a pretty good amount, so I feel obliged to point out that most of initial effort put into porting an algorithm to CUDA is not spent learning the architecture, but parallelizing the algorithm (which is something you would have to deal with even if just porting your code to multi-core CPU). Once you have figured out how to parallelize the algorithm, then you need to worry about the architecture, but only if you need to eek out every last bit of performance. In most viable cases, you should be able to get a pretty decent speedup without tearing your hair out.

In any case, OpenCL is based on CUDA (the driver API), and they are practically the same (compare the reference manuals and you'll see what I mean).

Also, GPU computing is much, much more efficient than CPU computing (in terms of FLOPS/Watt). A high-end GPU (say, a Tesla C1060) can probably pull between 200-300 watts (~2x what a high-end Xeon uses), but can do over 1TFLOP with sufficiently optimized code.

bravura · on Aug 2, 2009

Which company?

profquail · on Aug 2, 2009

It's PGI (The Portland Group):

http://www.nvidia.com/object/fortran.html

chancho · on Aug 2, 2009

I think you're absolutely right about fortran's legacy advantage (that, and it's really not a terrible language for numerical work) but the distinction between 'supercomputer' and cluster of commodity PC's is all but history. Virtually all current supercomputers are clusters made up of the exact same chips you can get in desktop machines. The vector processing optimizations you mention are merely SSE or AltiVec or a GPU, technologies we all have access to. The only thing that makes a supercomputer super is the number of processors and the interconnect.

jacquesm · on Aug 2, 2009

Yes, the practical upshot of this is that not every problem is suited for cluster computers. I think the term for that class of problems is 'embarrassingly parallel', little or no interdependency between processors during the computations.

kragen · on Aug 2, 2009

Yes, there is an efficiency cost to using high-level languages with array functions (Python with Numpy, Perl with PDL, Octave, R, their proprietary equivalents, and kx's q) but the cost is sometimes small. See http://www.scipy.org/PerformancePython for a pretty clear outline of the available options in the Python case (sadly, it doesn't present results for Fortran to compare with, just C++.)

Estragon · on Aug 2, 2009

scipy wraps a lot of fortran functions under the hood. You could cut the knot by wrapping your friends fortran functions and driving them with python. I do this a lot with C. It's a very effective way to develop scientific applications.

draegtun · on Aug 2, 2009

Its certainly possible. For eg. take a look at what BioPerl has done for bioinformatics & genomics.

http://www.bioperl.org/wiki/Main_Page

http://en.wikipedia.org/wiki/BioPerl

leif · on Aug 2, 2009

No new release or CFP. Why is this here again?