> having half your product in Ruby and half in C is awful to have to deal wi...

lbrandy · on April 21, 2013

Everyone always agrees with this and, in my experience, has never really tried it. It works better in theory than in practice (or, at a minimum, it works only on a small subset of the types of problems that a naive view might otherwise lead you to believe). The problem is the data isn't organized in a way to be computation efficient, and all the C code in the world isn't going to make chasing pointers to pointers to pointers to ints fast.

A ton of work has to go into computation-friendly data organization (see all of numpy, scipy, etc) before you can ever actually optimize effectively.

EDIT: since all of my responses are of the form: "oh no I did it once with great results", I'll retract the part where I said "have never really tried it". I'm not arguing it's never worked. I'm arguing that the set of likely performance problems and the set of easy-to-optimize-with-C problems don't overlap as much as people would like to believe.

vidarh · on April 21, 2013

I have tried it. My MSc dissertation required a lot of computation (weeks of runtime after I'd rewritten critical portions in C++), and I'd do it that way again.

For many types of problems, putting the data into an efficient format before passing it to your C/C++ code is trivial in terms of both difficulty and time spent.

I prototyped everything in Ruby, profiled, and then rewrote a handful of the critical parts in C and C++.

timr · on April 22, 2013

I've done it too, and I think most of the responses to the GP are missing the point. Can it be done? Sure. But it's a pain in the ass. It's not nearly as trivial as people suggest. It's basically only worthwhile when you know you have the resources to build an infrastructure of re-usuable fast modules that get coupled with a scripting language interface.

Even when you give up on writing true "hybrid" code (i.e. "just write modules for the slow parts in C++", which is a gigantic pain), and instead try to write driver code that calls standalone compiled applications, you find yourself having to write a lot of duplicate code to parse/validate I/O on both sides of the language boundary. It sucks, and there's no way to make it not suck.

I'd go so far as to say that for problems where processing speed is a known requirement from the start, you should just give up on interpreted languages. Fast interpreted languages have been right around the corner for long as I've been writing code, but they sure seem to be taking their time.

TikiTDO · on April 22, 2013

I would venture to say the triviality of writing hybrid code is a function of how often you have done something similar, and how well you understand the underlying principles. This is really the case with any software problem.

Ruby provides some very useful features, but a lot of those come at rather high costs. Those costs are exacerbated if you do not understand how the internal components of the language are laid out, and the purpose behind this layout.

Ruby tries to be a lot of things to a lot of people, so then when people learn how to use it for one task they automatically assume that their skills will carry over. This sort of approach might work reasonably well with a straight-forward compiled language, but this it simply can't be that easy for an interpreted language like ruby, with it's reams of special context sensitive behaviour.

For example, consider the "pointers to pointers to pointers" complaint. Nothing is stopping you from having a low level C struct for all your performance sensitive data. Granted, you would have to write a ruby wrapper for this data for when you need to export it back out to ruby, but wrapper generation could be automated.

Sure, you could just say, "You know what. I'm just going to use C" but what if your project could really use some of those ruby features outside of the performance sensitive parts of the code? It's always a tradeoff.

vidarh · on April 22, 2013

I didn't find it hard at all, nor a pain in the ass. I wrote my code. Profiled it. Found the few, small bits that needed to be fast and treated the Ruby code as an executable spec for the C, and rewrote it very quickly.

There is SWIG support for Ruby, as well as the Ruby FFI implementation and RubyInline, offering various degrees of simplicity, but none of them are hard. The Ruby extension interface is more work, but even that is fairly simple to work with. I've taken C libraries and wrapped them in Ruby in minutes using SWIG for example.

Overall I often these days choose to write in Ruby first and replace even for applications that I intend to write the finished version of entirely in C. Prototyping in Ruby and rewriting in C once I have a working version is faster to me than iterating over a C version.

timr · on April 22, 2013

"I wrote my code. Profiled it. Found the few, small bits that needed to be fast and treated the Ruby code as an executable spec for the C, and rewrote it very quickly."

The fact that the speed-critical parts of your problem could be reduced to a "few, small bits" indicates that we're talking about entirely different things.

I was in a problem domain where writing the time-critical pieces in a different language meant ensuring that huge amounts of working-set data were available to that optimized code. Efficient access to that data meant storage in native format, which meant that supporting two languages would require a nasty I/O layer for the interpreted language. Not worth it.

And that's just for the problems where it was possible to consider using Perl/Python at all. For some problems, the memory overhead of Perl/Python data structures ruled out use on all but the smallest problems. Again, not worth it.

It's nice that the speed-critical parts of your problem were reducible to a few lines of code, but that's not often the case. For most real problems, I/O is a huge concern, and can't be dismissed just by pre-processing the input.

emn13 · on April 23, 2013

Also, it's a weird wonderland in which you know exactly which parts need to be fast, reimplement those, and then never need to touch those bits again. In my experience, that's not the case - you're going to want to tune and alter your algorithm, and if it's API needs to go through several languages, it's going to be a pain. Every time you decide you need a slightly different input format or data store or set of hyperparameters, you'll likely be changing many more files than you would in a one-language solution, and in a way that's hard to unit-test and hard to statically type.

It's certainly doable, and definitely the right way to go if you want number crunching in a high-level language, but it's not a free lunch. It's certainly not a quick one-time conversion and then you can forget about it.

pekk · on April 22, 2013

Writing C at all is kind of a pain in the ass, now that it comes up...

reeses · on April 21, 2013

We don't call 'scripting languages' 'glue code' for nothing.

If more people thought of FFI/RPC/etc. as just another pipe-and-filter framework, they'd be less reluctant to write properly-defined, performance critical, pieces in a more performant environment.

Then again, your use of a profiler and FFI/pipe/whatever is a secret weapon that seems to be one of the things that separate 1x from 10x engineers. :-)

jonhohle · on April 21, 2013

I agree with it and have implemented it.

After profiling a backfill script which was looking like it was going to take about two weeks to run, I found an area where certain bit-twiddling was being done which was extremely slow in Ruby. I rewrote this section in C, and was able to make this section 1000x faster then the ruby equivalent (this was MRI 1.8.5 several years ago). The total runtime for the script was brought down to about 8 hours (from about 165 TPS to around 7,000 TPS).

Fortunately, I've found that Ruby makes it exceptionally easy to write C extensions, and I often think of reusing C libraries in cases where I'm worried about performance.

lbrandy · on April 21, 2013

I didn't mean to imply it wasn't ever possible. Just that the subset of things that it works great for is smaller than people wish, or believed. That "I'll write in Ruby and optimize in C when I need to" is a myth on the order of the "sufficiently smart compiler". There are things for which it just plain doesn't work. Things that are quite common.

Yes, you can make image transformations for your webapp fast by piping in a C-extension, because the C code can control the i/o and data organization as well as the algorithms. The integration surface is tiny. You can't, however, make the language's memory usage and garbage collection faster. At least not as easily. You won't be "dropping into C" to fix your garbage collection problems, at least not in the way people dream of when they think of "performance problems" at the outset.

TikiTDO · on April 22, 2013

If your memory usage and gc speed are of critical concern, then you should really know that ruby is just not going to provide the tools you need to handle that. That's like using a hammer when you need a screwdriver. Even the most modifiable hammer is still meant for pounding, not screwing.

Whenever I see "I'll write in Ruby and optimize in C when I need to" I assume the context of "my program is well suited to be written in Ruby." For instance, I certainly wouldn't write embedded code in ruby. However, I would venture to say that these situations are the exception rather than the rule.

These days most software has access to fairly reasonable computational resources. If you're writting microcode for some industrial control system, or calculating stock price variations with micro-second resolution then certainly stick to C or ASM or what have you. However, in an age when even a fridge will have a few hundred MB of memory, and when a lot of code is meant to be just "good enough" I think even ruby will suffice.

jeltz · on April 21, 2013

In a way you actually do fix the garbage collection problems by dropping down to C. One of the main performance hit in tight loops in Ruby is usually object allocation and GC. If you drop down to C you can handle the memory manually and avoid most of the allocations.

pifflesnort · on April 22, 2013

Now, imagine if you hadn't had to profile at all, or even drop to C, because you'd written algorithmically efficient code, and could rely on the runtime executing it well?

Think of all the saved time ...

codygman · on April 22, 2013

The context changes are what can take a while IMO opinion.

sarchertech · on April 21, 2013

A huge swath of the game industry is based on just this kind of architecture.

The engine is built in C++ and the logic is coded in some higher level scripting language. Seems to work fine for them.

jeltz · on April 21, 2013

I have tried it and it was both trivial to do and gave large performance gains. The ruby implementation does not add that many levels of indirection.

sammyo · on April 22, 2013

I'll interject one more 'done it' anecdote. The document processing system I've been working on for several years had several libraries in C. Very few issues compared to the general logical bugs. Integrating the Judy arrays library was a huge performance benefit in a number of areas.

rwallace · on April 22, 2013

Depends on how you go about it. I personally haven't encountered any cases where it would make sense to write a few functions in C and link them into a Python/Ruby program. But I have encountered a number of cases where what works well is to write the main program, the part that does the heavy crunching, in C++, and various ancillary scripts for preprocessing, postprocessing etc. in Python.

pak · on April 21, 2013

> it is very reasonable to code that part in some fast language while delegating the bigger (and often more complex) part in a higher level language such as Ruby.

Agreed. Contrary to what lbrandy said, I've tried it, and it works very well. Writing Ruby extensions in C, or wrapping C libraries for Ruby, is really easy. Wrapping a PNG library for Ruby allowed us to draw about a trillion pixels' worth of images based on the human genome across a modest cluster of ~100 cores in a few days [1,2,3]. For reference, a terapixel image is about ~4TB of uncompressed data and ~800GB compressed, with Microsoft putting up comparable numbers for a project they ran on a fancier HPC cluster in 2010 [4]. We were heavily compressing it into PNG tiles on the fly so ours came down to ~60GB.

Regarding pointers to pointers to ints--if you keep all the data within the heap of the Ruby extension, as we did with the wrapped PNG library, then it can be stored in whatever simple data structure you want.

[1] http://chromozoom.org/?db=hg19

[2] https://github.com/rothlab/chromozoom

[3] http://bioinformatics.oxfordjournals.org/content/29/3/384.fu...

[4] http://research.microsoft.com/en-us/projects/terapixel/

Scaevolus · on April 21, 2013

That's a neat project!

When generating huge numbers of png images, it's worth it to tweak the compression settings. Run png optimizers like optipng/pngcrush on your output, and use the settings it finds.

LodePNG is easy to use, but doesn't let you tweak as many options as libpng.

The tiles I've inspected are 15-30% larger than necessary-- besides using sub-optimal compression heuristics, they include unnecessary metadata (creation/modification time, aspect ratio).

pak · on April 23, 2013

Yeah, I did consider optipng/pngcrush. In the end it was a tradeoff between running time and space. I could afford the space but not the time. When I added in pngcrush, because it brute forces a whole bunch of compression routines on each tile, running times went up dramatically and I preferred having the job complete in days as opposed to weeks. It might be something to consider if we generate more tiles on a faster cluster.

frehpt · on April 21, 2013

Option A:

  -Write your application in Go

Option B:

  -Write application in Ruby
  -Profile Ruby application to identify code to be rewritten in C
  -Learn C if you don't already know it
  -Rewrite various parts of your application in C
  -Run Valgrind to search for memleaks
  -Ensure that your target systems all have required libs installed

Option A seems much simpler...

awj · on April 21, 2013

Sure it does, right up until you also need libraries that Ruby has but Go doesn't. Also you're trading away a few language features to get Go, so between the two you're tossing a lot of programmer time out the door to avoid writing in C. Maybe that's still a worthwhile trade, but it's not really a simple one.

papsosouid · on April 22, 2013

>Sure it does, right up until you also need libraries that Ruby has but Go doesn't.

Now replace go with ruby and ruby with perl. Go has libraries for 99% of what people are doing with ruby. If you are in that 1%, then you need to evaluate whether or not it is worth writing the library you need or if you should use another language.

awj · on April 22, 2013

A quick check led to me not finding a package manager for Go similar to RubyGems. Does it have one? Manually installing dependencies is a colossal waste of time.

papsosouid · on April 22, 2013

It is just go. Do a "go build foo" or "go install foo" and it will handle any dependencies for foo on its own. Do a "go get bar" and it will download and install bar and anything it depends on.

awj · on April 22, 2013

Where are these packages published? Can I install things from this page[1] in that way?

At least from 1000 feet away, it seems like Ruby has this problem solved better than Go does. Maybe it's just that the nature of the solution is more apparent.

[1] https://code.google.com/p/go-wiki/wiki/Projects

papsosouid · on April 22, 2013

>Where are these packages published?

Anywhere you want to publish them.

>Can I install things from this page[1] in that way?

Sure. Just do a "go install github.com/foo/bar". Bitbucket, github, google code, and launchpad are all in the defaults, but you can configure your import path to add whatever locations you like, and then "go list" will show packages available there, and the other go commands will be able to build/install/etc those. The same syntax is used for import statements, so your code can depend on any code any where:

     import "mycoolwebsite.com/~bob/myproject.git/foo"

>Maybe it's just that the nature of the solution is more apparent.

I think they make it pretty clear: http://golang.org/cmd/go/ gives you pretty much everything about packages and so does "go help". Go pretty much does everything out of the box with the included toolset, from installing dependencies to formatting code to refactoring.

threeseed · on April 21, 2013

Go does't have 1/100000 the amount of libraries and support that Ruby or C does.

Option A only seems simpler because you ignore the time/complexity in coding the task at hand.

jthol · on April 21, 2013

... and the Java programmer laughs in the corner (Let's not talk about generics though).

vidarh · on April 21, 2013

... of the asylum he's been committed to after having to deal with one too many AbstractFactoryFactory classes.

mitchi · on April 21, 2013

That's Design Patterns code. I'm sure that if I start programming seriously in Java, I can bring all the readability of C with me!

frehpt · on April 21, 2013

> Go does't have 1/100000 the amount of libraries and support that Ruby or C does.

That's what they used to say about Java...

threeseed · on April 21, 2013

Yes. Nearly 20 years ago.

But don't worry the JVM can save you. http://code.google.com/p/jgo/

kgabis · on April 21, 2013

Why not 1/10e100? Or even less? Go's library support is quite nice already and it's growing really fast.

threeseed · on April 21, 2013

Because I am being realistic. Ruby, C, Java etc have had decades to build a comprehensive set of developer libraries. Go hasn't.

It's a "cute" platform that I am sure will beat Ruby in a few benchmarks. But I don't see any reason why anyone would seriously consider it for a real world app.

JulianMorrison · on April 21, 2013

Ruby has a sprawling ecosystem of libraries largely drawn into being by Rails (itself a sprawl) and by design decisions in Ruby's core and stdlib that complicate things and need further code to work around. Threads and fibers and events, oh my. Each workaround has its own ecosystem. You could even consider things like rack and unicorn workarounds for the weak implementations in stdlib.

I don't think Ruby's libraries honestly provide a whole lot better coverage than Go. The stdlib is very well considered. It's fast, it scales, it's thread safe in the appropriate places. It's not a toy implementation. What it doesn't provide, it provides hooks for. Third party libs cover all the major bases. And unlike Ruby, writing interfacing code in Go is a snip. I don't see libraries as a downside.

(With one exception. Why oh why did they not include a bigdecimal? Argh, a ratio type is not an adequate replacement. Try rounding a ratio some time. Blarg.)

yasth · on April 21, 2013

Well for one Go beats Ruby in every benchmark that I am aware of, and generally by quite some distance.

Also realistically 60-70% of all libraries created for other languages are bitrotted to junk or pointlessness by this point.

If you can't see why an app that fits within the worked problemspace of Go (and even with the current libraries plenty of things are wholly doable in Go) and will require much less server resources might be a good fit, then you shouldn't be making those decisions, because that is the thought pattern of a zealot.

Look Ruby is well supported, established, and proven, but it is also slow. For a lot of things that is fine, but if you are dealing with something that requires a lot of serverside work on low margins it will kill you dead. Horses for courses.

vidarh · on April 22, 2013

The point is if you need the performance, C is the more natural choice for a lot of it at this point. I looked through one of the Go tutorials a while back, and it was WTF after WTF. If it works for you, fine, but Go has a long way to go to be interesting for a lot of us.

JulianMorrison · on April 22, 2013

C will give you 2 times the performance with 1/2 the memory usage of Go for 10 times the development effort, if you are skilled at optimizing C.

voidlogic · on April 22, 2013

Never mind its trivial to call C from Go.... http://blog.golang.org/2011/03/c-go-cgo.html

P.S. Lots of people including me write lots of real world applications in Go...

papsosouid · on April 22, 2013

Because in reality, people don't make decisions based on how many duplicate, unmaintained, broken libraries exist for language X. They make them based on "does language X have the libraries I actually need". Oh, json, DB_of_choice, and web framework exist? Gee, I guess 90%+ of people are covered right there. Virtually every language, even as seldom used as say, ocaml, have the libraries that 90% of people actually use. Libraries aren't an issue for the majority of tasks people are doing.

nfm · on April 21, 2013

That's a ridiculous set of alternatives.

espadrine · on April 22, 2013

The bottom line is: the time necessary to optimize Ruby to run below 8 minutes was bigger than he could afford, and apparently Go, once written (which apparently gave the OP some issues) needn't be optimized.

That said, Go wasn't necessarily the better choice, because of the subtlety that made the Go program overflow silently. Writing the solution in Scala or even in JS would have probably given him less of an issue.

reactor · on April 22, 2013

It's not.

danking00 · on April 21, 2013

Seconded.

I recently wrote a image searching program (find a set of images as sub-images in another set of images) and I wrote most of it in Racket (Scheme dialect). I wrote the inner loop to calculate the correlation matrix in C and made an FFI call.

I'm not familiar with Ruby's FFI, but I found making FFI calls dead simple in Racket.

pak · on April 21, 2013

FFI is also dead simple in Ruby.

https://github.com/ffi/ffi

It's pretty much `gem install ffi` and off you go. I've used a Ruby gem [1] that talks to a hashtable storage library via FFI and found it perfectly stable.

[1] https://github.com/jmettraux/rufus-tokyo

jarrett · on April 21, 2013

Indeed. And there is more than one way to do it. The most obvious thing is to write a Ruby C extension. But you could also write a totally separate worker process in the language of your choice and glue it to your main Ruby app with a message queue. In most real-world cases, either solution will get you the performance you need to avoid 10xing your servers.

up_and_up · on April 22, 2013

A really simple solution in that case is to run JRuby and then write computationally intensive code in a JVM compatible lang like Scala or Clojure. This route seems popular since you can easily give JRuby access to your Scala classes.

pkolaczk · on April 22, 2013

Right, but if already writing in Scala, why ever write some parts of code in Ruby? Scala gets you expressivity of Ruby with additional benefit of static typesafety.

up_and_up · on April 23, 2013

Well a pretty typical scenario is you are writing a ruby app and run into an area that needs additional horsepower (say a gaming API or something). Being on JRuby would give you more options at that point.

papsosouid · on April 22, 2013

>Why

Because it is a huge pain in the ass, and very rarely does any good. As soon as you are writing C, you are dealing with all the headaches that go with it for both development and deployment. You've lost a big part of the benefit of using scripting language X. And this part:

>In many cases, only a tiny fraction of your code needs to be fast

Isn't true. People repeat it a lot, but I've seen no evidence to support it. Most people are using scripting languages for web apps which really are just generally slow all over. They would need to write 80% of it in C to get good performance. Seems like using a language that is as productive as scripting languages, but is also fast, makes a lot more sense.