Isn't this already what a SIMD processor does? Or is it supposed to be like a SI...

reitzensteinm · on July 2, 2012

SIMD instructions are used to explicitly execute vector operations in parallel. The patent is about rearranging a stream of instructions across a loop at runtime to turn code that was not written with SIMD in mind into parallel code.

The across the loop part is the vaguely new part of it. To simplify it greatly, this can already be made parallel at runtime:

  float a = 5.0;
  float b = 10.0;
  
  //The next two lines are executed in one clock tick
  a*=2;
  b*=2;

This patent deals with a method to turn this into parallel code:

  float[] vals = { 1.0, 2.0, 3.0 }
  
  for ( int i = 0; i < vals.length; i++ ) {
    
    vals[ i ] *= 2;

  }

Obviously it has to detect any data dependencies at runtime, which isn't trivial. This can't be rewritten, for instance, because i depends on i - 1:

  float[] vals = { 1.0, 2.0, 3.0 }
  
  for ( int i = 1; i < vals.length; i++ ) {
    
    vals[ i ] *= vals[ i - 1 ];
  
  }

Now, this is cool. I'd love to be working on stuff like this. But it's an extremely obvious step, once you've cleared the low hanging fruit. It's just that the extra complexity hasn't been worth it until now, because easier and simpler things to do have given enough speed up. That someone would do this was inevitable, which is why I say the patent is ridiculous.

Imagine history turning out completely differently. IBM is the largest CPU design firm in the world, after Shockley Semiconductor and HP who fab chips for the latest Commodore machine. Apple and Dell never existed. Every single person working on CPU design today for whatever reason ended up in another field.

We would still be seeing this technique first being used at the 1 billion transistor mark for CPUs. It's just a consequence of the nature of computation.

caf · on July 2, 2012

In practice the loop would tend to be unrolled at compile-time anyway, so the CPU would see instructions that look like the first case.

NormM · on July 2, 2012

The biggest problem in our patent system is that obvious ideas get patents. That said, the current criterion that patent examiners use to decide if an idea is obvious is based on prior art. If an idea is clearly useful but no one is using it (or has publicly disclosed it), then it's considered non-obvious.

viraptor · on July 2, 2012

Isn't that already done with auto-vectorisation? http://gcc.gnu.org/projects/tree-ssa/vectorization.html It should take chunks of "vals" and automatically process X elements (depending on how many fit in the registers) at the same time via SIMD. It was available since ~2008 too.

jonhendry · on July 2, 2012

Apparently not.

At the gcc page you linked:

Example output using -ftree-vectorizer-verbose=2:

vect-1.c:82: note: not vectorized, possible dependence between data-refs a[i_124] and a[i_83]

Apple's patented method would appear to allow vectorization in this case, whereas gcc fails.

DeepDuh · on July 2, 2012

That's also what I understood. I think it is meant to be a software patent though. It reads very similar or equal to what OpenACC, CUDA and OpenCL do.