Friday, October 20, 2006

Moore's law for development

It's been a long time coming, but developers will no longer get to continue with Moore's law 'for free'.

For anyone that doesn't know about Moore's law (named after Gordon Moore, one of Intel's founders), it states that computers will double in performance* per dollar spent roughly every 18 months. This actually equates to a hundred fold improvement every ten years, or around a million times faster every thirty years.

There's an interesting image on Wikipedia which shows that this 'law' has held since 1900 when electromechanical devices somehow managed to achieve an amazing 1 calculation per second for every 10 million dollars spent. Nowdays we have Pentium 4 chips that carry out over 25 Billion operations every second for a few hundred dollars.

Developers have had it pretty easy during the last 100 years or so - chips get faster, and so does their software (although they seem to keep finding ways of slowing it down with new 'features'!). But not any more.

Intel and AMD are struggling to make a single core chip run at higher speeds. There seems to be a limit to how fast you can run a chip, and this limit is caused by the heat generated. So they are looking for other ways to deliver performance.

The biggest trend is now in multi-core chips. Most chips from Intel and AMD now run two core chips - the equivilent of having two processors instead of just one. This trend looks set to continue, with Intel having just released a four core chip, and the Cell Chip (which will be the PS3's CPU) having nine cores in total. Intel are talking about chips with 80 cores or more in future.

The problem at the moment is most developers write programs that only run on a single core, so when you have a program churning away on your 100 core desktop in several years time, it will only have access to 1% of the computers computing resource.

This has got to change, and developers need to start thinking about multi-threaded applications that will take advantage of whatever the computer has to offer. This isn't as easy as flicking a switch - sometimes a multi-threaded application needs to be built from the ground up, as the logic can be quite different.

For (a trivial) example, a program to add a thousand numbers together looks something like this :

for loop=1 to 1000
{
x=x+y
}

with a multi-threaded application, it is more like :

work out how many threads you need (perhaps query the abilities of the computer)
split the job into small chunks (say 10 sums at once)
while there is work to do
{
start lots of threads to add 10 numbers at a time. Each thread would have logic similar to the job above.
}
The first time you go round this loop you will end up with 100 'results'. You then need to go back round again to add these together to get 10 results, and one more time to get the final answer.


As you can see, the second example is an order of magnitude more complex, and that's not even taking into account areas such as thread locking and concurrant access to resources, which make it far more complex still. I'd urge anyone writing real programs for a living to start at least playing with multi-threaded apps, before you get left behind.

* Moore's law actually refers to transistor count, but most people use it as a mark of performance.

1 comment:

James Frost said...

at the moment you are right, but only because programmers are lazy. If one core is running the audio, and another doing plug ins, then what happens when there are no plug-ins to process, or what if the plug-ins become the bottleneck? These tasks should be allocated dynamically based on the workload in each area.

Of course that assumes that each of the tasks can be multi-tasked. Some inter-dependant tasks may be more difficult.

When you consider some of the possibilities raised by using the GPU instead of the CPU then we are already in the territory of having a huge number of cores

http://www.nforcershq.com/article1853.html has an example for audio effects processing, where they are getting 8X the performance of a single core desktop chip. That is only achieved by effectivly working on many tasks in parallel. ATI's latest cards have 48 shader units which are in effect specialised cores.

Anandtech has an article where they talk about the speed improvements for the Folding@home application as another example here : http://www.anandtech.com/video/showdoc.aspx?i=2849 . Claimed performance improvement is 20-40 times a desktop chip.

I think with clever algorthms, almost any application can split into multi-core, and it is in this area that developers need to start working.