Listening to Steam time played being reported as lifespan eaten values, I'm wondering how much time Chris has actually spent in these games. We all know Steam's time played records are rather iffy (while idle time when you walked away from the game to make some tea or the like will boost it, Steam has also been rather good at failing to track additions via momentary connectivity loses to the Steam servers and other bugs) and how long ago did Steam start tracking the time played? Was Zuma Deluxe out on the platform quite a while (2006 if Steam's store page is accurate) before they started tracking that data, when did Chris start playing?
Is this GPGPU (compute) stuff (GPUs obviously accelerate rendering, they are the things which DX or OGL passes their data to to do all the polygon render work, but now they are also starting to get used for the data pass before the drawing triangles stage beyond the original idea of shaders as small code fragements to run during the render stage)? The way GPU acceleration typically works is that the inner loop is pushed to the GPU to operate on in a massively parallel way.
Your CPU has good performance for general code thanks to branch prediction, out of order flips, and several other tweaks to a classic processor (required due to the deep pipeline, knowing the result of an if statement may take quite some time so you have to get good at guessing which branch to start executing down and if you got it wrong you have to flush and go back so generating heat for no real work done), while the GPU is a far dumber beast but with so many processing units that it can get the job done of running the millions of code fragments to render each frame in reasonable time. For this reason the use of GPU compute has traditionally been limited to looking at a loop (which gives obvious parallelization if each execution isn't dependent on the previous iteration, say for munching a large array to repeatedly calculate the new position or something) and just making a kernel out of the inner loop to throw at the GPU, avoid branches and go for fragments as small as possible. It makes sense as that is the traditional work that a GPU has been tuned to perform well at.
Why I think you're talking about this specific element is that you bring up ATi (AMD). A year ago nVidia announced Kepler and the introduction of Dynamic Parallelism to CUDA 5.0 which allows the GPU to spawn off new instances without going back to the CPU to manage the deployment of work. Kernels can create child kernels without the CPU being involved and some of their numbers show that certain tasks were being constrained by the CPU needing to manage it all in previous hardware/software. As far as I'm aware AMD have not responded with a similar tech for reducing the CPU load on compute tasks for their architecture. This sounds like it is what you're describing (please correct me if I've gotten the wrong end of the stick, I didn't listen to the podcast you reference) and searching for nVidia Dynamic Parallelism should provide some decent background reading if you're looking for more detailed information.