I personally doubt this will happen within the lifetime of silicon chip technology. Maybe with nanotech, biological or quantum computing - but probably not even then.
This document was written early in 2002 when a 2GHz Pentium IV pretty much defined the state of the art in CPU's and the nVidia GeForce 4 Ti 4600 was the fastest graphics card in the consumer realm.
Whilst memory bandwidth inside the main PC is increasing, it's doing so very slowly - and all the tricks it uses to get that speedup are equally applicable to the graphics hardware. Advances such as Double Data Rate (DDR) RAM are just as applicable to graphics hardware as they are to the main computer - hence whilst they speed up the CPU, they speed up the graphics chips too.
The graphics card can use heavily pipelined operations to guarantee that the memory bandwidth is 100% utilised - it can use an arbitarily large amount of parallelism to match the throughput of the chip to that of the memory - and it can easily make use of multiple RAM chips to increase speed. Whilst 64 bit CPU's are gradually appearing, we see things like the PlayStation 2's graphics chip which has a 2,560 bit wide memory bus!
The main CPU doesn't make such good use of RAM bandwidth because it's memory access patterns are not regular or particularly predictable. To add to it's problems, the main CPU is reading instructions from the RAM as well as Data.
The main CPU has a ton of legacy crap to deal with because it's expected to run unmodified machine code programs that were written 20 years ago. A 'clean slate' CPU design is possible, but within a couple of generations, we'll be back to the status quo.
Every generation of graphics chip can have a totally redesigned internal architecture that exactly fits the profile of today's RAM and silicon speeds and which only has to be burdened with just those instructions needed for the particular algorithms it knows it'll spend it's life executing.
But everything that is speeding up the main CPU is also speeding up the graphics processor - faster silicon, faster busses and faster RAM all help the graphics just as much as they help the CPU.
However, increasing the number of transistors you can have on a chip doesn't help the CPU out very much. Their instruction sets are not getting more complex in proportion to the increase in silicon area - and their ability to make use of more complex instructions is already limited by the brain power of compiler writers. Most of the speedup in modern CPU's is coming from physically shorter distances for signals to travel and faster clocks - all of the extra gates typically end up increasing the size of the on-chip cache which has marginal benefits to graphics algorithms.
In contrast to that, a graphics chip designer can just double the number of pixel processors or something and get an almost linear increase in performance with chip area with relatively little design effort and no software changes.
The graphics cards also gained features. Over that same period, they added - windowing, hardware T&L, antialiasing, multitexture, programmability and much else besides.
Meanwhile the CPU's have added just a modest amount of MMX/3Dnow type functionality...almost none of which is actually *used* because our compilers don't know how to generate those new instructions in compiling generalised C/C++ code. There have been internal feature additions - things like branch prediction and speculative execution - however, those are there to compensate for lack of RAM speed - they don't offer any actual features that the end user can see.