By John Orr
There's no doubt that the Cell processor, developed by Sony, IBM and Toshiba, is the platinum standard in video game computing.
The Cell, which is the brain of the PlayStation 3, has famously challenged game designers with programming its unique configuration of eight Synergistic Processing Units (SPUs) and one Power Processing Unit (PPU) -- but those who have met that challenge have produced games with unrivaled graphics and astounding sound.
Sony continues to build on the PS3 platform, which features the new standard for high-definition video, the Blue-ray Disc, with endless innovations in online community building, downloadable content including games and high-definition movies, and, of course, every possible variation in video gaming.
As Sony Chairman and CEO Howard Stringer recently told investors, "I am confident that the PS3 is the networked home entertainment server of the future - but it is available today."
In the meantime, the Cell processor has taken on an importance of its own, partly due to an amazing decision Sony reached during development of the PS3, which was to make it an open configuration, allowing other operating systems to be installed and used on it.
Clever people in science and industry -- who had followed the rumors of the creation of the Cell processor and had an idea what it would be able to do -- rejoiced, because they saw a way, via the PS3 and its Cell processor, to achieve supercomputing at bargain prices.
Especially Dr. Gaurav Khanna, an assistant professor at the University of Massachusetts, Dartmouth, who specializes in computational astrophysics. Khanna had already been thinking about using graphics processors in supercomputing, and was thrilled to hear what was coming in the Cell processor.
"When I first heard about the Cell, as a sort of hybrid processor," recalls Khanna in a recent interview with Sony, "using elements from GPUs (graphics processing units) and CPUs (central processing units), that got me excited, because that's what I really where I thought was the right way to do things. ...
"Of course I think the Cell was almost exclusively talked about ... as the PlayStation 3, and I don't know anything about gaming. I had no interest in gaming at that time, and only mildly do today, in spite of having 16 PS3s."
Those 16 PS3s, connected via a Net Gear switch and working with a Mac Pro as server, and running a Linux operating system, comprise a supercomputer that Khanna has used to study black-hole theory.
Khanna started developing code for use on PS3 in the summer of 2007, months before he even had the shiny black boxes in hand. By September 2007 he had eight PS3s and started running code, then added another eight PS3s in January 2008.
Expanding on the Cell
Sony and Toshiba, co-developers, with Sony, of the Cell Broadband Engine®, have continued to develop the processor for other uses.
-- In late June of 2008 it was announced that 12,240 modified Cell processors and 6,562 dual-core AMD Opteron® chips were used to create the Roadrunner supercomputer, which will be used at Los Alamos National Laboratory to study nuclear weapons. According to IBM, " Roadrunner, named after the New Mexico state bird, cost about $100 million, and was a three-phase project to deliver the world’s first 'hybrid' supercomputer – one powerful enough to operate at one petaflop (one thousand trillion calculations per second). That’s twice as fast as the current No.1 rated IBM Blue Gene system at Lawrence Livermore National Lab – itself nearly three times faster than the leading contenders on the current TOP 500 list of worldwide supercomputers."
Some elements of the Roadrunner can be traced back to popular video games, David Turek, vice president of IBM's supercomputing programs, told The Associated Press that in some ways, the Roadrunner is "a very souped-up Sony PlayStation 3. We took the basic chip design (of a PlayStation) and advanced its capability."
-- In early June of 2008, Toshiba announced it would release, in July, the first laptops to make use of the SpursEngine, a multimedia co-processor derived from the Cell.
Toshiba's Qosmio G50 and F40 machines -- to be sold initially only in Japan, feature a chip that contains four of the Synergistic Processing Elements from the Cell Broadband Engine processor. The operating system will run on an Intel Core 2 Duo chip and the SpursEngine will be called on to handle processor-intensive tasks, such as processing of high-definition video.
-- Bloomberg reported that Repsol YPF SA, Spain's largest oil company, is using supercomputers outfitted with IBM's PowerXCell 8i chips to analyze undersea rock formations in the search for untapped reserves. The chips are an updated version of the PlayStation 3 Cell processor and find oil in deep water as much as six times faster than previous computation methods.
-- Terra Soft Solutions, which specializes in "High Performance Linux Computing," puts together PS3 clusters for customers. One package includes eight PS3s packaged with a dual-core head node with two gigabytes RAM, a Net Gear switch, cables, mouse, keyboard and display, with Yellow Dog Linux and cluster-control software installed (and a few other odds and ends) for $17,650. The package with 32 PS3s goes for $42,250.
In addition, there are many other projects making special use of Sony's powerful PlayStation 3.
-- Dr. Frank Mueller may be able to take credit for putting together the very first PS3 cluster to be used for academic computing. His eight-console cluster was "born" on January 3, 2007, and by March he was quoted at Physorg.com to say "here at NC State we will use it for educational purposes and for research. We are working with scientists to determine the needs and how our cluster can be used to their benefit, and our computer science faculty is already using the cluster to teach classes in operating systems, with parallel systems, compilers and gaming likely to follow.”
In a recent email, Mueller described how his PS3 cluster is being used:
- All levels of programming for parallelism (within a processor [@ instruction
level and via vectorization], using threads, multiple cores [SPEs] and
multiple nodes [PS3s])
- DMA programming
- Operating systems and security topics
- Concepts of accellerating hardware
- Assessing different accellerators' capabilities (PS3, Nvidia GPGPU, FPGAs
and ClearSpeed cards)
- Devising new programming models for massive data paralllelism
- Providing fault tolerance for high-performance and stream computing
-- One of the most amazing and successful scientific enterprises to make use of Sony PlayStation 3s is Stanford University's Folding@Home distributed computing project.
Folding@Home uses more than 300,000 active CPUs scattered around the world to crunch numbers in studying protein folding.
"Proteins," says the Stanford web site, "are biology's workhorses -- its 'nanomachines.' Before proteins can carry out these important functions, they assemble themselves, or 'fold.' The process of protein folding, while critical and fundamental to virtually all of biology, in many ways remains a mystery.
"Moreover, when proteins do not fold correctly ... there can be serious consequences, including many well known diseases, such as Alzheimer's, Mad Cow (BSE), CJD, ALS, Huntington's, Parkinson's disease, and many Cancers and cancer-related syndromes."
Studying protein folding requires significant computing power, and Stanford has linked together hundreds of thousands of computers to share the work -- doing major science and winning a lot of awards along the way.
Among the thousands of computers running Windows, Mac OS, and Linux, there are also 46,751 PS3s.
What is impressive is that while there are 210,173 Windows machines involved, as of July 2, 2008, and only the 46,751 PS3s, it is the PS3s that are accomplishing more computing -- 1,318 teraflops from PS3s, compared to just 200 from Windows machines.
Folding@Home tracks server statistics on the page Client statistics by OS.
How fast is his PS3 cluster?
"That varies, depending on the length and accuracy of the solution required," Khanna explains. "But, let's say that a sample run took one hour on the PS3 grid. Then -- a supercomputer has a lot more processors/nodes. Therefore, to compare -- if I wanted the same run to finish in an hour on a supercomputer (IBM Blue Gene), I would need to use roughly 400 nodes on the supercomputer!
"In other words, my 16 PS3 cluster is roughly equivalent to a 400-node supercomputer.
"On a high-end quad-core Intel Xeon processor-based workstation, the same run would take just over a day."
Khanna has already submitted two papers for publication using data he ran through his PS3 cluster, is spending the summer helping two computational physicists move their codes over to Cell processing, and has more projects waiting for the fall.
And, Khanna has also helped some folks in industry try their codes on his PS3 cluster -- people who want to maybe build their own very large PS3 clusters, because it turns out that PS3 clusters do make for excellent computation at a bargain price.
Khanna used to have to dig up as much as $5,000 in grant money to buy enough time on a supercomputer to run one of his black-hole codes. Eight PS3s can be had for $3,200. The Net Gear switch was about $100, and the Mac Pro was already on hand.
"Sony helped me set up the first eight," Khanna says. "In fact, Sony provided four of the first eight that I set up, but once the first eight was a success and I got a lot of publicity on that ... my own university, my own dean, Dr. Robert E. Peck, actually helped me to buy some of the rest, and there were a few other donations that came from other places."
So, now he has his own supercomputer to play with anytime he needs it, instead of having to scramble for $5,000 for each research project.
Khanna had to do a lot of talking to get his first PS3s.
"Basically, nobody was listening to me," Khanna relates. "There was no chance I could go to a federal agency like NSF (National Science Foundation) and ask them to give me money for gaming hardware, right? ...
"Certainly I was prepared, basically, to be turned down everywhere, and the only place which -- I should say it took a lot of coaxing, too ... I had to coax Sony quite a bit. But eventually, after four months of coaxing, it did work out, and I'm very glad about that."
The projects Khanna has completed so far -- which have been submitted to Physical Review (American Physical Society) and Classical & Quantum Gravity (Institute of Physics in the United Kingdom) -- are "done purely on the basis of mathematical modeling, using Einstein's Relativity Theory. ...
"One of them," Khanna explains, "has to do with trying to figure out what happens if you have, let's say, a small star, or it could even be a small black hole, and it falls into a super massive black hole -- a very, very large black hole ...
"Every galaxy has this huge black hole at the center ... It has the mass of a million solar masses or so, and they routinely eat smaller stars and debris and stuff like that, and as a result they burp out a lot of radiation.
"Modeling that is pretty complicated, from a mathematical point of view -- from a computing point of view. And that, essentially, is the focus of my PS3-based code. I was able to use the Cell processor very effectively to be able to figure out what would result from such an event."
Khanna's work is "all in context of a NASA mission, a NASA and European Space Agency mission, which is going to up in about ten years, it's called the LISA mission. It stands for Laser Interferometer Space Antenna.
"This particular space antenna is going to be going up there, in orbit, in ten years from now or maybe less, and it's actually going to be able to make very precise measurements of precisely these types of phenomena.
"So the research that's being done on theory right now, that myself and other people are doing, is really in preparation for what the LISA mission will see. So the data is not present right now, but we hope that we would have that in about a decade or less, and then we would be able to make the kind of comparisons and so forth" between theory and actual observation.
Khanna's other completed project is also related to black-hole astrophysics. He explains: "If you disturb a black hole in a certain way -- technically we say, if you perturb a black hole in a certain way -- then it responds in a certain way, in the sense that it gives out a certain frequency of radiation, and if you wait long enough, it sort of stops oscillating, it stops vibrating, it just basically resorts to what is called tail behaviors, so you basically just have the black hole giving out a constant signal, you know, like a "gooooo." And it basically settles down to nothing.
"And what the nature of that tail behavior is, is sort of unknown right now, it's not very clear, so I am kind of studying that as well, again using a similar code, very, very similar to the other one."
Khanna's codes are not small potatoes.
"Envision you have this little star that is sort of spiraling in slowly," says Khanna, "getting closer and closer to the black hole, ultimately it's going to just get gulped in by the black hole. So the star that's actually going around and around, spiraling in -- referred to as 'in spiral' -- this star is emitting radiation. So as it is slowly spiraling in, because of the fact that it is spiraling in, it's emitting energy ...
"And to calculate how much radiation, and what kind of property the radiation would have ... what kind of frequency, what kind of strength, how bright is this radiation, how quickly does it oscillate, those type of questions ... there's ... a formula that comes from Einstein's Relativity Theory which tells you how to do that.
"But that formula is a very complex formula. I show it in my talks ... it takes about 20 slides in a Power Point presentation ... that's how long the formula is, just to be able to calculate how much light comes out, how much radiation comes off the star as it's spiraling in.
"And, so, that's precisely where the Cell processor comes in, because ... it can calculate this particular long formula, very, very fast, because of the fast SPUs that are available on it.
"So, the way my code works is that this part of the calculation, the one which calculates the formula and figures out what kind of radiation is going to come off of the star, is done purely by the SPUs -- that's the hard part, that's the heavy part of the code, and the remaining stuff, like, for example, figuring out where the star's going to go next, figuring out what the black hole is doing, what data should we be saving, input-output, and stuff of that sort, that's all done on the PPU, because that's not that heavy."
Khanna's guess is that none of Sony's game engineers would be surprised by the complexity of his codes.
"I've seen some of these games," Khanna says, "and they look absolutely stellar. So I suspect that the complexity level is probably similar. ...
"I just suspect it's probably a different kind of calculation they're after ... like, for example, in my case, there's a lot of calculations which involve trig functions, like cosines and sine and exponentials and all those basic trigonometry, all kinds of functions like that, that show up a lot in scientific calculations.
"I suspect that maybe in gaming that's not the type of calculations one does. But I don't really know. But I suspect that the size of the formula is probably not a surprise to some of the engineers there, but the details might be quite different."
Professor Khanna is quite grateful to Sony.
"I have to commend people at Sony R&D for designing this absolutely awesome piece of hardware and, you know, opening it up!" says Khanna. "Whoever made that decision, my hat's off to that individual, who had really, really been thinking about the right things. So I just hope they keep it that way.
"I hope other companies follow similar logic. ...
"The thing about science these days, science is suffering tremendously, with all kinds of funding shortages and things like that. The NSF budget has been capped, literally, for almost five-eight years now, so, you know, a lot of us are looking for avenues to save money, to be more effective with the limited funding we have and things like that, and these types of things can really, really help. ...
"I wish more companies followed a similar model and tried to benefit, tried to help science. Science really does need help right now."
Khanna had some help from a couple of individuals at Sony Foster City, early on in his work with the PS3s, Klaus Hofrichter and Noam Rimon, but is out of touch with them now, and would like to find other contacts at Sony for help with at least two issues.
He really wishes, for instance, that Sony would allow for more RAM. "As far as I can understand," he says, "the RAM is soldered on to the motherboard, you can't upgrade it. ... that is really small, for a lot of applications. ... In my case I was able to work around it because I have 16 of them, and I distribute my calculation -- not only do I distribute the calculation, I also distribute the memory. So, in effect I can grab, you know, 16 times 256 megabytes, whatever that is, and get around it that way. But for some work, for a large part of scientific calculation, that's not necessarily possible -- one needs to have a lot of RAM on the same workstation."
Another issue -- one perhaps more easily solvable in the short term -- has to do with how that RAM is distributed.
"I think I know the answer, but I just want to confirm this ... the RAM is actually 512 megabytes, but it is split into these two pieces, some for the graphics processor, some for the Cell. So, is it possible for a programmer to use all of it for the Cell?
"That may just require some software trick, for example. And I would like to know if that is possible, because it would give two times more, which is -- that would be a substantial improvement."
And he would be happy to see more tools to help people program for the Cell.
"Programming the Cell is no easy task," Khanna says. "I mean, it's not like multicore processor, in which basically the same processor is copied several times over, so it's much easier to program. This one is a different architecture, the PPU is different, the SPUs are different, so you have to program them differently and all that kind of stuff. So there have to be more tools for people to make it easier for them to program, especially in science.
"You know, we're scientists, we're not really highly paid programmers, and we're not really that skilled at programming anyway. We do science -- and so the easier they could make it -- by creating tools to help us utilize the resources the Cell has -- the easier it would be for us to move our codes over."
Khanna has gotten some significant help from Peter Hofstee at IBM. Hofstee was part of the Sony-Toshiba-IBM team that developed the Cell processor, and helped Khanna get some IBM tools -- compilers, in particular, that are Cell-specific.
"IBM has actually developed a lot of tools to precisely do what I was trying to say, which is to make the task of programming easier." says Khanna. "It's not quite there yet, there's still a lot of things that are missing, but they have several projects going on -- maybe the most famous one is called the Octopiler, in which, basically, a lot of stuff is done automatically by the compiler, and you don't have to program the SPUs by yourself, things like that. It's a very ambitious project."
Hofstee enlisted Khanna to run some code on a beta version of Roadrunner -- the brand-new world's fastest supercomputer, which is built around 12,240 chips derived from the Cell processor, as well as 6,562 dual-core AMD Opteron chips.
According to IBM, Roadrunner operates at speeds exceeding one petaflop -- one thousand trillion calculations per second -- or one million billion calculations per second; or one quadrillion calculations per second.
"If it were possible for cars to improve their gas mileage over the past decade at the same rate that supercomputers have improved their cost and efficiency," says IBM, "we'd be getting 200,000 miles to the gallon today."
Khanna: "The main difference with the enhanced Cell which the Roadrunner uses, and Cell which the PS3 uses, is the fact that the enhanced Cell has enhanced, double-precision floating-point calculation which is typically used in science -- science uses double precision, gaming uses single precision.
"Basically the difference is in the number of digits you want to keep track of. Usually in gaming, since you don't have to have numbers very, very accurately computed, they only are accurate up to seven significant figures, or seven digits or so. But in science .... normally what you want is double precision, which means accuracy usually to about 15 significant figures.
"So the difference in the Cell which the PS3 uses and the Cell which the Roadrunner uses is the fact that the PS3 doesn't have good performance for double precision -- it has very good single performance -- but the Roadrunner Cell has very good double and single, both."
Hofstee made contact with Khanna after he heard about Khanna's PS3 cluster.
"He wanted me to see how my codes would do on the new Cell," says Khanna, "on the Cell which uses double precision, because most of my calculations are double precision, although I've messed with singles once in a while as well -- and I got a factor of three or four more performance, simply by just re-running the code which I had been running on my PS3s. It was just for free, like a factor of three or four more, it was just incredible!"