Tuesday, October 17, 2006

Joy takes a jab at computer industry


“We have a strange situation where we are delivering a new kind of computing. It’s based on $50 million worth of systems that fill whole rooms, but there are no vendors that offer these systems. Companies like Google have to build it by themselves.”

That was one of the observations of Bill Joy, former chief scientist of Sun Microsystems, speaking at the Emerging Ventures conference in San Jose Tuesday.

Joy now focuses on semiconductors and green tech as a venture capitalist at Kleiner Perkins Caufield and Byers. At the conference, he was asked to share his reflections on computing today and took issue with the high-performance computing sector.

“We are scotch taping together suboptimal machines, and its not acceptable,” Joy said. “Unix is not designed to be an OS when there are 100,000 copies. CPUs and north bridge chips are not designed for that either.

“It suggests the industry needs a fix, but I don’t’ know what it is. There is an opportunity for some great company there,” Joy said.

The fact that a handful of companies such as Google have built their own large parallel systems is a huge advantage, but “it’s something most people can’t do,” he added.


The cerebral Joy probably wasn't aware his former employer announced Project Blackbox the day before. The data-center-in-a-box effort (pictured) is aimed to help hyper-growth Web companies gear up fast. Such drastic steps appear more aimed to garner press than truly solve the problem Joy highlighted.

Separately, I bumped into Sun architect Marc Tremblay during a morning break. He said Sun’s Rock processor is on target to tapeout by the end of the year. Sun has some new multi-core tools it will roll out in tandem with Rock, he added. Stay tuned! –rbm

2 comments:

Anonymous said...

Blackbox looks more like marketing than substance. 250 servers + storage (capacity isn't that interesting of a metric) is chump change compared to the 200K servers that Google builds. Even for many SMB / enterprises, the computational power shown might be viewed as mouse nuts. Project out a few years and the computatioal equivalent will be packaged into a single blade enclosure. It is critical to keep in mind that many customers buy racks of blades or servers at a time on planned cycles so the turnkey approach has a number of limitations.

Joy's comments are interesting but it appears he has not considered other growth areas within the industry focused on reducing the number of underutilized resources which in turn translates into lower power consumption, lower capital expenses, lower recurring people / software / management costs. Joy is dead on that the industry cannot continue as is since the building blocks cannot scale.

What is happening now is all of the major solution providers combined with the wannabes who think a platform = a solution are focused on providing new generation of services largely built around 20-30 year old concepts found in mainframes:

- Add virtualization to co-locate applications and increase utilization

- Add application-specific accelerator technology to off-load general purpose processors to lower cost, higher performing FPGA or ASIC that deliver 40-50x performance gains.

- Build solutions using a range of platform offerings rather than a one-size-fits-all to enable applications to be placed optimally and reduce unnecessary software licensing and processor / power waste.

- Add management software to provide resource capacity planning and controls to optimize costs and reduce IT overheads.

Nothing surprising here. What will be surprising is when this comes together over the next few years, the Sun Blackbox will likely be viewed as too large for many customers environments - at least the ones they are targeting in their marketing pitch.

People should also keep in mind that what Google does today is because their needs could not be cost effectively met with the technology they had at the time they started their execution strategy. Their future could like different as solution providers bring new capabilities to market to deliver much of what Google does custom today.

Mark said...

Bill Joy says: "We are scotch taping together suboptimal machines, and its not acceptable. Unix is not designed to be an OS when there are 100,000 copies. CPUs and north bridge chips are not designed for that either."

Isn't "scotch taping together suboptimal machines" exactly what Google does when they " ... build [$50 million worth of systems that fill whole rooms] by themselves."?

I remember seeing a show on Google several years ago, and they used caseless (to ease maintenance), desktop PCs, and the hard drives and power supplies were strapped to the motherboards with velcro. The show championed people wandering around swapping failed disks and power supplies with ease.

I remember being stunned at the low space density of the solution, and wondering how poor cooling would work with open cases. And I realized the operating system for this computer was not some Google derivative of FreeBSD, but dozens of humans wandering around swapping components. And while the open cases and velcro straps clearly reduced mean time to repair a failed disk, it seems today a diskless approach [iSCSI or NAS boot] makes far more sense for thousand node count clusters.

Here is how I would create a compute utility: I would look at dense, diskless, PSU-less, fanless nodes, within a powered, cooled rack (I'm sure APC has a rack like this). That would get rid of all moving parts in the compute node. Yes, I know I am describing a "blade" computer, but I am thinking of something using a standard motherboard, in a standard 1RU case. Boot from iSCSI or NAS, all file access via NAS, and all I/O over on-board Gigabit Ethernets. Put an Ethernet switch at the top of rack. The only connections out of the rack are power and Ethernet. Think of it as Sun's Blackbox at the rack level. But by using standardized components, processors, DIMMs, and motherboards could be revved as they changed, unlike blade vendor lock-ins. As the components rev, each new rack is a little faster, and little better than the previous one.

 
interconnects