Thursday, June 21, 2007

The HP way to 10GE

I've been getting bombarded lately with announcements from one 10 Gbit Ethernet chip or board supplier after another, each claiming some unique edge in this emerging market. To get some perspective on this issue from a major server maker, I talked with John Gromala, a product marketing manager for Hewlett-Packard's x86 servers.

The HP Proliant group is exclusively using startup Netxen's 10GE chips now. I asked John why Netxen, what are HP's criteria and what's the outlook given a dozen large and small companies have production 10GE chips rolling out shortly.

"You have to find a balance between the cost, latency and overall throughput. We don’t have a specific spec or number each of these vendors has to meet," Gromala said.

Specifically, HP likes the fact Netxen supports TCP offload (aka TOE), and does it in a stable way on both Windows and Linux. The two companies have had a long development relationship, he added.

Gromala would not say anything about HP's future plans except that it is evaluating 10G chips from multiple companies. He did say HP has decided not to use the so-called I/O Acceleration Technology that Intel has been putting on some of its Xeon chips as a substitute for TOE.

"IOAT is an example of an approach that has been used before and really just moves the processing from one part of the CPU complex to another. It doesn’t completely move the processing off the CPU and on to the [networking] card," Gromala said. "We don't want [the job of processing] the TCP stack using up all the CPU's horsepower," he added.

HP has been a big backer of iWarp that defines for Ethernet the remote direct memory access (RDMA) capability already in Infiniband. HP is working with Netxen to get RDMA into products, but so far the work is still somewhere in the development pipeline.

"What Netxen offers today is good enough for now but eventually we will bring a full RDMA function [to the products]," he said.

The next big goal for 10GE is to get to affordable options for copper media, Gromala said. HP is tracking the progress of 10GBaseT as well as backplane Ethernet and other options, he added.

My take: Intel needs to revise its 10GE story if giants like HP aren't buying into it. TOE and RDMA will be requirements with backing from HP.

Not incidentally, Sun's Neptune is looking better all the time as a way of carving out a unique approach that is apparently giving it an edge over Intel and its top OEMs like HP. Hey, maybe a big system house still can differentiate its products with some smart silicon development. Imagine that!

6 comments:

Anonymous said...

good post...Gromala had some honest takes.javascript:void(0)
Publish Your Comment

Anonymous said...

This is very interesting. It looks like HP likes TOE, but IBM is not having any of that and is going with the Intel approach. I think there is a bigger story here if you dig deeper into what IBM is doing.

History said...

HP for a long time has been in the RDMA offload camp. It's their I/O dogma. Using Nexen network adapters across the board does allow a common approach to both Intel and AMD offerings.

Dedicated offload is unlikely to prove efficient in terms of silicon area and power cost. Network traffic generally is bursty so dedicated I/O resources constitute unharnessed silicon during I/O idle periods. In contrast, the processor silicon can be directed toward whatever task is presented to the system, whether that is handling I/O bursts or handling compute-heavy tasks. The trend toward increased CPU core count and correspondingly increased processing power suggests the cost of I/O processing to the processor socket should decrease. The diminishing cost of I/O processing by the host processor should also reduce the value of I/O processing offload. Intel's I/OAT and Sun/Marvell's Neptune fit into that projection. Dedicated offload seems best justified when I/O has a high duty cycle activity and also presents a high processing burden to the system.

The 10GbE I/O attach rate is still very low. 10GbE I/O will not reach a high attach rate until it approaches commodity pricing, like all previous generations of Ethernet. 10GbE vendors should concentrate on reducing cost rather than trying to keep 10GbE controllers occupying a high end / high cost market. Historically, the Ethernet market has demanded low cost I/O. Even if a bifurcated market develops, commodity I/O prices will set a benchmark for the price of high end I/O and therefore drag down the pricing of high end I/O. Offload does not seem to be the path to volumes and ROI.

Anonymous said...

Oh good, another religious debate. Both on-load and off-load approaches have their advantages and disadvantages. Each camp claims they are doing it for the benefit of customers, the ability to differentiate, the ability to reduce total power consumption, the ability to drive higher performance, the ability to create supportable products, etc. etc.

Those that preach on-load take one slice on the above while those focused on off-load take another slice. The on-load camp as noted by the prior comment tends to succeed when the I/O rates are low and there is spare CPU. However, there isn't spare memory bandwidth, there isn't spare power, and in reality, as virtualization takes off, there isn't spare CPU.

The off-load camp takes the perspective that they can deliver optimal performance for the most extreme workloads, can do so with the lowest power (processors are pigs compared to I/O devices - 10's Watts per core vs. 7-20W per I/O device). The problem for the off-load camp is the support angle. If there is a bug and it is implemented in a state machine, then support can be a challenge. Further integration with volume OS can be complicated. Windows took the approach to design it into their new OS from the start so it has a chance to work well. Linux remains opposed to TOE seemingly more based on emotion than on substance - the on-load camp at work perhaps or reaping the natural aversion of software guys to anything that can be off-loaded.

The irony is protocol off-load is over 30 years old and varies from stateless to stateful implementations. There is a bit of hypocrisy from those who hate it in that they use it all of the time - storage controllers, InfiniBand, Fibre Channel, etc. all are protocol off-load solutions. Ethernet is relatively stateless off-load. So where again is the issue? They claim TCP is hard but it isn't on the main data path and that is where some have gotten a clue.

In any case, it is a good debate. Just wish people moved away from the marketing hype and focused on the technical issues and business challenges instead of hypocrits.

Anonymous said...

10 GbE volume has not risen due to a combination of cost as well as lack of platforms, OS, virtualization, etc. providers who could actually drive 10 GbE without consuming all of the processing power and memory bandwidth. The fact is customers are not going to buy 10 GbE unless they can show it saves them money. Failure to create cost effective hardware and software is what constrains adoption.

Now a couple of things have or will change in the coming 12-18 months:

- Multiple 10 GbE providers have products including Broadcom which finally pulled together an interesting product.

- Open source efforts like OFA provide RDAM over a TOE is available for Linux and Windows today.

- Low latency switch chip providers are coming out. Fulcrumm, Fujitsu, etc. are starting to ship interesting products.

- I/O sharing and virtualization specifications are nearing completion. This is complemented by the chipset and processor vendors virtualization enablement.

- Multi-core processors are starting to appear with not too terrible memory bandwidth though latency continues to move glacially.

Given all of the above, the infrastructure to drive 10 GbE is starting to appear and that may finally tempt customers to move adoption plans up and increase volume solving the whole cost equation. Combine that with advancements in process technology as well as backplane / copper & optical cable cost structures and so forth, and 2008 might be a watershed for 10 GbE adoption.

History said...

10GbE offload of TCP processing and RDMA actually is quite difficult technically and an implementation typically takes a lot of dedicated resources.

TCP was developed for execution on a processor and does not translate well to dedicated hardware, mostly because of slow-path processing. Consequently, the stateful 10GbE offload engine generally uses multiple embedded processors within the offload controller to execute or at least aid TCP processing. Embedded multiple processor design for this class of device is very challenging. Memory infrastructure and context management is very hard. Most stateful 10GbE offload engines use external memories to store state. The stateful offload controller is not a small die and it usually sits in a large package so the per copy expense is significant. At this time, device cost still pales in comparison to the PHY but SFP+ should expose the cost of stateful 10GbE offload controllers relative to stateless (or less stateful) 10GbE controllers, with their associated external memories. These factors are why some think it is better to keep 10GbE controllers simple and let the increasingly powerful processor do the processing. The SUN Niagara and Rock processors are going that way. Indications are that Intel also will provide more cores and more threads moving toward 2010.

Moving off the network adapter, I suggest 10GbE LOM lends itself to a single chip solution to keep the cost of board real estate reasonable. That does not necessarily eliminate stateful 10GbE offload as a solution. The stateful 10GbE offload controller could cache active session state kept in system memory.

The stateful offload model seems largely driven by the way expensive applications are licensed, generally per socket or some formula counting processor cores. Offloading I/O processing from the processor complex allows it to perform more transactions per unit time and thus get better value for the cost of the license. That licensing model is being affected by the movement to multiple core processors. It will be interesting to see how that licensing model adapts and what affect that has on the prospects for stateful 10GbE offload engines.

In case you're wondering, I don't have a vested interest in 10GbE adapters and I no longer work in that industry segment.

 
interconnects