Monday, September 03, 2007

A benchmark for 10G Ethernet

Thanks for bearing with my disappearance into the French Riviera for a week. I am back from diving into the deep blue Mediterranean Sea and ready to dive into the deep goo of interconnects.

Thanks to Serag GadelRab for supplying a copy of his very detailed and thorough technical review of techniques used in 10G Ethernet network interface cards in the May-June edition of IEEE Micro. It is indeed a deep dive into the issues. If you haven't read it yet, go find it.

Serag calls for a benchmark to gauge the effectiveness of various stateless and TCP offload approaches across different systems and apps. He does a good job making the case for that need by describing the incredible complexity and diversity of past and future techniques for handling the problem of matching these fast networks to relatively slow servers. Serag's article reinforces my own feeling of being somewhat in the dark about what will be the winning approaches here. So I second his call for a benchmark.

I'd like to hear from readers on this score. Do we need a benchmark for measuring the various 10G Ethernet approaches, and if so what would it be and who should create it?
As usual, you can post a comment here or email me at

Incidentally, Serag concludes that stateless NICs working with multicore processors will take the mainstream in 10G Ethernet. TOE chips with RDMA will hold a niche in high performance computing and storage, putting competitive pressure on Infiniband in its current home court. That's not the view I got from HP recently.

If Serag is right, that explains why Mellanox has embraced a hybrid Infiniband/Ethernet strategy, and why despite its current business success, validated by its recent successful IPO, Mellanox has not attracted any direct competition in Infiniband silicon.

Serag makes two other insightful observation about next-gen 10GE chips in his article. They will need to support two new features to handle the rise of traffic from multiple guest sessions in virtualized software environments. Vendors will try to eke out unique competitive approaches in how they support multiple logical data paths, he said. Secondly, these chips will have to learn how to handle switching on the NIC inside the server.


RapidIO Executive Director said...

I agree a benchmark for high speed interconnects would be great, it did wonders for the Accelerated Graphics business back in 1992. At ATi we worked with publications (ziff) and the industry to create benchmarks that sorted the claims, and helped everyone, user, creators, and media.

So the question is who makes up the group to create a Highspeed interconnect benchmark
Well Serag is one smart cookie, worked with him on RapidIO at Tundra. perhaps the RapidIO TA, The Ethernet Alliance, IEEE (if they promise not to slow us down) , Infiniband TA and some smart media/ guru type. (or Rick) :-)
so I have some time to bounce this Idea around, send me a email or use this forum.
Tom Cox Exec Dir. RapidIO

Anonymous said...

Don't have access to the article but a couple of points based on what you wrote:

(a) An IOV Ethernet device must provide switch semantics for frames routed between multiple guests on the same server. Multiple public presentations in various industry forums for the past 2.5 years by the PCI-SIG IOV representatives have explained why this is required for a credible solution - the various hypervisor providers have also provided such education. This should not be a surprise to anyone and I know (under CDA) a set of IHV executing such functionality today.

(b) Mainstream 10 GbE is an interesting assertion since it isn't mainstream today and the technologies required to move it to mainstream in many cases, have only just been completed from a standards perspective. The growing consensus is 2008 will be the year that 10 GbE really starts the S-curve adoption rate - more IHV offerings, lower power copper interfaces, active cables scaling now to 40 Gbps, etc. all point to fewer barriers to adoption.

(c) Many core + stateless 10 GbE isn't that hard to imagine being high volume especially since one of the top Ethernet providers (Intel) is projecting this same message and the new Sun Niagara 2 has integrated 10 GbE (is a core dedicated to packet processing any different than an embedded core in an I/O device at the end of the day? Answer is yes). The whole stateful vs. stateless debate borderlines on religious debates - different benefits and different cost models with both providing value at the end of the day. It isn't rocket science to comprehend what will be the highest volume since highest volumes are often driven by the simplest workloads that often dominate the data center for many customers. No need to support TOE or RDMA for many of these workloads since their I/O rate and demands on the system are not that severe (some argue that stateful for virtualization however can change this significantly).

BTW, when 10 GbE finally hits the client space, it will be stateless and many core driven just like it is for current GbE in clients. Hardly a revelation at to where the volume will be since ultimately the goal is to drive server technology such as Ethernet into clients in order to maximize volumes and reduce costs. The question has always been when not if this will happen. Even a couple of years ago at a Hot Interconnects panel, several panelists agreed on this very point though not all said it as succinctly as one panelists - just an echo of what many of us have contended and pushed for a number of years now.

(d) IOV solutions all follow a similar architecture which again, has been explained in excruciating detail over the years. Not sure what he is asserting that is different but separate logical flows isn't a new concept, has been implemented in multiple technologies for a number of years (one of the very first 10 GbE NIC implementations contained this concept that I had helped the IHV develop as a joint company effort). As for differentiation, the OS / hypervisor will play a major role in the end. Differentiation will require software to access and manage. There is still a great deal to develop here and it all eventually has to be tied back into higher level services which will make trade-offs on system resources which will then distill into trade-offs in I/O device resource management.

(e) Micro-benchmarks have their place but ultimately, the effectiveness of any technology needs to be evaluated at the application layer or at least at the presentation / session layer for the given protocol (e.g. Sockets, NFS, etc.). If one only focuses on layer 2, then that does not provide much insight into whether a given technique has real value at the end of the day since it is the impact to the system and ultimately for the application that shows the real value vs. cost analysis. Numerous benchmarks already exist that are used to evaluate Ethernet optimizations as well as partial or full stateful / stateless off-load techniques, e.g. Netperf is a widely used open source benchmark that has evolved to simulate a wide range of volume application network usage. Perhaps the author is not aware of all that already exists and how IHV and OS / network stack providers use these tools to evaluate technology trade-offs (examination of the discussion in open source efforts such as OFA illustrate some of the software debates on functional trade-offs). Without more specifics on what is being proposed, it is not clear there is any merit to creating yet another benchmark. BTW, no one makes a trade-off decision using only one benchmark. I've seen people show their wares illustrating performance from 6-8 benchmarks to highlight various capabilities and their effectiveness with their product.

(f) As for Mellanox, well, they are a very smart company. Creation of a multi-protocol chip isn't rocket science and parallels work done by other implementations already in the market. Mellanox appears to be looking to control more silicon and since one of the main usage models for IB Is to gateway to other fabrics as cost effectively as possible, it only makes more sense to develop a simple layer 2 framing chip to move between fabric types. Some have argued that this can also be used to play in what will be a growing 10 GbE market as well. Some merit to that but if you listen to Mellanox, their focus is about how to make IB as easy to integrate into the data center as possible. They maintain that with IB they can stay ahead of the Ethernet performance curve for many years to come. Perhaps true but there are limitations, inherent differences between protocol efficiencies and signaling rate, etc. I think the author is incorrect if his assertion is IB will be just a niche technology. Perhaps he has not read the tea leaves about how HPC workloads are becoming mainstream for many IT shops and that the use of protocol off-load and RDMA will increase over time. It may remain a niche in terms of overall volume but that niche will still maintain a good portion of money over at least the new few years to come (let's not forget IBM is just staring to ramp up its IB offerings with the Power 6 and they really have not shown all there is to their 8 year investment stream).

No offense to the author but it is often amusing to see people make insightful observations which often just restate what others have been communicating for often a number of years.

Anonymous said...

Unlike the author of the second posting, I have read the article. It is clearly intended to be tutorial in nature. It offers a comprehensive and necessarily (because it is an article) concise survey of the issues affecting 10GbE performance and the numerous techniques employed to improve performance. The article is up to date and a very good read for anyone with an interest in the field.

Anonymous said...

The GadelRab article is a good overview of the various technologies and choices in modern 10 Gigabit server networking solutions. Whatever you want to call it… onloading vs. offloading, stateless vs. stateful, native OS vs. smart NIC, this debate seems to have the interest of server OEMs and IHVs but gets little attention in IT circles.

Industry application-level benchmarks that measure the impact of network I/O would be welcomed as they can help measure the performance benefits of various technologies to determine whether they are worth the cost. Of course, benchmarks won’t take into account real-world requirements around reliability, usability, support, etc. which may make a bigger difference in which 10G architecture ultimately “wins”. It will be interesting to watch play out given the amount invested so far by established players like Broadcom, Intel, Sun and the dozen or so 10GbE start-ups…

Anonymous said...

Well, there are multiple industry level benchmarks that already measure the impact of network technologies and approaches. Not sure what people believe is lacking but any evaluation needs to be done at the application level and not at a layer 2 level if one wants to comprehend what is really beneficial or not to the customer. It might be good for people to examine what already exists and improve that than create what will likely become more of a marketing micro-benchmark than something practical to guide investment and development decisions.