<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" ><channel><title>Solace Systems &#187; Intel</title> <atom:link href="http://solacesystems.com/tag/intel/feed/" rel="self" type="application/rss+xml" /><link>http://solacesystems.com</link> <description>Messaging Middleware and Content Networking Appliances</description> <lastBuildDate>Thu, 02 Feb 2012 19:11:51 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>Further thoughts on FPGA co-processing and performance</title><link>http://solacesystems.com/blog/technology/hardware/fpga-co-processing-and-performance/</link> <comments>http://solacesystems.com/blog/technology/hardware/fpga-co-processing-and-performance/#comments</comments> <pubDate>Fri, 29 Aug 2008 21:36:30 +0000</pubDate> <dc:creator>Larry Neumann</dc:creator> <category><![CDATA[Hardware]]></category> <category><![CDATA[co-processing]]></category> <category><![CDATA[context switching]]></category> <category><![CDATA[FPGA]]></category> <category><![CDATA[Intel]]></category> <category><![CDATA[low-latency]]></category><guid isPermaLink="false">http://temp.solacesystems.com/uncategorized/269</guid> <description><![CDATA[I’d like to go one layer down on a point that was introduced in the last post and why you can’t lump all non-software approaches into one hardware bucket. Software suffers in terms of performance in two fundamental ways, heavy CPU loading when tasks are complex, and kernel to application space context switching when iteration [...]]]></description> <content:encoded><![CDATA[<p>I’d like to go one layer down on a point that was introduced in the last post and why you can’t lump all non-software approaches into one hardware bucket. Software suffers in terms of performance in two fundamental ways, heavy CPU loading when tasks are complex, and kernel to application space context switching when iteration counts are high (ie. millions of anything per second). Let’s go through how hardware helps with each.</p><p><strong>Issue 1: Heavy CPU Loading/CPU Offload</strong></p><p>CPU intensive operations are difficult for general purpose software to execute on general purpose CPUs. Examples might be monte carlo simulations, complex algorithms or complex transformation of large data records. Think of it this way: let’s say that the CPU cost of running simulations in software is as follows:</p><p><img class="alignnone size-full wp-image-705" title="software-cpu12" src="http://solacecdn.s3.amazonaws.com/new/wp-content/uploads/2008/09/software-cpu11.gif" alt="" width="459" height="167" /></p><p><span id="more-269"></span></p><p>You can see that relative to the task of preparing and presenting data, the simulation is a CPU hog. If that task is handed off to an FPGA, two things happen:</p><p><img class="alignnone size-full wp-image-706" title="hardware-offload-cpu1" src="http://solacecdn.s3.amazonaws.com/new/wp-content/uploads/2008/08/hardware-offload-cpu1.gif" alt="" width="452" height="161" /></p><ul><li>Special purpose hardware might complete the simulation 20-100 times faster than software (in terms of elapsed time). So you can allow more iterations to complete in the same time.</li><li>Equally important, you have unburdened the general purpose processor of 1000 CPU work units, or in this example, by over 99% per of its work per iteration. Obviously there is additional operating system CPU work in the formula, but the key is that the hardware offload very significantly lightens the load for the general purpose CPU.</li></ul><p>So ignoring the challenges of how the FPGA simulation code get written, this CPU offload use case is the most likely scenario for FPGA co-processing as suggested by Intel and AMD in their architectures. I suppose you could execute the prepare data and present data steps in hardware as well, but 99% of the performance savings come from focusing on the high CPU cost of the simulation.</p><p><strong>Issue 2: Eliminate Context Switching</strong></p><p>The second problem that FPGAs and similar technology can address comes about when you swap out ALL of the software for hardware. Take a look software pseudochart below for a simple view of how pub/sub messaging works:</p><p><a href="http://solacecdn.s3.amazonaws.com/new/wp-content/uploads/2008/09/software-messaging11.gif"><img class="alignnone size-full wp-image-708" title="software-messaging11" src="http://solacecdn.s3.amazonaws.com/new/wp-content/uploads/2008/09/software-messaging11.gif" alt="" width="500" height="165" /></a></p><p>Each one of the steps is very light on CPU cost and it all works great if the number of messages is low, but when you try to execute lots of messages a second, a rate limiting issue comes up. The cost of ‘context switching’ – the time it takes for the operating system to pass control from the network stack to an operating system function to the application space and back again – becomes very high. Each individual context switch is fast, but a single publish to multiple subscribers can cause hundreds or thousands of them which leads to the mathematical principle:</p><p class="MsoNormal" style="text-indent: 0.5in;">fast * many iterations = slow</p><p>This is fundamental to software. No co-processing architecture can address it. If you were to try to apply CPU offloading from the prior example to this scenario, you might pick route lookups as the task you implement in hardware. But in the context of all the work performed, making that one step 100 times faster might only reduce performance by 10% or less. All the other steps are the same, and context switching becomes the performance killer at high volume.</p><p>This is why an all-hardware messaging approach is so fundamentally different from software messaging.</p><p><a href="http://solacecdn.s3.amazonaws.com/new/wp-content/uploads/2008/09/hardware-messaging11.gif"><img class="alignnone size-full wp-image-709" title="hardware-messaging11" src="http://solacecdn.s3.amazonaws.com/new/wp-content/uploads/2008/09/hardware-messaging11.gif" alt="" width="362" height="189" /></a></p><p>With FPGAs doing the application processing and network processors performing networking, security, compression etc you have eliminated the operating system and coupled the application and network stack all in hardware. By doing so, you eliminate context switching. FPGAs and network processors can also more naturally parallelize operations that need to be iterative in software, providing further major performance gains. Of course, removing context switches and parallelizing also substantially lowers end-to-end latency.</p><p>These two concepts are most of the reason software often runs out of steam or gets very erratic in terms of behavior and latency at just a few hundred thousand messages/second and hardware can scale all the way up to wire saturation, above 10 million messages per second per hardware blade. An all hardware solution rewrites the rulebook for what is possible.</p><p>So in the context of the prior blog post, this hopefully clarifies why FPGA co-processing can help with CPU offloading, but it does little to resolve throughput issues relating to context switching.</p> ]]></content:encoded> <wfw:commentRss>http://solacesystems.com/blog/technology/hardware/fpga-co-processing-and-performance/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Intel promoting on-board FPGAs to address low-latency financial market</title><link>http://solacesystems.com/blog/technology/hardware/intel-promoting-on-board-fpgas-to-address-low-latency-financial-market/</link> <comments>http://solacesystems.com/blog/technology/hardware/intel-promoting-on-board-fpgas-to-address-low-latency-financial-market/#comments</comments> <pubDate>Thu, 28 Aug 2008 01:28:40 +0000</pubDate> <dc:creator>Larry Neumann</dc:creator> <category><![CDATA[Hardware]]></category> <category><![CDATA[algorithmic]]></category> <category><![CDATA[AMD]]></category> <category><![CDATA[co-processing]]></category> <category><![CDATA[FPGA]]></category> <category><![CDATA[Intel]]></category> <category><![CDATA[low-latency]]></category><guid isPermaLink="false">http://temp.solacesystems.com/?p=249</guid> <description><![CDATA[Rik Turner of CBR recently wrote an interesting story about Intel trying to break into the low-latency financial services space by courting FPGA chip manufacturers and solution providers leveraging FPGAs to partner with Intel as they launch their Nehalem technology, faster front-side bus and FPGA co-processing capabilities. Of course the idea of hosting FPGA co-processing [...]]]></description> <content:encoded><![CDATA[<p>Rik Turner of <a href="http://www.cbronline.com/">CBR</a> recently wrote an interesting story about Intel trying to break into the low-latency financial services space by courting <a href="http://en.wikipedia.org/wiki/Field-programmable_gate_array">FPGA</a> chip manufacturers and solution providers leveraging FPGAs to partner with Intel as they launch their <a href="http://www.intel.com/technology/architecture-silicon/next-gen/">Nehalem</a> technology, <a href="http://www.intel.com/technology/platforms/quickassist/index.htm">faster front-side bus</a> and FPGA co-processing capabilities.</p><p>Of course the idea of hosting FPGA co-processing is not new, AMD has been offering a version of this approach for over two years. Intel is clearly playing catch-up here. It’s also not surprising that Intel would be concerned about any key market moving processing to specialized hardware that can outperform software on Intel processors by 10, 20 or even 50 times. Especially if one box of non-Intel special-purpose hardware can replace the work of 10 to 30 Intel boxes running software.</p><p>This is a classic case of if you can&#8217;t beat ‘em join ‘em, which has been a successful strategy for Intel in the past. The question is, who exactly will they be joining?</p><p><span id="more-249"></span></p><p>For FPGA manufacturers, this is a mixed blessing. The positive is that it can allow more FPGA processors to be sold. The negative case is that it levels the playing field and pushes the technology towards low margin commodity faster than through custom platform. It will be harder for FPGA manufacturers to differentiate their products in a plug-and-play world where they are subservient to Intel&#8217;s processors.</p><p>Companies that rely on FPGA technology to deliver differentiated products (like <a href="http://www.solacesystems.com/">Solace</a>) could benefit in some cases. While custom designed boards are likely to always be faster, the lower costs from using commodity parts could allow hardware companies to line extend to more tiers of offerings at more attractive hardware price points, while releveraging the same FPGA code. Perhaps software and servers on general processors result in performance X, FPGA co-processing on an Intel or AMD motherboard might offer performance improvements of 3-5X and a custom hardware solution may offer performance improvements of 10-50X. That could be intriguing to firms looking to take high-end products downmarket at lower price points. But the co-processor architecture will not displace the very high-end requirements. Co-processing will favor a blended software/hardware solution, which introduces some amount of context switching, which in non-CPU intensive applications (feed handlers, messaging, etc) is the primary source of bottlenecks today.</p><p>Who will NOT be coming along for the ride is most software ISVs or enterprise customers. Designing hardware solutions is not for the faint of heart, it is a significant long term commitment that is expensive and time consuming. There may be one FGPA code designer available for every 1,000 or maybe even 10,000 software developers. This scarcity makes this risky to staff within IT and even within most software vendors. That leaves only the most performance-sensitive, highly motivated suppliers to commit to the FPGA path. By definition, those are the suppliers that need all the performance juice they can get, not a half-way, lower-cost solution.</p><p>There are niche applications that are hardcore CPU crunchers that may jump on board. <a href="http://en.wikipedia.org/wiki/Monte_Carlo_method">Simulation</a> and quant engines are candidates. Specialized <a href="http://en.wikipedia.org/wiki/Algorithmic_trading">algorithmic trading</a> applications, where the algorithms are sufficiently long lived to be worth coding in FPGAs, may be another. But the target in financial services is not broad. It’s a couple niche cases within the already narrow low-latency financial services market.</p><p>This awkward mismatch of motivations between Intel, FPGA providers and low-latency specialist solution vendors leaves Intel with ambitions that will be challenging to convert to meaningful market share. It&#8217;s telling that AMD’s been at this for a while and has not made noteworthy progress thus far.</p><p>Please comment if you have opinions on how this may play out for Intel (or AMD).</p> ]]></content:encoded> <wfw:commentRss>http://solacesystems.com/blog/technology/hardware/intel-promoting-on-board-fpgas-to-address-low-latency-financial-market/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
<!-- Served from: solacesystems.com @ 2012-02-03 23:44:39 by W3 Total Cache -->
