<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>RenderingPipeline</title>
	<atom:link href="http://renderingpipeline.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://renderingpipeline.com</link>
	<description>from geometry to pixels</description>
	<lastBuildDate>Sun, 21 Apr 2013 16:58:50 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>A look at the Bayer Pattern</title>
		<link>http://renderingpipeline.com/2013/04/a-look-at-the-bayer-pattern/</link>
		<comments>http://renderingpipeline.com/2013/04/a-look-at-the-bayer-pattern/#comments</comments>
		<pubDate>Sun, 21 Apr 2013 16:58:50 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[Image Processing]]></category>
		<category><![CDATA[Bayer Pattern]]></category>
		<category><![CDATA[RAW]]></category>

		<guid isPermaLink="false">http://renderingpipeline.com/?p=789</guid>
		<description><![CDATA[Have you ever wondered, how a &#8216;raw&#8217; image file from a camera looks like? I have, so I tried to visualise one. Contrary to popular believe, you actually can do that quite easily&#8230; Most digital cameras can not only store the images as JPEGs, but also &#8216;raw&#8217;, in a format that normally has to be [...]]]></description>
				<content:encoded><![CDATA[<p>Have you ever wondered, how a &#8216;raw&#8217; image file from a camera looks like? I have, so I tried to visualise one. Contrary to popular believe, you actually can do that quite easily&#8230;</p>
<p>Most digital cameras can not only store the images as JPEGs, but also &#8216;raw&#8217;, in a format that normally has to be developed before it can be viewed. In fact, the camera does the same thing before saving an image as a JPEG, even those that can&#8217;t store the raw itself like smartphones. Almost all digital sensors in cameras from phones up to DSLRs only have one sensor in them which can only detect the brightness of the incoming light. To get a coloured image they have per-pixel colour filters in front of the sensor to capture the red, green or blue light per pixel. The arrangement of these filters can vary, but most resemble the so called Bayer pattern invented by Bryce Bayer in 1976.</p>
<p><strong>Visualising the Bayer pattern</strong></p>
<p>The three colours are arranged in a 2&#215;2 pattern where green is used twice as often as the other colours as the human eye is most sensitive to this colour. The arrangement is as depicted below:</p>
<div id="attachment_791" class="wp-caption alignnone" style="width: 370px"><a href="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer-pattern.gif"><img class="size-full wp-image-791" alt="RGGB Bayer Pattern" src="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer-pattern.gif" width="360" height="300" /></a><p class="wp-caption-text">RGGB Bayer Pattern</p></div>
<p>Note that the pattern doesn&#8217;t have to start with red but at least in the cameras for which I could check the raw data it did. Instead of mixing one RGGB block to get one pixel, the missing colours are reconstructed to get a higher resolution out of the sensor. So the sensor above would result in 12 by 10 RGB pixels. So, yes, your 10 megapixels camera only has 5 million green, 2.5 million red and 2.5 million blue sensor elements and 66% of the resulting image are interpolated and were not actually captured!</p>
<p>So how does the data that comes out of the sensor look like? I made the photo below (as raw) and used <a href="http://www.libraw.org" target="_blank">libraw</a> to decode it. Luckily it comes with a sample called &#8216;unprocessed_raw&#8217; which can dump the sensor data and which served as a starting point for me.</p>
<div id="attachment_792" class="wp-caption alignnone" style="width: 370px"><a href="http://renderingpipeline.com/wp-content/uploads/2013/04/testimage.jpg"><img class="size-full wp-image-792" alt="test image " src="http://renderingpipeline.com/wp-content/uploads/2013/04/testimage.jpg" width="360" height="238" /></a><p class="wp-caption-text">Converted in Lightroom, no adjustments were made.</p></div>
<p>What comes out of the &#8217;unprocessed_raw&#8217; sample from libraw just looks black but this is because the sensor data gets saved in a 16 bit grey image but the camera might not have a 16 bit analog to digital converter. So the pixels had to be shifted in this case by 2 places as the camera was only set to 12 bit. Below you can see a small part of the image developed in Adobe Lightroom 4 and the same part of the original sensor data. On the right I coloured the sensor data with the corresponding colours they represent (the colours of the filter for that pixel).</p>
<div id="attachment_794" class="wp-caption alignnone" style="width: 664px"><a href="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_eye.jpg"><img class="size-full wp-image-794" alt="Resized 200%" src="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_eye.jpg" width="654" height="164" /></a><p class="wp-caption-text">Resized 200%. You might have to click on the image to see the pattern better in full resolution.</p></div>
<p>The image is quite dark and green is very dominant. To get the true colours out of the raw we would have to rescale the colour channels and find a proper white balance.</p>
<p><strong>Getting the missing colours back</strong></p>
<p>But first let&#8217;s see how the missing 66% of the data can be restored. The simplest way to do it is taking the value from the nearest sample of the correct colour: e.g. for a pixel that has already a red value take the blue from the pixel one pixel to the right and one to the left and the green from the pixel below the current pixel. This nearest neighbour filtering leads to the coloured result below (left). Now that the image is very roughly restored, the colours can get adjusted. The raw file stores additional information for that (e.g. the white balance from the camera) but here I just tried to roughly match the output from Lightroom (middle).</p>
<div id="attachment_795" class="wp-caption alignnone" style="width: 664px"><a href="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_nearest_bilinear.jpg"><img class="size-full wp-image-795" alt="Left to right: Nearest neighbour interpolation without and with colour correction, bilinear filtering with colour correction." src="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_nearest_bilinear.jpg" width="654" height="164" /></a><p class="wp-caption-text">Left to right: Nearest neighbour interpolation without and with colour correction, bilinear filtering with colour correction (resized 200%).</p></div>
<p>A slightly better way to reconstruct the missing colours is to average the surrounding samples instead of just picking one: A simple bilinear filtering. This is shown on the right (also colour adjusted). As you can see, this already improves the highlight on the eye dramatically.</p>
<p>If the same colour adjustments (in fact just a curve in Photoshop) are applied to the coloured Bayer pattern we can visualise it even better:</p>
<div id="attachment_797" class="wp-caption alignnone" style="width: 430px"><a href="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_pattern_rose.jpg"><img class="size-full wp-image-797" alt="Left: from Lightroom, right the coloured Bayer pattern. Resized 300%." src="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_pattern_rose.jpg" width="420" height="144" /></a><p class="wp-caption-text">Left: from Lightroom, right the coloured Bayer pattern. Resized 300%.</p></div>
<p>We can clearly see the dominance of the red coloured pixels on the rose, the green ones on the leaves and the mixture of all colours on the wooden table.</p>
<p><strong>Results from the pros</strong></p>
<p>Even the bilinear reconstruction is not ideal in any way. There are lots of algorithms to better handle Bayer demosaicing (also written demosaicking if you want to google the details) and as I didn&#8217;t want to implement all of them I switched to already available software: The already mentioned Lightroom and <a href="http://rawtherapee.com" target="_blank">RAWTherapee</a> which has the nice feature of letting you choose the algorithm for Bayer demosaicing.</p>
<div id="attachment_799" class="wp-caption alignnone" style="width: 770px"><a href="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_reconstruction_compare.png"><img class=" wp-image-799" alt="bayer_reconstruction_compare" src="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_reconstruction_compare.png" width="760" height="236" /></a><p class="wp-caption-text">Left to right: Nearest neighbour, bilinear, RAWTherapee &#8216;fast&#8217; &amp; &#8216;amaze&#8217;, Lightroom. Resized 200%, click the image to see it in full resolution.</p></div>
<p>Here we can see the leave structure of the rose slightly in the images reconstructed from Lightroom and RAWTherapee with different algorithms. My own quick hack did not reconstruct that but only created a bit of noise. As you can see now, the rose is actually an artificial flower, so the structure is not an artefact. No additional sharpening was used in any image. Below are the same parts of the image with enhanced contrast and sharpened:</p>
<div id="attachment_801" class="wp-caption alignnone" style="width: 770px"><a href="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_reconstruction_compare_sharp.png"><img class="size-full wp-image-801" alt="Increased contrast and sharpened." src="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_reconstruction_compare_sharp.png" width="760" height="236" /></a><p class="wp-caption-text">Increased contrast and sharpened, click on the image to see it in full resolution.</p></div>
<p>The visibility of the structure of the fabric is highly dependent of the algorithms used. All images so far were captured with a camera that has an optical low-pass filter which is intended to reduce aliasing that can be introduced by demosaicing the Bayer filter. However, it also reduces the sharpness of the image. The images below were taken with a camera that has 50% more pixel sensors but also has no optical filter anymore. But again, the quality differences were mostly a result of the demosaicing technique chosen.</p>
<div id="attachment_802" class="wp-caption alignnone" style="width: 890px"><a href="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_reconstruction_compare_nofilter.png"><img class="size-full wp-image-802" alt="foo" src="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_reconstruction_compare_nofilter.png" width="880" height="378" /></a><p class="wp-caption-text">Same scene, different sensor: 50% higher resolution and no low-pass filter.</p></div>
<p>The only moiré like pattern I found were in the camera made with the low-pass filter (but with a low quality demosaicing algorithm). So I assume that moiré can be avoided by better algorithms at least as well as with optical filters. &#8216;<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.141.634" target="_blank">Image Demosaicing: A Systematic Survey</a>&#8216; by Li et al. also shows the great influence of the demosaicing technique to moiré.</p>
<div id="attachment_803" class="wp-caption alignnone" style="width: 990px"><a href="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_reconstruction_compare_2.png"><img class="size-full wp-image-803" alt="foo" src="http://renderingpipeline.com/wp-content/uploads/2013/04/bayer_reconstruction_compare_2.png" width="980" height="352" /></a><p class="wp-caption-text">Nearest neighbour, bilinear, &#8216;fast&#8217; and &#8216;amaze&#8217; from RAWTherapee and Lightroom, 400%. Click on the image to see it in full resolution, otherwise some artefacts might not be visible!</p></div>
<p>This last example above shows that fine structures can be challenging for demosaicing: The algorithms used by RAWTherapee seem to try to add some sharpening that goes wrong here as can be seen by the few dark and bright pixels. The bilinear reconstruction can hardly preserve the structure from the fabric.</p>
<p><strong>Recap</strong></p>
<p>As it turns out it&#8217;s actually not too hard to get access to the raw sensor data of a DSLR and visualise it. A basic reconstruction of the full colour data is hacked together quite quickly but by far no match to the already available implementations (no surprise here). If you own a DSLR I would recommend to shoot in RAW mode as you can see that there are wide differences even in the reconstruction quality of the Bayer pattern (in addition to a higher bit rate and the avoidance of JPEG artefacts!). Who knows what is implemented inside of the camera and what awesome new algorithms will be invented in the future that can improve your already shot images even more?</p>
<p><strong>Other ways to capture an image</strong></p>
<p>While most cameras use the Bayer pattern, this isn&#8217;t the only way to capture an image of course:</p>
<ul>
<li><span style="line-height: 16px;">You can arrange the colour filter array in a different order.</span></li>
<li>You can add other colour filters than just red, green and blue, e.g. add yellow, white or cyan.</li>
<li>You can capture red, green and blue in one sensor (Foveon).</li>
<li>You can use three separate sensors &#8211; if you can manage to align them precisely, a mechanical problem that gets more and more challenging the higher the resolution gets.</li>
<li>You can even capture the colours one after another by placing removable colour filters in front of a sensors &#8211; sounds impractical? This is used by NASA for some spacecrafts and landers&#8230;</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://renderingpipeline.com/2013/04/a-look-at-the-bayer-pattern/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MacOS X OpenGL driver status (10.8.3)</title>
		<link>http://renderingpipeline.com/2013/03/macos-x-opengl-driver-status-10-8-3/</link>
		<comments>http://renderingpipeline.com/2013/03/macos-x-opengl-driver-status-10-8-3/#comments</comments>
		<pubDate>Sun, 24 Mar 2013 19:02:45 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[driver status]]></category>
		<category><![CDATA[MacOS X]]></category>

		<guid isPermaLink="false">http://renderingpipeline.com/?p=731</guid>
		<description><![CDATA[Apples latest software patch for MacOS X 10.8 (Mountain Lion, 10.8.3) fixed some issues related to OpenGL. To recap, my currently known bugs and issues are these: Wrong values from uniform structs (bug id #10518401) - Fixed Wrong names for uniforms in uniform blocks (bug id #11335828) - Still exists (this bug was reported in april 2012) [...]]]></description>
				<content:encoded><![CDATA[<p>Apples latest software patch for MacOS X 10.8 (Mountain Lion, 10.8.3) fixed some issues related to OpenGL.</p>
<p>To recap, my currently <a title="MacOS X OpenGL driver bugs" href="http://renderingpipeline.com/2012/07/macos-x-opengl-driver-bugs/">known bugs</a> and issues are these:</p>
<ul>
<li>Wrong values from uniform structs (bug id #10518401) - <strong>Fixed</strong></li>
<li>Wrong names for uniforms in uniform blocks (bug id #11335828) - <strong>Still exists</strong> (this bug was reported in april 2012)</li>
<li>Rendering to an FBO without fragment data location 0 is not possible (bug id #11826727) &#8211; <strong>Still exists </strong>(this bug was reported in july 2012)</li>
<li>Uniform block layout qualifier ‘packed’ results in GLSL compile errors (bug id #11884110) &#8211; <strong>Fixed</strong></li>
<li>Default layouts of Uniform Blocks can’t be changed at global scope (bug id #11884641) - <strong>Fixed</strong></li>
<li>GLSL noperspective varying interpolation doesn&#8217;t work on Intel HD4000 (bug id <a title="Noperspective varying interpolation bug on MacOS X" href="http://renderingpipeline.com/2012/11/noperspective-varying-interpolation-bug-on-macos-x/">#12728578</a>) &#8211; <strong>Fixed</strong></li>
<li>If a scissor test is active with a larger region than the glViewport and the later is smaller than the framebuffer, the rasterization is not clipped at the viewport. (at least on HD4000). Bugs #12949605 &amp; #13287825 &#8211; <strong>Still exists</strong></li>
<li>After a conditional rendering which was not performed (due to no passed samples) subsequent glClear calls will not get executed until the next draw call was issued. (at least on HD4000). Bug <a title="OpenGL conditional rendering bug on MacOS X" href="http://renderingpipeline.com/2013/01/opengl-conditional-rendering-bug-on-macos-x/">#12949224</a> &#8211; <strong>Still exists</strong></li>
<li>Highest <a title="MacOS X OpenGL driver status" href="http://renderingpipeline.com/2012/07/macos-x-opengl-driver-status/">OpenGL version available is 3.2</a> (not a bug, but filed as bugs as (at least) bug id #8225498 and #8233475). - <strong>no changes here</strong></li>
</ul>
<p>In addition to that, MacOS X 10.8.3 adds a few <a href="http://www.geeks3d.com/20130321/osx-mountain-lion-10-8-3-update-available-new-opengl-extensions/" target="_blank">new extensions</a>, for example explicit attribute locations, a feature that was even earlier exposed with <a title="First hints of OpenGL 4 on MacOS X?" href="http://renderingpipeline.com/2012/11/first-hints-of-opengl-4-on-macos-x/">a hack</a>.</p>
<p>So, there is still much to do for Apples driver staff&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://renderingpipeline.com/2013/03/macos-x-opengl-driver-status-10-8-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FPS vs msec/frame</title>
		<link>http://renderingpipeline.com/2013/02/fps-vs-msecframe/</link>
		<comments>http://renderingpipeline.com/2013/02/fps-vs-msecframe/#comments</comments>
		<pubDate>Sun, 17 Feb 2013 21:54:57 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[FPS]]></category>
		<category><![CDATA[graphics theory]]></category>

		<guid isPermaLink="false">http://renderingpipeline.com/?p=758</guid>
		<description><![CDATA[When talking about the performance of graphics applications or algorithms you hear frames per second (FPS for short) as the unit of measurement. This however is wrong most of the time and here&#8217;s why: Why we use FPS There are two ways to measure how fast an application can render the virtual world: by giving [...]]]></description>
				<content:encoded><![CDATA[<p>When talking about the performance of graphics applications or algorithms you hear frames per second (FPS for short) as <strong>the</strong> unit of measurement. This however is wrong most of the time and here&#8217;s why:</p>
<p><strong>Why we use FPS</strong></p>
<p>There are two ways to measure how fast an application can render the virtual world: by giving the time it takes to render it or by counting the number of frames that get rendered per second. It seems strange that the later way is more common but it makes sense after you have learned that the brain will interpret a sequence of images as a smooth motion if shown fast enough. As cinema shows us 24 images per second and TV 25 (PAL) or 29.97 (NTSC) any game that renders more frames per second is fast enough, right? Well this isn&#8217;t exactly right and I will come to that later but it demonstrates why FPS is a nice way to measure the rendering speed of a whole system (e.g. a game) for the user: He/she has only to compare the number given by a fixed value he learned to be the magic value where a sequence of images begins to feel like a fluid motion.</p>
<p>But instead of memorizing that 30 or 60 FPS is the magic border, you could also memorize 33 or 16 milliseconds as the magic rendering time. As I will show next, talking about milliseconds instead of FPS has some advantages.</p>
<p><strong>Talking about subsystems</strong></p>
<p>FPS doesn&#8217;t work anymore then we start talking about parts of the system instead of the whole renderer. Let me give you an example: Let&#8217;s say you read this paper about this cool new SSAO algorithm and it claims that with a simple scene you can achieve 200 FPS. Our game targets full 60 FPS, so it sounds like there is plenty of performance left. Now let&#8217;s look at the rendering times: the hypothetical SSAO demo took 5 msec to render each frame and we have to render a full game in 16.6 msec. That&#8217;s 30% of our budget for one post-processing effect! But maybe it&#8217;s not that bad: those 5 msec are needed to render the post-processing effect and a simple scene for which we don&#8217;t know the cost of the overhead! If this information would have been given in FPS it would say something like: &#8216;FPS without post-processing: 2000 FPS, with post-processing: 200 FPS&#8217;. But then you would think &#8216;One order of magnitude slower? No way this would work in my engine!&#8217;. But this reveals that the overhead of a simple scene was 0.5 msec and the effect is 4.5 msec. This way it&#8217;s much more clear if the effect fits into your budget or not. If the demonstration would use a complex scene the numbers could look like this: Without SSAO: 39.2 FPS, with SSAO: 33.3 FPS. &#8216;Just 17% overhead, great! My engine is running at 72 FPS, my goal is 60 FPS, so I have enough performance to spare!&#8217;. Going to msec reveals that the test application used up 25.5 msec and the post-process (again) 4.5 msec. Your app renders one frame in 13.9 msec and adding 4.5 will give you 18.3 &#8211; too bad you are over budget (~55 FPS).</p>
<p>It gets worse when we talk about adding multiple effects: Effect A renders at 200FPS, effect B at 100FPS and effect C at 500FPS so all three combined will render at&#8230;? You have to convert to msec/frame to calculate that anyway, so why aren&#8217;t we talking about msec per effect to begin with?</p>
<p>The bottom line is this: It&#8217;s much more intuitive to handle timings in what they are &#8211; the time to calculate something.</p>
<p><strong>Varying rendering speed</strong></p>
<p>Let&#8217;s say your engine is too slow, you play around with various settings and this is what you found out: With all effects active your game runs at just 50 FPS, without shadow-map creation its 66.6 FPS so you decide to recreate the SM only every third frame (you&#8217;re ok with the resulting artifacts as long as the user has smooth 60 FPS). Even tho you have 60 FPS now, it doesn&#8217;t feel smooth at all! Two frames take 15 msec to render and the third one 20! I once worked with a rendering system that didn&#8217;t feel as responsive as we expected from constantly (slightly) above 60 FPS. Further investigation revealed that some calculations were only triggered every few frames (one of the shadow maps every second, other systems in other intervals). I plotted the timings per frame and they varied a lot. The distribution of workload over multiple frames was quite bad balanced.</p>
<p>How would you even write down such varying rendering times in FPS? &#8216;We have an average of 60FPS but N% of the frames have 300FPS while M% have 20-30FPS&#8230;&#8217;?</p>
<p><strong>Latency</strong></p>
<p>Latency is a big topic: How long does it take your system from the moment you press a button until the resulting action gets displayed on the screen? There are a lot of factors like input latency, rendering time, latency of the driver (by buffering commands for whole frames), additional buffering in your TFT (TVs are even worse as they can perform some post-processing, e.g. 25 to 100Hz upscaling, that&#8217;s why they often have a &#8216;gaming mode&#8217; which switches this of) etc. All of these timings can be given in milliseconds and so should the rendering time! In the context of reducing the whole system latency rendering speeds below 16 msec (and thus above 60FPS) can make sense even if the display only is capable of displaying 60 images per second.</p>
<p>If your game has a high latency from input to the screen or your rendering time varies a lot, your average FPS count will tell me nothing about how smooth the experience is.</p>
<p><strong>tl;dr</strong></p>
<p>Don&#8217;t measure your graphics performance in FPS but in milliseconds per frame/effect.</p>
]]></content:encoded>
			<wfw:commentRss>http://renderingpipeline.com/2013/02/fps-vs-msecframe/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Adding analog inputs to your graphics app</title>
		<link>http://renderingpipeline.com/2013/01/adding-analog-inputs-to-your-graphics-app/</link>
		<comments>http://renderingpipeline.com/2013/01/adding-analog-inputs-to-your-graphics-app/#comments</comments>
		<pubDate>Sun, 27 Jan 2013 14:41:51 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[Code Bits]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Midi]]></category>
		<category><![CDATA[Prototype]]></category>

		<guid isPermaLink="false">http://renderingpipeline.com/?p=526</guid>
		<description><![CDATA[When prototyping graphics applications it doesn&#8217;t take long until you want to control some parameters during runtime. Using the keyboard is often the first candidate as getting keystrokes is very easy for most toolkits used in graphics like GLFW or GLUT. But it doesn&#8217;t take long until the number of keys it so large that [...]]]></description>
				<content:encoded><![CDATA[<p>When prototyping graphics applications it doesn&#8217;t take long until you want to control some parameters during runtime. Using the keyboard is often the first candidate as getting keystrokes is very easy for most toolkits used in graphics like <a href="http://www.glfw.org" target="_blank">GLFW</a> or GLUT.</p>
<p>But it doesn&#8217;t take long until the number of keys it so large that it becomes confusing. There is also no direct feedback of the currently set values (and a lot of printf doesn&#8217;t scale so good as well). So the next step is adding a simple UI. Some use QT for this as it&#8217;s a nice GUI toolkit and works cross platform (so did I for many projects). QT is great when your prototype turns into a full application (unless you are building a game) but may require too much glue code for simple prototypes.</p>
<p>More limited but also much simpler to include into existing projects and very simple to set up for simple tasks is <a href="http://www.antisphere.com/Wiki/tools:anttweakbar" target="_blank">AntTweakBar</a>. After writing a couple of lines of glue code, exposing one more variable for testing is a matter of one line of code!</p>
<p>In case you want some haptic feedback or adding a UI is not an option (maybe you don&#8217;t want to occlude anything from your game and need the keyboard and mouse for the game control), adding a MIDI device might be an option. <a href="http://www.music.mcgill.ca/~gary/rtmidi/">rtMidi</a> is a small cross platform library that helps getting the MIDI commands from your device. A simple MIDI controller with nine analog sliders and knobs will set you back 50€. Each control will give you a 7bit value of the current position.</p>
<p>One downside of this setup is the fact that the device only sends the values of the controls when they change, this means that the initial state when starting your app is not reported. So you might want to reset all physical sliders or move each one slightly after your application started to send the current control value to your app.</p>
<p>The video below shows how a small MIDI board is used to adjust the lighting parameters of our beloved Stanford bunny:</p>
<p><iframe width="650" height="366" src="http://www.youtube.com/embed/j5-yKfpmbEk?feature=oembed" frameborder="0" allowfullscreen></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://renderingpipeline.com/2013/01/adding-analog-inputs-to-your-graphics-app/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>OpenGL conditional rendering bug on MacOS X</title>
		<link>http://renderingpipeline.com/2013/01/opengl-conditional-rendering-bug-on-macos-x/</link>
		<comments>http://renderingpipeline.com/2013/01/opengl-conditional-rendering-bug-on-macos-x/#comments</comments>
		<pubDate>Thu, 03 Jan 2013 10:12:57 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[Bug]]></category>
		<category><![CDATA[driver status]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[MacOS X]]></category>

		<guid isPermaLink="false">http://renderingpipeline.com/?p=740</guid>
		<description><![CDATA[At least on Intel HD4000 GPUs there seems to be a bug regarding conditional rendering in OpenGL on 10.8 Mountain Lion: After the conditional rendering the condition state (ignore draw calls or execute them) is &#8216;stuck&#8217; until the next draw call gets issued &#8211; this can ignore glClear calls which shouldn&#8217;t get ignored. Here&#8217;s an [...]]]></description>
				<content:encoded><![CDATA[<p>At least on Intel HD4000 GPUs there seems to be a bug regarding conditional rendering in OpenGL on 10.8 Mountain Lion: After the conditional rendering the condition state (ignore draw calls or execute them) is &#8216;stuck&#8217; until the next draw call gets issued &#8211; this can ignore glClear calls which shouldn&#8217;t get ignored.</p>
<p>Here&#8217;s an example:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="c" style="font-family:monospace;">GLuint query<span style="color: #339933;">;</span>
glGenQueries<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #339933;">&amp;</span>query<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
glBeginQuery<span style="color: #009900;">&#40;</span>GL_SAMPLES_PASSED<span style="color: #339933;">,</span> query<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// draw bounding box</span>
glEndQuery<span style="color: #009900;">&#40;</span>GL_SAMPLES_PASSED<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
glBeginConditionalRender<span style="color: #009900;">&#40;</span> query<span style="color: #339933;">,</span> GL_QUERY_WAIT <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// draw actual object: if no sample of the draw call above was</span>
<span style="color: #666666; font-style: italic;">// drawn, all glClear and draw commands will get ignored</span>
glEndConditionalRender<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// now everything should get drawn again!</span>
glClear<span style="color: #009900;">&#40;</span> GL_COLOR_BUFFER_BIT <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// might also get ignored -&gt; BUG</span>
glDrawArrays<span style="color: #009900;">&#40;</span>...<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
glClear<span style="color: #009900;">&#40;</span> ... <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// now it works again</span></pre></td></tr></table></div>

<p>To be clear, glClear and glClearBuffer commands <strong>between</strong> glBeginConditionalRender and glEndConditionalRender should get ignored if the query failed, but <strong>after</strong> glEndConditionalRender everything should be back to normal. In fact at least on the stated configuration (10.8, Intel HD4000) it isn&#8217;t, the state of ignoring the glClear calls is stuck until the next draw call gets issued (which works as intended btw.). On 10.7 with a GeForce 9600M it works as intended, so I&#8217;m not sure whether this is a bug of the Intel drivers or of 10.8.</p>
<p>The bug is listed under bug id #12949224 and joins the list of <a title="MacOS X OpenGL driver bugs" href="http://renderingpipeline.com/2012/07/macos-x-opengl-driver-bugs/">other open issues</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://renderingpipeline.com/2013/01/opengl-conditional-rendering-bug-on-macos-x/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>GLSL Syntax Highlighting for TextWrangler &amp; BBEdit</title>
		<link>http://renderingpipeline.com/2012/12/glsl-syntax-highlighting-for-textwrangler-bbedit/</link>
		<comments>http://renderingpipeline.com/2012/12/glsl-syntax-highlighting-for-textwrangler-bbedit/#comments</comments>
		<pubDate>Sat, 01 Dec 2012 12:42:35 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[Shader]]></category>
		<category><![CDATA[Syntax Highlighting]]></category>
		<category><![CDATA[TextWrangler]]></category>

		<guid isPermaLink="false">http://renderingpipeline.com/?p=722</guid>
		<description><![CDATA[I hacked together a very simple codeless language module for TextWrangler (also compatible with BBEdit) for GLSL shaders. It is much simpler than the Kate/QTCreator highlighting and only highlights GLSL version 430 (core profile) but it&#8217;s better than nothing (In case you are writing shaders for OpenGL ES 3.0, or OpenGL 3.0 &#8211; 4.2 it [...]]]></description>
				<content:encoded><![CDATA[<p>I hacked together a very simple codeless language module for TextWrangler (also compatible with BBEdit) for GLSL shaders. It is much simpler than the <a title="GLSL Syntax Highlighting for QTCreator (and Kate)" href="http://renderingpipeline.com/2012/11/glsl-syntax-highlighting-for-qtcreator-and-kate/">Kate/QTCreator highlighting</a> and only highlights GLSL version 430 (core profile) but it&#8217;s better than nothing (In case you are writing shaders for OpenGL ES 3.0, or OpenGL 3.0 &#8211; 4.2 it just means that reserved keywords are highlighted as well. In case your shaders are WebGL, OpenGL ES 2.0 or OpenGL 2.x, some (now deprecated) keywords won&#8217;t get highlighted).</p>
<div id="attachment_723" class="wp-caption alignnone" style="width: 745px"><a href="http://renderingpipeline.com/wp-content/uploads/2012/12/textwrangler_glsl_highlighting.png"><img class="size-full wp-image-723" title="textwrangler_glsl_highlighting" src="http://renderingpipeline.com/wp-content/uploads/2012/12/textwrangler_glsl_highlighting.png" alt="OpenGL GLSL Syntax Highlighting for TextWrangler" width="735" height="682" /></a><p class="wp-caption-text">OpenGL GLSL Syntax Highlighting for TextWrangler</p></div>
<p>To install it, download the <a title="GLSL syntax highlighting for TextWrangler" href="http://files.renderingpipeline.com/glslsyntaxhighlighting/glsl430.plist.zip">glsl430.plist</a> (unzip it) and drop it into your <strong>~/Library/Application Support/TextWrangler/Language Modules</strong> folder. After restarting TextWrangler, all .glsl .fsh .vsh .gsh .tcsh .tesh .csh .frag .vert and .geo files should get the correct highlighting. In case you prefer other extensions, you can add or remove extensions from the beginning of the plist (just edit it in a text editor) or change the settings in TextWrangler under <strong>Preferences-&gt;Languages-&gt;Custom Extension Mappings</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://renderingpipeline.com/2012/12/glsl-syntax-highlighting-for-textwrangler-bbedit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GLSL Syntax Highlighting for QTCreator (and Kate)</title>
		<link>http://renderingpipeline.com/2012/11/glsl-syntax-highlighting-for-qtcreator-and-kate/</link>
		<comments>http://renderingpipeline.com/2012/11/glsl-syntax-highlighting-for-qtcreator-and-kate/#comments</comments>
		<pubDate>Sun, 25 Nov 2012 19:49:53 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[GLSL]]></category>
		<category><![CDATA[OpenGL ES]]></category>
		<category><![CDATA[QTCreator]]></category>
		<category><![CDATA[Shader]]></category>
		<category><![CDATA[Syntax Highlighting]]></category>
		<category><![CDATA[WebGL]]></category>

		<guid isPermaLink="false">http://renderingpipeline.com/?p=714</guid>
		<description><![CDATA[The syntax highlighter in QTCreator for GLSL shaders is currently far from ideal: It&#8217;s basically stuck at OpenGL 2.1. If you write shaders for OpenGL 3 (and later) or OpenGL ES 3, you will see a lot of red marks indicating errors in your code simply because QTCreator doesn&#8217;t know the latest keywords. On the [...]]]></description>
				<content:encoded><![CDATA[<p>The syntax highlighter in QTCreator for GLSL shaders is currently far from ideal: It&#8217;s basically stuck at OpenGL 2.1. If you write shaders for OpenGL 3 (and later) or OpenGL ES 3, you will see a lot of red marks indicating errors in your code simply because QTCreator doesn&#8217;t know the latest keywords. On the other hand, nothing indicates deprecated keywords and it can&#8217;t distinguish between the different GLSL variants.</p>
<p>So I was sick of waiting for better support and wrote a solution: The GLSL syntax checker in QTCreator is build-in, so it has to be deactivated first, then a syntax definition XML file in the format also used for Kate (and probably other KDE apps) can be used to define proper and modern GLSL:</p>
<ol>
<li>Go to <strong>Preferences-&gt;Environment-&gt;MIME-Types </strong>in QTCreator and look for all types associated with GLSL (<strong>application/x-glsl</strong>). The handler can&#8217;t be changed, but the file endings &#8211; just remove them. Don&#8217;t forget the MIME-Types <strong>text/x-glsl-es-frag</strong>,<strong> text/x-glsl-es-geometry</strong>,<strong> text/x-glsl-es-vert</strong>,<strong> text/x-glsl-frag</strong> and<strong> text/x-glsl-vert</strong>!</li>
<li>Add a new language syntax definition, look at <strong>Preferences-&gt;Text Editor-&gt;Generic Highlighter</strong>, you will see a location where QTCreator expects the XML file (~/.config/Nokia/qtcreator/generic-highlighter on OS X and Linux). Place the this <a title="GLSL Syntax Highlighting for QTCreator" href="http://files.renderingpipeline.com/glslsyntaxhighlighting/glsl.xml" target="_blank">glsl.xml</a> file there.</li>
<li>Restart QTCreator.</li>
</ol>
<p>The definition has the following features:</p>
<ul>
<li>Support for desktop OpenGL up to 4.3 (with all keywords, build-in functions, constants and variables), as well as OpenGL ES 2.0 and OpenGL ES 3.0 (and WebGL as it uses ES 2 shaders).</li>
<li>Detects the shader version per file based on the #version definition!</li>
<li>Unsupported keywords, build-in variables and functions are marked as such based on the detected GLSL version (see gif below).</li>
<li>Marks the &#8216;special parameters&#8217; inside of layout() declarations based on the GLSL version (those are not keywords according to the specs so they are not marked outside of a layout() declaration).</li>
</ul>
<p>See this comparison of the same (non functional) dummy code to get an impression:</p>
<div id="attachment_716" class="wp-caption alignnone" style="width: 666px"><a href="http://renderingpipeline.com/wp-content/uploads/2012/11/glsl_syntax_highlighting_qtcreator.gif"><img class="size-full wp-image-716" title="glsl_syntax_highlighting_qtcreator" alt="GLSL Syntax Highlighting for modern OpenGL" src="http://renderingpipeline.com/wp-content/uploads/2012/11/glsl_syntax_highlighting_qtcreator.gif" width="656" height="779" /></a><p class="wp-caption-text">Comparing the build-in syntax highlighter with this alternative. Note how keywords get crossed out in case they are not supported by the #version declaration.</p></div>
<p>This highlighting does not perform a full syntax check and assumes core profile code on GLSL 1.30 and later. The detected file endings are: *.glsl; *.vsh; *.vert; *.tcsh; *.tcs; *.tesh; *.tes; *.gsh; *.geo; *.geom; *.fsh; *.frag; *.csh; *.cs, but this list can easily get changed in line 4 of the XML. The highlighting style can get adjusted beginning at line 1884 (always restart QTCreator after changing the XML).</p>
<p>The current limitations are:</p>
<ul>
<li>The shader type itself (Compute Shader, Geometry Shader etc.) does not get detected &#8211; it would be possible to create six different XML files with slightly different rules to only mark keywords useful/supported per shader stage (in case the file ending is sufficient to make this distinction).</li>
<li>Version 130 and later assume core and mark compatibility build-ins as unsupported.</li>
<li>In <strong>#version 300 es</strong> for OpenGL ES 3, only one space between the &#8217;300&#8242; and the &#8216;es&#8217; are allowed.</li>
</ul>
<p>A description of the XML format can be found <a href="http://kate-editor.org/2005/03/24/writing-a-syntax-highlighting-file/" target="_blank">here</a>.</p>
<p>The <a href="http://files.renderingpipeline.com/glslsyntaxhighlighting/glsl.xml" target="_blank">glsl.xml</a> version 4.30 (the version number mimics the latest supported OpenGL version).</p>
<p><a href="http://flattr.com/thing/1032653/GLSL-Syntax-Highlighter-for-QTCreator" target="_blank"><img title="Flattr this" alt="Flattr this" src="http://api.flattr.com/button/flattr-badge-large.png" border="0" /></a></p>
<p><strong>Update 12/1/12:</strong> In case you prefer TextWrangler/BBEdit, there&#8217;s a (simpler) <a title="GLSL Syntax Highlighting for TextWrangler &amp; BBEdit" href="http://renderingpipeline.com/2012/12/glsl-syntax-highlighting-for-textwrangler-bbedit/">solution for that as well</a>.</p>
<p><strong>Update 12/17/12:</strong> A few GLSL 4.20 build-in functions were defined as 4.30 functions (e.g. imageStore).</p>
]]></content:encoded>
			<wfw:commentRss>http://renderingpipeline.com/2012/11/glsl-syntax-highlighting-for-qtcreator-and-kate/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Noperspective varying interpolation bug on MacOS X</title>
		<link>http://renderingpipeline.com/2012/11/noperspective-varying-interpolation-bug-on-macos-x/</link>
		<comments>http://renderingpipeline.com/2012/11/noperspective-varying-interpolation-bug-on-macos-x/#comments</comments>
		<pubDate>Tue, 20 Nov 2012 07:25:36 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[Bug]]></category>
		<category><![CDATA[Intel]]></category>

		<guid isPermaLink="false">http://renderingpipeline.com/?p=709</guid>
		<description><![CDATA[There seems to be a bug with noperspective varyings in OpenGL 3.2 contexts on MacOS X on Intel HD4000 hardware: The varying does not get interpolated and all fragments of the triangle get the same value (similar to flat interpolation). Note that in the left screen shot the image looks similar to perspective correct interpolated texture [...]]]></description>
				<content:encoded><![CDATA[<p>There seems to be a bug with <em>noperspective</em> varyings in OpenGL 3.2 contexts on MacOS X on Intel HD4000 hardware: The varying does not get interpolated and all fragments of the triangle get the same value (similar to <em>flat</em> interpolation).</p>
<div id="attachment_710" class="wp-caption alignnone" style="width: 759px"><a href="http://renderingpipeline.com/wp-content/uploads/2012/11/noperspective_bug.jpg"><img class="size-full wp-image-710" title="noperspective_bug" src="http://renderingpipeline.com/wp-content/uploads/2012/11/noperspective_bug.jpg" alt="" width="749" height="600" /></a><p class="wp-caption-text">noperspective varying interpolation bug</p></div>
<p>Note that in the left screen shot the image looks similar to perspective correct interpolated texture coordinates as the triangles are quite small in screen space but the are in fact interpolated non perspectively.</p>
<p>As @marcofatto noted, the bug does not occur on his NVidia on 10.8, so it seems that the Intel drivers are causing the problems.</p>
<p><a href="http://renderingpipeline.com/wp-content/uploads/2012/11/tweet_marcofatto_noperspective_bug.jpg"><img class="alignnone size-full wp-image-711" title="tweet_marcofatto_noperspective_bug" src="http://renderingpipeline.com/wp-content/uploads/2012/11/tweet_marcofatto_noperspective_bug.jpg" alt="" width="456" height="160" /></a></p>
<p>The bug has been filed as #12728578 and joins my <a title="MacOS X OpenGL driver bugs" href="http://renderingpipeline.com/2012/07/macos-x-opengl-driver-bugs/">list of unfixed OpenGL bugs</a> on MacOS 10.8.</p>
]]></content:encoded>
			<wfw:commentRss>http://renderingpipeline.com/2012/11/noperspective-varying-interpolation-bug-on-macos-x/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Augmented Reality on a Segway</title>
		<link>http://renderingpipeline.com/2012/11/augmented-reality-on-a-segway/</link>
		<comments>http://renderingpipeline.com/2012/11/augmented-reality-on-a-segway/#comments</comments>
		<pubDate>Thu, 15 Nov 2012 13:48:54 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AR]]></category>
		<category><![CDATA[Augmented Reality]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[Paper]]></category>
		<category><![CDATA[Segway]]></category>

		<guid isPermaLink="false">http://renderingpipeline.com/?p=678</guid>
		<description><![CDATA[In september I was in San Francisco for a couple of days to attend the MobileHCI conference and the MobiVis workshop, but not just for fun and looking at cool new stuff, I also presented a work about mobile augmented reality. My colleagues Michael Königs, Prof. Dr. Leif Kobbelt and myself tried out if you [...]]]></description>
				<content:encoded><![CDATA[<p>In september I was in San Francisco for a couple of days to attend the <a href="http://www.mobilehci2012.org" target="_blank">MobileHCI</a> conference and the MobiVis workshop, but not just for fun and looking at <a title="Tilt Displays – A true 3D display" href="http://renderingpipeline.com/2012/09/tilt-displays-a-true-3d-display/">cool new stuff</a>, I also presented a work about mobile augmented reality. My colleagues Michael Königs, Prof. Dr. Leif Kobbelt and myself tried out if you could build an augmented reality application (a game in this case) which uses only image based methods for localisation. This means instead of using the (inaccurate) GPS of your smartphone, you send an image to a remote server that figures out, where you are and what you are looking at. This has the added bonus of working exactly the same indoors and outdoors.</p>
<div id="attachment_679" class="wp-caption alignnone" style="width: 460px"><a href="http://renderingpipeline.com/wp-content/uploads/2012/11/argame_on_segway.jpg"><img class="size-full wp-image-679" title="argame_on_segway" src="http://renderingpipeline.com/wp-content/uploads/2012/11/argame_on_segway.jpg" alt="" width="450" height="348" /></a><p class="wp-caption-text">AR gaming on a segway.</p></div>
<p>During the game you had to interact with historical figures and solve quests in a Monkey Island kind of style, but instead of klicking where you want to go you had to go there as the setting was the real city centre of Aachen. As this was intended for tourists to discover the city, we added a Segway for faster travelling &#8211; and additional fun ;-)</p>
<p>Of course we build authoring tools, did a user study and tested the image based localisation method from a technical and usability standpoint (you can find the details in the <a href="http://mobivis.labs-exit.de/AcceptedPapers/mobivis2012_Menzel_et_al.pdf" target="_blank">paper</a>). The results summed up: You should look into image based localisation and computer vision if you want to build immersive AR apps, just GPS&amp;compass will never be good enough. The users loved the point&amp;click setting in the real world (and driving a Segway) and even our inexperienced, non-tech-savvy users had no problems with the localisation and gaming metaphors.</p>
<div id="attachment_680" class="wp-caption alignnone" style="width: 460px"><a href="http://renderingpipeline.com/wp-content/uploads/2012/11/argame_editor_places.jpg"><img class="size-full wp-image-680" title="argame_editor_places" src="http://renderingpipeline.com/wp-content/uploads/2012/11/argame_editor_places.jpg" alt="" width="450" height="370" /></a><p class="wp-caption-text">Our authoring tool (here with a dummy SanFrancisco setting, the described game was done in Aachen, Germany).</p></div>
<p>The server for finding our your position wasn&#8217;t done in this project but based on the work of our coworkers. On Youtube you can find videos of an older techdemo:</p>
<p><iframe width="480" height="360" src="http://www.youtube.com/embed/lSMcbYSsT4Y?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>You can find the paper titled &#8220;<a title="A Framework for Vision-based Mobile AR Applications" href="http://mobivis.labs-exit.de/AcceptedPapers/mobivis2012_Menzel_et_al.pdf" target="_blank"><em>A Framework for Vision-based Mobile AR Applications</em></a>&#8220; on the <a href="http://mobivis.labs-exit.de/AcceptedPapers.html" target="_blank">MobiVis website</a> for further information.</p>
]]></content:encoded>
			<wfw:commentRss>http://renderingpipeline.com/2012/11/augmented-reality-on-a-segway/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Understanding the parallelism of GPUs</title>
		<link>http://renderingpipeline.com/2012/11/understanding-the-parallelism-of-gpus/</link>
		<comments>http://renderingpipeline.com/2012/11/understanding-the-parallelism-of-gpus/#comments</comments>
		<pubDate>Fri, 09 Nov 2012 22:03:34 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[Graphics Hardware]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Shader]]></category>
		<category><![CDATA[SIMD]]></category>

		<guid isPermaLink="false">http://renderingpipeline.com/?p=637</guid>
		<description><![CDATA[A lot of tasks in (3D) graphics are independent from each other, so the idea to parallelize those tasks is not new. But while parallel processors are nowadays very common in every desktop PC and even on newer smartphones, the way GPUs parallelize there work is quite different. Understanding the ways a GPU works can [...]]]></description>
				<content:encoded><![CDATA[<p>A lot of tasks in (3D) graphics are independent from each other, so the idea to parallelize those tasks is not new. But while parallel processors are nowadays very common in every desktop PC and even on newer smartphones, the way GPUs parallelize there work is quite different. Understanding the ways a GPU works can help understanding the performance bottlenecks and is key to design algorithms that fit the GPU architecture.</p>
<p>I will mostly focus on how the programmable parts of the GPUs are designed and less on the remaining fixed-function parts. So let&#8217;s take a look at what is needed to build a GPU for current graphics APIs:</p>
<p>We need a processor that can run arbitrary code: our shaders or compute kernels. We will focus on float-point performance as these operations are needed often in graphics. Hardware support for more exotic functions as tan, sin, pow, sqrt etc are also needed. We will end up with some logic to decode the instruction to execute, some registers, a float-point ALU (working with ints is also needed) and a cache. If we copy the design of a multi-core CPU, we would copy&amp;paste the whole thing until the DIE space is filled up. Ok, CPUs are not that simple, but let&#8217;s stick with this simplification for a moment.</p>
<p>In reality modern CPU cores are quite complex: they speed up the program execution by reordering independent operations on the fly (<a href="http://en.wikipedia.org/wiki/Out-of-order_execution" target="_blank">out-of-order execution</a>), they try to guess the outcome of a branch before it was evaluated to keep the deep pipelines filled (<a href="http://en.wikipedia.org/wiki/Branch_prediction" target="_blank">branch prediction</a>) and implement parallel concepts in a single core: <a href="http://en.wikipedia.org/wiki/SIMD" target="_blank">SIMD</a> and <a href="http://en.wikipedia.org/wiki/Simultaneous_multithreading" target="_blank">simultaneous multithreading</a>. A GPU core is simpler when it comes to execution logic, they are in-order and try less to be clever. While this means that CPUs can handle a single thread better, the GPU design saves a lot of space on the DIE which can be used to add more cores from which parallel applications benefit more. But GPUs also implement the last two tricks (SIMD, simultaneous multithreading), even in more extreme ways.</p>
<h2>Why do it once if you can do it twice?</h2>
<p>Lets look at them individually: SIMD stands for single instruction, multiple data. One instruction is performed on a vector of operands at a time, that&#8217;s why we talk about vector processors. This idea is not new, there have been vector supercomputers in the 70th (<a href="http://en.wikipedia.org/wiki/CDC_Star-100" target="_blank">CDC Star-100</a> or <a href="http://en.wikipedia.org/wiki/Texas_Instruments_ASC" target="_blank">Texas Instruments ASC</a>). The trick is, that an instruction only has to be decoded once and multiple ALUs can than perform the work on multiple data elements at once. This design was used for graphics in 1990 with the Intel i750, this chip could store two 8 bit values in one 16 bit register and perform the same operation on both in parallel.</p>
<p>Around the same time, 1989 Intel also introduced a CPU with a new instruction set, the Intel i860 with a <a href="http://en.wikipedia.org/wiki/VLIW" target="_blank">VLIW</a> architecture which could also run SIMD like instructions. This chip actually ended up on a &#8216;graphics card&#8217;, the <a href="http://www.google.de/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=2&amp;cad=rja&amp;ved=0CDUQFjAB&amp;url=http%3A%2F%2Fwww1.cs.columbia.edu%2F~ravir%2F6160%2Fpapers%2Fp109-akeley.pdf" target="_blank">RealityEngine</a>, in 1993 as what we would now call a vertex shader.</p>
<p>Even though these chips weren&#8217;t very successful, the technique was added in 1996 into the Pentium MMX which added vector integer operations on 64 bit wide registers (each could hold e.g. 8 8-bit values or 4 16-bit values). AMD introduced 3DNow in 1998 adding float point operations in a similar way. As back then graphics cards were mostly rasterizers and the geometric transformations were still performed by the driver on the CPU, this vector float operations had the potential to speed up 3D graphics. Intel didn&#8217;t adapt 3DNow but instead extended its own set of vector operations with float operations, gave it a larger register set (128-bit) and named this SSE in 1999, first implemented on the Pentium 3. The latest iteration on Ivy Bridge has 256-bit wide registers which can operate on 8 float values or 4 double in parallel (it&#8217;s also renamed to AVX).</p>
<p>When we have a loop doing the same stuff with a lot of data points and each loop iteration is independent from each other, SIMD can help. Think about blending two images together, each new pixel is the weighed sum of the two input pixels from the two input images. The operations for all pixels are exactly the same, just the data is different. If however inside this loop there is a branch, things get tricky. In case all simultaneously evaluated data points take the same branch, everything is fine. But if even one point branches differently, we have to evaluate two pathes sequentially. This can be done by evaluating both branches and masking out the data points that took the other branch. Of course, the wider our SIMD registers, the higher the performance (given that the CPU can get the data fast enough from memory) but the higher the risk a branch will screw up the parallelism.</p>
<p>Transforming vertices and shading pixels are tasks that do (mostly) the same stuff on different data points so this can get accelerated by SIMD. When you write a shader or a compute kernel, you write the inner part of a loop (over all elements, fragments, vertices…). The compiler can now just merge N shader calls to one stream of SIMD instructions. On a modern Pentium this would mean shading 8 vertices in one go, on a current NVidia it means shading one &#8216;warp&#8217; of 32 vertices (called &#8216;wavefront&#8217; over at AMD). Yup, the SIMD width is 32 floats or 1024-bit. This means, that shading 32 fragments is a fast as shading just one! But it also means, that shading just one fragment is as slow as shading 32…</p>
<h2>If you have to wait, do something else!</h2>
<p>Another technique that we also know from CPUs is <a href="http://en.wikipedia.org/wiki/Simultaneous_multithreading" target="_blank">simultaneous multithreading</a>, or &#8216;Hyper-Threading&#8217; as Intel&#8217;s marketing guys call it. The basic idea is this: CPUs are fast, really fast, but memory is slow and can&#8217;t keep up. This is why data is cached in on-die caches and even quicker registers! But every once in a while the CPU can&#8217;t calculate as it has to wait for data. At other times it can calculate but the memory bus is idle. So why not switch from one thread that waits for data to another one that is ready? Well, for once the decision would need slow OS intervention and the switch would need to swap in the data of the other thread … from memory. To make this idea possible, the decision has to be made by the CPU alone and the data has to reside on the CPU the whole time. For this the registers have to be (at least) duplicated and some additional logic has to be added. Note that the ALUs are not doubled so only one thread is actually calculating stuff at any given time but the ALUs can be kept busy even if one thread stalls as it has to wait for memory.</p>
<p>Some chips can hold more than two threads on the CPU like the <a href="http://en.wikipedia.org/wiki/SPARC_T4" target="_blank">SPARC T4</a>: 8 threads per core. On a GPU this can be even more extreme: instead of a fixed set of registers they can have large register files and the amount of threads that these can hold is defined by the sum of registers needed by all threads. This number can go in the hundreds.</p>
<p>But there is a difference between simultaneous multithreading of CPUs and (at least some) GPUs: The threads on a CPU are completely independent of each other, e.g. they can belong to different applications. On some GPUs they have to belong to the same shader or kernel. While this looks like a major drawback, it&#8217;s actually not that bad: If you think of a 16k triangle mesh, one SIMD thread will evaluate (to stick with the earlier number) 32 vertex shader &#8220;threads&#8221; simultaneously, so in total 512 such threads have to be started (for 16k vertices) &#8211; enough opportunity for lots of simultaneous multi-threading. This limitation can also be exploited in a couple of ways: constants can be placed one time in the register file and don&#8217;t have to be copied for each thread. Uniforms can be cached for all threads as those are read-only anyway. Memory and texture access pattern will most likely be similar and this way make best use of the shared (texture) cache. The number of threads that can run in parallel are easy to determine as each thread uses the same number of registers. The threads can even share one instruction cache.</p>
<h2>We have the ingredients, lets make a GPU!</h2>
<p>Let&#8217;s put everything together. At the lowest level, we have a very wide SIMD processor, if it can operate on N floats in parallel, NVidia, AMD and co. would count that as N &#8216;cores&#8217;. Actually, a NVidia <del>Fermi</del> Kepler chip (e.g. in a GeForce 680) has processors that can issue six 32-float wide SIMD instructions at once in a Streaming Multiprocessor (SMX), counting for 192 &#8216;cores&#8217; at the marketing department. These processors can often run less of the more complex operations as trigonometric functions in parallel as simple float point operations, those have to be done more sequentially in a &#8216;special function ALU&#8217; (called &#8216;special function unit&#8217; SFU at NVidia). This is done to save transistors on less used operations and add more ALUs and wider SIMD for more common operations.</p>
<p>Each core has a (texture) cache, a register file and runs multiple threads in parallel with simultaneous multithreading. Fixed-function blocks can also be added here, e.g. texture units or even the fixed-functions of the geometry pipeline (vertex fetch, tessellator, viewport transform etc.) as it&#8217;s done on recent NVidia hardware.</p>
<p>Multiple of these processors are then placed on the GPU DIE, together with the remaining parts of the fixed-function pipeline, more cache and the memory controllers. Of course the whole thing also gets a command processor that distributes the workload coming from the host to the various processors. This way it&#8217;s easy to build different versions of the chip for the low-end to high-end market: just choose a different number of SIMD processors (and/or you can also change some number of those processors as well, e.g. the SIMD width).</p>
<h2>Crunch the numbers on the number crunchers:</h2>
<p>We already started to use the NVidia Kepler architecture as an example, so lets complete this: What I called SIMD processor is called SMX or streaming multiprocessor here. It has in fact 192 float point ALUs (&#8216;CUDA cores&#8217;) and 32 special function units. Threads are run in batches of 32, called a &#8216;warp&#8217;. The register file holds 64k 32bit values (e.g. float or ints) for all threads. It also has a texture cache, uniform cache, instruction cache and 64kb that can be used as an L1 cache or shared memory (in case it&#8217;s used for compute). Also 16 texture units to get in more data. All fixed-functions up to the viewport transformation is handled here as well. To get to the rasterization, we have to look one level up: Two SMX and one shared &#8216;Raster Engine&#8217; form one &#8216;Graphics Processing Cluster&#8217; and up to four of those are placed one one DIE together with 512kb L2 cache, memory controller and a &#8216;GigaThread Engine&#8217; to keep the SMX busy. To sum it up, we get 8 SMX with each 6 32-wide SIMD lanes for a total of 1536 &#8216;cores&#8217;. Each SMX can run a different shader or kernel, so no more as 8 different programs are running at any time but from those thousands of threads doing the same work on different data points.</p>
<p>At AMD, the latest architecture has processors called &#8216;Graphics Core Next&#8217; (GCN Compute Units): Each has just four SIMD lanes with are just 16 floats wide. Simultaneous multithreading works a bit different here: A GCN is limited to 10 threads per SIMD lane and thus 40 threads per processor (vs. &#8216;whatever fits in the register file&#8217; on NVidia hardware), but these don&#8217;t have to belong to the same shader/kernel making this design more flexible (and in fact a little bit more CPU-like). Each processor has 64kb local memory and 16kb L1 cache. The fixed-function parts for geometry and rasterization are together with a L2 cache &#8216;globally&#8217; on the DIE. Where or GeForce example has only 8 SMX processors, AMD puts in 32 GCN processors. So here the total is 32 GCNs with 4 16-wide SIMD lanes for a total of 2048 cores.</p>
<p>Just for fun let&#8217;s compare this to a (4 core) Core i7 Ivy Bridge: What&#8217;s called &#8216;core&#8217; on a CPU is what I called &#8216;processor&#8217; on the GPU examples above as individual SIMD ALUs are called &#8216;cores&#8217; there. So if an i7 would be a GPU, we would count: 4 processors with two threads each, each processor has a 8 float wide SIMD unit. This gives a total of 32 &#8216;cores&#8217; and a maximum of 64 &#8216;threads&#8217; in parallel. Still not much, even if taking the higher clock rate into account.</p>
<p>If that&#8217;s not enough performance for your graphics needs, you can always put two GPU DIEs on one graphics card and/or install multiple cards to work together&#8230;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://renderingpipeline.com/2012/11/understanding-the-parallelism-of-gpus/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.434 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2013-05-20 04:15:45 -->

<!-- Compression = gzip -->