from geometry to pixels

Measuring Input Latency

To deliver a convincing virtual reality experience it is not enough to ensure a high update rate of the (e.g. visual) feedback for the user. The amount of delay between a user input and the change of the systems output is crucial to trick the user to believe to actually be part of the simulated world. If the delay gets to high, it can even result in “simulator sickness”, especially in simulations using a head mounted display.

This delay, or latency as I will call it from now on, depends on the input devices used, the kind of monitor, the rendering software and driver settings. I will first quickly talk about the parts of the system that play a role in adding latency and then show how the total system latency can be measured.

Reasons for Latency

What latency can we expect between a user input and the updated image on the screen? You might thing that because your game runs at 60 frames per second it would take 16.6 msec to send the new image to the display plus the 5 msec the monitor needs to switch the pixels (a number you read on the data sheet). Sadly, the true latency for most games is more in the range of 100 msec instead of 21.6 msec. Let’s take a look at why this is the case:

Most of the time the input for a computer game comes from a keyboard, mouse or gamepad. Those only update the input they receive 125 times (normal PC mouse) to 1000 times (“gamer” mouse) per second. The input is then processed by the operating system before it is handed to the application. While the game could receive the input from a gaming mouse in around 2 msecs, I measured that a standart mouse adds 12 msecs on average.

Assuming the game is running at 60 Hz, we get nearly 16.6 msec latency if the input occurred right after the game queried the user input and no latency if it occurred right before the input was queried.

In the old times(TM) the game would now simulate the game logic, physics etc. and then generate the rendering commands for the GPU (adding 16.6 msec to our list). In times of multi core CPUs this might be pipelined: One thread is calculating the physics and logic for frame N while another thread is generating rendering commands based on the simulation results of frame N-1. This adds another 16.6 msec of latency.

The GPU does not render the image right away, it runs in parallel with the CPU code (this is why the draw commands return nearly instantly). It might collect all drawing commands for the whole frame and not start to render anything until all commands are present. This way another 16 msec are added to our latency list. Even worse: The driver might buffer the draw commands of multiple frames to ensure a better utilisation of the hardware and a overall higher framerate (Triple Buffering is one name of such techniques). The application can force the driver not to cache multiple frames with glFinish() in OpenGL or by waiting for the result of a hardware query (e.g. an occlusion query).

When the image is ready to be displayed it can be send to the screen right away but often the driver waits until the monitor is about to draw a new image: The monitor does not switch the whole image at once but redraws it from top to bottom. If the image would change within this redraw we would see tearing lines, so often the driver wants to sync the display of the new image with the vertical redraw (vsync). In the worst case we just missed the sync and have to wait for 16.6 msec.

Most monitors wait until a new frame was completely transferred before they start to display it adding another frame of latency. TVs might even cache multiple frames to perform 100Hz up-sampling or other post-processes (which is why some TVs have a “gaming” mode which just deactivates these sources of latency). Only a few monitors display an image (nearly) as soon as they get it from the GPU (e.g. some special gaming monitors as well as the Oculus Rift), often adding less then 5 msec of latency.

Now the display can start to update the image, it will take 16.6 msec until it is finished. When all pixels have received the command to switch, we still have to wait the pixel response time until the last of them changed there colour – this is the one value the manufactures are willing to tell us: around 5-8 msec (and some gaming monitors are down to 1-2 msec). The following video shows how a display updates itself in 500x slow-motion (filmed with 1000 frames per second):

Adding everything up in a vsynced scenario with a gaming mouse we get:

  • ~2 msec (mouse)
  • 8 msec (average time we wait for the input to be processed by the game)
  • 16.6 (game simulation)
  • 16.6 (rendering code)
  • 16.6 (GPU is rendering the last frame, current frame is cached)
  • 16.6 (GPU rendering)
  • 8 (average for missing the vsync)
  • 16.6 (frame caching inside of the display)
  • 16.6 (redrawing the frame)
  • 5 (pixel switching)

Total: 122.6 msec.

We can now see why gaming monitors are so attractive: It’s not the faster switching of the pixels, but the 120Hz update rate that cuts most sources for latency in half in case the PC is fast enough to render at 120 FPS. Disabling vsync can also drastically reduce the latency but introduces tearing lines.

In case the window manager does additional compositing the latency can even be longer: OpenGL based compositing on KDE added one more frame in my tests – even in fullscreen. Luckily you can deactivate it.

My own measurements with Half-Life 2 are in the over 100 msec range as well. I tested various combinations of rendering settings by firing a gun and measuring the time passed till the weapon flash started. As the gun was in the middle of the frame I could see the change while the redraw of the frame was half done, the total screen update would take 8 – 13 msec longer. Here are some values:

  • HL2 with vsync: 112 msec
  • HL2 without vsync: 76 msec
  • HL2 without vsync and drastically reduced image quality: 58 msec
  • HL2 without vsync, reduced quality and no multicore renderer: 50 msec

All timings are averages from at least 5 gun shots and recorded with a camera at 240 FPS.

If you are interestend in what you as the developer can do to reduce latency, take a look at John Carmacks article Latency Mitigation Strategies. Let me add one hint to that: We now have a OpenGL/WGL extension that can help us to get the timing right to query the user input (again) right before the vsync: wgl_delay_before_swap – sadly so far this is only supported by NVidia and only on Windows, I’d love to see a glx_delay_before_swap variant for Linux as well.

How to measure the Latency

The easiest and most universal way to measure the system latency is to film the user performing the input and the output screen at the same time with a high-speed camera. This film can then be analysed frame by frame to get the total system latency. Luckily, this is not anymore as expensive as it may sound: For a rough estimate a 60 FPS camera is sufficient and even faster cameras are not too expensive anymore.

The cheapest option would be to use the camera you already have. Many phones can now record 60 FPS videos (note: it doesn’t have to be FullHD!). Some can even record 120 FPS (e.g. iPhone 5s, Galaxy S4, Galaxy Camera, Oppo Find 5).

A better option would be a camera where the user has more control over the shutter speed as well. This can reduce the motion blur in the individual frames and help to make out the one frame where the button was pressed more easily. A good compact camera or any DSLR which can record 60p video will do. Note that we don’t have to worry about motion blur any more if the camera can record >= 240 frames a second as the shutter speed will be fast by definition.

Latency measurement with a DSLR

Latency measurement with a DSLR at 60 FPS: The exposure time was set to 1/8000 of a second, shown is the first frame of the new (red) image. Based on the already coloured pixel lines we can derive a more accurate timing than based on the 60Hz alone.

So called action cameras are specialised in recording at higher frame rates, the most prominent is the GoPro Hero. The top end model can record 240 FPS. But there are some other compact cameras that can even go higher: The Casio Exilim ZR100 (and its successors) can record even 1000 frames per second! The resolution then drops drastically to 224 by 64 pixels and the resulting videos are far from pretty, but it’s enough to count frames given that the scene was lit bright enough. The ZR100 can be found starting at 115€ which makes it nearly one third the price of a 240 FPS capable GoPro. The cheapest Nikon 1 cameras are about in the same price range as a GoPro. These cameras can record up to 1200 FPS at slightly higher resolutions than the ZR100.

The following list shows some of the current cameras and phones that are able to record > 60 FPS together with there recording modes and the Amazon (or Apple) price at the time of writing in € (in case of phones it’s the price without a contract):

  • Nikon J1 (330€ with one lens):
    • 1200 FPS @ 320*120
    • 400 FPS @ 640*240
  • Casio EXILIM EX-ZR100 (115€):
    • 1000 FPS @ 224*64
    • 480 FPS @ 224*160
    • 240 FPS @ 432*320
    • 120 FPS @ 432*320
  • Casio EXILIM EX-ZR300 (133€):
    • 1000 FPS @ 224*64
    • 480 FPS @ 224*160
    • 240 FPS @ 512*384
    • 120 FPS @ 640*480
  • GoPro Hero3 Black (350€):
    • 240 FPS @ 640*480
    • 120 FPS @ 1280*720
  • Canon PowerShot S100 (250€):
    • 240 FPS @ 320*240
    • 120 FPS @ 640*480
  • iPhone 5s (699€):
    • 120 FPS @ 1280*720
  • Galaxy Camera (340€):
    • 120 FPS @ 768*512
  • Galaxy S4 (490€):
    • 120 FPS @ 800*450
  • Oppo Find 5 (500€):
    • 120 FPS @ 640*480

I found 480 FPS recording to be quite helpful and a good trade off between image quality and timing accuracy. For slower applications 240 FPS is good enough and if you only want a rough comparison even 60 FPS might work for you. It can be interesting to see the screen refresh in detail with 1000 FPS but it doesn’t help too much for measuring the latency.

It is important that two events can be detected in the video later on: The input event and the begin of the new frame drawing. As the video frame-rate goes up, the exposure time goes down, so it is crucial to provide enough light. I found it also helpful to stick some bright masking tape onto my mouse button to see the button press event better. To see the redraw as clear as possible, I changed my apps in a way that they either inverted the color when the input event occurred or just showed a special color in fullscreen. This way the redrawing can be seen very clearly independent on where it starts (if vsync is deactivated).

As already described, the timings will not be constant even if vsync is activated as the user input is not synchronised with the application reading the input and also the camera is not synchronised with the display. This means that multiple measurements will be necessary to get reliable results.

To analyse the video every player or editor which lets you step thru the frames one by one will be sufficient, even later versions of Photoshop will do (which has the nice benefit of allowing adjustment layers on top of the video to enhance the contrast).

, , ,

4 thoughts on “Measuring Input Latency
  • Chris Piechowicz says:

    If anyone finds this…i’m trying to find the source of stutter or massive tearing in wolfenstein enemy territory, whenever fps and refresh rate aren’t aligned by gods. I have 1000hz gaming mouse, 144hz gaming monitor, gtx770 with all gpu and cpu downclocking disabled…and yet the game tears/stutters like hell. I have searched the internet for years for answers and can’t find any. I’ve switched to different machine, different operating systems, alas, it defeats me. only other game/program that ever does this is Return To Castle Wolfenstein. I’ve never seen this in the 30 years of gaming/programming/I.T. I think only you guys are smart enough to figure this thing out. Please help.

  • Gary says:

    Thanks for this very interesting post.
    I need help with a difficult android camera performance issue and wondered if you could give me some advice.
    I am developing a camera application were I need to display the camera preview without conceivable latency, so when the user looks at the world and the preview together he shouldn’t conceive a time gap between them.
    Every frame needs to be scaled, flipped, rotated and overlaid with a small image before being displayed but all these manipulations add less than 1ms to the computations so no problem there.
    The minimal latency I measure on my app, on some example projects and on the built-in camera is 120ms (tested on Note4, Galaxy 4 etc) which is definitely noticeable.
    I must reduce it to 60ms. The best frame rate I managed to get is a stable 31 fps. I tried both the old Camera API and Camera2 API both with SurfaceView and TextureView.
    The app needs to run only on a single phone model which is based on the Snapdragon 615 SoC and the OmniVision 13850 camera running OS 5.0.2. I need to display the preview on a 1080×608 area of the display. So there is no shortage of CPU power, as can be seen from the attached Systrace the processor is idle most of the time:
    GPU profiling shows me that rendering is well below the 16ms mark. The 13850 can provide 30, 60 and even 120 fps.
    If I understand the problem correctly, the limit comes from the camera VSync being capped at 32ms (to a maximum of 30 fps) by the Android system (maybe in the kernel drivers) or from the 13850 configuration.
    According to Google ( Android is using triple buffering so at 32ms that’s 96ms which with some overheads can explains the 120ms.
    I assume that if I could increase camera preview fps to 60 my latency will decrease to 60ms.
    There are many discussions how to achieve 60 fps for drawing graphics on the display but I can’t find much about getting 60 fps from the camera preview.
    Is my assumption correct?
    How do I decrease the camera preview latency to 60ms? (It’s a dedicated device so I am allowed to make changes to Linux and drivers).

    • Robert says:

      If it’s possible to get more than 30 FPS out of a camera or not depends on the hardware and the provided API. I don’t know if it’s possible on the phones you are using. Maybe there is an API to get “slow-motion” video (e.g. 120FPS at a lower resolution) out of the API? You could then drop every second image and only display at 60Hz but the read out time would be only around 8ms. Also notice that the exposure time limits the maximal frame-rate of the camera (not sure if that’s configurable on Android).

  • Marc Seebold says:

    fyi: Related CHI’15 paper “Quantifying and Mitigating the Negative Effects of Local Latencies on Aiming in 3D Shooter Games”,

Leave a Reply

Your email address will not be published. Required fields are marked *