Measuring Input Latency
To deliver a convincing virtual reality experience it is not enough to ensure a high update rate of the (e.g. visual) feedback for the user. The amount of delay between a user input and the change of the systems output is crucial to trick the user to believe to actually be part of the simulated world. If the delay gets to high, it can even result in “simulator sickness”, especially in simulations using a head mounted display.
This delay, or latency as I will call it from now on, depends on the input devices used, the kind of monitor, the rendering software and driver settings. I will first quickly talk about the parts of the system that play a role in adding latency and then show how the total system latency can be measured.
Reasons for Latency
What latency can we expect between a user input and the updated image on the screen? You might thing that because your game runs at 60 frames per second it would take 16.6 msec to send the new image to the display plus the 5 msec the monitor needs to switch the pixels (a number you read on the data sheet). Sadly, the true latency for most games is more in the range of 100 msec instead of 21.6 msec. Let’s take a look at why this is the case:
Most of the time the input for a computer game comes from a keyboard, mouse or gamepad. Those only update the input they receive 125 times (normal PC mouse) to 1000 times (“gamer” mouse) per second. The input is then processed by the operating system before it is handed to the application. While the game could receive the input from a gaming mouse in around 2 msecs, I measured that a standart mouse adds 12 msecs on average.
Assuming the game is running at 60 Hz, we get nearly 16.6 msec latency if the input occurred right after the game queried the user input and no latency if it occurred right before the input was queried.
In the old times(TM) the game would now simulate the game logic, physics etc. and then generate the rendering commands for the GPU (adding 16.6 msec to our list). In times of multi core CPUs this might be pipelined: One thread is calculating the physics and logic for frame N while another thread is generating rendering commands based on the simulation results of frame N-1. This adds another 16.6 msec of latency.
The GPU does not render the image right away, it runs in parallel with the CPU code (this is why the draw commands return nearly instantly). It might collect all drawing commands for the whole frame and not start to render anything until all commands are present. This way another 16 msec are added to our latency list. Even worse: The driver might buffer the draw commands of multiple frames to ensure a better utilisation of the hardware and a overall higher framerate (Triple Buffering is one name of such techniques). The application can force the driver not to cache multiple frames with glFinish() in OpenGL or by waiting for the result of a hardware query (e.g. an occlusion query).
When the image is ready to be displayed it can be send to the screen right away but often the driver waits until the monitor is about to draw a new image: The monitor does not switch the whole image at once but redraws it from top to bottom. If the image would change within this redraw we would see tearing lines, so often the driver wants to sync the display of the new image with the vertical redraw (vsync). In the worst case we just missed the sync and have to wait for 16.6 msec.
Most monitors wait until a new frame was completely transferred before they start to display it adding another frame of latency. TVs might even cache multiple frames to perform 100Hz up-sampling or other post-processes (which is why some TVs have a “gaming” mode which just deactivates these sources of latency). Only a few monitors display an image (nearly) as soon as they get it from the GPU (e.g. some special gaming monitors as well as the Oculus Rift), often adding less then 5 msec of latency.
Now the display can start to update the image, it will take 16.6 msec until it is finished. When all pixels have received the command to switch, we still have to wait the pixel response time until the last of them changed there colour – this is the one value the manufactures are willing to tell us: around 5-8 msec (and some gaming monitors are down to 1-2 msec). The following video shows how a display updates itself in 500x slow-motion (filmed with 1000 frames per second):
Adding everything up in a vsynced scenario with a gaming mouse we get:
- ~2 msec (mouse)
- 8 msec (average time we wait for the input to be processed by the game)
- 16.6 (game simulation)
- 16.6 (rendering code)
- 16.6 (GPU is rendering the last frame, current frame is cached)
- 16.6 (GPU rendering)
- 8 (average for missing the vsync)
- 16.6 (frame caching inside of the display)
- 16.6 (redrawing the frame)
- 5 (pixel switching)
Total: 122.6 msec.
We can now see why gaming monitors are so attractive: It’s not the faster switching of the pixels, but the 120Hz update rate that cuts most sources for latency in half in case the PC is fast enough to render at 120 FPS. Disabling vsync can also drastically reduce the latency but introduces tearing lines.
In case the window manager does additional compositing the latency can even be longer: OpenGL based compositing on KDE added one more frame in my tests – even in fullscreen. Luckily you can deactivate it.
My own measurements with Half-Life 2 are in the over 100 msec range as well. I tested various combinations of rendering settings by firing a gun and measuring the time passed till the weapon flash started. As the gun was in the middle of the frame I could see the change while the redraw of the frame was half done, the total screen update would take 8 – 13 msec longer. Here are some values:
- HL2 with vsync: 112 msec
- HL2 without vsync: 76 msec
- HL2 without vsync and drastically reduced image quality: 58 msec
- HL2 without vsync, reduced quality and no multicore renderer: 50 msec
All timings are averages from at least 5 gun shots and recorded with a camera at 240 FPS.
If you are interestend in what you as the developer can do to reduce latency, take a look at John Carmacks article Latency Mitigation Strategies. Let me add one hint to that: We now have a OpenGL/WGL extension that can help us to get the timing right to query the user input (again) right before the vsync: wgl_delay_before_swap - sadly so far this is only supported by NVidia and only on Windows, I’d love to see a glx_delay_before_swap variant for Linux as well.
How to measure the Latency
The easiest and most universal way to measure the system latency is to film the user performing the input and the output screen at the same time with a high-speed camera. This film can then be analysed frame by frame to get the total system latency. Luckily, this is not anymore as expensive as it may sound: For a rough estimate a 60 FPS camera is sufficient and even faster cameras are not too expensive anymore.
The cheapest option would be to use the camera you already have. Many phones can now record 60 FPS videos (note: it doesn’t have to be FullHD!). Some can even record 120 FPS (e.g. iPhone 5s, Galaxy S4, Galaxy Camera, Oppo Find 5).
A better option would be a camera where the user has more control over the shutter speed as well. This can reduce the motion blur in the individual frames and help to make out the one frame where the button was pressed more easily. A good compact camera or any DSLR which can record 60p video will do. Note that we don’t have to worry about motion blur any more if the camera can record >= 240 frames a second as the shutter speed will be fast by definition.
So called action cameras are specialised in recording at higher frame rates, the most prominent is the GoPro Hero. The top end model can record 240 FPS. But there are some other compact cameras that can even go higher: The Casio Exilim ZR100 (and its successors) can record even 1000 frames per second! The resolution then drops drastically to 224 by 64 pixels and the resulting videos are far from pretty, but it’s enough to count frames given that the scene was lit bright enough. The ZR100 can be found starting at 115€ which makes it nearly one third the price of a 240 FPS capable GoPro. The cheapest Nikon 1 cameras are about in the same price range as a GoPro. These cameras can record up to 1200 FPS at slightly higher resolutions than the ZR100.
The following list shows some of the current cameras and phones that are able to record > 60 FPS together with there recording modes and the Amazon (or Apple) price at the time of writing in € (in case of phones it’s the price without a contract):
- Nikon J1 (330€ with one lens):
- 1200 FPS @ 320*120
- 400 FPS @ 640*240
- Casio EXILIM EX-ZR100 (115€):
- 1000 FPS @ 224*64
- 480 FPS @ 224*160
- 240 FPS @ 432*320
- 120 FPS @ 432*320
- Casio EXILIM EX-ZR300 (133€):
- 1000 FPS @ 224*64
- 480 FPS @ 224*160
- 240 FPS @ 512*384
- 120 FPS @ 640*480
- GoPro Hero3 Black (350€):
- 240 FPS @ 640*480
- 120 FPS @ 1280*720
- Canon PowerShot S100 (250€):
- 240 FPS @ 320*240
- 120 FPS @ 640*480
- iPhone 5s (699€):
- 120 FPS @ 1280*720
- Galaxy Camera (340€):
- 120 FPS @ 768*512
- Galaxy S4 (490€):
- 120 FPS @ 800*450
- Oppo Find 5 (500€):
- 120 FPS @ 640*480
I found 480 FPS recording to be quite helpful and a good trade off between image quality and timing accuracy. For slower applications 240 FPS is good enough and if you only want a rough comparison even 60 FPS might work for you. It can be interesting to see the screen refresh in detail with 1000 FPS but it doesn’t help too much for measuring the latency.
It is important that two events can be detected in the video later on: The input event and the begin of the new frame drawing. As the video frame-rate goes up, the exposure time goes down, so it is crucial to provide enough light. I found it also helpful to stick some bright masking tape onto my mouse button to see the button press event better. To see the redraw as clear as possible, I changed my apps in a way that they either inverted the color when the input event occurred or just showed a special color in fullscreen. This way the redrawing can be seen very clearly independent on where it starts (if vsync is deactivated).
As already described, the timings will not be constant even if vsync is activated as the user input is not synchronised with the application reading the input and also the camera is not synchronised with the display. This means that multiple measurements will be necessary to get reliable results.
To analyse the video every player or editor which lets you step thru the frames one by one will be sufficient, even later versions of Photoshop will do (which has the nice benefit of allowing adjustment layers on top of the video to enhance the contrast).