Replaying 3D traces with piglit

If you don’t know what is traces based rendering regression testing, read the appendix before continuing.

The Mesa community has witnessed an explosion of the Continuous Integration interest in the last two years.

In addition to checking the proper building of the project, integrating the testing of its functional correctness has become a priority. The user space graphics drivers exhibit a wide variety of types of tests and test suites. One kind of those tests are the traces based rendering regression testing.

The public effort to add this kind of tests into Mesa’s CI started with this mail from Alexandros Frantzis.

At some point, we had support for replaying OpenGL, Vulkan and D3D11 traces using apitrace, RenderDoc and GFXReconstruct with the in-tree tool tracie. However, it was a very custom solution made to the needs of Mesa so I proposed to move this codebase and integrate it into the piglit test suite. It was a natural step forward.

This is how replayer was born into piglit.

replayer

The first step to test a trace is, actually, obtaining a trace. I won’t go into the details about how to create one from scratch. The process is well documented on each of the tools listed above. However, the Mesa community has been collecting publicly distributable traces for a while and placing them in traces-db whose CI is copying them to Freedesktop.org’s MinIO instance.

To make things simple, once we have built and installed piglit, if we would like to test an apitrace created OpenGL trace, we can download from there with:

$ replayer.py download \
 	 --download-url https://minio-packet.freedesktop.org/mesa-tracie-public/ \
 	 --db-path ./traces-db \
 	 --force-download \
 	 glxgears/glxgears-2.trace

The parameters are self explanatory. The downloaded trace will now exist at ./traces-db/glxgears/glxgears-2.trace.

The next step will be to dump an image from the trace. Since it is a .trace file we will need to have apitrace installed in the system. If we do not specify the call(s) from which to dump the image(s), we will just get the last frame of the trace:

$ replayer.py dump ./traces-db/glxgears/glxgears-2.trace

The dumped PNG image will be at ./results/glxgears-2.trace-0000001413.png. Notice, the number suffix is the snapshot id from the trace.

Dumping from a trace may result in a range of different possible images. One example is when the trace makes use of uninitialized values, leading to undefined behaviors.

However, since the original aim was performing pre-merge rendering regression testing in Mesa’s CI, the idea is that replaying any of the provided traces would be quick and the dumped image will be consistent. In other words, if we would dump several times the same frame of a trace with the same GFX stack, the image will always be the same.

With this precondition, we can test whether 2 different images are the same just by doing a hash of its content. replayer can obtain the hash for the generated dumped image:

$ replayer.py checksum ./results/glxgears-2.trace-0000001413.png 
f8eba0fec6e3e0af9cb09844bc73bdc8

Now, if we would build a different commit of Mesa, we could check the generated image at this new point against the previously generated reference image. If everything goes well, we will see something like:

$ replayer.py compare trace \
 	 --download-url https://minio-packet.freedesktop.org/mesa-tracie-public/ \
 	 --device-name gl-vmware-llvmpipe \
 	 --db-path ./traces-db \
 	 --keep-image \
 	 glxgears/glxgears-2.trace f8eba0fec6e3e0af9cb09844bc73bdc8
[dump_trace_images] Info: Dumping trace ./traces-db/glxgears/glxgears-2.trace...
[dump_trace_images] Running: apitrace dump --calls=frame ./traces-db/glxgears/glxgears-2.trace
// process.name = "/usr/bin/glxgears"
1384 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)

1413 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)

error: drawable failed to resize: expected 1515x843, got 300x300
[dump_trace_images] Running: eglretrace --headless --snapshot=1413 --snapshot-prefix=./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace- ./blog-traces-db/glxgears/glxgears-2.trace
Wrote ./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace-0000001413.png

OK
[check_image]
    actual: f8eba0fec6e3e0af9cb09844bc73bdc8
  expected: f8eba0fec6e3e0af9cb09844bc73bdc8
[check_image] Images match for:
  glxgears/glxgears-2.trace

PIGLIT: {"images": [{"image_desc": "glxgears/glxgears-2.trace", "image_ref": "f8eba0fec6e3e0af9cb09844bc73bdc8.png", "image_render": "./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace-0000001413-f8eba0fec6e3e0af9cb09844bc73bdc8.png"}], "result": "pass"}

replayer‘s compare subcommand is the one spitting a piglit formatted test expectations output.

Putting everything together

We can make the whole process way simpler by passing the replayer a YAML tests list file. For example:

$ cat testing-traces.yml
traces-db:
  download-url: https://minio-packet.freedesktop.org/mesa-tracie-public/

traces:
  - path: gputest/triangle.trace
    expectations:
      - device: gl-vmware-llvmpipe
        checksum: c8848dec77ee0c55292417f54c0a1a49
  - path: glxgears/glxgears-2.trace
    expectations:
      - device: gl-vmware-llvmpipe
        checksum: f53ac20e17da91c0359c31f2fa3f401e
$ replayer.py compare yaml \
 	 --device-name gl-vmware-llvmpipe \
 	 --yaml-file testing-traces.yml 
[check_image] Downloading file gputest/triangle.trace took 5s.
[dump_trace_images] Info: Dumping trace ./replayer-db/gputest/triangle.trace...
[dump_trace_images] Running: apitrace dump --calls=frame ./replayer-db/gputest/triangle.trace
// process.name = "/home/anholt/GpuTest_Linux_x64_0.7.0/GpuTest"
397 glXSwapBuffers(dpy = 0x7f0ad0005a90, drawable = 56623106)

510 glXSwapBuffers(dpy = 0x7f0ad0005a90, drawable = 56623106)


/home/anholt/GpuTest_Linux_x64_0.7.0/GpuTest
[dump_trace_images] Running: eglretrace --headless --snapshot=510 --snapshot-prefix=./results/trace/gl-vmware-llvmpipe/gputest/triangle.trace- ./replayer-db/gputest/triangle.trace
Wrote ./results/trace/gl-vmware-llvmpipe/gputest/triangle.trace-0000000510.png

OK
[check_image]
    actual: c8848dec77ee0c55292417f54c0a1a49
  expected: c8848dec77ee0c55292417f54c0a1a49
[check_image] Images match for:
  gputest/triangle.trace

[check_image] Downloading file glxgears/glxgears-2.trace took 5s.
[dump_trace_images] Info: Dumping trace ./replayer-db/glxgears/glxgears-2.trace...
[dump_trace_images] Running: apitrace dump --calls=frame ./replayer-db/glxgears/glxgears-2.trace
// process.name = "/usr/bin/glxgears"
1384 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)

1413 glXSwapBuffers(dpy = 0x56060e921f80, drawable = 31457282)


/usr/bin/glxgears
error: drawable failed to resize: expected 1515x843, got 300x300
[dump_trace_images] Running: eglretrace --headless --snapshot=1413 --snapshot-prefix=./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace- ./replayer-db/glxgears/glxgears-2.trace
Wrote ./results/trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace-0000001413.png

OK
[check_image]
    actual: f8eba0fec6e3e0af9cb09844bc73bdc8
  expected: f8eba0fec6e3e0af9cb09844bc73bdc8
[check_image] Images match for:
  glxgears/glxgears-2.trace

replayer features also the query subcommand, which is just a helper to read the YAML files with the tests configuration.

Testing the other kind of supported 3D traces doesn’t change much from what’s shown here. Just make sure to have the needed tools installed: RenderDoc, GFXReconstruct, the VK_LAYER_LUNARG_screenshot layer, Wine and DXVK. A good reference for building, installing and configuring these tools are Mesa’s GL and VK test containers building scripts.

replayer also accepts several configurations to tweak how to behave and where to find the actual tracing tools needed for replaying the different types of traces. Make sure to check the replay section in piglit’s configuration example file.

replayer‘s README.md file is also a good read for further information.

piglit

replayer is a test runner in a similar fashion to shader_runner or glslparsertest. We are now missing how does it integrate so we can do piglit runs which will produce piglit formatted results.

This is done through the replay test profile.

This profile needs a couple configuration values. Easiest is just to set the PIGLIT_REPLAY_DESCRIPTION_FILE and PIGLIT_REPLAY_DEVICE_NAME env variables. They are self explanatory, but make sure to check the documentation for this and other configuration options for this profile.

The following example features a similar run to the one done above invoking directly replayer but with piglit integration, providing formatted results:

$ PIGLIT_REPLAY_DESCRIPTION_FILE=testing-traces.yml PIGLIT_REPLAY_DEVICE_NAME=gl-vmware-llvmpipe piglit run replay -n replay-example replay-results
[2/2] pass: 2   
Thank you for running Piglit!
Results have been written to replay-results

We can create some summary based on the results:

# piglit summary console replay-results/
trace/gl-vmware-llvmpipe/glxgears/glxgears-2.trace: pass
trace/gl-vmware-llvmpipe/gputest/triangle.trace: pass
summary:
       name: replay-example
       ----  --------------
       pass:              2
       fail:              0
      crash:              0
       skip:              0
    timeout:              0
       warn:              0
 incomplete:              0
 dmesg-warn:              0
 dmesg-fail:              0
    changes:              0
      fixes:              0
regressions:              0
      total:              2
       time:       00:00:00

Creating an HTML summary may be also interesting, specially when finding failures!

Wishlist

Through different backends, replayer supports running apitrace, RenderDoc and GFXReconstruct traces. We may want to support other tracing tools in the future. The dummy backend used for functional testing is a good starting point when writing a new backend.
The solution chosen for checking whether we detect a rendering regression is dependent on having consistent results, as said before. It’d be great if we could add a secondary testing method whenever the expected rendered image is variable. From the top of my head, using exclusion masks could be a quick single-run solution when we know which specific areas in a rendered scenario are the ones fluctuating. For more complex variations, a multi-run based solution seems to be the best option. EzBench has a great statistical approach for this!
The current syntax of the YAML test list files implies running the compare subcommand with the default behavior of checking against the last frame of the tested trace. This means figuring out which call number is the one of the last frame first. It would be great to support providing the call numbers directly in the YAML files to be able to test more than just the last frame and, additionally, cut down the time taken to run the test.
The HTML generated summary allows us to see the reference and generated image during a test run side to side when it fails. It’d be great to have also some easy way of checking its differences. Using Rembrandt.js could be a possible solution.

Thanks a lot to the whole Mesa community for helping with the creation of this tool. Alexandros Frantzis, Rohan Garg and Tomeu Vizoso did a lot of the initial development for the in-tree tracie tool while Dylan Baker was very patient while reviewing my patches for the piglit integration.

Finally, thanks to Igalia for allowing me to work in this.

Appendix

In 3D computer graphics we say “traces”, for short, to name the files generated by 3D APIs capturing tools which store not only the calls to the specific 3D API but also the internal state of the 3D program during the capturing process: shaders, textures, buffers, etc.

Being able to “record” the execution of a 3D program is very useful. Usually, it will allow us to replay the execution without the need of the original program from which we generated the trace, it will also allow in-depth analysis for debugging and performance optimization, it’s a very good solution for sharing with other developers, and, in some cases, will allow us to check how the replay will happen with different GPUs.

In this post, however, I focus in a specific usage: rendering regression testing.

When doing a regression test what we would do is compare a specific metric obtained by replaying the trace with a specific version of the GFX software stack against the same metric obtained from a different version of the GFX stack. If the value of the metric changes we have found a regression (or an improvement!).

To make things simpler, we would like to check changes happening just in one of the many elements of the software stack. The most relevant component is the user space driver. In particular, I care about the Mesa drivers and the GNU/Linux stack.

Mainly, there are two kinds of regression testing we can do with a trace: performance or rendering regression testing. When doing a performance one, the checked metric(s) usually are in terms of speed or memory usage. In the case of the rendering ones what we would do is comparing the rendered output at one (or many) point during the trace replay. This output, a bitmap image, is the metric that we will compare in between two different points of the Mesa driver. If the images differ, we may have found a regression; artifacts, improper colors, etc, or an enhancement, if the reference image is the one featuring any of these problems.

frozen mumblings

tanty's lumber room: free software, life, art and so on …