One Step Forward, Two Steps Deferred

This post is going to be dense. It’s the summation of the last month or so of work on StaticRectangle (sans PBR materials covered in my last post).

Let’s start with the results and discuss how we got there.

h toggles the Controls
~ opens the terminal
Mouse look with left/right mouse buttons
w, a, s, d to move
Space Bar fly-up, Left Ctrl fly-down
Change the Render Pass in Render Options
Enable Show Lights in the Viewport to select Lights to tweak properties etc.
Select objects to change properties

Foward vs Deferred Rendering

To properly implement indirect lighting (and the other screenspace effects I had bodged til now), I needed to move away from a primarily forward renderer, to a hybrid-deferred one.

Here’s a simpistic high-level distinction:

Forward

Draws all objects and lights in the same pass to produce the pixel’s colour. This means for each object, every light is evaluated making lighting calculations more expensive the more lights and objects you have.

Deferred

Draws geometry and material properties into multiple fullscreen textures (normals, depth, albedo, metallic, roughness etc.) in one pass and deferes lighting to a later pass, this way lighting cost scales with screen pixels not scene complexity.

Why use one over the other?

Forward rendering is good when your scenes are simpler and GPU resources are limited, so if you’re targeting mobile etc.

Deferred is good when you have many lights or want screen-space effects like SSAO, SSGI and SSR (and others, TBD) and can spare the GPU memory for all the textures. Screenspace techniques leverage maps that a G-Buffer provides and you can use the output of them as inputs to the lighting pass.

Hybrid

So why hybrid-deferred? Some materials still require forward rendering. Transparent objects for example can’t be rendered into the G-Buffer because it only stores one surface per pixel, and transparency needs to blend multiple overlapping objects/surfaces. The typical approach is to render all opaque materials into the G-Buffer, run the deferred lighting pass, then forward-render transparent objects back-to-front into the lit scene. Same goes for volumetics, particles and refractive materials etc.

Image Based Lighting

HDR

The first requirement of Image Based Lighting is a High Dynamic Range Image (HDRI). HDR images store not only colour, but also brightness per pixel. This lets us know where the light sources are and how intense they are. These environment maps are typically captured as equirectangular projections, a full 360-degree spherical view mapped onto a rectangular image.

Cube Map

To use a equirectangular HDR, it’s converted to a cube map. That is six square textures representing the faces of a cube (up, down, left, right, front, back). Cube maps are a more efficient way to sample, you can pass in a normal or reflection direction and the hardware returns the right texel without having to unwrap it manually each time.

Irradience Map

The cube map contains the raw environment, but with diffuse lighting a surface doesn’t receive light from a single direction, it receives light from the everywhere around it. Computing this per pixel at runtime would mean sampling thousands of texels from the cube map for every fragment. Instead, it is precomputed as an irradience map. The irradiance map is a low-resolution cube map where each texel stores the total incoming light for a surface facing that direction. At render time, you sample it with the surface normal and get the diffuse lighting in a single texture lookup.

Prefiltered Env Map

While the irradiance map handles diffuse lighting, the prefiltered environment map handles specular. Because surfaces have varing degrees of “roughness” we generate multiple maps at varing blurriness (StaticRectangle does 5 currently), without it would mean sampling hundreds of texels per fragment and weighing them by the BDRF at that roughness amount, so not real-time friendly.

BDRF (Bidirectional Reflectance Distribution Function)

BRDF is a mathematical function that describes how light reflects off a surface in different directions¹. A BDRF LUT is a small 2D texture where one axis is the viewing angle and the other is roughness. At render time, you sample this texture with the viewing angle and roughness, then multiply the result with the prefiltered environment sample. This avoids computing an expensive integral per pixel by splitting it into two cheaper texture lookups, one for the light, one for the surface response.

Bringing it together

In StaticRectangle a Cook-Torrence² BDRF LUT is generated at start up, this only has to happen once. An EnvLight can hold an hdrPath that when loaded, generates the Cube Map, Irradience Map and Prefiltered Env Map and caches them. The maps are then uploaded for use by the lighting shader. It’s a bind group that only needs to be created once when it’s loaded and recreated only when the hdrPath changes.

You can test this out yourself by turning on Show Lights in the Viewport settings, and changing the Environment Map between the three options there. The first time it has to load so there will be a slight delay, after that they will be cached and retrieved instantly.

If you want to learn more about Image Based Lighting, this is a good place to start.

Global Illumination

Global Illumination is a lighting technique used to represent indirect light that has bounced off one or more surfaces before reaching the point being shaded.

In an offline pathtracer or highend real-time game engines, global illumination is computed more physically accurately where “photons” are traced from a light source as it bounces around the scene leaving behind a mix of the colour of object and the light itself at diminishing intensities as it eventually loses energy.

In StaticRectangle (and most real-time applications) this far too expensive to do per frame.

Instead there are a few techniques that can help approximate it. A prefiltered environment map (we discussed this already with the IBL), light probes (captures the scene at strategic points to better approximate global lighting conditions) and screenspace global illumination (SSGI).

Screenspace Global Illumination

I started with SSGI, mainly because I had already been looking into it³, though in restrospect maybe light/reflectance probes would have been a natural step from IBL. SSGI is still quite involved, it’s made up of 4 passes (technically 6, but two are up/downsamples scaling the passes between half and full resolution).

Stochastic Normals

Stochastic normals are random directions generated in the hemisphere above the surface at each pixel.

This randomness means each pixel only traces one ray per frame instead of hundreds, making it cheap enough for real-time. To generate this pass, you take the G-Buffer’s world space normals, generate two psuedo-independent numbers, one for elevation the other for rotation (azimuth) then do the cosine weighted hemisphere sample about the normal and that is your new normal for that pixel.

Side note, what you use as the elevation and rotation terms makes a big difference. I have gotten pretty good results using Interleaved Gradient Noise⁴ (IGN) for elevation and a frame offset blue noise texture sample for the rotation, but I’d like to do more experimenting.

Raymarch

Once we have the stochastic normals, in the next pass (excluding a downsample to half the resolution) we march along the normal direction in small steps through the scene, checking at each step whether the ray hit something.

It doesn’t actually trace through the scene though, it traces through screen space by projecting the ray’s position at each step back onto the screen and compares the ray’s depth with the depth buffer at that pixel.

If the ray is behind the surface stored in the depth buffer, it’s a hit. It then samples the previous frame’s⁵ colour at that point and returns it as the indirect light contribution for the originating pixel.

Bilateral Blur

An inherent issue with marching stochastic directions per pixel, is noise. Each ray shoots off in different directions causing discontinuity between pixels. The next pass helps smooth some of that out. We perform a bilateral blur, which to quote wikipedia⁶ “is a non-linear, edge-preserving, and noise-reducing smoothing filter.”

The important thing here, is that we preserve the edges, we can’t soften everything or colours bleed into places they shouldn’t.

Typically a bilaterial blur uses spatial closeness (closer pixels have more influence, the normal Gaussian falloff) and colour simularity (pixels closer in colour or intensity have more influence)⁷, but with our G-Buffer normal and depth passes we have extra knobs we can tune. Pixels at similar depths with similar surface normals can be blurred together more aggressively leaving the edges nice and crispy.

Temporal Accumulation

The final step (after an upsample back to full resolution) blends the current frame’s result with the history of all previous frames. With each frame and every pixel being traced a random direction, eventually over many frames the full hemisphere is covered and the result converges into a smooth well sampled result.

The blend is controlled by a single weight (it’s called Temporal Strength in the Render Options > SSGI Settings), which is how much to blend the newly generated frame versus the accumulated history.

Lower strengths means more history, giving a smooth noiseless result but slower response and more obvious ghosting. Higher strengths mean new frames have more influence, responding faster but keepiong more of the noise left from the bilaterial blur.

Conclusion

SSGI is a pretty cool technique, though in many cases it’s probably not worth the cost as the extra passes/textures do add up, especially on high pixel density displays like the Macbook/IPhone etc. It also only works well in enclosed spaces, where rays aren’t marched into oblivion and you’re not having to increase the step count too high.

That said, I haven’t looked at optimisations yet, there is a lot of tuning I can do and there are probably better algorithms/approaches I haven’t come across yet.

There are also some obvious limitations to screenspace effects, you’ll likely notice it pretty quick especially when viewing any SSGI pass down from stochastic normals.

As soon as a surface either disappears from the cameras view, it no longer contributes to the indirect lighting. This doesn’t make physical sense of course, it’s just a limitation. It’s why this technique is complimented with light/reflection probes.

Another subject for another day.

https://medium.com/@gerdoo/brdf-bc71b13a452 ↩︎
https://graphicscompendium.com/gamedev/15-pbr ↩︎
https://gamehacker1999.github.io/posts/SSGI/ ↩︎
https://blog.demofox.org/2022/01/01/interleaved-gradient-noise-a-different-kind-of-low-discrepancy-sequence/ ↩︎
The “previous frame” part is important. SSGI can’t read the current frame’s lighting because it hasn’t been computed yet. Using the previous frame introduces a one frame of latency but at the FPS we’re expecting, it’s imperceptible. The temporal accumulation pass smooths it all out anyway. ↩︎
https://en.wikipedia.org/wiki/Bilateral_filter ↩︎
https://medium.com/@chinmayiadsul/the-art-of-blur-in-image-processing-part-3-bilateral-blur-the-diplomat-0fcfd3d5ee30 ↩︎

Foward vs Deferred Rendering#

Forward#

Deferred#

Why use one over the other?#

Hybrid#

Image Based Lighting#

HDR#

Cube Map#

Irradience Map#

Prefiltered Env Map#

BDRF (Bidirectional Reflectance Distribution Function)#

Bringing it together#

Global Illumination#

Screenspace Global Illumination#

Stochastic Normals#

Raymarch#

Bilateral Blur#

Temporal Accumulation#

Conclusion#