# Adaptive Incident Radiance Field Sampling and Reconstruction Using Deep Reinforcement Learning

## Contribution

- Addresses the light-field sampling and reconstruction problem with deep learning techniques and offline datasets.
- Proposes a novel R-network that explores the image and direction spaces of the radiance field to effectively filter and reconstruct the incident radiance field.
- Presents a novel RL-based Q-network to guide the adaptive rendering process.

## Related Work

- Image-Space Methods
- Light field Reconstruction Methods
- Light field Adaptive Sampling Methods
- Filtering using DNN
- DRL

## Representation of Incident Radiance Field

- 4D incident radiance field
- Image spaces: the space of a pixel or a shading point
- Direction spaces: the space of an incident hemisphere centered on the average normal of a group of shading points

- Radiance field blocks
- Partition the image space (pixels) into tiles
- Partition the direction space into bins

- A radiance block $B^j$ is defined by a bounded domain of the direction space and can be expressed as:

$$

B^j\mathop=^{\Delta}\{\theta,\phi;0\leq\theta_0^j\leq\theta\leq\theta_1^j<\frac{\pi}{2},0\leq\phi_0^j\leq\phi\leq\phi_1^j<2\pi\}

$$- $[\theta_0^j,\theta_1^j]$ and $[\phi_0^j,\phi_1^j]$ are the elevation and azimuth angle bounds of $B^j$
- The j-th partition in the direction space at the particular pixel $\pmb x$ is $B_{\pmb x}^j$
- The j-th partition in the direction space shared by all pixels in the tile is $B_T^j$

- Guided by the Q-network
- Adaptively partition the direction space into a radiance field hierarchy with nodes of various sizes
- Recursively partitioning the azimuth angle $\theta$ and the cosine weighted zenith angle $\theta$ into half

- Reconstruct an incident radiance field per tile
- As the inputs of the networks
- The hierarchy is built in the hemisphere of the local frame of a tile
- Defined from the average normal of pixels in the tile

- Project the result to the individual frames of the pixels
- Special case: the hemisphere of an individual pixel has certain incident directions (i.e., uncovered domains) that are not covered by the hemisphere of the average local frame exists
- Lead to poor runtime performance
- Assign a uniform PDF to each uncovered domain for unbiased sampling

- Special case: the hemisphere of an individual pixel has certain incident directions (i.e., uncovered domains) that are not covered by the hemisphere of the average local frame exists

## Radiance Field Reconstruction Using the R-network

Radiance field reconstruction $\mathcal N$ of an incident radiance block $B^j$ is

$$

\hat L_{in}^{B^j}(\pmb x)=\mathcal N(\pmb X,\Xi;\pmb w)

$$- $\hat L_{in}^{B^j}(\pmb x)$: the output of the network
- The average incident radiance in $B^j$ at pixel $\pmb x$

- $\pmb X$: the incident radiance sample in the domain of $B^j$
- $\Xi$: indicate auxiliary features
- e.g., the position, normal and depth

- $\pmb w$: the trainable weight and bias term of $\mathcal N$

- $\hat L_{in}^{B^j}(\pmb x)$: the output of the network

## Filtering 4D Radiance Space

- Challenges
- The number of samples per radiance block is smaller than the number of samples per pixel because one pixel has more than one block
- Due to the curse of dimensionality, performing convolutions in a 4D space requires higher menory, training time, and data

### R-Network

Differences compared to image-space filtering

- Samples are disperesd in many directions
- Inputs are sparse at individual radiance field blocks

- Direct convolution in the 4D light-field space requires high memory and computation

- Samples are disperesd in many directions
Four different CNNs

- Image network
- Use image-space auxiliary features and performs image-space convolution

- Direction network
- Works with features and convolution in the direction space

- Image-direction network (final R-network)
- Direction-image network

- Image network

### Image-Direction Network

The image-direction network can be partitioned into image and direction parts

- $\pmb X^i_\Gamma$: feature map associated with directional block $i$ and pixel tile $\Gamma$
- The image part takes some image-space auxiliary feature maps $\pmb G_\Gamma$ (i.e., surface normals, positions, and depth) and radiance feature maps $\pmb R^j_{\Gamma}$ (mean, variance and gradient of the radiance) as inputs
- The output is the direction-space feature map $\pmb F_{\pmb d_\Gamma}^j$
- Learn from the image part as input to simultaneously convolve the radiance predictions of all radiance field blocks

The geometrical features have a total of 26 channels as follows:

- Three channels for the average normal, one channel for the average variance in the normals, and six channels for the gradients of the average normal
- Three channels for the average position, one channel for the average variance in the positions, and six channels for the gradients of the average position
- One channel for the average depth, one channel for the variance in the depth, and two channels for the gradients of the average depth
- Two channels for the gradients of the average radiance of all blocks

The radiance features have a total of four channels, which comprise:

- Three channels for radiance
- One channel for average variance in the radiance

### Experiments on Reconstruction Networks

## DRL-based Adaptive Sampling

Sample distribution and radiance field resolution greatly influence the results

- A higher number of samples provides richer information for even using well-trained denoising CNNs
- Adaptively refining the radiance field is a commonly used strategy to preserve lighting details with a limited budget

Propose the use of the DRL-based Q-network to guide the sampling and refinement of the radiance field hierarchy

- Use DRL to train the network: attempt to cover all the possible radiance field hierarchies and sampling distributions to search for GT are impractical
- Treat adaptive sampling as a dynamic process that iteratively takes action to refine radiance field blocks into smaller blocks or to increase the number of samples
- The trained Q-network evaluates the value of each action at the runtime to guide the adaptive process

Two factors are critical when building the hierarchy:

- The structure of the hierarchy, i.e., the method for discretizing the radiance field
- Noted in a previous adaptive method: a higher grid resolution can effectively capture high-frequency lighting features, but it comes with an overhead

- An adaptive sample distribution
- More samples (i.e., a greater sample density) are placed in those noisy areas (blocks or nodes) to reduce reconstruction errors

- The structure of the hierarchy, i.e., the method for discretizing the radiance field

### Deep Q-Learning

- Input states: the global radiance field information (e.g., geometry information, radiance samples and radiance field hierarchy)
- Output: predicts the quality value (Q-value) of possible actions as the output to determine the next action
- Action:
- Resample the block by doubling its sample density per block (the number of samples per block)
- Decrease the variance in the radiance feature, which suppresses noise

- Refine the radiance field block to 4x4 new blocks by equally partitioning each axis, while keeping the average number of samples per block by adding some new samples
- The goal of maintaining the sample density per block is to prevent the degeneration of the reconstruction quality due to the sparser samples
- Increase the resolution of the grid, which can capture high-frequency details

- Resample the block by doubling its sample density per block (the number of samples per block)
- The quality value $Q$ and reward of action $r$ are defined in each radiance field block. For radiance field block $B^j$ at a pixel, the quality value of taking action $a$ in a state $s^j$ is defined by Bellman equation:

$$

Q^j(s^j,a)=r(s^j,a)+\gamma\max_{a’}Q^j(s^{‘j’},a’)

$$- $s^j$ and $s^{‘j}$ correspond to the states before and after action $a$ is taken
- $a’$ is a possible next action
- $r(s^j,a)$ denotes the reward of the action
- $\gamma$ is a decay parameter between 0 and 1

- Estimate the Q-value of an action:
- Approximate equation as follows:

$$

Q^j(s^j,a)\approx r(s^j,a)+\gamma\max_{a’}r(s^{‘j},a’)

$$

Define the reward $r(s^j, a)$ as follows:

$$

r(s^j,a)=E^j(s^j)-E^j(s’^j)

$$- $E^j(s^j)$: the reconstruction error of block $B^j$

- Approximate equation as follows:

### Reinforcement Learning Process

### Q-network Structure

## Adaptive Sampling and Rendering

The adaptive sampling and the rendering pipeline contain three steps given the trained networks:

- Use the trained Q-network to guide the process of adaptive sampling and refine the field blocks
- Result in a hierarchy of radiance field blocks

- Use the trained R-network to reconstruct the incoming radiances from the hierarchy
- Apply the reconstruction result for the final rendering

### Adaptive Sampling Algorithm

### Reconstruction and Final Rendering

- Use the image-direction R-network to reconstruct the radiance field blocks of the hierarchy $H_T$
- Generate a fast preview: simply use the reconstructed incident radiance field to evaluate and integrate the product of the incident radiance and the BRDF
- Render the unbiased image: treat the reconstructed radiance field and the BRDF as two PDFs to generate the sampling directions, combine those two samplers via MIS
- The BRDF samples can be analytically drawn from a cumulative distribution function
- The reconstructed radiance field samples are generated by initially selecting a radiance field block from a discrete PDF and then proportionally sampling a point from the block according to the cosine weighting term

- As multi-bounce vertices are sparse, they are not compatible with the input format of the networks
- Switch to standard multiple importance sampling
- Adapt to other photon guided methods if the lighting is complex

## Result

## Limitations

- BRDF term is not considered
- Focus on first-bounce radiance field reconstruction