# Real-Time Global Illumination Decomposition of Videos

**Author & Institution**

*ABHIMITRA MEKA**, Max Planck Institute for Informatics, Saarland Informatics Campus and Google*MOHAMMAD SHAFIEI**, Max Planck Institute for Informatics, Saarland Informatics Campus*MICHAEL ZOLLHÖFER*, Stanford University*CHRISTIAN RICHARDT*, University of Bath*CHRISTIAN THEOBALT*, *Max Planck* Institute for Informatics, Saarland Informatics Campus, Germany

**Link**: http://gvv.mpi-inf.mpg.de/projects/LiveIlluminationDecomposition/

## Abstract

- Propose the first approach for the decomposition of a monocular color video into direct and indirect illumination components in real time
- In separate layers, the contribution made to the scene appearance by the scene reflectance, the light sources and the reflections from various coherent scene regions to one another
- Works for regular videos and produces temporally coherent decomposition layers at real-time frame rates
- Core: several sparsity priors that enable the estimation of the per-pixel direct and indirect illumination layers based on a small set of jointly estimated base reflectance colors
- The resulting variational decomposition problem uses a new formulation based on sparse and dense sets of non-linear equations that we solve efficiently using a novel alternating data-parallel optimization strategy

- Existing techniques that invert global light transport require image capture under multiplexed controlled lighting, or only enable the decomposition of a single image at slow off-line frame rates

## 1. Introduction

- Contributions
- Joint illumination decomposition of direct and indirect illumination layers, and estimation and refinement of base colors that constitute the scene reflectance
- A sparsity-based automatic estimation of the underlying reflectance when a user identifies regions of strong interreflections
- A novel parallelized sparse–dense optimizer to solve a mixture of high-dimensional sparse problems jointly with lowdimensional dense problems at real-time frame rates

## 2. Related Work

- Inverse Rendering
- The colors in an image depend on scene geometry, material appearance and illumination
- Reconstructing these components from a single image or video is a challenging and illposed problem called
*inverse rendering* - Many complex image editing tasks can be achieved using a purely image-based decomposition without full inverse rendering

- Global Illumination Decomposition
- To decompose the captured radiance of a scene into direct and indirect components, some methods actively illuminate the scene to investigate the effect of light transport

- Intrinsic Images
- Many approaches have been introduced for the task of intrinsic image decomposition that explains a photograph using physically interpretable images such as reflectance and shading

- Intrinsic Video
- Layer-based Image Editing
- A physically accurate decomposition is not required to achieve complex image editing tasks such as recoloring of objects
- Instead, a decomposition into multiple semitransparent layers is often sufficient

## 3. Overview

- The first real-time method for temporally coherent illumination decomposition of a video into a reflectance layer, direct illumination layer and multiple indirect illumination layers
- Propose a novel sparsity-driven formulation for the estimation and refinement of a base color palette, which is used for decomposing the video frames
- Algorithm starts by automatically estimating a set of base colors that represent scene reflectances
- Automatic and only occasionally requires a minimal set of user clicks on the first video frame to identify regions of strong inter-reflections and refine the base colors
- Propagate the user input automatically to the rest of the video by a spatiotemporal region-growing method

- Perform the illumination decomposition
- Formulation results in a mixture of dense and sparse non-convex high-dimensional optimization problems, which they solve efficiently using a custom-tailored parallel iterative non-linear solver that they implement on the GPU

- Evaluate on a variety of synthetic and real-world scenes, and provide comparisons that show that our method outperforms state-of-the-art illumination decomposition, intrinsic decomposition and layer-based image editing techniques, both qualitatively and quantitatively
- Demonstrate that real-time illumination decomposition of videos enables a range of advanced, illumination-aware video editing applications that are suitable for photo-real augmented reality applications, such as inter-reflection-aware recoloring and retexturing

## 4. Problem Formulation

Algorithm: video frame -> a reflectance layer + a direct illumination layer + multiple indirect illumination layers

Simplifying assumption

- Assume that the scene is Lambertian, i.e., surfaces exhibit no view-dependent effects and hence their reflectance can be parameterized as a diffuse albedo with RGB components
- Assume that all light sources in the scene produce only white colored light. Hence, the direct illumination in the scene can be expressed by a grayscale or single channel image
- Assume that the second reflection bounce (or the first inter-reflection) of light is the primary source of indirect illumination in the scene, while the contribution of subsequent bounces of light is negligible
- Assume that the motion of the camera in the video is smooth with significant overlap between adjacent frames
- Assume that no new objects or materials come into view after the first frame

The algorithm factors each video frame $\pmb I$ into a per-pixel product of the reflectance $\pmb R$ and the illumination $\pmb S$

$$

\pmb I(\pmb x)=\pmb R(\pmb x)\odot \pmb S(\pmb x)

$$- $\pmb x$: pixel location
- $\odot$: element-wise product
- For diffuse objects, the reflectance layer captures the surface albedo, and the illumination layer $\pmb S$ jointly captures the direct and indirect illumination effects
- Represent the illumination layer as a colored RGB image to allow indirect illumination effects to be expressed in the illumination layer

Further decompose the illumination layer into a grayscale direct illumination layer resulting from the white illuminant, and multiple indirect colored illumination layers resulting from inter-reflections from colored objects in the scene

We start by estimating a set of base colors that consists of $K$ unique reflectance colors ${\pmb b_k}$ that represent the scene

- $K$ is specified by the user, as superfluous clusters will be removed automatically
- This set of base colors serves as the basis for the illumination decomposition
- The base colors help constrain the values of pixels in the reflectance layer $\pmb R$

For every surface point in the scene, assume that a single indirect bounce of light may occur from every base reflectance color, in addition to the direct illumination

The global illumination in the scene is modeled using a linear decomposition of the illumination layer $\pmb S$ into a direct illumination layer $T_0$ and the sum of the $K$ indirect illumination layers $\{T_k\}_{0<k\leq K}$

$$

\pmb I(\pmb x)=\pmb R(\pmb x)\odot \sum_{k=0}^K\pmb b_k T_k(\pmb x)

$$- $\pmb b_0$: represents the color of the illuminant: white in paper’s case, i.e. $\pmb b_0=(1,1,1)$
- $T_0(\pmb x)$: indicates the light transport contribution from the direct illumination
- The contribution from each base color $\pmb b_k$ at a given pixel location $\pmb x$ is measured by the map $T_k(\pmb x)$
- Provides the net contribution by the base reflectance color to the global scene illumination

- Obtain the set of base colors automatically using a real-time clustering technique
- Once the base colors are obtained, the scene clustering can be further refined using a few simple user-clicks
- This refines only the regions of clustering but not the base colors themselves

## 5. Base Color Estimation

- Initialize the set of base colors by clustering the dominant colors in the first video frame
- This clustering step not only provides an initial base color estimate, but also a segmentation of the video into regions of approximately uniform reflectance
- If needed, the clustering in a video frame undergoes a user-guided correction

- The base colors are used for the illumination decomposition
- Further refined
- Used to compute the direct and indirect illumination layers

### 5.1. Chromaticity Clustering

Cluster the first video frame by color to approximate the regions of uniform reflectance that are observed in scenes with sparsely colored objects

- Based on a much faster histogram-based k-means clustering approach
- Perform the clustering of each RGB video frame in a discretized chromaticity space

Workflow

- Obtain chromaticity image by dividing the input image by its intensity $\pmb C(\pmb x)=\pmb I(\pmb x)/|\pmb I(\pmb x)|$
- Compute a histogram of the chromaticity image with 10 partitions along each axis
- Perform weighted k-means clustering to obtain cluster center chromaticity values, using the population of the bins as the weight and the mid-point of the bin as sample values
- The user provides an upper limit of the number of clusters visible in the scene $K$

- Collapse adjacent similar clusters by measuring the pairwise chromaticity distance between estimated cluster centers
- If this distance is below a threshold of 0.2, merge the smaller cluster into the larger cluster
- The average RGB colors of all pixels assigned to each cluster then yield the set of initial base colors

The histogram-based clustering approach significantly reduces the segmentation complexity, independent of the image size

- Produces a segmentation of the input frame, by assigning each pixel to its closest cluster
- Provides a coarse approximation of the reflectance layer $\pmb R_{\mathrm{cluster}}$
- Use $\pmb R_{\mathrm{cluster}}$ as an initialization for the reflectance layer $\pmb R$ in the energy optimization

### 5.2. Misclustering Correction

- Since the clustering directly depends on the color of a pixel, regions of strong inter-reflections may be erroneously assigned to the base color of an indirect illuminant instead of the base color representing the reflectance of the region
- Such a misclustering is difficult to correct automatically because of the inherent ambiguity of the illumination decomposition problem
- Rely on minimal manual interaction to identify misclustered regions and then automatically correct the underlying reflectance base color in all subsequent frames

#### 5.2.1. Region Identification and Tracking

Identifying the true reflectance of a pixel in the presence of strong inter-reflections from other objects is an ambiguous task

- In case of direct illumination, the observed color value of a pixel is obtained by modulating the reflectance solely by the color of the illuminant
- In the case of inter-reflections, there is further modulation by light reflected from other objects, which then depends on their reflectance properties
- Such regions are easy to identify by a user
- Ask the user to simply click on such a region only in the first frame it occurs

- Automatically identify the full region by flood filling it using connected-components analysis based on the cluster identifier

Real-time tracking of non-rigidly deforming, non-convex marked regions in subsequent frames

- Given the marked pixel region in the previous frame
- Probe the same pixel locations in the current frame to identify pixels with the same cluster ID as in the previous frame
- Flood fill starting from these valid pixels to obtain the tracked marked region in the new frame
- Do not flood fill for pixels inside the regions to keep this operation efficient
- Observe that one or two valid pixels are sufficient to correctly identify the entire misclustered region

#### 5.2.2. Reflectance Correction

- Once all pixels in a misclustered region are identified in a video frame (either marked or tracked), exploit the sparsity constraint of the indirect illumination layers to solve for the correct reflectance base color
- Perform multiple full illumination decompositions for each identified region, evaluating each base color’s suitability as the region’s reflectance
- For each base color, measure the sparsity obtained over the region using the illumination sparsity term
- The base color that provides the sparsest solution of the decomposition is then used as the corrected reflectance

- The intuition behind such a sparsity prior is that using the correct underlying reflectance should lead to an illumination layer which is explained by the color spill from only a sparse number of nearby objects

## 6. Illumination Decomposition

Decomposition each input video frame $\pmb I$ into its reflectance layer $\pmb R$, its direct illumination layer $T_0$ and a set of indirect illumination layers ${T_k}$

corresponding to the base colors ${\pmb b_k}$- The direct illumination layer $T_0$ represents the direct contribution to the scene by the external light sources
- The indirect illumination layers $\{T_k\}$ capture the inter-reflections that occur within the scene

Formulate the illumination decomposition as an energy minimization problem with the following energy:

$$

E_{\mathrm{decomp}}(\pmb \chi)=E_{\mathrm{data}}(\pmb \chi)+E_{\mathrm{reflectance}}(\pmb \chi)+E_{\mathrm{illumination}}(\pmb \chi)

$$- $\pmb \chi=\{\pmb R,T_0,\{T_k\}\}$ is the set of variables to be optimized
- The base colors ${\pmb b_k}$ stay fixed
- Optimize this energy using a novel fast GPU solver to obtain real-time performance

**Data Fidelity Term**

This constraint enforces that the decomposition result reproduces the input image:

$$

E_{\mathrm{data}}(\pmb \chi)=\lambda_{\mathrm{data}}\cdot \sum_{\pmb x}\left\Vert\pmb I(\pmb x)-\pmb R(\pmb x)\odot \sum_{k=0}^K\pmb b_kT_k(\pmb x) \right\Vert^2

$$

- $\lambda_{\mathrm {data}}$: the weight for this energy term
- $T_k$: the $(K+1)$ illumination layers of the decomposition(one direct layer $T_0$ and $K$ indirect layers $\{T_k\}$)

### 6.1. Reflectance Priors

- Constrain the estimated reflectance layer $\pmb R$ using three priors:

$$

E_{\mathrm{reflectance}}(\pmb \chi)=E_{\mathrm{clustering}}(\pmb \chi)+E_{\mathrm{r-sparsity}}(\pmb \chi)+E_{\mathrm{r-consistency}}(\pmb \chi)

$$

**Reflectance Clustering Prior**

Use the cluster to guide the decomposition, as the chromaticity-clustered image $\pmb R_{\mathrm{cluster}}$ is an approximation of the reflectance layer $\pmb R$

Constrain the reflectance map to remain close to the clustered image using the following energy term:

$$

E_{\mathrm{clustering}}(\pmb \chi)=\lambda_{\mathrm{clustering}}\cdot \sum_{\pmb x} \parallel\pmb r(\pmb x)-\pmb r_{\mathrm{cluster}}(\pmb x) \parallel_2^2

$$- $\pmb r$: represent the quantity $\pmb R$ in the log-domain, i.e. $\pmb r=\ln\pmb R$
- $\pmb r_{\mathrm{cluster}}$: the clustered reflectance map

**Reflectance Sparsity Prior**

- Encourages a piecewise constant reflectance map using gradient sparsity
- Natural scenes generally consist of a small set of objects and materials, hence the reflectance layer is expected to have sparse gradients
- Such a spatially sparse solution for the reflectance image can be obtained by minimizing the $\mathcal ℓ_p-\mathrm{norm}$ ($p\in[0,1]$) of the gradient magnitude $\parallel\nabla \pmb r\parallel_2$

$$

E_{\mathrm{r-sparsity}}=\lambda_{\mathrm{r-sparsity}}\cdot\sum_{\pmb x}\parallel\nabla \pmb r(\pmb x)\parallel_2^p

$$

**Spatiotemporal Reflectance Consistency Prior**

- Enforces that the reflectance stays temporally consistent by connecting every pixel with a set of randomly sampled pixels in a small spatiotemporal window by constraining the reflectance of the pixels to be close under a defined chromaticity-closeness condition
- For each pixel $\pmb x$ in the reflectance image, connect it to $N_s$ randomly sampled pixels $\pmb y_i$
- Samples are chosen from reflectance images of the current and previous frames $t_i$

$$

\begin{align}

E_{\mathrm{r-consistency}}(\pmb \chi)&=\lambda_{\mathrm{r-consistency}}\cdot \sum_{i=1}^{N_s}g_i(\pmb x)\cdot \parallel\pmb r(\pmb x)-\pmb r_{t_i}(\pmb y_i)\parallel_2^2\\

g_i(\pmb x)&=\begin{cases}

w_{iw}(\pmb x)&\mathrm{if}\ \parallel\pmb c(\pmb x)-\pmb c_{t_i}(\pmb y_i)\parallel_2<\tau_{cc}\\

0&\mathrm{otherwise}

\end{cases}

\end{align}

$$

- $\tau_{cc}$: a chromaticity consistency threshold

### 6.2. Illumination Priors

- Constrain the illumination $\pmb S$ to be close to monochrome and the indirect illumination layers ${T_k}$ to have a sparse decomposition, spatial smoothness and non-negativity

$$

E_{\mathrm{illumination}}(\pmb \chi)=E_{\mathrm{monochrome}}(\pmb \chi)+E_{\mathrm{i-sparsity}}(\pmb \chi)+E_{\mathrm{smoothness}}(\pmb \chi)+E_{\mathrm{non-neg}}(\pmb \chi)

$$

**Soft-Retinex Weighted Monochromaticity Prior**

- The illumination layer is a combination of direct and indirect illumination effects
- Indirect effects such as inter-reflections tend to be spatially local with smooth color gradients whereas under the white-illumination assumption, the direct bounce does not contribute any color to the illumination layer
- Expect the illumination $\pmb S$ to be mostly monochromatic except at small spatial pockets where smooth color gradients occur due to inter-reflections

$$

E_{\mathrm{monochrome}}(\pmb \chi)=\lambda_{\mathrm{monochrome}}\cdot w_{\mathrm{SR}}\cdot \sum_{\pmb x}\sum_{\pmb c}(\pmb S_c(\pmb x)-|\pmb S(\pmb x)|)^2

$$

$c\in{R,G,B}$

$|\pmb S|$: the intensity of the illumination layer $\pmb S$

This constraint pulls the color channels of each pixel close to the grayscale intensity of the pixel, hence encouraging monochromaticity

$w_{\mathrm{SR}}$: the soft-color-Retinex weight

$$

w_{\mathrm{SR}}=1-\exp(-50\cdot\Delta\pmb C)

$$- $\Delta\pmb C$: the maximum of the chromaticity gradient of the input image in any of the four spatial directions at the pixel location

The soft-color-Retinex weight is high only for large chromaticity gradients, which represent reflectance edges

Hence, monochromaticity of the illumination layer is enforced only close to the reflectance edges and not at locations of slowly varying chromaticity, which represent inter-reflections

Relying on local chromaticity gradients may be problematic when there are regions of uniform colored reflectance, but in such regions the reflectance sparsity priors tend to be stronger and overrule the monochromaticity prior.

**Illumination Decomposition Sparsity**

Enforce that the illumination decomposition is sparse in terms of the layers that are activated per-pixel, i.e., those that influence the pixel with their corresponding base color

Assume during image formation in the real world, a large part of the observed radiance for a scene point comes from a small subpart of the scene

Apply the sparsity-inducing $\mathcal ℓ_1-\mathrm{norm}$ to the indirect illumination layers

$$

E_{\mathrm{i-sparsity}}(\pmb \chi)=\lambda_{\mathrm{i-sparsity}}\cdot\sum_{\pmb x}\left\Vert\{T_k(\pmb x)\}^K_{k=1} \right\Vert_1

$$

**Spatial Smoothness**

Encourage the decomposition to be spatially piecewise smooth using an $ℓ1-\mathrm{sparsity}$ prior in the gradient domain

Enforces piecewise constancy of each direct or indirect illumination layer

$$

E_{\mathrm{smoothness}}(\pmb \chi)=\lambda_{\mathrm{smoothness}}\cdot \sum_{\pmb x}\sum_{k=0}^K\parallel\nabla T_k(\pmb x)\parallel_1

$$This allows to have sharp edges in the decomposition layers

**Non-Negativity of Light Transport**

Light transport is an inherently additive process

- Light bouncing around in the scene adds radiance to scene points
- But never subtracts from them

The quantity of transported light is always positive

$$

E_{\mathrm{non-neg}}(\pmb \chi)=\lambda_{\mathrm{non-neg}}\cdot \sum_{\pmb x}\sum_{k=0}^K\max(-T_k(\pmb x),0)

$$- If the decomposition layer $T_k(\pmb x)$ is non-negative, there is no penalty
- If $T_k(\pmb x)$ becomes negative, a linear penalty is enforced

### 6.3. Base Color Refinement

Refine the base colors further on the first video frame to approach the ground-truth reflectance of the materials in the scene

The refinement of base colors is formulated as an incremental update $\Delta \pmb b_k$ of the base colors $\pmb b_k$ in the original data fidelity term, along with intensity and chromaticity regularizers

$$

\begin{align}

E_{\mathrm{refine}}(\pmb \chi)&=\lambda_{\mathrm{data}}\sum_{\pmb x}\left\Vert \pmb I(\pmb x)-\pmb R(\pmb x)\odot \sum_{k=0}^K(\pmb b_k+\Delta\pmb b_k)T_k(\pmb x) \right\Vert^2\\

&+\lambda_{\mathrm{IR}}\sum_{k=1}^K\parallel\Delta\pmb b_k\parallel_2^2+\lambda_{\mathrm{CR}}\sum_{k=1}^K\parallel(\pmb C(\pmb b_k)+\Delta\pmb b_k)-\pmb C(\pmb b_k)\parallel_2^2

\end{align}

$$- $\pmb \chi={\Delta\pmb b_k}$: the vector of unknowns to be optimized
- $\lambda_{\mathrm{IR}}$: the weight for the intensity regularizer that ensures small base color updates
- $\lambda_{\mathrm{CR}}$: the weight of the chromaticity regularizer
- Constrains base color updates $\Delta\pmb b_k$ to remain close in chromaticity $\pmb C(\cdot)$ to the initially estimated base color $\pmb b_k$

These regularizers ensure that base color updates do not lead to oscillations in the optimization process

The refinement energy is solved in combination with the illumination decomposition energy

- Resulting in an estimation of the unknown variables that together promotes decomposition sparsity

This refinement of the base colors leads to a dense Jacobian matrix, because the unknown variables ${\Delta \pmb b_𝑘}$ in the energy are influenced by all pixels in the image

### 6.4. Handling the Sparsity-Inducing Norms

Some energy terms contain sparsity-inducing $ℓ𝑝-\mathrm{norms}$ ($p \in [0, 1]$)

Handle these objectives in a unified manner using Iteratively Re-weighted Least Squares

Approximate the $ℓ𝑝-\mathrm{norms}$ by a nonlinear least-squares objective based on re-weighting, i.e., replace the corresponding residuals $\pmb r$ as follows:

$$

\begin{align}

\parallel\pmb r\parallel_p&=\parallel\pmb r\parallel_2^2\cdot \parallel\pmb r\parallel_2^{p-2}\\

&\approx \parallel\pmb r\parallel_2^2\cdot \parallel\pmb r_{\mathrm {old}}\parallel_2^{p-2}

\end{align}

$$in each step of the applied iterative solver

- $\pmb r_{\mathrm{old}}$: the corresponding residual after the previous iteration step

#### 6.4.1. Handling Non-negativity Constraints

The non-negativity objective $E_{\mathrm{non-neg}}(\pmb \chi)$ contains a maximum function that is non-differentiable at zero

Handle this objective by replacing the maximum with a re-weighted least-squares term, $\max(-T_k(\pmb x),0)=w_kT_k^2(\pmb x)$, using

$$

w_k=\begin{cases}

0&\mathrm{if}\ T_k(\pmb x)>0\\

(|T_k(\pmb x)+\epsilon|)^{-1}&\mathrm{otherwise}

\end{cases}

$$- $\epsilon = 0.002$: a small constant that prevents division by zero

Transforms the non-convex energy into a non-linear least-squares optimization problem

## 7. Data-Parallel GPU Optimization

The decomposition problems are all non-convex optimizations based on an objective $E$ with unknowns $\pmb \chi$

The best decomposition $\pmb \chi^\ast$ by solving the following minimization problem:

$$

\pmb \chi^\ast=\arg\min_{\pmb \chi} E(\pmb \chi)

$$The optimization problems are in general non-linear least-squares form and can be tackled by the iterative Gauss–Newton algorithm that approximates the optimum $\pmb \chi^\ast\approx \pmb \chi_k$ by a sequence of solutions $\pmb \chi_k=\pmb \chi_{k-1}+\pmb \delta_k^\ast$

The optimal linear update $\pmb \delta_k^\ast$ is given by the solution of the associated normal equations:

$$

\pmb\delta_k^\ast=\arg\min_{\pmb\delta_k}\parallel\pmb F(\pmb \chi_{k-1})+\pmb\delta_k\pmb J(\pmb \chi_{k-1})\parallel_2^2

$$- $\pmb F$: a vector field that stacks all residuals, i.e., $E(\pmb\chi)=\parallel\pmb F(\pmb\chi)\parallel_2^2$
- $\pmb J$: its Jacobian matrix

Obtaining real-time performance is challenging even with recent state-of-the-art data-parallel iterative non-linear least-squares solution strategies

- To avoid cluttered notation, we will omit the parameters and simply write $\pmb J$ instead of $\pmb J(\pmb\chi)$
- For our decomposition energies, the Jacobian $\pmb J$ is a large matrix with usually more than 70 million rows and 4 million columns
- Previous approaches assume $\pmb J$ to be a sparse matrix, meaning that only a few residuals are influenced by each variable
- While this holds for the columns of $\pmb J$ that corresponds to the variables that are associated with the decomposition layers, it does not hold for the columns that store the derivatives with respect to the base color updates ${\Delta \pmb b_k}$, since the base colors influence each residual of $E_{\mathrm{data}}$

- $\pmb J=\begin{bmatrix}\pmb S_{\pmb J}&\pmb D_{\pmb J}\end{bmatrix}$ has two sub-blocks:
- $\pmb S_{\pmb J}$: a large sparse matrix with only a few non-zero entries per row
- $\pmb D_{\pmb J}$: a dense matrix with the same number of rows, but only a few columns

- The evaluation of the Jacobian $\pmb J$ requires a different specialized parallelization for the dense and sparse parts

### 7.1. Sparse-Dense Splitting

Tackle the described problem using a sparse–dense splitting approach that splits the variables $\chi$ into a sparse set $\mathcal T$ (decomposition layers) and a dense set $\mathcal B$ (base color updates)

- Afterwards, we optimize for $\mathcal B$ and $\mathcal T$ independently in an iterative flip-flop manner

Algorithm:

Optimize for $\mathcal T$, while keeping $\mathcal B$ fixed

- The resulting optimization problem is a sparse non-linear least-squares problem
- Improve upon the previous solution by performing a nonlinear Gauss–Newton step
- The corresponding normal equations are solved using 16 steps of data-parallel preconditioned conjugate gradient
- Parallelize over the rows of the system matrix using one thread per row (variable)

After updating the ‘sparse’ variables $\mathcal T$, keep them fixed and solve for the ‘dense’ variables $\mathcal B$

- The resulting optimization problem is a dense least-squares problem with a small $3𝐾 \times 3𝐾$ system matrix (normally $𝐾$ is between 4 and 7 due to merged clusters)
- Materialize the normal equations in device memory based on a sequence of outer products, using one thread per entry of $\pmb J^T\pmb J$

The system is mapped to the CPU and robustly solved using singular value decomposition. After updating ‘dense’ variables $\mathcal B$,

we again solve for ‘sparse’ variables $\mathcal T$ and iterate this process until convergence

## 8. Results and Evaluation

**Parameters**

- $\lambda_{\mathrm{clustering}}=200$
- $\lambda_{\mathrm{r-sparsity}}=20$
- $p=1$
- $\lambda_{\mathrm{i-sparsity}}=3$
- $\lambda_{\mathrm{smoothness}}=3$
- $\lambda_{\mathrm{non-neg}}=1000$
- $\lambda_{\mathrm{data}}=5000$
- $\lambda_{\mathrm{IR}}=10$
- $\lambda_{\mathrm{CR}}=10$
- $\lambda_{\mathrm{r-consistency}}=\lambda_{\mathrm{monochrome}}=10$

**Runtime performance**

- Platform: Intel Core i7 with 2.7 GHz, 32 GB RAM and an NVIDIA GeForce GTX 980
- Resolution: 640x512 pixels
- Performance:
- Illumination decomposition: 14ms
- Base color refinement: 2s
- Misclustering correction: 1s

- Perform the last two steps, base color refinement and misclustering correction, only once at the beginning of the video
- Runs at real-time frame rates (⩾30 Hz)
- Enables real-time video editing applications

**Demo**

## 9. Limitations

- Breaking the assumptions can lead to inaccurate estimations
- The method can face challenges if objects enter or leave the scene during the course of the video
- The inter-reflections caused by out-of-view objects cannot be properly modeled, since the corresponding base color might not be available
- If an object with an unseen color enters the scene for the first time, and the base colors are already exceeded, its inter-reflections cannot be modeled
- Complex, textured scenes with many different colors are challenging to decompose, since this requires many base colors, leading to a large number of variables and an even more under-constrained optimization problem
- More general indoor and outdoor scenes are not the ideal use cases for the method