Author & Institution
ABHIMITRA MEKA*, Max Planck Institute for Informatics, Saarland Informatics Campus and Google
MOHAMMAD SHAFIEI*, Max Planck Institute for Informatics, Saarland Informatics Campus
MICHAEL ZOLLHÖFER, Stanford University
CHRISTIAN RICHARDT, University of Bath
CHRISTIAN THEOBALT, Max Planck Institute for Informatics, Saarland Informatics Campus, Germany
Link: http://gvv.mpiinf.mpg.de/projects/LiveIlluminationDecomposition/
Abstract
 Propose the first approach for the decomposition of a monocular color video into direct and indirect illumination components in real time
 In separate layers, the contribution made to the scene appearance by the scene reflectance, the light sources and the reflections from various coherent scene regions to one another
 Works for regular videos and produces temporally coherent decomposition layers at realtime frame rates
 Core: several sparsity priors that enable the estimation of the perpixel direct and indirect illumination layers based on a small set of jointly estimated base reflectance colors
 The resulting variational decomposition problem uses a new formulation based on sparse and dense sets of nonlinear equations that we solve efficiently using a novel alternating dataparallel optimization strategy
 Existing techniques that invert global light transport require image capture under multiplexed controlled lighting, or only enable the decomposition of a single image at slow offline frame rates
1. Introduction
 Contributions
 Joint illumination decomposition of direct and indirect illumination layers, and estimation and refinement of base colors that constitute the scene reflectance
 A sparsitybased automatic estimation of the underlying reflectance when a user identifies regions of strong interreflections
 A novel parallelized sparse–dense optimizer to solve a mixture of highdimensional sparse problems jointly with lowdimensional dense problems at realtime frame rates
2. Related Work
 Inverse Rendering
 The colors in an image depend on scene geometry, material appearance and illumination
 Reconstructing these components from a single image or video is a challenging and illposed problem called inverse rendering
 Many complex image editing tasks can be achieved using a purely imagebased decomposition without full inverse rendering
 Global Illumination Decomposition
 To decompose the captured radiance of a scene into direct and indirect components, some methods actively illuminate the scene to investigate the effect of light transport
 Intrinsic Images
 Many approaches have been introduced for the task of intrinsic image decomposition that explains a photograph using physically interpretable images such as reflectance and shading
 Intrinsic Video
 Layerbased Image Editing
 A physically accurate decomposition is not required to achieve complex image editing tasks such as recoloring of objects
 Instead, a decomposition into multiple semitransparent layers is often sufficient
3. Overview
 The first realtime method for temporally coherent illumination decomposition of a video into a reflectance layer, direct illumination layer and multiple indirect illumination layers
 Propose a novel sparsitydriven formulation for the estimation and refinement of a base color palette, which is used for decomposing the video frames
 Algorithm starts by automatically estimating a set of base colors that represent scene reflectances
 Automatic and only occasionally requires a minimal set of user clicks on the first video frame to identify regions of strong interreflections and refine the base colors
 Propagate the user input automatically to the rest of the video by a spatiotemporal regiongrowing method
 Perform the illumination decomposition
 Formulation results in a mixture of dense and sparse nonconvex highdimensional optimization problems, which they solve efficiently using a customtailored parallel iterative nonlinear solver that they implement on the GPU
 Evaluate on a variety of synthetic and realworld scenes, and provide comparisons that show that our method outperforms stateoftheart illumination decomposition, intrinsic decomposition and layerbased image editing techniques, both qualitatively and quantitatively
 Demonstrate that realtime illumination decomposition of videos enables a range of advanced, illuminationaware video editing applications that are suitable for photoreal augmented reality applications, such as interreflectionaware recoloring and retexturing
4. Problem Formulation
Algorithm: video frame > a reflectance layer + a direct illumination layer + multiple indirect illumination layers

Simplifying assumption
 Assume that the scene is Lambertian, i.e., surfaces exhibit no viewdependent effects and hence their reflectance can be parameterized as a diffuse albedo with RGB components
 Assume that all light sources in the scene produce only white colored light. Hence, the direct illumination in the scene can be expressed by a grayscale or single channel image
 Assume that the second reflection bounce (or the first interreflection) of light is the primary source of indirect illumination in the scene, while the contribution of subsequent bounces of light is negligible
 Assume that the motion of the camera in the video is smooth with significant overlap between adjacent frames
 Assume that no new objects or materials come into view after the first frame

The algorithm factors each video frame $\pmb I$ into a perpixel product of the reflectance $\pmb R$ and the illumination $\pmb S$
$$
\pmb I(\pmb x)=\pmb R(\pmb x)\odot \pmb S(\pmb x)
$$ $\pmb x$: pixel location
 $\odot$: elementwise product
 For diffuse objects, the reflectance layer captures the surface albedo, and the illumination layer $\pmb S$ jointly captures the direct and indirect illumination effects
 Represent the illumination layer as a colored RGB image to allow indirect illumination effects to be expressed in the illumination layer

Further decompose the illumination layer into a grayscale direct illumination layer resulting from the white illuminant, and multiple indirect colored illumination layers resulting from interreflections from colored objects in the scene

We start by estimating a set of base colors that consists of $K$ unique reflectance colors ${\pmb b_k}$ that represent the scene
 $K$ is specified by the user, as superfluous clusters will be removed automatically
 This set of base colors serves as the basis for the illumination decomposition
 The base colors help constrain the values of pixels in the reflectance layer $\pmb R$

For every surface point in the scene, assume that a single indirect bounce of light may occur from every base reflectance color, in addition to the direct illumination

The global illumination in the scene is modeled using a linear decomposition of the illumination layer $\pmb S$ into a direct illumination layer $T_0$ and the sum of the $K$ indirect illumination layers $\{T_k\}_{0<k\leq K}$
$$
\pmb I(\pmb x)=\pmb R(\pmb x)\odot \sum_{k=0}^K\pmb b_k T_k(\pmb x)
$$ $\pmb b_0$: represents the color of the illuminant: white in paper’s case, i.e. $\pmb b_0=(1,1,1)$
 $T_0(\pmb x)$: indicates the light transport contribution from the direct illumination
 The contribution from each base color $\pmb b_k$ at a given pixel location $\pmb x$ is measured by the map $T_k(\pmb x)$
 Provides the net contribution by the base reflectance color to the global scene illumination
 Obtain the set of base colors automatically using a realtime clustering technique
 Once the base colors are obtained, the scene clustering can be further refined using a few simple userclicks
 This refines only the regions of clustering but not the base colors themselves

5. Base Color Estimation
 Initialize the set of base colors by clustering the dominant colors in the first video frame
 This clustering step not only provides an initial base color estimate, but also a segmentation of the video into regions of approximately uniform reflectance
 If needed, the clustering in a video frame undergoes a userguided correction
 The base colors are used for the illumination decomposition
 Further refined
 Used to compute the direct and indirect illumination layers
5.1. Chromaticity Clustering

Cluster the first video frame by color to approximate the regions of uniform reflectance that are observed in scenes with sparsely colored objects
 Based on a much faster histogrambased kmeans clustering approach
 Perform the clustering of each RGB video frame in a discretized chromaticity space

Workflow
 Obtain chromaticity image by dividing the input image by its intensity $\pmb C(\pmb x)=\pmb I(\pmb x)/\pmb I(\pmb x)$
 Compute a histogram of the chromaticity image with 10 partitions along each axis
 Perform weighted kmeans clustering to obtain cluster center chromaticity values, using the population of the bins as the weight and the midpoint of the bin as sample values
 The user provides an upper limit of the number of clusters visible in the scene $K$
 Collapse adjacent similar clusters by measuring the pairwise chromaticity distance between estimated cluster centers
 If this distance is below a threshold of 0.2, merge the smaller cluster into the larger cluster
 The average RGB colors of all pixels assigned to each cluster then yield the set of initial base colors

The histogrambased clustering approach significantly reduces the segmentation complexity, independent of the image size
 Produces a segmentation of the input frame, by assigning each pixel to its closest cluster
 Provides a coarse approximation of the reflectance layer $\pmb R_{\mathrm{cluster}}$
 Use $\pmb R_{\mathrm{cluster}}$ as an initialization for the reflectance layer $\pmb R$ in the energy optimization
5.2. Misclustering Correction
 Since the clustering directly depends on the color of a pixel, regions of strong interreflections may be erroneously assigned to the base color of an indirect illuminant instead of the base color representing the reflectance of the region
 Such a misclustering is difficult to correct automatically because of the inherent ambiguity of the illumination decomposition problem
 Rely on minimal manual interaction to identify misclustered regions and then automatically correct the underlying reflectance base color in all subsequent frames
5.2.1. Region Identification and Tracking

Identifying the true reflectance of a pixel in the presence of strong interreflections from other objects is an ambiguous task
 In case of direct illumination, the observed color value of a pixel is obtained by modulating the reflectance solely by the color of the illuminant
 In the case of interreflections, there is further modulation by light reflected from other objects, which then depends on their reflectance properties
 Such regions are easy to identify by a user
 Ask the user to simply click on such a region only in the first frame it occurs
 Automatically identify the full region by flood filling it using connectedcomponents analysis based on the cluster identifier

Realtime tracking of nonrigidly deforming, nonconvex marked regions in subsequent frames
 Given the marked pixel region in the previous frame
 Probe the same pixel locations in the current frame to identify pixels with the same cluster ID as in the previous frame
 Flood fill starting from these valid pixels to obtain the tracked marked region in the new frame
 Do not flood fill for pixels inside the regions to keep this operation efficient
 Observe that one or two valid pixels are sufficient to correctly identify the entire misclustered region
5.2.2. Reflectance Correction
 Once all pixels in a misclustered region are identified in a video frame (either marked or tracked), exploit the sparsity constraint of the indirect illumination layers to solve for the correct reflectance base color
 Perform multiple full illumination decompositions for each identified region, evaluating each base color’s suitability as the region’s reflectance
 For each base color, measure the sparsity obtained over the region using the illumination sparsity term
 The base color that provides the sparsest solution of the decomposition is then used as the corrected reflectance
 The intuition behind such a sparsity prior is that using the correct underlying reflectance should lead to an illumination layer which is explained by the color spill from only a sparse number of nearby objects
6. Illumination Decomposition

Decomposition each input video frame $\pmb I$ into its reflectance layer $\pmb R$, its direct illumination layer $T_0$ and a set of indirect illumination layers ${T_k}$
corresponding to the base colors ${\pmb b_k}$ The direct illumination layer $T_0$ represents the direct contribution to the scene by the external light sources
 The indirect illumination layers $\{T_k\}$ capture the interreflections that occur within the scene

Formulate the illumination decomposition as an energy minimization problem with the following energy:
$$
E_{\mathrm{decomp}}(\pmb \chi)=E_{\mathrm{data}}(\pmb \chi)+E_{\mathrm{reflectance}}(\pmb \chi)+E_{\mathrm{illumination}}(\pmb \chi)
$$ $\pmb \chi=\{\pmb R,T_0,\{T_k\}\}$ is the set of variables to be optimized
 The base colors ${\pmb b_k}$ stay fixed
 Optimize this energy using a novel fast GPU solver to obtain realtime performance
Data Fidelity Term
This constraint enforces that the decomposition result reproduces the input image:
$$
E_{\mathrm{data}}(\pmb \chi)=\lambda_{\mathrm{data}}\cdot \sum_{\pmb x}\left\Vert\pmb I(\pmb x)\pmb R(\pmb x)\odot \sum_{k=0}^K\pmb b_kT_k(\pmb x) \right\Vert^2
$$
 $\lambda_{\mathrm {data}}$: the weight for this energy term
 $T_k$: the $(K+1)$ illumination layers of the decomposition(one direct layer $T_0$ and $K$ indirect layers $\{T_k\}$)
6.1. Reflectance Priors
 Constrain the estimated reflectance layer $\pmb R$ using three priors:
$$
E_{\mathrm{reflectance}}(\pmb \chi)=E_{\mathrm{clustering}}(\pmb \chi)+E_{\mathrm{rsparsity}}(\pmb \chi)+E_{\mathrm{rconsistency}}(\pmb \chi)
$$
Reflectance Clustering Prior

Use the cluster to guide the decomposition, as the chromaticityclustered image $\pmb R_{\mathrm{cluster}}$ is an approximation of the reflectance layer $\pmb R$

Constrain the reflectance map to remain close to the clustered image using the following energy term:
$$
E_{\mathrm{clustering}}(\pmb \chi)=\lambda_{\mathrm{clustering}}\cdot \sum_{\pmb x} \parallel\pmb r(\pmb x)\pmb r_{\mathrm{cluster}}(\pmb x) \parallel_2^2
$$ $\pmb r$: represent the quantity $\pmb R$ in the logdomain, i.e. $\pmb r=\ln\pmb R$
 $\pmb r_{\mathrm{cluster}}$: the clustered reflectance map
Reflectance Sparsity Prior
 Encourages a piecewise constant reflectance map using gradient sparsity
 Natural scenes generally consist of a small set of objects and materials, hence the reflectance layer is expected to have sparse gradients
 Such a spatially sparse solution for the reflectance image can be obtained by minimizing the $\mathcal ℓ_p\mathrm{norm}$ ($p\in[0,1]$) of the gradient magnitude $\parallel\nabla \pmb r\parallel_2$
$$
E_{\mathrm{rsparsity}}=\lambda_{\mathrm{rsparsity}}\cdot\sum_{\pmb x}\parallel\nabla \pmb r(\pmb x)\parallel_2^p
$$
Spatiotemporal Reflectance Consistency Prior
 Enforces that the reflectance stays temporally consistent by connecting every pixel with a set of randomly sampled pixels in a small spatiotemporal window by constraining the reflectance of the pixels to be close under a defined chromaticitycloseness condition
 For each pixel $\pmb x$ in the reflectance image, connect it to $N_s$ randomly sampled pixels $\pmb y_i$
 Samples are chosen from reflectance images of the current and previous frames $t_i$
$$
\begin{align}
E_{\mathrm{rconsistency}}(\pmb \chi)&=\lambda_{\mathrm{rconsistency}}\cdot \sum_{i=1}^{N_s}g_i(\pmb x)\cdot \parallel\pmb r(\pmb x)\pmb r_{t_i}(\pmb y_i)\parallel_2^2\\
g_i(\pmb x)&=\begin{cases}
w_{iw}(\pmb x)&\mathrm{if}\ \parallel\pmb c(\pmb x)\pmb c_{t_i}(\pmb y_i)\parallel_2<\tau_{cc}\\
0&\mathrm{otherwise}
\end{cases}
\end{align}
$$
 $\tau_{cc}$: a chromaticity consistency threshold
6.2. Illumination Priors
 Constrain the illumination $\pmb S$ to be close to monochrome and the indirect illumination layers ${T_k}$ to have a sparse decomposition, spatial smoothness and nonnegativity
$$
E_{\mathrm{illumination}}(\pmb \chi)=E_{\mathrm{monochrome}}(\pmb \chi)+E_{\mathrm{isparsity}}(\pmb \chi)+E_{\mathrm{smoothness}}(\pmb \chi)+E_{\mathrm{nonneg}}(\pmb \chi)
$$
SoftRetinex Weighted Monochromaticity Prior
 The illumination layer is a combination of direct and indirect illumination effects
 Indirect effects such as interreflections tend to be spatially local with smooth color gradients whereas under the whiteillumination assumption, the direct bounce does not contribute any color to the illumination layer
 Expect the illumination $\pmb S$ to be mostly monochromatic except at small spatial pockets where smooth color gradients occur due to interreflections
$$
E_{\mathrm{monochrome}}(\pmb \chi)=\lambda_{\mathrm{monochrome}}\cdot w_{\mathrm{SR}}\cdot \sum_{\pmb x}\sum_{\pmb c}(\pmb S_c(\pmb x)\pmb S(\pmb x))^2
$$

$c\in{R,G,B}$

$\pmb S$: the intensity of the illumination layer $\pmb S$

This constraint pulls the color channels of each pixel close to the grayscale intensity of the pixel, hence encouraging monochromaticity

$w_{\mathrm{SR}}$: the softcolorRetinex weight
$$
w_{\mathrm{SR}}=1\exp(50\cdot\Delta\pmb C)
$$ $\Delta\pmb C$: the maximum of the chromaticity gradient of the input image in any of the four spatial directions at the pixel location

The softcolorRetinex weight is high only for large chromaticity gradients, which represent reflectance edges

Hence, monochromaticity of the illumination layer is enforced only close to the reflectance edges and not at locations of slowly varying chromaticity, which represent interreflections

Relying on local chromaticity gradients may be problematic when there are regions of uniform colored reflectance, but in such regions the reflectance sparsity priors tend to be stronger and overrule the monochromaticity prior.
Illumination Decomposition Sparsity

Enforce that the illumination decomposition is sparse in terms of the layers that are activated perpixel, i.e., those that influence the pixel with their corresponding base color

Assume during image formation in the real world, a large part of the observed radiance for a scene point comes from a small subpart of the scene

Apply the sparsityinducing $\mathcal ℓ_1\mathrm{norm}$ to the indirect illumination layers
$$
E_{\mathrm{isparsity}}(\pmb \chi)=\lambda_{\mathrm{isparsity}}\cdot\sum_{\pmb x}\left\Vert\{T_k(\pmb x)\}^K_{k=1} \right\Vert_1
$$
Spatial Smoothness

Encourage the decomposition to be spatially piecewise smooth using an $ℓ1\mathrm{sparsity}$ prior in the gradient domain

Enforces piecewise constancy of each direct or indirect illumination layer
$$
E_{\mathrm{smoothness}}(\pmb \chi)=\lambda_{\mathrm{smoothness}}\cdot \sum_{\pmb x}\sum_{k=0}^K\parallel\nabla T_k(\pmb x)\parallel_1
$$ 
This allows to have sharp edges in the decomposition layers
NonNegativity of Light Transport

Light transport is an inherently additive process
 Light bouncing around in the scene adds radiance to scene points
 But never subtracts from them

The quantity of transported light is always positive
$$
E_{\mathrm{nonneg}}(\pmb \chi)=\lambda_{\mathrm{nonneg}}\cdot \sum_{\pmb x}\sum_{k=0}^K\max(T_k(\pmb x),0)
$$ If the decomposition layer $T_k(\pmb x)$ is nonnegative, there is no penalty
 If $T_k(\pmb x)$ becomes negative, a linear penalty is enforced
6.3. Base Color Refinement

Refine the base colors further on the first video frame to approach the groundtruth reflectance of the materials in the scene

The refinement of base colors is formulated as an incremental update $\Delta \pmb b_k$ of the base colors $\pmb b_k$ in the original data fidelity term, along with intensity and chromaticity regularizers
$$
\begin{align}
E_{\mathrm{refine}}(\pmb \chi)&=\lambda_{\mathrm{data}}\sum_{\pmb x}\left\Vert \pmb I(\pmb x)\pmb R(\pmb x)\odot \sum_{k=0}^K(\pmb b_k+\Delta\pmb b_k)T_k(\pmb x) \right\Vert^2\\
&+\lambda_{\mathrm{IR}}\sum_{k=1}^K\parallel\Delta\pmb b_k\parallel_2^2+\lambda_{\mathrm{CR}}\sum_{k=1}^K\parallel(\pmb C(\pmb b_k)+\Delta\pmb b_k)\pmb C(\pmb b_k)\parallel_2^2
\end{align}
$$ $\pmb \chi={\Delta\pmb b_k}$: the vector of unknowns to be optimized
 $\lambda_{\mathrm{IR}}$: the weight for the intensity regularizer that ensures small base color updates
 $\lambda_{\mathrm{CR}}$: the weight of the chromaticity regularizer
 Constrains base color updates $\Delta\pmb b_k$ to remain close in chromaticity $\pmb C(\cdot)$ to the initially estimated base color $\pmb b_k$

These regularizers ensure that base color updates do not lead to oscillations in the optimization process

The refinement energy is solved in combination with the illumination decomposition energy
 Resulting in an estimation of the unknown variables that together promotes decomposition sparsity

This refinement of the base colors leads to a dense Jacobian matrix, because the unknown variables ${\Delta \pmb b_𝑘}$ in the energy are influenced by all pixels in the image
6.4. Handling the SparsityInducing Norms

Some energy terms contain sparsityinducing $ℓ𝑝\mathrm{norms}$ ($p \in [0, 1]$)

Handle these objectives in a unified manner using Iteratively Reweighted Least Squares

Approximate the $ℓ𝑝\mathrm{norms}$ by a nonlinear leastsquares objective based on reweighting, i.e., replace the corresponding residuals $\pmb r$ as follows:
$$
\begin{align}
\parallel\pmb r\parallel_p&=\parallel\pmb r\parallel_2^2\cdot \parallel\pmb r\parallel_2^{p2}\\
&\approx \parallel\pmb r\parallel_2^2\cdot \parallel\pmb r_{\mathrm {old}}\parallel_2^{p2}
\end{align}
$$in each step of the applied iterative solver
 $\pmb r_{\mathrm{old}}$: the corresponding residual after the previous iteration step
6.4.1. Handling Nonnegativity Constraints

The nonnegativity objective $E_{\mathrm{nonneg}}(\pmb \chi)$ contains a maximum function that is nondifferentiable at zero

Handle this objective by replacing the maximum with a reweighted leastsquares term, $\max(T_k(\pmb x),0)=w_kT_k^2(\pmb x)$, using
$$
w_k=\begin{cases}
0&\mathrm{if}\ T_k(\pmb x)>0\\
(T_k(\pmb x)+\epsilon)^{1}&\mathrm{otherwise}
\end{cases}
$$ $\epsilon = 0.002$: a small constant that prevents division by zero

Transforms the nonconvex energy into a nonlinear leastsquares optimization problem
7. DataParallel GPU Optimization

The decomposition problems are all nonconvex optimizations based on an objective $E$ with unknowns $\pmb \chi$

The best decomposition $\pmb \chi^\ast$ by solving the following minimization problem:
$$
\pmb \chi^\ast=\arg\min_{\pmb \chi} E(\pmb \chi)
$$ 
The optimization problems are in general nonlinear leastsquares form and can be tackled by the iterative Gauss–Newton algorithm that approximates the optimum $\pmb \chi^\ast\approx \pmb \chi_k$ by a sequence of solutions $\pmb \chi_k=\pmb \chi_{k1}+\pmb \delta_k^\ast$

The optimal linear update $\pmb \delta_k^\ast$ is given by the solution of the associated normal equations:
$$
\pmb\delta_k^\ast=\arg\min_{\pmb\delta_k}\parallel\pmb F(\pmb \chi_{k1})+\pmb\delta_k\pmb J(\pmb \chi_{k1})\parallel_2^2
$$ $\pmb F$: a vector field that stacks all residuals, i.e., $E(\pmb\chi)=\parallel\pmb F(\pmb\chi)\parallel_2^2$
 $\pmb J$: its Jacobian matrix


Obtaining realtime performance is challenging even with recent stateoftheart dataparallel iterative nonlinear leastsquares solution strategies
 To avoid cluttered notation, we will omit the parameters and simply write $\pmb J$ instead of $\pmb J(\pmb\chi)$
 For our decomposition energies, the Jacobian $\pmb J$ is a large matrix with usually more than 70 million rows and 4 million columns
 Previous approaches assume $\pmb J$ to be a sparse matrix, meaning that only a few residuals are influenced by each variable
 While this holds for the columns of $\pmb J$ that corresponds to the variables that are associated with the decomposition layers, it does not hold for the columns that store the derivatives with respect to the base color updates ${\Delta \pmb b_k}$, since the base colors influence each residual of $E_{\mathrm{data}}$
 $\pmb J=\begin{bmatrix}\pmb S_{\pmb J}&\pmb D_{\pmb J}\end{bmatrix}$ has two subblocks:
 $\pmb S_{\pmb J}$: a large sparse matrix with only a few nonzero entries per row
 $\pmb D_{\pmb J}$: a dense matrix with the same number of rows, but only a few columns
 The evaluation of the Jacobian $\pmb J$ requires a different specialized parallelization for the dense and sparse parts
7.1. SparseDense Splitting

Tackle the described problem using a sparse–dense splitting approach that splits the variables $\chi$ into a sparse set $\mathcal T$ (decomposition layers) and a dense set $\mathcal B$ (base color updates)
 Afterwards, we optimize for $\mathcal B$ and $\mathcal T$ independently in an iterative flipflop manner

Algorithm:

Optimize for $\mathcal T$, while keeping $\mathcal B$ fixed
 The resulting optimization problem is a sparse nonlinear leastsquares problem
 Improve upon the previous solution by performing a nonlinear Gauss–Newton step
 The corresponding normal equations are solved using 16 steps of dataparallel preconditioned conjugate gradient
 Parallelize over the rows of the system matrix using one thread per row (variable)

After updating the ‘sparse’ variables $\mathcal T$, keep them fixed and solve for the ‘dense’ variables $\mathcal B$
 The resulting optimization problem is a dense leastsquares problem with a small $3𝐾 \times 3𝐾$ system matrix (normally $𝐾$ is between 4 and 7 due to merged clusters)
 Materialize the normal equations in device memory based on a sequence of outer products, using one thread per entry of $\pmb J^T\pmb J$

The system is mapped to the CPU and robustly solved using singular value decomposition. After updating ‘dense’ variables $\mathcal B$,
we again solve for ‘sparse’ variables $\mathcal T$ and iterate this process until convergence

8. Results and Evaluation
Parameters
 $\lambda_{\mathrm{clustering}}=200$
 $\lambda_{\mathrm{rsparsity}}=20$
 $p=1$
 $\lambda_{\mathrm{isparsity}}=3$
 $\lambda_{\mathrm{smoothness}}=3$
 $\lambda_{\mathrm{nonneg}}=1000$
 $\lambda_{\mathrm{data}}=5000$
 $\lambda_{\mathrm{IR}}=10$
 $\lambda_{\mathrm{CR}}=10$
 $\lambda_{\mathrm{rconsistency}}=\lambda_{\mathrm{monochrome}}=10$
Runtime performance
 Platform: Intel Core i7 with 2.7 GHz, 32 GB RAM and an NVIDIA GeForce GTX 980
 Resolution: 640x512 pixels
 Performance:
 Illumination decomposition: 14ms
 Base color refinement: 2s
 Misclustering correction: 1s
 Perform the last two steps, base color refinement and misclustering correction, only once at the beginning of the video
 Runs at realtime frame rates (⩾30 Hz)
 Enables realtime video editing applications
Demo
9. Limitations
 Breaking the assumptions can lead to inaccurate estimations
 The method can face challenges if objects enter or leave the scene during the course of the video
 The interreflections caused by outofview objects cannot be properly modeled, since the corresponding base color might not be available
 If an object with an unseen color enters the scene for the first time, and the base colors are already exceeded, its interreflections cannot be modeled
 Complex, textured scenes with many different colors are challenging to decompose, since this requires many base colors, leading to a large number of variables and an even more underconstrained optimization problem
 More general indoor and outdoor scenes are not the ideal use cases for the method