ROF and TV-L1 denoising with Primal-Dual algorithm

This notebook shows how ROF and TV-L1 variational denoising models can be implemented using Primal-Dual optimization alghorithm. While the denoising results may be inferior comparing to some other methods, it is still a nice toy example, showing how convex variational methods can be appied to image processing problems.

The in-depth discussion and justification of described approaches can be found in:

A. Chambolle et al "An introduction to Total Variation for Image Analysis" (PDF)

Also take a look at GPU4Vision project for more information on variational methods in computer vision.

Disclaimer: this notebook was written to familiarize myself with Total-Variation based methods and as a trial for IPyhon Notebook based publishing.

In [1]:
# imports and notebook setup
from nbutils import showarray

Let's take some image and spoil it with noise

In [2]:
from scipy.misc import lena
img_ref = lena()[140:,120:][:256,:256] / 255.0

def make_noisy(img):
    # add gaussian noise
    img = img + 0.1 * np.random.normal(size=img.shape)
    # add some outliers in on the right side of the image
    m = rand(*img.shape) < 0.2
    m[:,:160] = 0
    img[m] = rand(m.sum())
    return img

img_obs = make_noisy(img_ref)
showarray(hstack([img_ref, ones((256, 16)), img_obs]))

We will compare following denoising models:

  • ROF

$$ \min_x \|\nabla x\| + \frac{\lambda}{2} \|x - f\|^2 $$

  • TV-L1

$$ \min_x \|\nabla x\| + \lambda \|x - f\| $$

where $x$ is the denoised image, $f$ is the observed image and $\lambda$ is the regularisation coefficient.

Let's define descrete $\nabla$ operator and it's transposition $\nabla^T$, which will be needed later:

In [3]:
def nabla(I):
    h, w = I.shape
    G = np.zeros((h, w, 2), I.dtype)
    G[:, :-1, 0] -= I[:, :-1]
    G[:, :-1, 0] += I[:, 1:]
    G[:-1, :, 1] -= I[:-1]
    G[:-1, :, 1] += I[1:]
    return G

def nablaT(G):
    h, w = G.shape[:2]
    I = np.zeros((h, w), G.dtype)
    # note that we just reversed left and right sides
    # of each line to obtain the transposed operator
    I[:, :-1] -= G[:, :-1, 0]
    I[:, 1: ] += G[:, :-1, 0]
    I[:-1]    -= G[:-1, :, 1]
    I[1: ]    += G[:-1, :, 1]
    return I

Now we can define functions for computing current solution's energy:

In [4]:
# little auxiliary routine
def anorm(x):
    '''Calculate L2 norm over the last array dimention'''
    return np.sqrt((x*x).sum(-1))

def calc_energy_ROF(X, observation, clambda):
    Ereg = anorm(nabla(X)).sum()
    Edata = 0.5 * clambda * ((X - observation)**2).sum()
    return Ereg + Edata

def calc_energy_TVL1(X, observation, clambda):
    Ereg = anorm(nabla(X)).sum()
    Edata = clambda * np.abs(X - observation).sum()
    return Ereg + Edata

# some reasonable lambdas
lambda_ROF = 8.0
lambda_TVL1 = 1.0

print "ROF:",
print calc_energy_ROF(img_obs, img_obs, lambda_ROF), 
print calc_energy_ROF(img_ref, img_obs, lambda_ROF)
print "TV-L1:",
print calc_energy_TVL1(img_obs, img_obs, lambda_TVL1), 
print calc_energy_TVL1(img_ref, img_obs, lambda_TVL1)
ROF: 14756.2926322 8239.92956335
TV-L1: 14756.2926322 9555.75949906

OK, at least we see that noisy image energy is greater than the reference.

Saddle point formulation

Now we can apply the Primal-Dual algorithm to minimize defined energy. Algorithm works on a general problems, defined in a form

$$ \min_x \; F(Kx) + G(x) \tag{1} $$

where $F$ and $G$ are some convex functions and $K$ is a linear operator. First we define the dual of a function:

$$ F^*(p) = \sup_ {x \in X} \; \langle p, x \rangle - F(x) \tag{2} $$

It can be shown that for convex functions

$$ F = F^{**}(p) = \sup_ {x \in X} \; \langle p, x \rangle - F^*(x)$$

Applying this transormation to function $F$ in (1) we get the following saddle point problem:

$$ \min_x \; \max_p \; \langle Kx, p\rangle + G(x) - F^*(p) \tag{3} $$

In both ROF and TV-L1 denoising modelds regularisation term $F(Kx) = \sum_i \| \nabla x_i \|$, where $\nabla x_i$ is a two-dimentional intensity gradient vector at image pixel $i$. Then, according to (2)

$$ F^*(p) = \delta_P(p) = \begin{cases} 0 & p \in P \\ +\infty & p \notin P \end{cases} $$

where $P = \{p : \forall i \; \| p_i \| \leq 1\}$

Proximity operator

Before proceding to the Primal-Dual algoritm we have to define the proximity operator of a function:

$$ (I + \tau \partial F)^{-1}(x) = argmin_{y} \frac{1}{2}\|y-x\|^2 + \tau F(y) $$

This operator is equivalent to implicit gradient descent step for a function. In order to use Primal-Dual algorithm, functions $F^*$ and $G$ must have an easy to compute proximity operator. For example:

$$ \begin{eqnarray} F^*(p) = \delta_P(p) \quad &\Leftrightarrow & \quad (I + \sigma \partial F^*)^{-1}(p) &= \mathbf{project}_P (p) \\ G_{ROF}(x) = \frac{\lambda}{2} \|x - f\|^2 \quad &\Leftrightarrow & \quad (I + \tau \partial G_{ROF})^{-1}(x) &= \frac{x + \lambda\tau f}{1 + \lambda\tau} \\ G_{TV-L1}(x) = \lambda \|x - f\| \quad &\Leftrightarrow & \quad (I + \tau \partial G_{TV-L1})^{-1}(x) &= \mathbf{shrink} (x, f, \lambda\tau) \end{eqnarray} $$

Where

$$\mathbf{project}_P(p) = \frac{p}{max(\|p\|, 1)}$$

is a pixel-wise euclidian projection onto feasible set $P$ and

$$ \mathbf{shrink} (x, f, \lambda\sigma) = \begin{cases} x - \lambda\sigma & x > f + \lambda\sigma \\ x + \lambda\sigma & x < f - \lambda\sigma \\ f & |x-f| \leq \lambda\sigma \end{cases}$$

Here are the implementations of the auxiliary functions $\mathbf{project}$ and $\mathbf{shrink}$:

In [5]:
def project_nd(P, r):
    '''perform a pixel-wise projection onto R-radius balls'''
    nP = np.maximum(1.0, anorm(P)/r)
    return P / nP[...,np.newaxis]
    
def shrink_1d(X, F, step):
    '''pixel-wise scalar srinking'''
    return X + np.clip(F - X, -step, step)

Primal-Dual algorithm

Now we are ready to implement the Primal-Dual algorithm. Here it is in it's general form:

  • Choose step sizes $\sigma > 0$ and $\tau > 0$, so that $\sigma\tau L^2 < 1$, where $L = \|K\|$, and set $\theta = 1.$
  • Choose some initial values for pimal variable $x_0$ and dual variable $p_0$. Our model is convex, so any assignment should converge to the global minima.
  • Iterate until convergence: $$ \begin{cases} p_{n+1} = (I + \sigma\partial F^*)^{-1} (p_n + \sigma K x_n) \\ \hat x_{n+1} = (I + \tau\partial G)^{-1} (x_n - \tau K^T p_{n+1}) \\ x_{n+1} = \hat x_{n+1} + \theta (\hat x_{n+1} - x_n) %\frac {\|x\|} {2\alpha} & \|x\| \le \alpha \\ %\|x\| - \frac {\alpha} {2} & \|x\| > \alpha \end{cases} $$

Everything is ready to write some code! We'll start with the ROF model.

In [6]:
def solve_ROF(img, clambda, iter_n=101):
    # setting step sizes and other params
    L2 = 8.0
    tau = 0.02
    sigma = 1.0 / (L2*tau)
    theta = 1.0

    X = img.copy()
    P = nabla(X)
    for i in xrange(iter_n):
        P = project_nd( P + sigma*nabla(X), 1.0 )
        lt = clambda * tau
        X1 = (X - tau * nablaT(P) + lt * img) / (1.0 + lt)
        X = X1 + theta * (X1 - X)
        if i % 10 == 0:
            print "%.2f" % calc_energy_ROF(X, img, clambda),
    print
    return X

showarray(solve_ROF(img_obs, 8.0))
8977.08 6172.76 5772.63 5722.87 5715.17 5712.49 5710.94 5709.92 5709.19 5708.66 5708.27

While the ROF model denoising worked reasonably well in the left part of image, where the noise is normaly distributed, outliers are still present in the right part. We may try to decrease $\lambda$ value to suppress them, but this leads to reconstructed image oversmoothing:

In [7]:
showarray(solve_ROF(img_obs, 4.0))
8285.19 4991.98 3989.75 3615.00 3478.13 3426.19 3406.99 3400.07 3397.33 3396.02 3395.23

Now let's try the TV-L1 model:

In [8]:
def solve_TVL1(img, clambda, iter_n=101):
    # setting step sizes and other params
    L2 = 8.0
    tau = 0.02
    sigma = 1.0 / (L2*tau)
    theta = 1.0

    X = img.copy()
    P = nabla(X)
    for i in xrange(iter_n):
        P = project_nd( P + sigma*nabla(X), 1.0 )
        X1 = shrink_1d(X - tau*nablaT(P), img, clambda*tau)
        X = X1 + theta * (X1 - X)
        if i % 10 == 0:
            print "%.2f" % calc_energy_TVL1(X, img, clambda),
    print
    return X

showarray(solve_TVL1(img_obs, 1.0))
11689.44 9244.72 8612.70 8264.14 8058.28 7918.95 7820.75 7745.22 7690.99 7645.88 7613.68

Outliers are gone, while details are preserved.

Saddle-point problem formulation for multiple observations

Suppose we want to reconstruct a signal from multiple noisy observations with TV-L1 model.

Let's generate some test data first.

In [9]:
def make_spotty(img, r=3, n=1000):
    img = img.copy()
    h, w = img.shape
    for i in xrange(n):
        x, y = int32(rand(2)*(w-r, h-r))
        img[y:y+r, x:x+r] = round(rand())
    return img

observations = [make_spotty(make_noisy(img_ref)) for i in xrange(5)]
showarray(observations[0])

We are going to optimize the following function:

$$ \min_x \|\nabla x\| + \lambda \sum_i \|x - f_i\| $$

or, in general

$$ \min_x \; F(Kx) + \sum_i G_i(x) $$

Lets apply the dual transormation to all $F$ and $G_i$ functions, obtaining:

$$ \min_x \; \max_{p,r_i} \; \langle Kx, p\rangle - F^*(p) + \sum_i [ \langle x, r_i\rangle - G_i^*(r_i) ]$$

$$G_i(x) = \lambda |x-f_i| \quad \Rightarrow \quad G_i^*(r) = rf_i + \delta_{|r| \leq \lambda}(r) $$

Primal-Dual algorithm iteration then becomes:

$$ \begin{cases} p_{n+1} = (I + \sigma\partial F^*)^{-1} (p_n + \sigma K x_n) \\ r^i_{n+1} = (I + \sigma\partial G_i^*)^{-1} (r^i_n + \sigma x_n) \\ \hat x_{n+1} = x_n - \tau (K^T p_{n+1} + \sum_i r^i_{n+1}) \\ x_{n+1} = \hat x_{n+1} + \theta (\hat x_{n+1} - x_n) %\frac {\|x\|} {2\alpha} & \|x\| \le \alpha \\ %\|x\| - \frac {\alpha} {2} & \|x\| > \alpha \end{cases} $$

In [10]:
def solve_TVL1_multy(imgs, clambda, iter_n=101):
    # setting step sizes and other params
    L2 = 8.0
    tau = 0.02
    sigma = 1.0 / (L2*tau)
    theta = 1.0

    imgs = array(imgs)
    X = imgs[0].copy()
    P = nabla(X)
    Rs = np.zeros_like(imgs)
    for i in xrange(iter_n):
        P = project_nd( P + sigma*nabla(X), 1.0 )
        Rs = clip(Rs + sigma*(X-imgs), -clambda, clambda)
        X1 = X - tau*(nablaT(P) + Rs.sum(0))
        X = X1 + theta * (X1 - X)
        if i % 10 == 0:
            print ".",
    print
    return X

showarray(solve_TVL1_multy(observations, 0.5))
. . . . . . . . . . .

This post in generated from IPython Notebook, which can be found in my GitHub repository

Comments !