2018-12-13 Sparse coding using (F)ISTA

In this post we would try to show how one could infer the sparse representation of an image knowing an appropriate generative model for its synthesis. In particular, we will implement a convolutional version of the iterative shrinkage thresholding algorithm (ISTA) and its fast version, the FISTA. For computational efficiency, all these convolutions will be implemented by a Fast Fourier Tansform. We will benchmark this on a realistic image size of $1024 \times 1024$ giving some timing results on a standard laptop.

Let's first initialize the notebook:

In [1]:
from __future__ import division, print_function
import numpy as np
np.set_printoptions(precision=6, suppress=True)
import os
%matplotlib inline
#%config InlineBackend.figure_format='retina'
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
phi = (np.sqrt(5)+1)/2
fig_width = 10
figsize = (fig_width, fig_width/phi)
#from IPython.display import display, HTML
#def show_video(filename): 
#    return HTML(data='<video src="{}" loop autoplay width="600" height="600"></video>'.format(filename))
%load_ext autoreload
%autoreload 2

problem statement: sparse events in a static image

In the first case that we will study, images are produced by the combination of a single kernal with sparse events. Think of it like some brushstrokes on a canvas, somes drops of rain that fell on the surface of a pond...

For simplicity, the combination of these events will be considered linear. This is mostly true for instance in watercolor or in combining transparent surfaces and by ignoring saturation of the sensor for instance. To handle linearity, we will use linear algebra and matrices: We will denote as $x$ the (raveled) image and these "events" as $y$. The kernels that define the brushstrokes is denoted as $A$ and will be explicited below. Generally, in matrix form, the genrativve model of the image is written:

$$ x = A y $$

First, coefficients are drawn from a Laplace distribution and sparse events are generated by using the . These will be convolved with a kernel defined using the MotionCloudslibrary. Let's first illustrate that for a small image size:

In [2]:
N_X, N_Y = 2**10, 2**10
N_X, N_Y = 2**7, 2**7
N_X, N_Y = 2**8, 2**8

rho = 1.e-3
seed = 42
rng = np.random.RandomState(seed)
sf_0 = .05
B_sf = .025
y_lambda = 1.61803
In [3]:
events = y_lambda * rng.laplace(size=(N_X, N_Y))
In [4]:
events_max = np.max(np.abs(events))
fig, ax = plt.subplots(figsize=figsize)
ax.hist(events.ravel(), bins=np.linspace(-events_max, events_max, 100, endpoint=True))
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');
In [5]:
print('mean, std=', events.mean(), events.std())
mean, std= -0.0030762160361596726 2.283752278229822

From this continuous distribution of coefficients, we will zero out to achieve a desired sparsity (in the $\ell_0$ pseudoi-norm sense):

In [6]:
threshold = np.quantile(np.absolute(events).ravel(), 1-rho)
print('threshold=', threshold)
threshold= 11.163882960534961
In [7]:
events_thr = events  * ((events < -threshold) + (events > threshold))
In [8]:
fig, ax = plt.subplots(figsize=figsize)
ax.hist(events_thr.ravel(), bins=np.linspace(-events_max, events_max, 100, endpoint=True))
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');
In [9]:
print('mean, std=', events_thr.mean(), events_thr.std())
mean, std= -0.0019611999454255454 0.4153429162267828
In [10]:
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
ax.imshow(events_thr, cmap=plt.viridis());

Let's now generate the image by convolving it with an isotropic kernel, as definedd in the https://github.com/NeuralEnsemble/MotionClouds library:

In [11]:
import MotionClouds as mc
fx, fy, ft = mc.get_grids(N_X, N_Y, 1)
opts = dict(V_X=0., V_Y=0., B_V=0, B_theta=np.inf, sf_0=sf_0, B_sf=B_sf)
envelope = mc.envelope_gabor(fx, fy, ft, **opts).squeeze()
In [12]:
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
#ax.imshow(env.squeeze(), cmap=plt.plasma());
ax.imshow(envelope[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.plasma()); 

(This shape reminds me something... DOOH!)

Then we use the Fourier transform to actually perform the convolution (as implemented in the Motion Clouds library):

In [13]:
x = mc.random_cloud(envelope[:, :, None], events=events_thr[:, :, None])
x = x.reshape((N_X, N_Y))
print('x.shape=', x.shape)
x.shape= (256, 256)

This has all the nice properties of the 2D Discrete Fourier Transform, see for example this textbook (in French).

In [14]:
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(x).max()
ax.imshow(x, cmap=plt.gray(), vmin=-vmax, vmax=vmax);
In [15]:
x_max = np.max(np.abs(x))
fig, ax = plt.subplots(figsize=figsize)
ax.hist(x.ravel(), bins=np.linspace(-x_max, x_max, 100, endpoint=True))
ax.set_xlabel('luminance')
ax.set_yscale('log')
ax.set_ylabel('#');

All in one function:

In [16]:
def MC_env(N_X, N_Y, opts=dict(V_X=0., V_Y=0., B_V=0, B_theta=np.inf, sf_0=sf_0, B_sf=B_sf)):
    fx, fy, ft = mc.get_grids(N_X, N_Y, 1)
    return mc.envelope_gabor(fx, fy, ft, **opts).squeeze()
envelope = MC_env(N_X, N_Y)
In [17]:
%%timeit
envelope = MC_env(N_X, N_Y)
5.36 ms ± 686 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [18]:
def random_cloud(envelope, events):
    (N_X, N_Y) = envelope.shape
    #fx, fy, ft = mc.get_grids(N_X, N_Y, N_frame)    
    F_events = np.fft.fftn(events)
    F_events = np.fft.fftshift(F_events)
    
    Fz = F_events * envelope
    # de-centering the spectrum
    Fz = np.fft.ifftshift(Fz)
    #Fz[0, 0, 0] = 0. # removing the DC component
    z = np.fft.ifftn(Fz).real
    return z
In [19]:
%%timeit
random_cloud(envelope, events_thr)
24 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [20]:
def model(envelope, events, verbose=False):
    if verbose: print('envelope.shape = ', envelope.shape)
    if verbose: print('events.shape = ', events.shape)
    N_X, N_Y = envelope.shape
    x = random_cloud(envelope, events=events)
    #x = x.reshape((N_X, N_Y))
    if verbose: print('x.shape=', x.shape)
    return x
x = model(envelope, events_thr, verbose=True)
envelope.shape =  (256, 256)
events.shape =  (256, 256)
x.shape= (256, 256)
In [21]:
%%timeit
model(envelope, events_thr)
39.6 ms ± 13.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Merging both functions:

In [22]:
def model(envelope, events, verbose=False):
    if verbose: print('envelope.shape = ', envelope.shape)
    if verbose: print('events.shape = ', events.shape)
    N_X, N_Y = envelope.shape
    F_events = np.fft.fftn(events)
    F_events = np.fft.fftshift(F_events)
    Fz = F_events * envelope
    Fz = np.fft.ifftshift(Fz)
    x = np.fft.ifftn(Fz).real
    if verbose: print('x.shape=', x.shape)
    return x
x = model(envelope, events_thr, verbose=True)
envelope.shape =  (256, 256)
events.shape =  (256, 256)
x.shape= (256, 256)
In [23]:
%%timeit
model(envelope, events_thr)
25.6 ms ± 12.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [24]:
def generative_model(envelope, rho=rho, y_lambda=y_lambda, seed=seed, verbose=False):
    N_X, N_Y = envelope.shape
    if verbose: print('N_X, N_Y = ', envelope.shape)
    rng = np.random.RandomState(seed)
    events = y_lambda * rng.laplace(size=(N_X, N_Y))

    threshold = np.quantile(np.absolute(events).ravel(), 1.-rho)
    events = events  * ((events < -threshold) + (events > threshold))

    x = model(envelope, events, verbose=verbose)
    return events, x
events, x = generative_model(envelope, verbose=True)
N_X, N_Y =  (256, 256)
envelope.shape =  (256, 256)
events.shape =  (256, 256)
x.shape= (256, 256)
In [25]:
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(x).max()
ax.imshow(x, cmap=plt.gray(), vmin=-vmax, vmax=vmax);
In [26]:
envelope = MC_env(N_X, N_Y)
In [27]:
%%timeit
events, x = generative_model(envelope)
19.3 ms ± 8.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Now, how do I retrieve the events knowing x? This is a form of deconvolution:

Deconvolution in the noiseless case: invertibility of the image to retrieve events

It's relatively easy to use the shape of the enveloppe to retrieve the events:

In [28]:
def invert(x, envelope, eps=1.e-16):
    Fx = np.fft.fft2(x)
    Fx = np.fft.fftshift(Fx)
    F_deconv = envelope * (1.-1.*(envelope==0)) / (envelope**2 + eps*(envelope==0))
    # applying the filter
    F_events = Fx*F_deconv
    # de-centering the spectrum
    F_events = np.fft.ifftshift(F_events)
    events = np.fft.ifft2(F_events).real 
    return events

envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)

events_pred = invert(x, envelope)
N_X, N_Y =  (256, 256)
envelope.shape =  (256, 256)
events.shape =  (256, 256)
x.shape= (256, 256)
In [29]:
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width))
axs[0].imshow(events, cmap=plt.viridis())
axs[1].imshow(events_pred, cmap=plt.viridis());

Some statistics to show the efficiency of this method:

In [30]:
#print('enveloppe energy=', np.sqrt(((envelope)**2).mean()) )
def SE_y(events, events_pred):
    return ((events-events_pred)**2).sum()/(events**2).sum()
print('relative Squared error=', SE_y(events, events_pred))
def L1_y(events, events_pred):
    return np.abs(events-events_pred).sum()/np.abs(events).sum()
print('relative L1 error=', L1_y(events, events_pred))
#print('energy rec=', ((events_pred)**2).sum())
#print('energy input=', ((events.squeeze())**2).sum())
#print('ratio energy=', ((events_pred)**2).sum()/(events**2).sum())
#print('Squared ratio energy=', np.sqrt((events**2).sum()/(events_pred**2).sum()))
relative Squared error= 2.2295665818809154e-05
relative L1 error= 0.14996931608556324
In [31]:
def SE_x(x, envelope, events_pred):
    return ((x-model(envelope, events_pred))**2).sum()/(x**2).sum()
print('relative Squared image error=', SE_x(x, envelope, events_pred))
relative Squared image error= 1.6343098641935313e-31

Reconstruction is perfect up to the machine's numerical precision. This is true because the problem is invertible in that instance and that the invert function implements a generalization of the pseudo-inverse $A^+ = (A^T A )^{-1} A$ of the linear transform $x = A y$ generalized to the convolutional case defined in the model function.

TODO: give a short proof using the fact that A is a Toeplitz matrix

This method is fast:

In [32]:
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope)
In [33]:
%%timeit
events_pred = invert(x, envelope)
8.67 ms ± 1.42 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

And can be further accelerated using a pre-shifted filter:

In [34]:
def invert_f(x, envelope, eps=1.e-16):
    Fx = np.fft.fft2(x)
    F_deconv = envelope * (1.-1.*(envelope==0)) / (envelope**2 + eps*(envelope==0))
    # applying the filter
    F_events = Fx*F_deconv
    events = np.fft.ifft2(F_events).real 
    return events

envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope)

envelope_f = np.fft.fftshift(envelope)
events_pred = invert_f(x, envelope_f)
print('relative L1 error=', L1_y(events, events_pred))
print('relative Squared error=', SE_y(events, events_pred))
relative L1 error= 0.14996931608556324
relative Squared error= 2.2295665818809154e-05
In [35]:
print('relative Squared image error=', SE_x(x, envelope, events_pred))
relative Squared image error= 1.6343098641935313e-31
In [36]:
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope)
envelope = np.fft.fftshift(envelope)
In [37]:
%%timeit
events_pred = invert_f(x, envelope)
9.51 ms ± 1.5 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

Instead of using the standard fft as in

In [38]:
%timeit np.fft.fft2(x)
2.92 ms ± 208 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

... one could use the knowledge that the input signal is real:

In [39]:
%timeit np.fft.rfft2(x)
2.11 ms ± 565 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Specifically, it is necessary to reshape the enveloppe in frequency space, see https://docs.scipy.org/doc/numpy/reference/generated/numpy.fft.rfft.html :

In [40]:
np.fft.rfft2(x).shape
Out[40]:
(256, 129)

such that finally:

In [41]:
def invert_r(x, envelope, eps=1.e-16):
    envelope = np.fft.fftshift(envelope)[:, :(envelope.shape[1]//2+1)]
    Fx = np.fft.rfft2(x)
    F_deconv = envelope * (1.-1.*(envelope==0)) / (envelope**2 + eps*(envelope==0))
    # applying the filter
    F_events = Fx*F_deconv
    events = np.fft.irfft2(F_events).real 
    return events

#mc.N_X, mc.N_Y = 2**7, 2**7
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)

events_pred = invert_r(x, envelope)
print('relative L1 error=', L1_y(events, events_pred))
print('relative Squared error=', SE_y(events, events_pred))
print('relative Squared image error=', SE_x(x, envelope, events_pred))
N_X, N_Y =  (256, 256)
envelope.shape =  (256, 256)
events.shape =  (256, 256)
x.shape= (256, 256)
relative L1 error= 0.14996931608556324
relative Squared error= 2.2295665819360203e-05
relative Squared image error= 2.1694317901733363e-31
In [42]:
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width))
axs[0].imshow(events, cmap=plt.viridis())
axs[1].imshow(events_pred, cmap=plt.viridis());
In [43]:
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
N_X, N_Y =  (256, 256)
envelope.shape =  (256, 256)
events.shape =  (256, 256)
x.shape= (256, 256)
In [44]:
%%timeit
events_pred = invert_r(x, envelope)
5.18 ms ± 724 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Deconvolution in the noisy case (white noise): (partial) failure to retrieve sparse coefficients

In [45]:
noise = .05 # relative level of noise
In [46]:
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
x_noise = x + noise * rng.normal(size=x.shape)
N_X, N_Y =  (256, 256)
envelope.shape =  (256, 256)
events.shape =  (256, 256)
x.shape= (256, 256)
In [47]:
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(x_noise).max()
ax.imshow(x_noise, cmap=plt.gray(), vmin=-vmax, vmax=vmax);
In [48]:
x_noise = x + noise * rng.normal(size=x.shape)
events_pred = invert_r(x_noise, envelope)
print('relative L1 error=', L1_y(events, events_pred))
print('relative Squared error=', SE_y(events, events_pred))
print('relative Squared image error=', SE_x(x, envelope, events_pred))
relative L1 error= 226195359.8001237
relative Squared error= 80325663487969.6
relative Squared image error= 0.003006707653847025

Error as a function of noise level:

In [49]:
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
N_X, N_Y =  (256, 256)
envelope.shape =  (256, 256)
events.shape =  (256, 256)
x.shape= (256, 256)
In [50]:
N_scan = 15
SE = np.zeros((N_scan, 2))
noises = np.logspace(-3, -.3, N_scan, base=10)
for i_noise, noise_ in enumerate(noises):
    x_noise = x + noise_ * rng.normal(size=x.shape)
    events_pred = invert_r(x_noise, envelope)
    SE[i_noise, 0] = np.sqrt(((x-model(envelope, events_pred))**2).sum()/(x**2).sum())
    SE[i_noise, 1] = np.sqrt(((events-events_pred)**2).sum()/(events**2).sum())

# recording for future comparisons
SE_noise = {'full': SE}
In [51]:
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].loglog(noises, np.ones_like(SE[:, i]), 'k--')
    ax[i].loglog(noises, SE[:, i], label='full')
    ax[i].set_xlabel('noise')
    ax[i].set_ylabel(SE_l);
ax[0].legend(loc='best');
/usr/local/lib/python3.7/site-packages/matplotlib/axes/_base.py:3507: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=1.0, top=1.0
  self.set_ylim(upper, lower, auto=None)
In [52]:
x_noise = x + noise * rng.normal(size=x.shape)
events_pred = invert_r(x_noise, envelope)
print('relative L1 error=', L1_y(events, events_pred))
print('relative Squared error=', SE_y(events, events_pred))
print('relative Squared image error=', SE_x(x, envelope, events_pred))
relative L1 error= 234037674.07285565
relative Squared error= 85660314503623.25
relative Squared image error= 0.0030150699518216267

Even for this simple model, the error grows too fast wrt to noise. We need to find a solution for handling this. A first solution comes from inspecting the histogram of events' coefficients:

In [53]:
fig, ax = plt.subplots(figsize=figsize)
ax.hist(events_pred.ravel(), bins=100)
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');

Konwing the sparsity of the input, one could select only the proportion of coefficients which are big enough and remove the other ones:

In [54]:
threshold_minus = np.quantile(events_pred.ravel(), rho/2)
threshold_plus = np.quantile(events_pred.ravel(), 1.-rho/2)
print('threshold_minus=', threshold_minus, 'threshold_plus=', threshold_plus)
threshold_minus= -12991959.688145997 threshold_plus= 13141467.32904197
In [55]:
events_pred_thr = events_pred  * ((events_pred < threshold_minus) + (events_pred > threshold_plus))
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
relative L1 error (full)= 234037674.07285565
relative Squared error (full)= 85660314503623.25
relative L1 error (thr)= 1100238.2535608897
relative Squared error (thr)= 1196466058828.0068
In [56]:
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))
relative Squared image error (full)= 0.0030150699518216267
relative Squared image error (thr)= 161132128835.1909

Doing so globally, we get

In [57]:
threshold = np.quantile(np.absolute(events_pred).ravel(), 1.-rho)
print('threshold=', threshold)
threshold= 13086615.761472493
In [58]:
events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))
relative L1 error (thr)= 1100565.05213663
relative Squared error (thr)= 1197113790395.2651
relative Squared image error (thr)= 142503144091.8907

The result is worst in the image reconstruction: Indeed, the coefficients were found using the least-squared method for which the pseudo-inverse is the optimal solution. Deviating from this solution increases reconstruction error in the image domain.

Yet, it is better for the extraction of the sparse coefficients as we zero-ed out some of which we classified using our knowledge of the sparseness of the vector.

Yet, a problem is that we do not know a priori the sparseness. Let's see how this changes with different estimates, either by knowing the original or noisy images:

In [59]:
N_scan = 15
SE_thr = np.zeros((N_scan, 2))
SE_est = np.zeros((N_scan, 2))
rhos = np.logspace(-3, -1, N_scan, base=10)
for i_rho, rho_est in enumerate(rhos):
    threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
    events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
    
    SE_thr[i_rho, 0] = np.sqrt(((x-model(envelope, events_pred_thr))**2).sum()/(x**2).sum())
    SE_thr[i_rho, 1] = np.sqrt(((events-events_pred_thr)**2).sum()/(events**2).sum())

    SE_est[i_rho, 0] = np.sqrt(((x_noise-model(envelope, events_pred_thr))**2).sum()/(x**2).sum())
    SE_est[i_rho, 1] = np.sqrt(((events_pred-events_pred_thr)**2).sum()/(events_pred**2).sum())
    
In [60]:
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].loglog(rhos, np.ones_like(SE_thr[:, i]), 'k--')
    ax[i].loglog(rhos, SE_thr[:, i], label='Original')
    ax[i].loglog(rhos, SE_est[:, i], '--', label='Noisy')
    ax[i].set_xlabel('rho_est')
    ax[i].set_ylabel(SE_l);
ax[0].legend(loc='best');        

Now assuming that we have an estimate of rho, let's see how this method compares to the previous one ('full') for varying levels of noise:

In [61]:
rho_est = rho
SE_thr = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
    x_noise = x + noise_ * rng.normal(size=x.shape)
    events_pred = invert_r(x_noise, envelope)
    
    threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
    events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
    SE_thr[i_noise, 0] = np.sqrt(((x-model(envelope, events_pred_thr))**2).sum()/(x**2).sum())
    SE_thr[i_noise, 1] = np.sqrt(((events-events_pred_thr)**2).sum()/(events**2).sum())

SE_noise['thr']= SE_thr
In [62]:
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].plot(noises, np.ones(len(noises)), 'k--')
    for method in ['full', 'thr']:
        ax[i].plot(noises, SE_noise[method][:, i], label=method)
    ax[i].set_xlabel('noise')
    ax[i].set_ylabel(SE_l)
    ax[i].set_xscale('log')
    ax[i].set_yscale('log')
ax[0].legend(loc='best');    

Thresholding coefficients gives a better estimate of coefficients.

If we know the level of noise, one can also use Wiener filtering:

In [63]:
sigma_noise = noise
def invert_w(x, envelope, eps=1.e-16, sigma_noise=sigma_noise):
    envelope = np.fft.fftshift(envelope)[:, :(envelope.shape[1]//2+1)]
    Fx = np.fft.rfft2(x)
    F_deconv = envelope / (envelope**2 + sigma_noise**2 + eps*(envelope==0))
    # applying the filter
    F_events = Fx*F_deconv
    events = np.fft.irfft2(F_events).real 
    return events

#mc.N_X, mc.N_Y = 2**7, 2**7
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)

events_pred = invert_w(x, envelope)
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
N_X, N_Y =  (256, 256)
envelope.shape =  (256, 256)
events.shape =  (256, 256)
x.shape= (256, 256)
relative L1 error (thr)= 10501976.149614768
relative Squared error (thr)= 109176345654527.05
relative Squared image error (full)= 6.028972045891317e-06
In [64]:
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width))
axs[0].imshow(events, cmap=plt.viridis())
axs[1].imshow(events_pred, cmap=plt.viridis());
In [65]:
rho_est = rho
SE_thr = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
    x_noise = x + noise_ * rng.normal(size=x.shape)
    events_pred = invert_w(x_noise, envelope, sigma_noise=noise_)
    
    threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
    events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
    SE_thr[i_noise, 0] = np.sqrt(((x-model(envelope, events_pred_thr))**2).sum()/(x**2).sum())
    SE_thr[i_noise, 1] = np.sqrt(((events-events_pred_thr)**2).sum()/(events**2).sum())

SE_noise['wie']= SE_thr
In [66]:
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].plot(noises, np.ones(len(noises)), 'k--')
    for method in ['full', 'thr', 'wie']:
        ax[i].plot(noises, SE_noise[method][:, i], label=method)
    ax[i].set_xlabel('noise')
    ax[i].set_ylabel(SE_l)
    ax[i].set_xscale('log')
    ax[i].set_yscale('log')
ax[0].legend(loc='best');    

Could we repeat this procedure more times?

A first solution is to define a global cost as the LASSO problem :

$$ \mathcal{C} = .5 \cdot \| x - A y \|^2 + \lambda \| y \|_1 $$

And to apply a gradient descent technique on this cost:

conjugate gradient descent algorithm

setting up variables:

In [67]:
envelope = MC_env(N_X, N_Y)
events = np.zeros((N_X, N_Y))
events[N_X//2, N_Y//2] = 1
impulse = model(envelope, events)
ATA = model(envelope, impulse)
ATA_pred = model(envelope**2, events)
In [68]:
print(np.absolute(ATA_pred).max(), np.absolute(ATA).max())
4.5155254264533315 4.5155254264533315
In [69]:
assert(((ATA_pred-ATA)**2).sum() < 1.e-16)
In [70]:
#fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(impulse).max()
vmax = np.absolute(ATA).max()
#ax.imshow(impulse, cmap=plt.gray(), vmin=-vmax, vmax=vmax);
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width/phi))
axs[0].imshow(ATA[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.gray(), vmin=-vmax, vmax=vmax); 
axs[1].imshow(ATA_pred[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.gray(), vmin=-vmax, vmax=vmax); 
In [71]:
print('impulse.max()', impulse.max())
print('envelope.mean()', envelope.mean())
print('ATA.max()', ATA.max())
print('(envelope**2).mean()', (envelope**2).mean())
impulse.max() 0.3466512040353435
envelope.mean() 0.34665120403534355
ATA.max() 4.5155254264533315
(envelope**2).mean() 4.5155254264533315
In [72]:
# envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
x_noise = x + noise * rng.normal(size=x.shape)
N_X, N_Y =  (256, 256)
envelope.shape =  (256, 256)
events.shape =  (256, 256)
x.shape= (256, 256)
In [73]:
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width/phi))
axs[0].imshow(envelope[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.plasma()); 
axs[1].imshow(envelope[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)]**2, cmap=plt.plasma()); 

Let's project the image

In [74]:
events_pred = model(envelope, x_noise, verbose=True) #/ 26.80927 # / np.sqrt(((envelope)**2).mean()) 

print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))
envelope.shape =  (256, 256)
events.shape =  (256, 256)
x.shape= (256, 256)
relative Squared image error (full)= 117306.98132599112
relative L1 error (full)= 765.910072564179
relative Squared error (full)= 1487.9376011900608
In [75]:
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred_thr  * ((events_pred_thr < -threshold) + (events_pred_thr > threshold))

print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))
relative L1 error (thr)= 1.0
relative Squared error (thr)= 1.0
relative Squared image error (thr)= 1.0

Other stats for introspection:

In [76]:
print('Mean enveloppe energy=', (envelope**2).mean())
print('Squared enveloppe energy=', np.sqrt((envelope**2).mean()) )
print('relative Squared error=',SE_y(events, events_pred_thr))
print('energy rec=', ((events_pred)**2).sum())
print('energy input=', ((events)**2).sum())
print('ratio energy=', ((events_pred)**2).sum()/(events**2).sum())
print('Squared ratio energy=', np.sqrt((events_pred**2).sum()/(events**2).sum()))
Mean enveloppe energy= 4.5155254264533315
Squared enveloppe energy= 2.1249765708010364
relative Squared error= 1.0
energy rec= 16919556.629121374
energy input= 11305.850264944267
ratio energy= 1496.5311084636746
Squared ratio energy= 38.68502434358384
In [77]:
events, x = generative_model(envelope, seed=42)
x_noise = x + noise * rng.normal(size=x.shape)
In [78]:
events_pred = invert_r(x_noise, envelope)

threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
In [79]:
error = x_noise - model(envelope, events_pred)
grad = model(envelope, error)
grad_pred = model(envelope, x_noise) - model(envelope**2, events_pred)
In [80]:
assert(((grad-grad_pred)**2).sum() < 1.e-16)

We will include information about sparsity by modifying the gradient knowing the LASSO formulation (see https://blogs.princeton.edu/imabandit/2013/04/09/orf523-conditional-gradient-descent-and-structured-sparsity/ ) :

In [81]:
y_lambda_est = y_lambda
grad -= y_lambda_est * np.sign(events_pred)
In [82]:
N_iter = 300
eta = .05
start_iter = 10
rho_est = rho
y_lambda_est = 1. # y_lambda
sigma_noise = 2.e-3
In [83]:
N_scan = 15
default = dict(N_iter=N_iter, eta=eta, start_iter=start_iter, rho_est=rho_est, 
               y_lambda_est=y_lambda_est, sigma_noise=sigma_noise)
print('default=', default)
default= {'N_iter': 300, 'eta': 0.05, 'start_iter': 10, 'rho_est': 0.001, 'y_lambda_est': 1.0, 'sigma_noise': 0.002}
In [84]:
iters = range(N_iter)
SE_iter = np.zeros((N_iter, 2))
SE_thr = np.zeros((N_iter, 2))
# intialization
norm_enveloppe = 1# np.sqrt((envelope**2).mean())
#events_pred = 0. * model(envelope, x_noise) / norm_enveloppe
events_pred = invert_w(x_noise, envelope, sigma_noise=noise)

threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
events_pred = events_pred_thr #np.zeros_like(x_noise)

# iterations
for i_iter in iters:
    # predicting
    grad = model(envelope, x_noise) - model(envelope**2, events_pred)
    grad -= y_lambda_est * np.sign(events_pred)
    events_pred += eta / (i_iter+start_iter) * grad

    # thresholding
    threshold = np.quantile(np.absolute(events_pred).ravel(), 1.-rho_est)
    events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))

    # measuring efficiency
    #SE_iter[i_iter, 0] = np.sqrt((error**2).sum()/(x**2).sum())
    SE_iter[i_iter, 0] = np.sqrt(((x-model(envelope, events_pred))**2).sum()/(x**2).sum())
    SE_iter[i_iter, 1] = np.sqrt(((events-events_pred)**2).sum()/(events**2).sum())

    SE_thr[i_iter, 0] = np.sqrt(((x-model(envelope, events_pred_thr))**2).sum()/(x**2).sum())
    SE_thr[i_iter, 1] = np.sqrt(((events-events_pred_thr)**2).sum()/(events**2).sum())
In [85]:
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].plot(iters, SE_iter[:, i], label='full')
    ax[i].plot(iters, SE_thr[:, i], label='thrs')
    ax[i].set_xlabel('iterations')
    ax[i].set_ylabel(SE_l)
    #ax[i].set_xscale('log')
    ax[i].set_yscale('log')
ax[0].legend(loc='best');