In this post we would try to show how one could infer the sparse representation of an image knowing an appropriate generative model for its synthesis. We will start by a linear inversion (pseudo-inverse deconvolution), and then move to a gradient descent algorithm. Finally, we will implement a convolutional version of the iterative shrinkage thresholding algorithm (ISTA) and its fast version, the FISTA.

For computational efficiency, all convolutions will be implemented by a Fast Fourier Tansform, so that a standard convolution will be mathematically exactly similar. We will benchmark this on a realistic image size of $512 \times 512$ giving some timing results on a standard laptop.

Let's first initialize the notebook:

In [1]:

from __future__ import division, print_function
import numpy as np
np.set_printoptions(precision=6, suppress=True)
import os
%matplotlib inline
#%config InlineBackend.figure_format='retina'
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
phi = (np.sqrt(5)+1)/2
fig_width = 15
figsize = (fig_width, fig_width/phi)
%load_ext autoreload
%autoreload 2

problem statement: sparse events in a static image¶

In the first case that we will study, images are produced by the combination of a single kernel with sparse events. Think of it like some brushstrokes on a canvas, somes drops of rain that fell on the surface of a pond...

For simplicity, the combination of these events will be considered linear. This is mostly true for instance in watercolor or in combining transparent surfaces and by ignoring saturation of the sensor for instance. To handle linearity, we will use linear algebra and matrices: We will denote as $x$ the (raveled) image and these "events" as $y$. The kernels that define the brushstrokes is denoted as $A$ and will be explicited below. Generally, in matrix form, the genrativve model of the image is written:

$$ x = A y $$

First, coefficients are drawn from a Laplace distribution and sparse events are generated by thresholding these coefficients. These will be convolved with a kernel defined using the MotionCloudslibrary. Let's first illustrate that for a small image size:

In [2]:

N_X, N_Y = 2**9, 2**9 # BENCHMARK
# N_X, N_Y = 2**8, 2**8 # DEBUG

rho = 1.e-3
sf_0 = .15
B_sf = .15
y_lambda = 1.61803

In [3]:

seed = 2020
rng = np.random.RandomState(seed)
events = y_lambda * rng.laplace(size=(N_X, N_Y))

In [4]:

events_max = np.max(np.abs(events))
fig, ax = plt.subplots(figsize=figsize)
ax.hist(events.ravel(), bins=np.linspace(-events_max, events_max, 100, endpoint=True))
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');

In [5]:

print('mean, std=', events.mean(), events.std())

mean, std= -0.008859802575012121 2.2900579569626447

From this continuous distribution of coefficients, we will zero out to achieve a desired sparsity (in the $\ell_0$ pseudoi-norm sense):

In [6]:

threshold = np.quantile(np.absolute(events).ravel(), 1-rho)
print('threshold=', threshold)

threshold= 11.432255842859089

In [7]:

events_thr = events  * ((events < -threshold) + (events > threshold))

In [8]:

fig, ax = plt.subplots(figsize=figsize)
ax.hist(events_thr.ravel(), bins=np.linspace(-events_max, events_max, 100, endpoint=True))
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');

In [9]:

print('mean, std=', events_thr.mean(), events_thr.std())

mean, std= 0.00031693423294735503 0.4103409801780267

In [10]:

fig, ax = plt.subplots(figsize=(fig_width, fig_width))
ax.imshow(events_thr, cmap=plt.viridis());

Let's now generate the image by convolving it with an isotropic kernel, as defined in the https://github.com/NeuralEnsemble/MotionClouds library:

In [11]:

import MotionClouds as mc
fx, fy, ft = mc.get_grids(N_X, N_Y, 1)
opts = dict(V_X=0., V_Y=0., B_V=0, B_theta=np.inf, sf_0=sf_0, B_sf=B_sf)
envelope = mc.envelope_gabor(fx, fy, ft, **opts).squeeze()
#env_smooth = mc.retina(fx, fy, ft, df=0.07)[:, :, 0]

In [12]:

fig, ax = plt.subplots(figsize=(fig_width, fig_width))
#ax.imshow(envelope[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.plasma()); 
ax.imshow(envelope, cmap=plt.plasma());

(This shape reminds me something... DOOH!)

Then we use the Fourier transform to actually perform the convolution (as implemented in the Motion Clouds library):

In [13]:

x = mc.random_cloud(envelope[:, :, None], events=events_thr[:, :, None])
x = x.reshape((N_X, N_Y))
print('x.shape=', x.shape)

x.shape= (512, 512)

This has all the nice properties of the 2D Discrete Fourier Transform, see for example this textbook (in French).

In [14]:

fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(x).max()
ax.imshow(x, cmap=plt.gray(), vmin=-vmax, vmax=vmax);

In [15]:

x_max = np.max(np.abs(x))
fig, ax = plt.subplots(figsize=figsize)
ax.hist(x.ravel(), bins=np.linspace(-x_max, x_max, 100, endpoint=True))
ax.set_xlabel('luminance')
ax.set_yscale('log')
ax.set_ylabel('#');

Note: one could have also used https://en.wikipedia.org/wiki/Shot_noise

All in one function:

In [16]:

def MC_env(N_X, N_Y, opts=dict(V_X=0., V_Y=0., B_V=0, B_theta=np.inf, sf_0=sf_0, B_sf=B_sf, alpha=0), do_norm=True, verbose=False):
    fx, fy, ft = mc.get_grids(N_X, N_Y, 1)
    envelope = mc.envelope_gabor(fx, fy, ft, **opts).squeeze()
    envelope /= envelope.max()
    if do_norm: envelope /= np.sqrt((envelope**2).sum())
    if verbose: print('(envelope**2).sum()=', (envelope**2).sum())
    return envelope
envelope = MC_env(N_X, N_Y, verbose=True)

(envelope**2).sum()= 1.0000000000000002

In [17]:

%%timeit
envelope = MC_env(N_X, N_Y)

13.5 ms ± 398 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [18]:

def random_cloud(envelope, events):
    (N_X, N_Y) = envelope.shape
    #fx, fy, ft = mc.get_grids(N_X, N_Y, N_frame)    
    F_events = np.fft.fftn(events)
    F_events = np.fft.fftshift(F_events)
    
    Fz = F_events * envelope
    # de-centering the spectrum
    Fz = np.fft.ifftshift(Fz)
    #Fz[0, 0, 0] = 0. # removing the DC component
    z = np.fft.ifftn(Fz).real
    return z

In [19]:

%%timeit
random_cloud(envelope, events_thr)

15 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [20]:

def model(envelope, events, verbose=False):
    if verbose: print('envelope.shape = ', envelope.shape)
    if verbose: print('events.shape = ', events.shape)
    N_X, N_Y = envelope.shape
    x = random_cloud(envelope, events=events)
    #x = x.reshape((N_X, N_Y))
    if verbose: print('x.shape=', x.shape)
    return x
x = model(envelope, events_thr, verbose=True)

envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)

In [21]:

%%timeit
model(envelope, events_thr)

14.7 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Merging both functions:

In [22]:

def model(envelope, events, verbose=False):
    if verbose: print('envelope.shape = ', envelope.shape)
    if verbose: print('events.shape = ', events.shape)
    N_X, N_Y = envelope.shape
    F_events = np.fft.fftn(events)
    F_events = np.fft.fftshift(F_events)
    Fz = F_events * envelope
    Fz = np.fft.ifftshift(Fz)
    x = np.fft.ifftn(Fz).real
    if verbose: print('x.shape=', x.shape)
    return x
x = model(envelope, events_thr, verbose=True)

envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)

In [23]:

%%timeit
model(envelope, events_thr)

14.9 ms ± 433 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [24]:

def generative_model(envelope, rho=rho, y_lambda=y_lambda, seed=seed, verbose=False):
    N_X, N_Y = envelope.shape
    if verbose: print('N_X, N_Y = ', envelope.shape)
    rng = np.random.RandomState(seed)
    events = y_lambda * rng.laplace(size=(N_X, N_Y))

    threshold = np.quantile(np.absolute(events).ravel(), 1.-rho)
    events = events  * ((events < -threshold) + (events > threshold))

    x = model(envelope, events, verbose=verbose)
    return events, x
events, x = generative_model(envelope, verbose=True)

N_X, N_Y =  (512, 512)
envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)

In [25]:

fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(x).max()
ax.imshow(x, cmap=plt.gray(), vmin=-vmax, vmax=vmax);

In [26]:

envelope = MC_env(N_X, N_Y)

In [27]:

%%timeit
events, x = generative_model(envelope)

25.9 ms ± 591 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Now, how do I retrieve the events knowing x? This is a form of deconvolution:

Deconvolution in the noiseless case: invertibility of the image to retrieve events¶

It's relatively easy to use the shape of the enveloppe to retrieve the events:

In [28]:

eps = 2.e-6
thr = 5.e-2
def deconv(envelope, eps=eps, thr=thr, do_norm=False):
    # mask = 1 / ( 1 + np.exp( -(envelope/envelope.max() - thr) / eps ) )# coefficients near zero
    fr = mc.frequency_radius(fx, fy, ft).squeeze()
    mask =  1 / ( 1 + np.exp( (fr - .45) / .025 ) ) # high frequency
    mask *=  1 / ( 1 + np.exp( -(fr - thr) / .01 ) )# low frequency
    
    F_deconv = envelope / (envelope**2 + eps*(1-mask)) # avoid division by zero
    F_deconv *= mask
    if do_norm: F_deconv /= np.sqrt((F_deconv**2).sum())    
    return F_deconv / ((envelope*F_deconv)).mean()

F_deconv = deconv(envelope, eps=eps, thr=thr)

fig, ax = plt.subplots(figsize=(fig_width, fig_width))
#cmap = ax.imshow(F_deconv[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.plasma())
cmap = ax.imshow(F_deconv, cmap=plt.plasma())
plt.colorbar(cmap, shrink=.8);

In [29]:

fig, ax = plt.subplots(figsize=(fig_width, fig_width))
cmap = ax.imshow(envelope*F_deconv, cmap=plt.plasma())
plt.colorbar(cmap, shrink=.8);

In [30]:

# generate
envelope = MC_env(N_X, N_Y)
events = np.zeros((N_X, N_Y))
events[N_X//2, N_Y//2] = 1.
impulse = model(envelope, events)
# invert
Fx = np.fft.fft2(impulse)
F_deconv = deconv(envelope, eps=eps, thr=thr)
# applying the filter
F_events = Fx*np.fft.ifftshift(F_deconv)
# de-centering the spectrum
impulse_pred = np.fft.ifft2(F_events).real 
print('impulse_pred.max()', impulse_pred.max(), 'impulse_pred[N_X//2, N_Y//2]', impulse_pred[N_X//2, N_Y//2])

impulse_pred.max() 1.0 impulse_pred[N_X//2, N_Y//2] 1.0

In [31]:

fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width))
axs[0].imshow(impulse, cmap=plt.gray())
axs[1].imshow(impulse_pred, cmap=plt.viridis());

In [32]:

def invert(x, envelope, eps=eps, thr=thr):
    Fx = np.fft.fft2(x)
    F_deconv = deconv(envelope, eps=eps, thr=thr)
    # applying the filter
    # de-centering the spectrum
    F_events = Fx*np.fft.ifftshift(F_deconv)
    events = np.fft.ifft2(F_events).real 
    return events

envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)

events_pred = invert(x, envelope)

N_X, N_Y =  (512, 512)
envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)

In [33]:

fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width))
axs[0].imshow(events, cmap=plt.viridis())
axs[1].imshow(events_pred, cmap=plt.viridis());

In [34]:

fig, ax = plt.subplots(figsize=figsize)
counts, bins, _ = ax.hist(events.ravel(), fc='b', bins=100, alpha=.5)
counts, bins, _ = ax.hist(events_pred.ravel(), fc='r', bins=bins, alpha=.5)
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');

Some statistics to show the efficiency of this method:

In [35]:

print('mean, std=', events.mean(), events.std())

mean, std= 0.00031693423294735503 0.4103409801780267

In [36]:

print('mean, std=', events_pred.mean(), events_pred.std())

mean, std= 8.131516293641283e-20 0.5460662014027519

In [37]:

#print('enveloppe energy=', np.sqrt(((envelope)**2).mean()) )
def SE_y(events, events_pred):
    return ((events-events_pred)**2).sum()/(events**2).sum()
print('relative Squared error on coefficients: SE_y=', SE_y(events, events_pred))
def L1_y(events, events_pred):
    return np.abs(events-events_pred).sum()/np.abs(events).sum()
print('relative L1 error=', L1_y(events, events_pred))
#print('energy rec=', ((events_pred)**2).sum())
#print('energy input=', ((events.squeeze())**2).sum())
#print('ratio energy=', ((events_pred)**2).sum()/(events**2).sum())
#print('Squared ratio energy=', np.sqrt((events**2).sum()/(events_pred**2).sum()))

relative Squared error on coefficients: SE_y= 0.7709950804764737
relative L1 error= 5.898850045732038

In [38]:

def SE_x(x, envelope, events_pred):
    return ((x-model(envelope, events_pred))**2).sum()/(x**2).sum()
print('relative Squared image error=', SE_x(x, envelope, events_pred))

relative Squared image error= 0.8535097755381738

Reconstruction is perfect up to the machine's numerical precision. This is true because the problem is invertible in that instance and that the invert function implements a generalization of the pseudo-inverse $A^+ = (A^T A )^{-1} A$ of the linear transform $x = A y$ generalized to the convolutional case defined in the model function.

Note that a FFT and convolution operations are linear operators, follonwing the fact that A is a Toeplitz matrix derived from the kernel we used.

This method is fast:

In [39]:

envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope)

In [40]:

%%timeit
events_pred = invert(x, envelope)

24.5 ms ± 1.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Instead of using the standard fft as in

In [41]:

%timeit np.fft.fft2(x)

6.44 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

... one could use the knowledge that the input signal is real:

In [42]:

%timeit np.fft.rfft2(x)

3.98 ms ± 156 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Specifically, it is necessary to reshape the enveloppe in frequency space, see https://docs.scipy.org/doc/numpy/reference/generated/numpy.fft.rfft.html :

In [43]:

np.fft.rfft2(x).shape

Out[43]:

(512, 257)

such that finally:

In [44]:

def invert_r(x, envelope, eps=eps, thr=thr):
    Fx = np.fft.rfft2(x)
    F_deconv = deconv(envelope, eps=eps, thr=thr)
    F_deconv = np.fft.ifftshift(F_deconv)[:, :(envelope.shape[1]//2+1):]
    # applying the filter
    # de-centering the spectrum
    F_events = Fx*F_deconv
    events = np.fft.irfft2(F_events).real 
    return events

#mc.N_X, mc.N_Y = 2**7, 2**7
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)

events_pred = invert_r(x, envelope)
print('relative L1 error=', L1_y(events, events_pred))
print('relative Squared error=', SE_y(events, events_pred))
print('relative Squared image error=', SE_x(x, envelope, events_pred))

N_X, N_Y =  (512, 512)
envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)
relative L1 error= 5.898850045732038
relative Squared error= 0.7709950804764737
relative Squared image error= 0.8535097755381738

In [45]:

fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width))
axs[0].imshow(events, cmap=plt.viridis())
axs[1].imshow(events_pred, cmap=plt.viridis());

In [46]:

envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)

N_X, N_Y =  (512, 512)
envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)

In [47]:

%%timeit
events_pred = invert_r(x, envelope)

17.8 ms ± 1.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Deconvolution in the noisy case (colored noise): (partial) failure to retrieve sparse coefficients¶

In [48]:

envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)

N_X, N_Y =  (512, 512)
envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)

In [49]:

def add_noise(x, noise, B_sf=.5, do_white=False):
    if do_white:
        im_noise = rng.normal(size=x.shape)
    else:
        env_noise = mc.envelope_gabor(fx, fy, ft, B_theta=np.inf, sf_0=sf_0, B_sf=B_sf)
        im_noise = mc.random_cloud(env_noise).squeeze()
        im_noise /= im_noise.std()

    x_noise = x + noise * x.std() * im_noise
    
    return x_noise

noise = .5
x_noise = add_noise(x, noise)

In [50]:

fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(x_noise).max()
ax.imshow(x_noise, cmap=plt.gray(), vmin=-vmax, vmax=vmax);

In [51]:

x_noise = add_noise(x, noise)
events_pred = invert(x_noise, envelope)
print('relative L1 error=', L1_y(events, events_pred))
print('relative Squared error=', SE_y(events, events_pred))
print('relative Squared image error=', SE_x(x, envelope, events_pred))

relative L1 error= 21.712250715957364
relative Squared error= 1.2835350862888342
relative Squared image error= 1.518171234489418

Error as a function of noise level:

In [52]:

envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)

N_X, N_Y =  (512, 512)
envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)

In [53]:

N_scan = 15
SE = np.zeros((N_scan, 2))
noises = np.logspace(-2, .5, N_scan, base=10, endpoint=True)
for i_noise, noise_ in enumerate(noises):
    x_noise = add_noise(x, noise_)
    events_pred = invert(x_noise, envelope)
    SE[i_noise, 0] = SE_x(x, envelope, events_pred)
    SE[i_noise, 1] = SE_y(events, events_pred)

# recording for future comparisons
SE_noise = {'full': SE}

In [54]:

fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].loglog(noises, np.ones_like(SE[:, i]), 'k--')
    ax[i].loglog(noises, SE[:, i], label='full')
    ax[i].set_xlabel('noise')
    ax[i].set_ylabel(SE_l);
ax[0].legend(loc='lower right');

In [55]:

x_noise = add_noise(x, noise)
events_pred = invert(x_noise, envelope)
print('relative L1 error=', L1_y(events, events_pred))
print('relative Squared error=', SE_y(events, events_pred))
print('relative Squared image error=', SE_x(x, envelope, events_pred))

relative L1 error= 21.62688356397757
relative Squared error= 1.2790255656451477
relative Squared image error= 1.5243018532816033

Even for this simple model, the error grows too fast wrt to noise. We need to find a solution for handling this. A first solution comes from inspecting the histogram of events' coefficients:

In [56]:

fig, ax = plt.subplots(figsize=figsize)
counts, bins, _ = ax.hist(events.ravel(), fc='b', bins=100, alpha=.5)
counts, bins, _ = ax.hist(events_pred.ravel(), fc='r', bins=bins, alpha=.5)
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');

Konwing the sparsity of the input, one could select only the proportion of coefficients which are big enough and remove the other ones:

In [57]:

threshold = np.quantile(np.abs(events_pred).ravel(), 1-rho)
print('threshold=', threshold)

threshold= 10.14246437921788

In [58]:

threshold_minus = np.quantile(events_pred.ravel(), rho/2)
threshold_plus = np.quantile(events_pred.ravel(), 1.-rho/2)
print('threshold_minus=', threshold_minus, 'threshold_plus=', threshold_plus)

threshold_minus= -6.99502093139905 threshold_plus= 11.1569027723831

In [59]:

events_pred_thr = events_pred  * ((events_pred < threshold_minus) + (events_pred > threshold_plus))
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))

relative L1 error (full)= 21.62688356397757
relative Squared error (full)= 1.2790255656451477
relative L1 error (thr)= 0.038263579961269924
relative Squared error (thr)= 0.01437538545902251

In [60]:

fig, ax = plt.subplots(figsize=figsize)
counts, bins, _ = ax.hist(events.ravel(), fc='b', bins=100, alpha=.5)
counts, bins, _ = ax.hist(events_pred_thr.ravel(), fc='r', bins=bins, alpha=.5)
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');

In [61]:

print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))

relative Squared image error (full)= 1.5243018532816033
relative Squared image error (thr)= 0.01693486843514906

In [62]:

def L0_y(events, events_pred):
    error = ((1- (events==0))  * (events_pred==0)).sum() # Type I error : false positive
    error += ((1- (events_pred==0))  * (events==0)).sum() # Type II error : false negative
    return error / N_X / N_Y #/ (1- (events==0)).sum()
print('relative L0 error=', L0_y(events, events_pred))

relative L0 error= 0.9989967346191406

Doing so globally, we get

In [63]:

threshold = np.quantile(np.absolute(events_pred).ravel(), 1.-rho)
print('threshold=', threshold)

threshold= 10.14246437921788

In [64]:

events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))

relative L1 error (thr)= 0.019930734264362066
relative Squared error (thr)= 0.0006575179777211669
relative Squared image error (thr)= 0.0006386922129840186

The result is worst in the image reconstruction: Indeed, the coefficients were found using the least-squared method for which the pseudo-inverse is the optimal solution. Deviating from this solution increases reconstruction error in the image domain.

Yet, it is better for the extraction of the sparse coefficients as we zero-ed out some of which we classified using our knowledge of the sparseness of the vector.

Yet, a problem is that we do not know a priori the sparseness. Let's see how this changes with different estimates, either by knowing the original or noisy images:

In [65]:

N_scan_rho = 45

x_noise = add_noise(x, noise)
events_pred = invert(x, envelope)
events_pred_noise = invert(x_noise, envelope)

SE_thr = np.zeros((N_scan_rho, 2))
SE_noisy = np.zeros((N_scan_rho, 2))
rhos = np.logspace(-5, -2, N_scan_rho, base=10, endpoint=True)
for i_rho, rho_ in enumerate(rhos):
    
    threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_)
    events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
    
    SE_thr[i_rho, 0] = SE_x(x, envelope, events_pred_thr)
    SE_thr[i_rho, 1] = SE_y(events_pred, events_pred_thr)

    threshold = np.quantile(np.absolute(events_pred_noise).ravel(), 1-rho_)
    events_pred_thr = events_pred_noise  * ((events_pred_noise < -threshold) + (events_pred_noise > threshold))

    SE_noisy[i_rho, 0] = SE_x(x_noise, envelope, events_pred_thr)
    SE_noisy[i_rho, 1] = SE_y(events_pred_noise, events_pred_thr)

In [66]:

fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].loglog(rhos, np.ones_like(SE_thr[:, i]), 'k--')
    ax[i].loglog(rhos, SE_thr[:, i], label='Original')
    ax[i].loglog(rhos, SE_noisy[:, i], '--', label='Noisy')
    ax[i].set_xlabel('rho_est')
    ax[i].set_ylabel(SE_l);
ax[0].legend(loc='best');

Now assuming that we have an estimate of rho, let's see how this method compares to the previous one ('full') for varying levels of noise:

In [67]:

rho_est = rho
SE_thr = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
    x_noise = add_noise(x, noise_)
    
    events_pred = invert(x_noise, envelope)
    threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
    events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
    
    SE_thr[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
    SE_thr[i_noise, 1] = SE_y(events, events_pred_thr)

SE_noise['thr']= SE_thr

In [68]:

fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].plot(noises, np.ones(len(noises)), 'k--')
    for method in ['full', 'thr']:
        ax[i].plot(noises, SE_noise[method][:, i], label=method)
    ax[i].set_xlabel('noise')
    ax[i].set_ylabel(SE_l)
    ax[i].set_xscale('log')
    ax[i].set_yscale('log')
ax[0].legend(loc='best');

Thresholding coefficients gives a much better estimate of coefficients.

If we know the level of noise, one can also use Wiener filtering:

In [69]:

sigma_noise = noise

envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)

events_pred = invert(x, envelope, eps=sigma_noise * x.std())
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))

N_X, N_Y =  (512, 512)
envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)
relative L1 error (thr)= 0.21189481957141518
relative Squared error (thr)= 0.1132681320687763
relative Squared image error (full)= 4.484507063662472

In [70]:

fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width))
axs[0].imshow(events, cmap=plt.viridis())
axs[1].imshow(events_pred, cmap=plt.viridis());

In [71]:

fig, ax = plt.subplots(figsize=figsize)
counts, bins, _ = ax.hist(events.ravel(), fc='b', bins=100, alpha=.5)
counts, bins, _ = ax.hist(events_pred.ravel(), fc='r', bins=bins, alpha=.5)
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');

In [72]:

rho_est = rho
SE_thr = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
    x_noise = add_noise(x, noise_)
    
    events_pred = invert(x_noise, envelope, eps=noise_ * x.std())
    threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
    events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))

    SE_thr[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
    SE_thr[i_noise, 1] = SE_y(events, events_pred_thr)

SE_noise['wie']= SE_thr

In [73]:

fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].plot(noises, np.ones(len(noises)), 'k--')
    for method in ['full', 'thr', 'wie']:
        ax[i].plot(noises, SE_noise[method][:, i], label=method)
    ax[i].set_xlabel('noise')
    ax[i].set_ylabel(SE_l)
    ax[i].set_xscale('log')
    ax[i].set_yscale('log')
ax[0].legend(loc='best');

Could we repeat this procedure more times?

A first solution is to define a global cost as the LASSO problem :

$$ \mathcal{C} = .5 \cdot \| x - A y \|^2 + \lambda \| y \|_1 $$

And to apply a gradient descent technique on this cost:

conjugate gradient descent algorithm¶

setting up variables:

In [74]:

envelope = MC_env(N_X, N_Y)
events = np.zeros((N_X, N_Y))
events[N_X//2, N_Y//2] = 1
impulse = model(envelope, events)
ATA = model(envelope, impulse)
ATA_pred = model(envelope**2, events)

In [75]:

print(np.absolute(ATA_pred).max(), np.absolute(ATA).max())

3.814697265625001e-06 3.814697265625e-06

In [76]:

assert(((ATA_pred-ATA)**2).sum() < 1.e-16)

In [77]:

#fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(impulse).max()
vmax = np.absolute(ATA).max()
#ax.imshow(impulse, cmap=plt.gray(), vmin=-vmax, vmax=vmax);
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width/phi))
axs[0].imshow(ATA[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.gray(), vmin=-vmax, vmax=vmax); 
axs[1].imshow(ATA_pred[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.gray(), vmin=-vmax, vmax=vmax);

In [78]:

print('impulse.max()', impulse.max())
print('envelope.mean()', envelope.mean())
print('ATA.max()', ATA.max())
print('(envelope**2).mean()', (envelope**2).mean())

impulse.max() 0.0012951797917240277
envelope.mean() 0.0012951797917240284
ATA.max() 3.814697265625e-06
(envelope**2).mean() 3.814697265625001e-06

In [79]:

# envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
x_noise = add_noise(x, noise)

N_X, N_Y =  (512, 512)
envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)

In [80]:

fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width/phi))
#axs[0].imshow(envelope[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.plasma()); 
#axs[1].imshow(envelope[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)]**2, cmap=plt.plasma()); 
axs[0].imshow(envelope, cmap=plt.plasma()); 
axs[1].imshow(envelope**2, cmap=plt.plasma());

Let's project the image

In [81]:

events_pred = model(envelope, x_noise, verbose=True) #/ 26.80927 # / np.sqrt(((envelope)**2).mean()) 

print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))

envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)
relative Squared image error (full)= 0.9999602187033221
relative L1 error (full)= 1.000140303243882
relative Squared error (full)= 0.9999923618964663

In [82]:

rho_est = rho
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred_thr  * ((events_pred_thr < -threshold) + (events_pred_thr > threshold))

print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))

relative L1 error (thr)= 0.5828322420932677
relative Squared error (thr)= 0.48138052189460406
relative Squared image error (thr)= 0.6334377713182282

Other stats for introspection:

In [83]:

print('Mean enveloppe energy=', (envelope**2).mean())
print('Squared enveloppe energy=', np.sqrt((envelope**2).mean()) )
print('relative Squared error=',SE_y(events, events_pred_thr))
print('energy rec=', ((events_pred)**2).sum())
print('energy input=', ((events)**2).sum())
print('ratio energy=', ((events_pred)**2).sum()/(events**2).sum())
print('Squared ratio energy=', np.sqrt((events_pred**2).sum()/(events**2).sum()))

Mean enveloppe energy= 3.814697265625001e-06
Squared enveloppe energy= 0.001953125
relative Squared error= 0.48138052189460406
energy rec= 4.00839680485629e-06
energy input= 44139.75965486855
ratio energy= 9.081147781950303e-11
Squared ratio energy= 9.529505644024931e-06

In [84]:

events, x = generative_model(envelope, seed=42)
x_noise = add_noise(x, noise)

In [85]:

events_pred = invert(x_noise, envelope)

threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))

In [86]:

error = x_noise - model(envelope, events_pred)
grad = model(envelope, error)
grad_pred = model(envelope, x_noise) - model(envelope**2, events_pred)

In [87]:

assert(((grad-grad_pred)**2).sum() < 1.e-16)

We will include information about sparsity by modifying the gradient knowing the LASSO formulation (see https://blogs.princeton.edu/imabandit/2013/04/09/orf523-conditional-gradient-descent-and-structured-sparsity/ ) :

In [88]:

y_lambda_est = y_lambda
grad -= y_lambda_est * np.sign(events_pred)

In [89]:

N_iter = 1000
eta = .01
start_iter = 10
rho_est = rho
y_lambda_est = 1. # y_lambda

In [90]:

N_scan = 15
default = dict(N_iter=N_iter, eta=eta, start_iter=start_iter, rho_est=rho_est, 
               y_lambda_est=y_lambda_est)
print('default =', default)

default = {'N_iter': 1000, 'eta': 0.01, 'start_iter': 10, 'rho_est': 0.001, 'y_lambda_est': 1.0}

In [91]:

iters = range(N_iter)
SE_iter = np.zeros((N_iter, 2))
SE_thr = np.zeros((N_iter, 2))
# intialization
norm_enveloppe = 1# np.sqrt((envelope**2).mean())
#events_pred = 0. * model(envelope, x_noise) / norm_enveloppe
events_pred = invert(x_noise, envelope)

# threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
# events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
# events_pred = events_pred_thr #np.zeros_like(x_noise)

# iterations
for i_iter in iters:
    # predicting
    grad = model(envelope, x_noise) - model(envelope**2, events_pred)
    grad -= y_lambda_est * np.sign(events_pred)
    events_pred += eta / (i_iter+start_iter) * grad

    # measuring efficiency
    SE_iter[i_iter, 0] = SE_x(x, envelope, events_pred)
    SE_iter[i_iter, 1] = SE_y(events, events_pred)        

    # thresholding
    threshold = np.quantile(np.absolute(events_pred).ravel(), 1.-rho_est)
    events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))

    SE_thr[i_iter, 0] = SE_x(x, envelope, events_pred_thr)
    SE_thr[i_iter, 1] = SE_y(events, events_pred_thr)

In [92]:

fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].plot(iters, SE_iter[:, i], label='full')
    ax[i].plot(iters, SE_thr[:, i], '--', label='thr')
    ax[i].set_xlabel('iterations')
    ax[i].set_ylabel(SE_l)
    #ax[i].set_xscale('log')
    ax[i].set_yscale('log')
ax[0].legend(loc='best');

Let's now test this function over its different parameters:

In [93]:

events, x = generative_model(envelope, seed=42)
x_noise = add_noise(x, noise)


def cg(x_noise, envelope, 
       N_iter=N_iter, eta=eta, start_iter=start_iter, rho_est=rho_est, 
       y_lambda_est=y_lambda_est):  
    
    #events_pred = np.zeros_like(x_noise)
    events_pred = invert(x_noise, envelope)
    #threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
    #events_pred *= ((events_pred < -threshold) + (events_pred > threshold))

    iters = range(int(N_iter))

    # iterations
    for i_iter in iters:
        # predicting
        grad = model(envelope, x_noise) - model(envelope**2, events_pred)
        grad -= y_lambda_est * np.sign(events_pred)
        events_pred += eta/(i_iter+int(start_iter)) * grad
        
    return events_pred

Testing with and without the sparsity:

In [94]:

events_pred = cg(x_noise, envelope, y_lambda_est=y_lambda_est)
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))

print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))

relative L1 error (full)= 18.500830950464188
relative Squared error (full)= 1.1431711547236103
relative Squared image error (full)= 1.3258956031599767
relative L1 error (thr)= 0.020796171513273086
relative Squared error (thr)= 0.0017684529114802398
relative Squared image error (thr)= 0.002634544977301539

In [95]:

events_pred = cg(x_noise, envelope, y_lambda_est=0)
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))

print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))

relative L1 error (full)= 21.884097998802357
relative Squared error (full)= 1.2864606042875937
relative Squared image error (full)= 1.5113587512109732
relative L1 error (thr)= 0.020353119402202434
relative Squared error (thr)= 0.0017621233158293223
relative Squared image error (thr)= 0.002646032322600538

In [96]:

noise

Out[96]:

0.5

In [97]:

SE = {}
SE_thr = {}
for variable in default.keys():    
    SE[variable] = np.zeros((N_scan, 2))
    SE_thr[variable] = np.zeros((N_scan, 2))
    values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
    for i_value, value in enumerate(values):
        opts = default.copy()
        opts[variable] = value
        print('variable', variable, '=', '%.4f' % opts[variable], ' - default =', '%.4f' % default[variable])
        x_noise = add_noise(x, noise)
        events_pred = cg(x_noise, envelope, **opts)
       
        SE[variable][i_value, 0] = SE_x(x, envelope, events_pred)
        SE[variable][i_value, 1] = SE_y(events, events_pred)        
    
        threshold = np.quantile(np.absolute(events_pred).ravel(), 1-opts['rho_est'])
        events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))

        SE_thr[variable][i_value, 0] = SE_x(x, envelope, events_pred_thr)
        SE_thr[variable][i_value, 1] = SE_y(events, events_pred_thr)

variable N_iter = 100.0000  - default = 1000.0000
variable N_iter = 138.9495  - default = 1000.0000
variable N_iter = 193.0698  - default = 1000.0000
variable N_iter = 268.2696  - default = 1000.0000
variable N_iter = 372.7594  - default = 1000.0000
variable N_iter = 517.9475  - default = 1000.0000
variable N_iter = 719.6857  - default = 1000.0000
variable N_iter = 1000.0000  - default = 1000.0000
variable N_iter = 1389.4955  - default = 1000.0000
variable N_iter = 1930.6977  - default = 1000.0000
variable N_iter = 2682.6958  - default = 1000.0000
variable N_iter = 3727.5937  - default = 1000.0000
variable N_iter = 5179.4747  - default = 1000.0000
variable N_iter = 7196.8567  - default = 1000.0000
variable N_iter = 10000.0000  - default = 1000.0000
variable eta = 0.0010  - default = 0.0100
variable eta = 0.0014  - default = 0.0100
variable eta = 0.0019  - default = 0.0100
variable eta = 0.0027  - default = 0.0100
variable eta = 0.0037  - default = 0.0100
variable eta = 0.0052  - default = 0.0100
variable eta = 0.0072  - default = 0.0100
variable eta = 0.0100  - default = 0.0100
variable eta = 0.0139  - default = 0.0100
variable eta = 0.0193  - default = 0.0100
variable eta = 0.0268  - default = 0.0100
variable eta = 0.0373  - default = 0.0100
variable eta = 0.0518  - default = 0.0100
variable eta = 0.0720  - default = 0.0100
variable eta = 0.1000  - default = 0.0100
variable start_iter = 1.0000  - default = 10.0000
variable start_iter = 1.3895  - default = 10.0000
variable start_iter = 1.9307  - default = 10.0000
variable start_iter = 2.6827  - default = 10.0000
variable start_iter = 3.7276  - default = 10.0000
variable start_iter = 5.1795  - default = 10.0000
variable start_iter = 7.1969  - default = 10.0000
variable start_iter = 10.0000  - default = 10.0000
variable start_iter = 13.8950  - default = 10.0000
variable start_iter = 19.3070  - default = 10.0000
variable start_iter = 26.8270  - default = 10.0000
variable start_iter = 37.2759  - default = 10.0000
variable start_iter = 51.7947  - default = 10.0000
variable start_iter = 71.9686  - default = 10.0000
variable start_iter = 100.0000  - default = 10.0000
variable rho_est = 0.0001  - default = 0.0010
variable rho_est = 0.0001  - default = 0.0010
variable rho_est = 0.0002  - default = 0.0010
variable rho_est = 0.0003  - default = 0.0010
variable rho_est = 0.0004  - default = 0.0010
variable rho_est = 0.0005  - default = 0.0010
variable rho_est = 0.0007  - default = 0.0010
variable rho_est = 0.0010  - default = 0.0010
variable rho_est = 0.0014  - default = 0.0010
variable rho_est = 0.0019  - default = 0.0010
variable rho_est = 0.0027  - default = 0.0010
variable rho_est = 0.0037  - default = 0.0010
variable rho_est = 0.0052  - default = 0.0010
variable rho_est = 0.0072  - default = 0.0010
variable rho_est = 0.0100  - default = 0.0010
variable y_lambda_est = 0.1000  - default = 1.0000
variable y_lambda_est = 0.1389  - default = 1.0000
variable y_lambda_est = 0.1931  - default = 1.0000
variable y_lambda_est = 0.2683  - default = 1.0000
variable y_lambda_est = 0.3728  - default = 1.0000
variable y_lambda_est = 0.5179  - default = 1.0000
variable y_lambda_est = 0.7197  - default = 1.0000
variable y_lambda_est = 1.0000  - default = 1.0000
variable y_lambda_est = 1.3895  - default = 1.0000
variable y_lambda_est = 1.9307  - default = 1.0000
variable y_lambda_est = 2.6827  - default = 1.0000
variable y_lambda_est = 3.7276  - default = 1.0000
variable y_lambda_est = 5.1795  - default = 1.0000
variable y_lambda_est = 7.1969  - default = 1.0000
variable y_lambda_est = 10.0000  - default = 1.0000

In [98]:

for variable in default.keys():    
    values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
    
    fig, ax = plt.subplots(1, 2, figsize=figsize)
    for i, SE_l in enumerate(['SE_x', 'SE_y']):
        ax[i].plot(values, np.ones_like(SE_thr[variable][:, i]), 'k--')
        ax[i].plot(values, SE[variable][:, i], label='full')
        ax[i].plot(values, SE_thr[variable][:, i], '--', label='thr')
        ax[i].set_xlabel(variable)
        ax[i].set_ylabel(SE_l);
        fig.suptitle(variable)
        ax[i].set_xscale('log')
        ax[i].set_yscale('log')
    ax[0].legend(loc='best');

And now for different levels of noise:

In [99]:

rho_est = rho
SE = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
    x_noise = add_noise(x, noise_)
    events_pred = cg(x_noise, envelope)
    
    threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
    events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))

    SE[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
    SE[i_noise, 1] = SE_y(events, events_pred_thr)

In [100]:

SE_noise['cg'] = SE

fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].plot(noises, np.ones(len(noises)), 'k--')
    ax[i].plot(noises, SE[:, i], 'g--')
    for method in ['full', 'thr', 'cg']:
        ax[i].plot(noises, SE_noise[method][:, i], label=method)
    ax[i].set_xlabel('noise')
    ax[i].set_ylabel(SE_l)
    ax[i].set_xscale('log')
    ax[i].set_yscale('log')
ax[0].legend(loc='best');

It converges to a better solution as the simple thresholding, but yet the convergence is slow. Another solution is deployed in iterative thresholding.

ISTA: iterative shrinkage-thresholding algorithm¶

Some nice pointers about ISTA:

the FISTA paper_files/Breck_2009.pdf)
lectures on FISTA http://stat.cmu.edu/~ryantibs/convexopt/lectures/prox-grad.pdf http://www.seas.ucla.edu/~vandenbe/236C/lectures/fista.pdf
https://blogs.princeton.edu/imabandit/2013/04/11/orf523-ista-and-fista/

An implementation in matlab: https://github.com/seunghwanyoo/ista_lasso - and one in python on a toy problem: https://gist.github.com/agramfort/ac52a57dc6551138e89b

As we have seen we can invert events from a noisy image, but the error is big...

In [101]:

env = MC_env(N_X, N_Y)
events, x = generative_model(env, verbose=True)

x_noise = add_noise(x, noise)
events_pred = invert(x_noise, envelope)
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))

N_X, N_Y =  (512, 512)
envelope.shape =  (512, 512)
events.shape =  (512, 512)
x.shape= (512, 512)
relative Squared image error (full)= 1.528970399238033
relative L1 error (full)= 21.828188780275898
relative Squared error (full)= 1.2919741704512568

One idea is to the same procedure iteratively, as we know that the events matrix is sparse:

In [102]:

def shrinkage(events_pred, threshold, stype='soft'):
    if stype=='hard':
        events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
    else:
        events_pred_thr = np.sign(events_pred) * np.maximum(np.abs(events_pred)-threshold, 0.)
    return events_pred_thr

fig, ax = plt.subplots(figsize=figsize)
events_pred = np.linspace(-1, 1, 200, endpoint=True)
for stype in ['hard', 'soft']:
    ax.plot(events_pred, shrinkage(events_pred, threshold=.25, stype=stype), label=stype, alpha=.5)

ax.legend(loc='best')
ax.set_xlabel('coefficient')
ax.set_ylabel('shrinked coefficient');

In [103]:

N_iter = 2000
eta = .005
y_lambda_est = 4.
rho_est = rho
start_iter = 2

In [104]:

N_scan = 15
default = dict(N_iter=N_iter, rho_est=rho_est, y_lambda_est=y_lambda_est, eta=eta, start_iter=start_iter)
print('default =', default)

default = {'N_iter': 2000, 'rho_est': 0.001, 'y_lambda_est': 4.0, 'eta': 0.005, 'start_iter': 2}

In [105]:

iters = range(N_iter)
SE_iter = np.zeros((N_iter, 2, 2))
SE_thr = np.zeros((N_iter, 2, 2))
for i_type, stype in enumerate(['soft', 'hard']):
    if True:
        # intialization - with the denoising solution:
        events_pred = invert(x_noise, envelope)
        # thresholding
        #threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
        #events_pred = shrinkage(events_pred, threshold=threshold, stype='hard')    
    else:
        events_pred = np.zeros_like(x_noise)

    AT_x_noise = model(envelope, x_noise)
    envelope2 = envelope**2
    # iterations
    for i_iter in iters:
        # predicting
        grad = AT_x_noise - model(envelope**2, events_pred)
        #grad = model(envelope, x_noise - model(envelope, events_pred))
        #grad -= y_lambda_est * np.sign(events_pred)
        
        eta_ = eta / np.sqrt(i_iter+start_iter)
        events_pred += eta_ * grad
        
        # shrinkage operator
        events_pred = shrinkage(events_pred, threshold=eta_*y_lambda_est, stype=stype)    
        
        # measuring efficiency
        SE_iter[i_iter, 0, i_type] = SE_x(x, envelope, events_pred)
        SE_iter[i_iter, 1, i_type] = SE_y(events, events_pred)        
        
        # thresholding
        threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
        events_pred_thr = shrinkage(events_pred, threshold=threshold, stype='hard')    

        SE_thr[i_iter, 0, i_type] = SE_x(x, envelope, events_pred_thr)
        SE_thr[i_iter, 1, i_type] = SE_y(events, events_pred_thr)

In [106]:

SE_iter[-1, ...]

Out[106]:

array([[0.283206, 1.528892],
       [0.232429, 1.291956]])

In [107]:

fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].plot(iters,np.ones_like(SE_iter[:, i, 0]), 'k--')
    for i_type, stype in enumerate(['soft', 'hard']):
        ax[i].plot(iters, SE_iter[:, i, i_type], label=stype + '_full')
        ax[i].plot(iters, SE_thr[:, i, i_type], '--', label=stype + '_thr')
        ax[i].set_xlabel('iterations')
        ax[i].set_ylabel(SE_l)
    #ax[i].set_xscale('log')
    ax[i].set_yscale('log')
ax[0].legend(loc='best');

Now testing ISTA to explore the effect of parameters:

In [108]:

def ISTA(x_noise, envelope, eta=eta, start_iter=start_iter, y_lambda_est=y_lambda_est, rho_est=rho_est, N_iter=N_iter, stype='soft'):
    # intialization
    events_pred = invert(x_noise, envelope)
    AT_x_noise = model(envelope, x_noise)
    envelope2 = envelope**2
    
    # iterations
    iters = range(int(N_iter))
    for i_iter in iters:
        # predicting
        grad = AT_x_noise - model(envelope2, events_pred)
        
        eta_ = eta / np.log(i_iter+start_iter)
        events_pred += eta_ * grad

        # thresholding
        events_pred = shrinkage(events_pred, threshold=eta_*y_lambda_est, stype=stype)            
        
    return events_pred

Testing different parameters:

In [109]:

SE = {}
SE_thr = {}
for variable in default.keys():    
    SE[variable] = np.zeros((N_scan, 2))
    SE_thr[variable] = np.zeros((N_scan, 2))
    values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
    for i_value, value in enumerate(values):
        opts = default.copy()
        opts[variable] = value
        print('variable', variable, '=', '%.3f' % opts[variable], ' - default =', '%.3f' % default[variable])
        x_noise = add_noise(x, noise)
        events_pred = ISTA(x_noise, envelope, **opts)

        SE[variable][i_value, 0] = SE_x(x, envelope, events_pred)
        SE[variable][i_value, 1] = SE_y(events, events_pred)        
    
        threshold = np.quantile(np.absolute(events_pred).ravel(), 1-opts['rho_est'])
        events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))

        SE_thr[variable][i_value, 0] = SE_x(x, envelope, events_pred_thr)
        SE_thr[variable][i_value, 1] = SE_y(events, events_pred_thr)

variable N_iter = 200.000  - default = 2000.000
variable N_iter = 277.899  - default = 2000.000
variable N_iter = 386.140  - default = 2000.000
variable N_iter = 536.539  - default = 2000.000
variable N_iter = 745.519  - default = 2000.000
variable N_iter = 1035.895  - default = 2000.000
variable N_iter = 1439.371  - default = 2000.000
variable N_iter = 2000.000  - default = 2000.000
variable N_iter = 2778.991  - default = 2000.000
variable N_iter = 3861.395  - default = 2000.000
variable N_iter = 5365.392  - default = 2000.000
variable N_iter = 7455.187  - default = 2000.000
variable N_iter = 10358.949  - default = 2000.000
variable N_iter = 14393.713  - default = 2000.000
variable N_iter = 20000.000  - default = 2000.000
variable rho_est = 0.000  - default = 0.001
variable rho_est = 0.000  - default = 0.001
variable rho_est = 0.000  - default = 0.001
variable rho_est = 0.000  - default = 0.001
variable rho_est = 0.000  - default = 0.001
variable rho_est = 0.001  - default = 0.001
variable rho_est = 0.001  - default = 0.001
variable rho_est = 0.001  - default = 0.001
variable rho_est = 0.001  - default = 0.001
variable rho_est = 0.002  - default = 0.001
variable rho_est = 0.003  - default = 0.001
variable rho_est = 0.004  - default = 0.001
variable rho_est = 0.005  - default = 0.001
variable rho_est = 0.007  - default = 0.001
variable rho_est = 0.010  - default = 0.001
variable y_lambda_est = 0.400  - default = 4.000
variable y_lambda_est = 0.556  - default = 4.000
variable y_lambda_est = 0.772  - default = 4.000
variable y_lambda_est = 1.073  - default = 4.000
variable y_lambda_est = 1.491  - default = 4.000
variable y_lambda_est = 2.072  - default = 4.000
variable y_lambda_est = 2.879  - default = 4.000
variable y_lambda_est = 4.000  - default = 4.000
variable y_lambda_est = 5.558  - default = 4.000
variable y_lambda_est = 7.723  - default = 4.000
variable y_lambda_est = 10.731  - default = 4.000
variable y_lambda_est = 14.910  - default = 4.000
variable y_lambda_est = 20.718  - default = 4.000
variable y_lambda_est = 28.787  - default = 4.000
variable y_lambda_est = 40.000  - default = 4.000
variable eta = 0.001  - default = 0.005
variable eta = 0.001  - default = 0.005
variable eta = 0.001  - default = 0.005
variable eta = 0.001  - default = 0.005
variable eta = 0.002  - default = 0.005
variable eta = 0.003  - default = 0.005
variable eta = 0.004  - default = 0.005
variable eta = 0.005  - default = 0.005
variable eta = 0.007  - default = 0.005
variable eta = 0.010  - default = 0.005
variable eta = 0.013  - default = 0.005
variable eta = 0.019  - default = 0.005
variable eta = 0.026  - default = 0.005
variable eta = 0.036  - default = 0.005
variable eta = 0.050  - default = 0.005
variable start_iter = 0.200  - default = 2.000
variable start_iter = 0.278  - default = 2.000
variable start_iter = 0.386  - default = 2.000
variable start_iter = 0.537  - default = 2.000
variable start_iter = 0.746  - default = 2.000
variable start_iter = 1.036  - default = 2.000
variable start_iter = 1.439  - default = 2.000
variable start_iter = 2.000  - default = 2.000
variable start_iter = 2.779  - default = 2.000
variable start_iter = 3.861  - default = 2.000
variable start_iter = 5.365  - default = 2.000
variable start_iter = 7.455  - default = 2.000
variable start_iter = 10.359  - default = 2.000
variable start_iter = 14.394  - default = 2.000
variable start_iter = 20.000  - default = 2.000

In [110]:

for variable in default.keys():
    values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
    
    fig, ax = plt.subplots(1, 2, figsize=figsize)
    for i, SE_l in enumerate(['SE_x', 'SE_y']):
        ax[i].plot(values, np.ones_like(SE_thr[variable][:, i]), 'k--')
        ax[i].plot(values, SE[variable][:, i], label='full')
        ax[i].plot(values, SE_thr[variable][:, i], '--', label='thr')
        ax[i].set_xlabel(variable)
        ax[i].set_ylabel(SE_l);
        fig.suptitle(variable)
        ax[i].set_xscale('log')
        ax[i].set_yscale('log')
    ax[0].legend(loc='best');

In [111]:

SE_ista = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
    x_noise = add_noise(x, noise_)
    events_pred_thr = ISTA(x_noise, envelope)
    SE_ista[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
    SE_ista[i_noise, 1] = SE_y(events, events_pred_thr)

In [112]:

SE_noise['ista']= SE_ista

fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].plot(noises, np.ones(len(noises)), 'k--')
    for method in ['full', 'thr', 'cg', 'ista']:
        ax[i].plot(noises, SE_noise[method][:, i], label=method)
    ax[i].set_xlabel('noise')
    ax[i].set_ylabel(SE_l)
    ax[i].set_xscale('log')
    ax[i].set_yscale('log')
ax[0].legend(loc='best');

FISTA: Fast ISTA¶

In [113]:

def FISTA(x_noise, envelope, rho_est=rho_est, N_iter=N_iter, stype='soft'):
    # intialization
    events_pred = events_pred_thr = events_pred_thr_ = invert(x_noise, envelope)
    t = 1
    # iterations
    iters = range(int(N_iter))
    for i_iter in iters:
        t_old = t
        t = .5 * (1 + np.sqrt(1 + 4*t**2))
        
        events_old = events_pred
        events_pred_thr_old = events_pred_thr_
        
        grad = model(envelope, x_noise) - model(envelope**2, events_pred_thr)
        #grad -= y_lambda_est * np.sign(events_pred_thr)
        events_pred += eta / (i_iter+1) * grad
        
    return events_pred

In [114]:

N_scan = 15
default = dict(N_iter=1000, rho_est=rho)
print('default =', default)

default = {'N_iter': 1000, 'rho_est': 0.001}

In [115]:

SE = {}
SE_thr = {}
for variable in default.keys():    
    SE[variable] = np.zeros((N_scan, 2))
    SE_thr[variable] = np.zeros((N_scan, 2))
    values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
    for i_value, value in enumerate(values):
        opts = default.copy()
        opts[variable] = value
        print('variable', variable, '=', '%.3f' % opts[variable], ' - default =', '%.3f' % default[variable])
        x_noise = add_noise(x, noise)
        events_pred = FISTA(x_noise, envelope, **opts)
       
        SE[variable][i_value, 0] = SE_x(x, envelope, events_pred)
        SE[variable][i_value, 1] = SE_y(events, events_pred)        
    
        threshold = np.quantile(np.absolute(events_pred).ravel(), 1-opts['rho_est'])
        events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))

        SE_thr[variable][i_value, 0] = SE_x(x, envelope, events_pred_thr)
        SE_thr[variable][i_value, 1] = SE_y(events, events_pred_thr)

variable N_iter = 100.000  - default = 1000.000
variable N_iter = 138.950  - default = 1000.000
variable N_iter = 193.070  - default = 1000.000
variable N_iter = 268.270  - default = 1000.000
variable N_iter = 372.759  - default = 1000.000
variable N_iter = 517.947  - default = 1000.000
variable N_iter = 719.686  - default = 1000.000
variable N_iter = 1000.000  - default = 1000.000
variable N_iter = 1389.495  - default = 1000.000
variable N_iter = 1930.698  - default = 1000.000
variable N_iter = 2682.696  - default = 1000.000
variable N_iter = 3727.594  - default = 1000.000
variable N_iter = 5179.475  - default = 1000.000
variable N_iter = 7196.857  - default = 1000.000
variable N_iter = 10000.000  - default = 1000.000
variable rho_est = 0.000  - default = 0.001
variable rho_est = 0.000  - default = 0.001
variable rho_est = 0.000  - default = 0.001
variable rho_est = 0.000  - default = 0.001
variable rho_est = 0.000  - default = 0.001
variable rho_est = 0.001  - default = 0.001
variable rho_est = 0.001  - default = 0.001
variable rho_est = 0.001  - default = 0.001
variable rho_est = 0.001  - default = 0.001
variable rho_est = 0.002  - default = 0.001
variable rho_est = 0.003  - default = 0.001
variable rho_est = 0.004  - default = 0.001
variable rho_est = 0.005  - default = 0.001
variable rho_est = 0.007  - default = 0.001
variable rho_est = 0.010  - default = 0.001

In [116]:

for variable in default.keys():
    values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
    
    fig, ax = plt.subplots(1, 2, figsize=figsize)
    for i, SE_l in enumerate(['SE_x', 'SE_y']):
        ax[i].plot(values, np.ones_like(SE_thr[variable][:, i]), 'k--')
        ax[i].plot(values, SE[variable][:, i], label='full')
        ax[i].plot(values, SE_thr[variable][:, i], '--', label='thr')
        ax[i].set_xlabel(variable)
        ax[i].set_ylabel(SE_l);
        fig.suptitle(variable)
        ax[i].set_xscale('log')
        ax[i].set_yscale('log')
    ax[0].legend(loc='best');

In [117]:

SE_fista = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
    x_noise = add_noise(x, noise_)
    events_pred_thr = FISTA(x_noise, envelope)
    SE_fista[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
    SE_fista[i_noise, 1] = SE_y(events, events_pred_thr)
SE_noise['fista']= SE_fista

In [118]:

fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].loglog(noises, np.ones_like(SE_noise[method][:, i]), 'k--')
    for method in ['full', 'thr', 'cg', 'ista', 'fista']:
        ax[i].plot(noises, SE_noise[method][:, i], label=method)
    ax[i].set_xlabel('noise')
    ax[i].set_ylabel(SE_l)
ax[0].legend(loc='best');

Checking the efficiency for different sparseness levels

In [119]:

#N_X, N_Y = 2**10, 2**10
#N_X, N_Y = 2**7, 2**7
#noise = .04
envelope = MC_env(N_X, N_Y)

SE_thr = np.zeros((N_scan_rho, 2))
SE_ista = np.zeros((N_scan_rho, 2))
SE_fista = np.zeros((N_scan_rho, 2))
rhos = np.logspace(-4, -2, N_scan_rho, base=10, endpoint=True)
for i_rho, rho_ in enumerate(rhos):
    events, x = generative_model(envelope, rho=rho_)

    x_noise = add_noise(x, noise)

    events_pred = invert(x_noise, envelope)
    threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_)
    events_pred_thr = events_pred  * ((events_pred < -threshold) + (events_pred > threshold))
    
    SE_thr[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
    SE_thr[i_noise, 1] = SE_y(events, events_pred_thr)     
    
    events_pred_thr = ISTA(x_noise, envelope, rho_est=rho_)

    SE_ista[i_rho, 0] = SE_x(x, envelope, events_pred_thr)
    SE_ista[i_rho, 1] = SE_y(events, events_pred_thr)     

    events_pred_thr = FISTA(x_noise, envelope, rho_est=rho_)

    SE_fista[i_rho, 0] = SE_x(x, envelope, events_pred_thr)
    SE_fista[i_rho, 1] = SE_y(events, events_pred_thr)

In [120]:

fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
    ax[i].loglog(rhos, np.ones_like(SE_thr[:, i]), 'k--')
    ax[i].loglog(rhos, SE_thr[:, i], label='Original')
    ax[i].loglog(rhos, SE_ista[:, i], '--', label='Noisy')
    ax[i].loglog(rhos, SE_fista[:, i], '--', label='FISTA')
    ax[i].set_xlabel('rho_est')
    ax[i].set_ylabel(SE_l);
ax[0].legend(loc='best');

there is obviously something wrong with ISTA / FISTA
TODO: learn enveloppe http://yann.lecun.com/exdb/publis/pdf/gregor-icml-10.pdf
TODO: Noisy case: correlated noise

some book keeping for the notebook¶

In [121]:

%reload_ext watermark
%watermark -i -h -m -v -p numpy,MotionClouds,matplotlib,scipy,imageio  -r -g -b

Python implementation: CPython
Python version       : 3.9.10
IPython version      : 7.27.0

numpy       : 1.22.3
MotionClouds: 20200212
matplotlib  : 3.5.1
scipy       : 1.6.0
imageio     : 2.16.1

Compiler    : Clang 13.0.0 (clang-1300.0.29.3)
OS          : Darwin
Release     : 21.3.0
Machine     : x86_64
Processor   : i386
CPU cores   : 36
Architecture: 64bit

Hostname: fortytwo

Git hash: 6afc708ac96955ef9573fafc11530bf1488d5ecb

Git repo: https://github.com/laurentperrinet/sciblog.git

Git branch: master