Sparse coding of large images
In this post we would try to show how one could infer the sparse representation of an image knowing an appropriate generative model for its synthesis. We will start by a linear inversion (pseudo-inverse deconvolution), and then move to a gradient descent algorithm. Finally, we will implement a convolutional version of the iterative shrinkage thresholding algorithm (ISTA) and its fast version, the FISTA.
For computational efficiency, all convolutions will be implemented by a Fast Fourier Tansform, so that a standard convolution will be mathematically exactly similar. We will benchmark this on a realistic image size of $512 \times 512$ giving some timing results on a standard laptop.
Let's first initialize the notebook:
from __future__ import division, print_function
import numpy as np
np.set_printoptions(precision=6, suppress=True)
import os
%matplotlib inline
#%config InlineBackend.figure_format='retina'
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
phi = (np.sqrt(5)+1)/2
fig_width = 15
figsize = (fig_width, fig_width/phi)
%load_ext autoreload
%autoreload 2
problem statement: sparse events in a static image¶
In the first case that we will study, images are produced by the combination of a single kernel with sparse events. Think of it like some brushstrokes on a canvas, somes drops of rain that fell on the surface of a pond...
For simplicity, the combination of these events will be considered linear. This is mostly true for instance in watercolor or in combining transparent surfaces and by ignoring saturation of the sensor for instance. To handle linearity, we will use linear algebra and matrices: We will denote as $x$ the (raveled) image and these "events" as $y$. The kernels that define the brushstrokes is denoted as $A$ and will be explicited below. Generally, in matrix form, the genrativve model of the image is written:
$$ x = A y $$
First, coefficients are drawn from a Laplace distribution and sparse events are generated by thresholding these coefficients. These will be convolved with a kernel defined using the MotionClouds
library. Let's first illustrate that for a small image size:
N_X, N_Y = 2**9, 2**9 # BENCHMARK
# N_X, N_Y = 2**8, 2**8 # DEBUG
rho = 1.e-3
sf_0 = .15
B_sf = .15
y_lambda = 1.61803
seed = 2020
rng = np.random.RandomState(seed)
events = y_lambda * rng.laplace(size=(N_X, N_Y))
events_max = np.max(np.abs(events))
fig, ax = plt.subplots(figsize=figsize)
ax.hist(events.ravel(), bins=np.linspace(-events_max, events_max, 100, endpoint=True))
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');
print('mean, std=', events.mean(), events.std())
mean, std= -0.008859802575012121 2.2900579569626447
From this continuous distribution of coefficients, we will zero out to achieve a desired sparsity (in the $\ell_0$ pseudoi-norm sense):
threshold = np.quantile(np.absolute(events).ravel(), 1-rho)
print('threshold=', threshold)
threshold= 11.432255842859089
events_thr = events * ((events < -threshold) + (events > threshold))
fig, ax = plt.subplots(figsize=figsize)
ax.hist(events_thr.ravel(), bins=np.linspace(-events_max, events_max, 100, endpoint=True))
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');
print('mean, std=', events_thr.mean(), events_thr.std())
mean, std= 0.00031693423294735503 0.4103409801780267
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
ax.imshow(events_thr, cmap=plt.viridis());
Let's now generate the image by convolving it with an isotropic kernel, as defined in the https://github.com/NeuralEnsemble/MotionClouds library:
import MotionClouds as mc
fx, fy, ft = mc.get_grids(N_X, N_Y, 1)
opts = dict(V_X=0., V_Y=0., B_V=0, B_theta=np.inf, sf_0=sf_0, B_sf=B_sf)
envelope = mc.envelope_gabor(fx, fy, ft, **opts).squeeze()
#env_smooth = mc.retina(fx, fy, ft, df=0.07)[:, :, 0]
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
#ax.imshow(envelope[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.plasma());
ax.imshow(envelope, cmap=plt.plasma());
(This shape reminds me something... DOOH!)
Then we use the Fourier transform to actually perform the convolution (as implemented in the Motion Clouds library):
x = mc.random_cloud(envelope[:, :, None], events=events_thr[:, :, None])
x = x.reshape((N_X, N_Y))
print('x.shape=', x.shape)
x.shape= (512, 512)
This has all the nice properties of the 2D Discrete Fourier Transform, see for example this textbook (in French).
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(x).max()
ax.imshow(x, cmap=plt.gray(), vmin=-vmax, vmax=vmax);
x_max = np.max(np.abs(x))
fig, ax = plt.subplots(figsize=figsize)
ax.hist(x.ravel(), bins=np.linspace(-x_max, x_max, 100, endpoint=True))
ax.set_xlabel('luminance')
ax.set_yscale('log')
ax.set_ylabel('#');
Note: one could have also used https://en.wikipedia.org/wiki/Shot_noise
All in one function:
def MC_env(N_X, N_Y, opts=dict(V_X=0., V_Y=0., B_V=0, B_theta=np.inf, sf_0=sf_0, B_sf=B_sf, alpha=0), do_norm=True, verbose=False):
fx, fy, ft = mc.get_grids(N_X, N_Y, 1)
envelope = mc.envelope_gabor(fx, fy, ft, **opts).squeeze()
envelope /= envelope.max()
if do_norm: envelope /= np.sqrt((envelope**2).sum())
if verbose: print('(envelope**2).sum()=', (envelope**2).sum())
return envelope
envelope = MC_env(N_X, N_Y, verbose=True)
(envelope**2).sum()= 1.0000000000000002
%%timeit
envelope = MC_env(N_X, N_Y)
13.5 ms ± 398 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
def random_cloud(envelope, events):
(N_X, N_Y) = envelope.shape
#fx, fy, ft = mc.get_grids(N_X, N_Y, N_frame)
F_events = np.fft.fftn(events)
F_events = np.fft.fftshift(F_events)
Fz = F_events * envelope
# de-centering the spectrum
Fz = np.fft.ifftshift(Fz)
#Fz[0, 0, 0] = 0. # removing the DC component
z = np.fft.ifftn(Fz).real
return z
%%timeit
random_cloud(envelope, events_thr)
15 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
def model(envelope, events, verbose=False):
if verbose: print('envelope.shape = ', envelope.shape)
if verbose: print('events.shape = ', events.shape)
N_X, N_Y = envelope.shape
x = random_cloud(envelope, events=events)
#x = x.reshape((N_X, N_Y))
if verbose: print('x.shape=', x.shape)
return x
x = model(envelope, events_thr, verbose=True)
envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512)
%%timeit
model(envelope, events_thr)
14.7 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Merging both functions:
def model(envelope, events, verbose=False):
if verbose: print('envelope.shape = ', envelope.shape)
if verbose: print('events.shape = ', events.shape)
N_X, N_Y = envelope.shape
F_events = np.fft.fftn(events)
F_events = np.fft.fftshift(F_events)
Fz = F_events * envelope
Fz = np.fft.ifftshift(Fz)
x = np.fft.ifftn(Fz).real
if verbose: print('x.shape=', x.shape)
return x
x = model(envelope, events_thr, verbose=True)
envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512)
%%timeit
model(envelope, events_thr)
14.9 ms ± 433 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
def generative_model(envelope, rho=rho, y_lambda=y_lambda, seed=seed, verbose=False):
N_X, N_Y = envelope.shape
if verbose: print('N_X, N_Y = ', envelope.shape)
rng = np.random.RandomState(seed)
events = y_lambda * rng.laplace(size=(N_X, N_Y))
threshold = np.quantile(np.absolute(events).ravel(), 1.-rho)
events = events * ((events < -threshold) + (events > threshold))
x = model(envelope, events, verbose=verbose)
return events, x
events, x = generative_model(envelope, verbose=True)
N_X, N_Y = (512, 512) envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512)
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(x).max()
ax.imshow(x, cmap=plt.gray(), vmin=-vmax, vmax=vmax);
envelope = MC_env(N_X, N_Y)
%%timeit
events, x = generative_model(envelope)
25.9 ms ± 591 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Now, how do I retrieve the events knowing x
? This is a form of deconvolution:
Deconvolution in the noiseless case: invertibility of the image to retrieve events¶
It's relatively easy to use the shape of the enveloppe to retrieve the events:
eps = 2.e-6
thr = 5.e-2
def deconv(envelope, eps=eps, thr=thr, do_norm=False):
# mask = 1 / ( 1 + np.exp( -(envelope/envelope.max() - thr) / eps ) )# coefficients near zero
fr = mc.frequency_radius(fx, fy, ft).squeeze()
mask = 1 / ( 1 + np.exp( (fr - .45) / .025 ) ) # high frequency
mask *= 1 / ( 1 + np.exp( -(fr - thr) / .01 ) )# low frequency
F_deconv = envelope / (envelope**2 + eps*(1-mask)) # avoid division by zero
F_deconv *= mask
if do_norm: F_deconv /= np.sqrt((F_deconv**2).sum())
return F_deconv / ((envelope*F_deconv)).mean()
F_deconv = deconv(envelope, eps=eps, thr=thr)
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
#cmap = ax.imshow(F_deconv[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.plasma())
cmap = ax.imshow(F_deconv, cmap=plt.plasma())
plt.colorbar(cmap, shrink=.8);
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
cmap = ax.imshow(envelope*F_deconv, cmap=plt.plasma())
plt.colorbar(cmap, shrink=.8);
# generate
envelope = MC_env(N_X, N_Y)
events = np.zeros((N_X, N_Y))
events[N_X//2, N_Y//2] = 1.
impulse = model(envelope, events)
# invert
Fx = np.fft.fft2(impulse)
F_deconv = deconv(envelope, eps=eps, thr=thr)
# applying the filter
F_events = Fx*np.fft.ifftshift(F_deconv)
# de-centering the spectrum
impulse_pred = np.fft.ifft2(F_events).real
print('impulse_pred.max()', impulse_pred.max(), 'impulse_pred[N_X//2, N_Y//2]', impulse_pred[N_X//2, N_Y//2])
impulse_pred.max() 1.0 impulse_pred[N_X//2, N_Y//2] 1.0
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width))
axs[0].imshow(impulse, cmap=plt.gray())
axs[1].imshow(impulse_pred, cmap=plt.viridis());
def invert(x, envelope, eps=eps, thr=thr):
Fx = np.fft.fft2(x)
F_deconv = deconv(envelope, eps=eps, thr=thr)
# applying the filter
# de-centering the spectrum
F_events = Fx*np.fft.ifftshift(F_deconv)
events = np.fft.ifft2(F_events).real
return events
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
events_pred = invert(x, envelope)
N_X, N_Y = (512, 512) envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512)
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width))
axs[0].imshow(events, cmap=plt.viridis())
axs[1].imshow(events_pred, cmap=plt.viridis());
fig, ax = plt.subplots(figsize=figsize)
counts, bins, _ = ax.hist(events.ravel(), fc='b', bins=100, alpha=.5)
counts, bins, _ = ax.hist(events_pred.ravel(), fc='r', bins=bins, alpha=.5)
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');
Some statistics to show the efficiency of this method:
print('mean, std=', events.mean(), events.std())
mean, std= 0.00031693423294735503 0.4103409801780267
print('mean, std=', events_pred.mean(), events_pred.std())
mean, std= 8.131516293641283e-20 0.5460662014027519
#print('enveloppe energy=', np.sqrt(((envelope)**2).mean()) )
def SE_y(events, events_pred):
return ((events-events_pred)**2).sum()/(events**2).sum()
print('relative Squared error on coefficients: SE_y=', SE_y(events, events_pred))
def L1_y(events, events_pred):
return np.abs(events-events_pred).sum()/np.abs(events).sum()
print('relative L1 error=', L1_y(events, events_pred))
#print('energy rec=', ((events_pred)**2).sum())
#print('energy input=', ((events.squeeze())**2).sum())
#print('ratio energy=', ((events_pred)**2).sum()/(events**2).sum())
#print('Squared ratio energy=', np.sqrt((events**2).sum()/(events_pred**2).sum()))
relative Squared error on coefficients: SE_y= 0.7709950804764737 relative L1 error= 5.898850045732038
def SE_x(x, envelope, events_pred):
return ((x-model(envelope, events_pred))**2).sum()/(x**2).sum()
print('relative Squared image error=', SE_x(x, envelope, events_pred))
relative Squared image error= 0.8535097755381738
Reconstruction is perfect up to the machine's numerical precision. This is true because the problem is invertible in that instance and that the invert
function implements a generalization of the pseudo-inverse $A^+ = (A^T A )^{-1} A$ of the linear transform $x = A y$ generalized to the convolutional case defined in the model
function.
Note that a FFT and convolution operations are linear operators, follonwing the fact that A
is a Toeplitz matrix derived from the kernel we used.
This method is fast:
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope)
%%timeit
events_pred = invert(x, envelope)
24.5 ms ± 1.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Instead of using the standard fft
as in
%timeit np.fft.fft2(x)
6.44 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
... one could use the knowledge that the input signal is real:
%timeit np.fft.rfft2(x)
3.98 ms ± 156 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Specifically, it is necessary to reshape the enveloppe in frequency space, see https://docs.scipy.org/doc/numpy/reference/generated/numpy.fft.rfft.html :
np.fft.rfft2(x).shape
(512, 257)
such that finally:
def invert_r(x, envelope, eps=eps, thr=thr):
Fx = np.fft.rfft2(x)
F_deconv = deconv(envelope, eps=eps, thr=thr)
F_deconv = np.fft.ifftshift(F_deconv)[:, :(envelope.shape[1]//2+1):]
# applying the filter
# de-centering the spectrum
F_events = Fx*F_deconv
events = np.fft.irfft2(F_events).real
return events
#mc.N_X, mc.N_Y = 2**7, 2**7
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
events_pred = invert_r(x, envelope)
print('relative L1 error=', L1_y(events, events_pred))
print('relative Squared error=', SE_y(events, events_pred))
print('relative Squared image error=', SE_x(x, envelope, events_pred))
N_X, N_Y = (512, 512) envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512) relative L1 error= 5.898850045732038 relative Squared error= 0.7709950804764737 relative Squared image error= 0.8535097755381738
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width))
axs[0].imshow(events, cmap=plt.viridis())
axs[1].imshow(events_pred, cmap=plt.viridis());
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
N_X, N_Y = (512, 512) envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512)
%%timeit
events_pred = invert_r(x, envelope)
17.8 ms ± 1.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Deconvolution in the noisy case (colored noise): (partial) failure to retrieve sparse coefficients¶
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
N_X, N_Y = (512, 512) envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512)
def add_noise(x, noise, B_sf=.5, do_white=False):
if do_white:
im_noise = rng.normal(size=x.shape)
else:
env_noise = mc.envelope_gabor(fx, fy, ft, B_theta=np.inf, sf_0=sf_0, B_sf=B_sf)
im_noise = mc.random_cloud(env_noise).squeeze()
im_noise /= im_noise.std()
x_noise = x + noise * x.std() * im_noise
return x_noise
noise = .5
x_noise = add_noise(x, noise)
fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(x_noise).max()
ax.imshow(x_noise, cmap=plt.gray(), vmin=-vmax, vmax=vmax);
x_noise = add_noise(x, noise)
events_pred = invert(x_noise, envelope)
print('relative L1 error=', L1_y(events, events_pred))
print('relative Squared error=', SE_y(events, events_pred))
print('relative Squared image error=', SE_x(x, envelope, events_pred))
relative L1 error= 21.712250715957364 relative Squared error= 1.2835350862888342 relative Squared image error= 1.518171234489418
Error as a function of noise level:
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
N_X, N_Y = (512, 512) envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512)
N_scan = 15
SE = np.zeros((N_scan, 2))
noises = np.logspace(-2, .5, N_scan, base=10, endpoint=True)
for i_noise, noise_ in enumerate(noises):
x_noise = add_noise(x, noise_)
events_pred = invert(x_noise, envelope)
SE[i_noise, 0] = SE_x(x, envelope, events_pred)
SE[i_noise, 1] = SE_y(events, events_pred)
# recording for future comparisons
SE_noise = {'full': SE}
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].loglog(noises, np.ones_like(SE[:, i]), 'k--')
ax[i].loglog(noises, SE[:, i], label='full')
ax[i].set_xlabel('noise')
ax[i].set_ylabel(SE_l);
ax[0].legend(loc='lower right');
x_noise = add_noise(x, noise)
events_pred = invert(x_noise, envelope)
print('relative L1 error=', L1_y(events, events_pred))
print('relative Squared error=', SE_y(events, events_pred))
print('relative Squared image error=', SE_x(x, envelope, events_pred))
relative L1 error= 21.62688356397757 relative Squared error= 1.2790255656451477 relative Squared image error= 1.5243018532816033
Even for this simple model, the error grows too fast wrt to noise. We need to find a solution for handling this. A first solution comes from inspecting the histogram of events' coefficients:
fig, ax = plt.subplots(figsize=figsize)
counts, bins, _ = ax.hist(events.ravel(), fc='b', bins=100, alpha=.5)
counts, bins, _ = ax.hist(events_pred.ravel(), fc='r', bins=bins, alpha=.5)
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');
Konwing the sparsity of the input, one could select only the proportion of coefficients which are big enough and remove the other ones:
threshold = np.quantile(np.abs(events_pred).ravel(), 1-rho)
print('threshold=', threshold)
threshold= 10.14246437921788
threshold_minus = np.quantile(events_pred.ravel(), rho/2)
threshold_plus = np.quantile(events_pred.ravel(), 1.-rho/2)
print('threshold_minus=', threshold_minus, 'threshold_plus=', threshold_plus)
threshold_minus= -6.99502093139905 threshold_plus= 11.1569027723831
events_pred_thr = events_pred * ((events_pred < threshold_minus) + (events_pred > threshold_plus))
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
relative L1 error (full)= 21.62688356397757 relative Squared error (full)= 1.2790255656451477 relative L1 error (thr)= 0.038263579961269924 relative Squared error (thr)= 0.01437538545902251
fig, ax = plt.subplots(figsize=figsize)
counts, bins, _ = ax.hist(events.ravel(), fc='b', bins=100, alpha=.5)
counts, bins, _ = ax.hist(events_pred_thr.ravel(), fc='r', bins=bins, alpha=.5)
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))
relative Squared image error (full)= 1.5243018532816033 relative Squared image error (thr)= 0.01693486843514906
def L0_y(events, events_pred):
error = ((1- (events==0)) * (events_pred==0)).sum() # Type I error : false positive
error += ((1- (events_pred==0)) * (events==0)).sum() # Type II error : false negative
return error / N_X / N_Y #/ (1- (events==0)).sum()
print('relative L0 error=', L0_y(events, events_pred))
relative L0 error= 0.9989967346191406
Doing so globally, we get
threshold = np.quantile(np.absolute(events_pred).ravel(), 1.-rho)
print('threshold=', threshold)
threshold= 10.14246437921788
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))
relative L1 error (thr)= 0.019930734264362066 relative Squared error (thr)= 0.0006575179777211669 relative Squared image error (thr)= 0.0006386922129840186
The result is worst in the image reconstruction: Indeed, the coefficients were found using the least-squared method for which the pseudo-inverse is the optimal solution. Deviating from this solution increases reconstruction error in the image domain.
Yet, it is better for the extraction of the sparse coefficients as we zero-ed out some of which we classified using our knowledge of the sparseness of the vector.
Yet, a problem is that we do not know a priori the sparseness. Let's see how this changes with different estimates, either by knowing the original or noisy images:
N_scan_rho = 45
x_noise = add_noise(x, noise)
events_pred = invert(x, envelope)
events_pred_noise = invert(x_noise, envelope)
SE_thr = np.zeros((N_scan_rho, 2))
SE_noisy = np.zeros((N_scan_rho, 2))
rhos = np.logspace(-5, -2, N_scan_rho, base=10, endpoint=True)
for i_rho, rho_ in enumerate(rhos):
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_)
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
SE_thr[i_rho, 0] = SE_x(x, envelope, events_pred_thr)
SE_thr[i_rho, 1] = SE_y(events_pred, events_pred_thr)
threshold = np.quantile(np.absolute(events_pred_noise).ravel(), 1-rho_)
events_pred_thr = events_pred_noise * ((events_pred_noise < -threshold) + (events_pred_noise > threshold))
SE_noisy[i_rho, 0] = SE_x(x_noise, envelope, events_pred_thr)
SE_noisy[i_rho, 1] = SE_y(events_pred_noise, events_pred_thr)
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].loglog(rhos, np.ones_like(SE_thr[:, i]), 'k--')
ax[i].loglog(rhos, SE_thr[:, i], label='Original')
ax[i].loglog(rhos, SE_noisy[:, i], '--', label='Noisy')
ax[i].set_xlabel('rho_est')
ax[i].set_ylabel(SE_l);
ax[0].legend(loc='best');
Now assuming that we have an estimate of rho
, let's see how this method compares to the previous one ('full') for varying levels of noise:
rho_est = rho
SE_thr = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
x_noise = add_noise(x, noise_)
events_pred = invert(x_noise, envelope)
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
SE_thr[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
SE_thr[i_noise, 1] = SE_y(events, events_pred_thr)
SE_noise['thr']= SE_thr
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].plot(noises, np.ones(len(noises)), 'k--')
for method in ['full', 'thr']:
ax[i].plot(noises, SE_noise[method][:, i], label=method)
ax[i].set_xlabel('noise')
ax[i].set_ylabel(SE_l)
ax[i].set_xscale('log')
ax[i].set_yscale('log')
ax[0].legend(loc='best');
Thresholding coefficients gives a much better estimate of coefficients.
If we know the level of noise, one can also use Wiener filtering:
sigma_noise = noise
envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
events_pred = invert(x, envelope, eps=sigma_noise * x.std())
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
N_X, N_Y = (512, 512) envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512) relative L1 error (thr)= 0.21189481957141518 relative Squared error (thr)= 0.1132681320687763 relative Squared image error (full)= 4.484507063662472
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width))
axs[0].imshow(events, cmap=plt.viridis())
axs[1].imshow(events_pred, cmap=plt.viridis());
fig, ax = plt.subplots(figsize=figsize)
counts, bins, _ = ax.hist(events.ravel(), fc='b', bins=100, alpha=.5)
counts, bins, _ = ax.hist(events_pred.ravel(), fc='r', bins=bins, alpha=.5)
ax.set_xlabel('coefficients')
ax.set_yscale('log')
ax.set_ylabel('#');
rho_est = rho
SE_thr = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
x_noise = add_noise(x, noise_)
events_pred = invert(x_noise, envelope, eps=noise_ * x.std())
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
SE_thr[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
SE_thr[i_noise, 1] = SE_y(events, events_pred_thr)
SE_noise['wie']= SE_thr
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].plot(noises, np.ones(len(noises)), 'k--')
for method in ['full', 'thr', 'wie']:
ax[i].plot(noises, SE_noise[method][:, i], label=method)
ax[i].set_xlabel('noise')
ax[i].set_ylabel(SE_l)
ax[i].set_xscale('log')
ax[i].set_yscale('log')
ax[0].legend(loc='best');
Could we repeat this procedure more times?
A first solution is to define a global cost as the LASSO problem :
$$ \mathcal{C} = .5 \cdot \| x - A y \|^2 + \lambda \| y \|_1 $$
And to apply a gradient descent technique on this cost:
conjugate gradient descent algorithm¶
setting up variables:
envelope = MC_env(N_X, N_Y)
events = np.zeros((N_X, N_Y))
events[N_X//2, N_Y//2] = 1
impulse = model(envelope, events)
ATA = model(envelope, impulse)
ATA_pred = model(envelope**2, events)
print(np.absolute(ATA_pred).max(), np.absolute(ATA).max())
3.814697265625001e-06 3.814697265625e-06
assert(((ATA_pred-ATA)**2).sum() < 1.e-16)
#fig, ax = plt.subplots(figsize=(fig_width, fig_width))
vmax = np.absolute(impulse).max()
vmax = np.absolute(ATA).max()
#ax.imshow(impulse, cmap=plt.gray(), vmin=-vmax, vmax=vmax);
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width/phi))
axs[0].imshow(ATA[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.gray(), vmin=-vmax, vmax=vmax);
axs[1].imshow(ATA_pred[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.gray(), vmin=-vmax, vmax=vmax);
print('impulse.max()', impulse.max())
print('envelope.mean()', envelope.mean())
print('ATA.max()', ATA.max())
print('(envelope**2).mean()', (envelope**2).mean())
impulse.max() 0.0012951797917240277 envelope.mean() 0.0012951797917240284 ATA.max() 3.814697265625e-06 (envelope**2).mean() 3.814697265625001e-06
# envelope = MC_env(N_X, N_Y)
events, x = generative_model(envelope, verbose=True)
x_noise = add_noise(x, noise)
N_X, N_Y = (512, 512) envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512)
fig, axs = plt.subplots(1, 2, figsize=(fig_width, fig_width/phi))
#axs[0].imshow(envelope[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)], cmap=plt.plasma());
#axs[1].imshow(envelope[(N_X//2-N_X//4):(N_X//2+N_X//4), (N_Y//2-N_Y//4):(N_Y//2+N_Y//4)]**2, cmap=plt.plasma());
axs[0].imshow(envelope, cmap=plt.plasma());
axs[1].imshow(envelope**2, cmap=plt.plasma());
Let's project the image
events_pred = model(envelope, x_noise, verbose=True) #/ 26.80927 # / np.sqrt(((envelope)**2).mean())
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))
envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512) relative Squared image error (full)= 0.9999602187033221 relative L1 error (full)= 1.000140303243882 relative Squared error (full)= 0.9999923618964663
rho_est = rho
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred_thr * ((events_pred_thr < -threshold) + (events_pred_thr > threshold))
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))
relative L1 error (thr)= 0.5828322420932677 relative Squared error (thr)= 0.48138052189460406 relative Squared image error (thr)= 0.6334377713182282
Other stats for introspection:
print('Mean enveloppe energy=', (envelope**2).mean())
print('Squared enveloppe energy=', np.sqrt((envelope**2).mean()) )
print('relative Squared error=',SE_y(events, events_pred_thr))
print('energy rec=', ((events_pred)**2).sum())
print('energy input=', ((events)**2).sum())
print('ratio energy=', ((events_pred)**2).sum()/(events**2).sum())
print('Squared ratio energy=', np.sqrt((events_pred**2).sum()/(events**2).sum()))
Mean enveloppe energy= 3.814697265625001e-06 Squared enveloppe energy= 0.001953125 relative Squared error= 0.48138052189460406 energy rec= 4.00839680485629e-06 energy input= 44139.75965486855 ratio energy= 9.081147781950303e-11 Squared ratio energy= 9.529505644024931e-06
events, x = generative_model(envelope, seed=42)
x_noise = add_noise(x, noise)
events_pred = invert(x_noise, envelope)
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
error = x_noise - model(envelope, events_pred)
grad = model(envelope, error)
grad_pred = model(envelope, x_noise) - model(envelope**2, events_pred)
assert(((grad-grad_pred)**2).sum() < 1.e-16)
We will include information about sparsity by modifying the gradient knowing the LASSO formulation (see https://blogs.princeton.edu/imabandit/2013/04/09/orf523-conditional-gradient-descent-and-structured-sparsity/ ) :
y_lambda_est = y_lambda
grad -= y_lambda_est * np.sign(events_pred)
N_iter = 1000
eta = .01
start_iter = 10
rho_est = rho
y_lambda_est = 1. # y_lambda
N_scan = 15
default = dict(N_iter=N_iter, eta=eta, start_iter=start_iter, rho_est=rho_est,
y_lambda_est=y_lambda_est)
print('default =', default)
default = {'N_iter': 1000, 'eta': 0.01, 'start_iter': 10, 'rho_est': 0.001, 'y_lambda_est': 1.0}
iters = range(N_iter)
SE_iter = np.zeros((N_iter, 2))
SE_thr = np.zeros((N_iter, 2))
# intialization
norm_enveloppe = 1# np.sqrt((envelope**2).mean())
#events_pred = 0. * model(envelope, x_noise) / norm_enveloppe
events_pred = invert(x_noise, envelope)
# threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
# events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
# events_pred = events_pred_thr #np.zeros_like(x_noise)
# iterations
for i_iter in iters:
# predicting
grad = model(envelope, x_noise) - model(envelope**2, events_pred)
grad -= y_lambda_est * np.sign(events_pred)
events_pred += eta / (i_iter+start_iter) * grad
# measuring efficiency
SE_iter[i_iter, 0] = SE_x(x, envelope, events_pred)
SE_iter[i_iter, 1] = SE_y(events, events_pred)
# thresholding
threshold = np.quantile(np.absolute(events_pred).ravel(), 1.-rho_est)
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
SE_thr[i_iter, 0] = SE_x(x, envelope, events_pred_thr)
SE_thr[i_iter, 1] = SE_y(events, events_pred_thr)
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].plot(iters, SE_iter[:, i], label='full')
ax[i].plot(iters, SE_thr[:, i], '--', label='thr')
ax[i].set_xlabel('iterations')
ax[i].set_ylabel(SE_l)
#ax[i].set_xscale('log')
ax[i].set_yscale('log')
ax[0].legend(loc='best');
Let's now test this function over its different parameters:
events, x = generative_model(envelope, seed=42)
x_noise = add_noise(x, noise)
def cg(x_noise, envelope,
N_iter=N_iter, eta=eta, start_iter=start_iter, rho_est=rho_est,
y_lambda_est=y_lambda_est):
#events_pred = np.zeros_like(x_noise)
events_pred = invert(x_noise, envelope)
#threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
#events_pred *= ((events_pred < -threshold) + (events_pred > threshold))
iters = range(int(N_iter))
# iterations
for i_iter in iters:
# predicting
grad = model(envelope, x_noise) - model(envelope**2, events_pred)
grad -= y_lambda_est * np.sign(events_pred)
events_pred += eta/(i_iter+int(start_iter)) * grad
return events_pred
Testing with and without the sparsity:
events_pred = cg(x_noise, envelope, y_lambda_est=y_lambda_est)
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))
relative L1 error (full)= 18.500830950464188 relative Squared error (full)= 1.1431711547236103 relative Squared image error (full)= 1.3258956031599767 relative L1 error (thr)= 0.020796171513273086 relative Squared error (thr)= 0.0017684529114802398 relative Squared image error (thr)= 0.002634544977301539
events_pred = cg(x_noise, envelope, y_lambda_est=0)
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
print('relative L1 error (thr)=', L1_y(events, events_pred_thr))
print('relative Squared error (thr)=', SE_y(events, events_pred_thr))
print('relative Squared image error (thr)=', SE_x(x, envelope, events_pred_thr))
relative L1 error (full)= 21.884097998802357 relative Squared error (full)= 1.2864606042875937 relative Squared image error (full)= 1.5113587512109732 relative L1 error (thr)= 0.020353119402202434 relative Squared error (thr)= 0.0017621233158293223 relative Squared image error (thr)= 0.002646032322600538
noise
0.5
SE = {}
SE_thr = {}
for variable in default.keys():
SE[variable] = np.zeros((N_scan, 2))
SE_thr[variable] = np.zeros((N_scan, 2))
values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
for i_value, value in enumerate(values):
opts = default.copy()
opts[variable] = value
print('variable', variable, '=', '%.4f' % opts[variable], ' - default =', '%.4f' % default[variable])
x_noise = add_noise(x, noise)
events_pred = cg(x_noise, envelope, **opts)
SE[variable][i_value, 0] = SE_x(x, envelope, events_pred)
SE[variable][i_value, 1] = SE_y(events, events_pred)
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-opts['rho_est'])
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
SE_thr[variable][i_value, 0] = SE_x(x, envelope, events_pred_thr)
SE_thr[variable][i_value, 1] = SE_y(events, events_pred_thr)
variable N_iter = 100.0000 - default = 1000.0000 variable N_iter = 138.9495 - default = 1000.0000 variable N_iter = 193.0698 - default = 1000.0000 variable N_iter = 268.2696 - default = 1000.0000 variable N_iter = 372.7594 - default = 1000.0000 variable N_iter = 517.9475 - default = 1000.0000 variable N_iter = 719.6857 - default = 1000.0000 variable N_iter = 1000.0000 - default = 1000.0000 variable N_iter = 1389.4955 - default = 1000.0000 variable N_iter = 1930.6977 - default = 1000.0000 variable N_iter = 2682.6958 - default = 1000.0000 variable N_iter = 3727.5937 - default = 1000.0000 variable N_iter = 5179.4747 - default = 1000.0000 variable N_iter = 7196.8567 - default = 1000.0000 variable N_iter = 10000.0000 - default = 1000.0000 variable eta = 0.0010 - default = 0.0100 variable eta = 0.0014 - default = 0.0100 variable eta = 0.0019 - default = 0.0100 variable eta = 0.0027 - default = 0.0100 variable eta = 0.0037 - default = 0.0100 variable eta = 0.0052 - default = 0.0100 variable eta = 0.0072 - default = 0.0100 variable eta = 0.0100 - default = 0.0100 variable eta = 0.0139 - default = 0.0100 variable eta = 0.0193 - default = 0.0100 variable eta = 0.0268 - default = 0.0100 variable eta = 0.0373 - default = 0.0100 variable eta = 0.0518 - default = 0.0100 variable eta = 0.0720 - default = 0.0100 variable eta = 0.1000 - default = 0.0100 variable start_iter = 1.0000 - default = 10.0000 variable start_iter = 1.3895 - default = 10.0000 variable start_iter = 1.9307 - default = 10.0000 variable start_iter = 2.6827 - default = 10.0000 variable start_iter = 3.7276 - default = 10.0000 variable start_iter = 5.1795 - default = 10.0000 variable start_iter = 7.1969 - default = 10.0000 variable start_iter = 10.0000 - default = 10.0000 variable start_iter = 13.8950 - default = 10.0000 variable start_iter = 19.3070 - default = 10.0000 variable start_iter = 26.8270 - default = 10.0000 variable start_iter = 37.2759 - default = 10.0000 variable start_iter = 51.7947 - default = 10.0000 variable start_iter = 71.9686 - default = 10.0000 variable start_iter = 100.0000 - default = 10.0000 variable rho_est = 0.0001 - default = 0.0010 variable rho_est = 0.0001 - default = 0.0010 variable rho_est = 0.0002 - default = 0.0010 variable rho_est = 0.0003 - default = 0.0010 variable rho_est = 0.0004 - default = 0.0010 variable rho_est = 0.0005 - default = 0.0010 variable rho_est = 0.0007 - default = 0.0010 variable rho_est = 0.0010 - default = 0.0010 variable rho_est = 0.0014 - default = 0.0010 variable rho_est = 0.0019 - default = 0.0010 variable rho_est = 0.0027 - default = 0.0010 variable rho_est = 0.0037 - default = 0.0010 variable rho_est = 0.0052 - default = 0.0010 variable rho_est = 0.0072 - default = 0.0010 variable rho_est = 0.0100 - default = 0.0010 variable y_lambda_est = 0.1000 - default = 1.0000 variable y_lambda_est = 0.1389 - default = 1.0000 variable y_lambda_est = 0.1931 - default = 1.0000 variable y_lambda_est = 0.2683 - default = 1.0000 variable y_lambda_est = 0.3728 - default = 1.0000 variable y_lambda_est = 0.5179 - default = 1.0000 variable y_lambda_est = 0.7197 - default = 1.0000 variable y_lambda_est = 1.0000 - default = 1.0000 variable y_lambda_est = 1.3895 - default = 1.0000 variable y_lambda_est = 1.9307 - default = 1.0000 variable y_lambda_est = 2.6827 - default = 1.0000 variable y_lambda_est = 3.7276 - default = 1.0000 variable y_lambda_est = 5.1795 - default = 1.0000 variable y_lambda_est = 7.1969 - default = 1.0000 variable y_lambda_est = 10.0000 - default = 1.0000
for variable in default.keys():
values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].plot(values, np.ones_like(SE_thr[variable][:, i]), 'k--')
ax[i].plot(values, SE[variable][:, i], label='full')
ax[i].plot(values, SE_thr[variable][:, i], '--', label='thr')
ax[i].set_xlabel(variable)
ax[i].set_ylabel(SE_l);
fig.suptitle(variable)
ax[i].set_xscale('log')
ax[i].set_yscale('log')
ax[0].legend(loc='best');
And now for different levels of noise:
rho_est = rho
SE = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
x_noise = add_noise(x, noise_)
events_pred = cg(x_noise, envelope)
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
SE[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
SE[i_noise, 1] = SE_y(events, events_pred_thr)
SE_noise['cg'] = SE
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].plot(noises, np.ones(len(noises)), 'k--')
ax[i].plot(noises, SE[:, i], 'g--')
for method in ['full', 'thr', 'cg']:
ax[i].plot(noises, SE_noise[method][:, i], label=method)
ax[i].set_xlabel('noise')
ax[i].set_ylabel(SE_l)
ax[i].set_xscale('log')
ax[i].set_yscale('log')
ax[0].legend(loc='best');
It converges to a better solution as the simple thresholding, but yet the convergence is slow. Another solution is deployed in iterative thresholding.
ISTA: iterative shrinkage-thresholding algorithm¶
Some nice pointers about ISTA:
- the FISTA paper_files/Breck_2009.pdf)
- lectures on FISTA http://stat.cmu.edu/~ryantibs/convexopt/lectures/prox-grad.pdf http://www.seas.ucla.edu/~vandenbe/236C/lectures/fista.pdf
- https://blogs.princeton.edu/imabandit/2013/04/11/orf523-ista-and-fista/
An implementation in matlab: https://github.com/seunghwanyoo/ista_lasso - and one in python on a toy problem: https://gist.github.com/agramfort/ac52a57dc6551138e89b
As we have seen we can invert events from a noisy image, but the error is big...
env = MC_env(N_X, N_Y)
events, x = generative_model(env, verbose=True)
x_noise = add_noise(x, noise)
events_pred = invert(x_noise, envelope)
print('relative Squared image error (full)=', SE_x(x, envelope, events_pred))
print('relative L1 error (full)=', L1_y(events, events_pred))
print('relative Squared error (full)=', SE_y(events, events_pred))
N_X, N_Y = (512, 512) envelope.shape = (512, 512) events.shape = (512, 512) x.shape= (512, 512) relative Squared image error (full)= 1.528970399238033 relative L1 error (full)= 21.828188780275898 relative Squared error (full)= 1.2919741704512568
One idea is to the same procedure iteratively, as we know that the events matrix is sparse:
def shrinkage(events_pred, threshold, stype='soft'):
if stype=='hard':
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
else:
events_pred_thr = np.sign(events_pred) * np.maximum(np.abs(events_pred)-threshold, 0.)
return events_pred_thr
fig, ax = plt.subplots(figsize=figsize)
events_pred = np.linspace(-1, 1, 200, endpoint=True)
for stype in ['hard', 'soft']:
ax.plot(events_pred, shrinkage(events_pred, threshold=.25, stype=stype), label=stype, alpha=.5)
ax.legend(loc='best')
ax.set_xlabel('coefficient')
ax.set_ylabel('shrinked coefficient');
N_iter = 2000
eta = .005
y_lambda_est = 4.
rho_est = rho
start_iter = 2
N_scan = 15
default = dict(N_iter=N_iter, rho_est=rho_est, y_lambda_est=y_lambda_est, eta=eta, start_iter=start_iter)
print('default =', default)
default = {'N_iter': 2000, 'rho_est': 0.001, 'y_lambda_est': 4.0, 'eta': 0.005, 'start_iter': 2}
iters = range(N_iter)
SE_iter = np.zeros((N_iter, 2, 2))
SE_thr = np.zeros((N_iter, 2, 2))
for i_type, stype in enumerate(['soft', 'hard']):
if True:
# intialization - with the denoising solution:
events_pred = invert(x_noise, envelope)
# thresholding
#threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
#events_pred = shrinkage(events_pred, threshold=threshold, stype='hard')
else:
events_pred = np.zeros_like(x_noise)
AT_x_noise = model(envelope, x_noise)
envelope2 = envelope**2
# iterations
for i_iter in iters:
# predicting
grad = AT_x_noise - model(envelope**2, events_pred)
#grad = model(envelope, x_noise - model(envelope, events_pred))
#grad -= y_lambda_est * np.sign(events_pred)
eta_ = eta / np.sqrt(i_iter+start_iter)
events_pred += eta_ * grad
# shrinkage operator
events_pred = shrinkage(events_pred, threshold=eta_*y_lambda_est, stype=stype)
# measuring efficiency
SE_iter[i_iter, 0, i_type] = SE_x(x, envelope, events_pred)
SE_iter[i_iter, 1, i_type] = SE_y(events, events_pred)
# thresholding
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_est)
events_pred_thr = shrinkage(events_pred, threshold=threshold, stype='hard')
SE_thr[i_iter, 0, i_type] = SE_x(x, envelope, events_pred_thr)
SE_thr[i_iter, 1, i_type] = SE_y(events, events_pred_thr)
SE_iter[-1, ...]
array([[0.283206, 1.528892], [0.232429, 1.291956]])
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].plot(iters,np.ones_like(SE_iter[:, i, 0]), 'k--')
for i_type, stype in enumerate(['soft', 'hard']):
ax[i].plot(iters, SE_iter[:, i, i_type], label=stype + '_full')
ax[i].plot(iters, SE_thr[:, i, i_type], '--', label=stype + '_thr')
ax[i].set_xlabel('iterations')
ax[i].set_ylabel(SE_l)
#ax[i].set_xscale('log')
ax[i].set_yscale('log')
ax[0].legend(loc='best');
Now testing ISTA to explore the effect of parameters:
def ISTA(x_noise, envelope, eta=eta, start_iter=start_iter, y_lambda_est=y_lambda_est, rho_est=rho_est, N_iter=N_iter, stype='soft'):
# intialization
events_pred = invert(x_noise, envelope)
AT_x_noise = model(envelope, x_noise)
envelope2 = envelope**2
# iterations
iters = range(int(N_iter))
for i_iter in iters:
# predicting
grad = AT_x_noise - model(envelope2, events_pred)
eta_ = eta / np.log(i_iter+start_iter)
events_pred += eta_ * grad
# thresholding
events_pred = shrinkage(events_pred, threshold=eta_*y_lambda_est, stype=stype)
return events_pred
Testing different parameters:
SE = {}
SE_thr = {}
for variable in default.keys():
SE[variable] = np.zeros((N_scan, 2))
SE_thr[variable] = np.zeros((N_scan, 2))
values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
for i_value, value in enumerate(values):
opts = default.copy()
opts[variable] = value
print('variable', variable, '=', '%.3f' % opts[variable], ' - default =', '%.3f' % default[variable])
x_noise = add_noise(x, noise)
events_pred = ISTA(x_noise, envelope, **opts)
SE[variable][i_value, 0] = SE_x(x, envelope, events_pred)
SE[variable][i_value, 1] = SE_y(events, events_pred)
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-opts['rho_est'])
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
SE_thr[variable][i_value, 0] = SE_x(x, envelope, events_pred_thr)
SE_thr[variable][i_value, 1] = SE_y(events, events_pred_thr)
variable N_iter = 200.000 - default = 2000.000 variable N_iter = 277.899 - default = 2000.000 variable N_iter = 386.140 - default = 2000.000 variable N_iter = 536.539 - default = 2000.000 variable N_iter = 745.519 - default = 2000.000 variable N_iter = 1035.895 - default = 2000.000 variable N_iter = 1439.371 - default = 2000.000 variable N_iter = 2000.000 - default = 2000.000 variable N_iter = 2778.991 - default = 2000.000 variable N_iter = 3861.395 - default = 2000.000 variable N_iter = 5365.392 - default = 2000.000 variable N_iter = 7455.187 - default = 2000.000 variable N_iter = 10358.949 - default = 2000.000 variable N_iter = 14393.713 - default = 2000.000 variable N_iter = 20000.000 - default = 2000.000 variable rho_est = 0.000 - default = 0.001 variable rho_est = 0.000 - default = 0.001 variable rho_est = 0.000 - default = 0.001 variable rho_est = 0.000 - default = 0.001 variable rho_est = 0.000 - default = 0.001 variable rho_est = 0.001 - default = 0.001 variable rho_est = 0.001 - default = 0.001 variable rho_est = 0.001 - default = 0.001 variable rho_est = 0.001 - default = 0.001 variable rho_est = 0.002 - default = 0.001 variable rho_est = 0.003 - default = 0.001 variable rho_est = 0.004 - default = 0.001 variable rho_est = 0.005 - default = 0.001 variable rho_est = 0.007 - default = 0.001 variable rho_est = 0.010 - default = 0.001 variable y_lambda_est = 0.400 - default = 4.000 variable y_lambda_est = 0.556 - default = 4.000 variable y_lambda_est = 0.772 - default = 4.000 variable y_lambda_est = 1.073 - default = 4.000 variable y_lambda_est = 1.491 - default = 4.000 variable y_lambda_est = 2.072 - default = 4.000 variable y_lambda_est = 2.879 - default = 4.000 variable y_lambda_est = 4.000 - default = 4.000 variable y_lambda_est = 5.558 - default = 4.000 variable y_lambda_est = 7.723 - default = 4.000 variable y_lambda_est = 10.731 - default = 4.000 variable y_lambda_est = 14.910 - default = 4.000 variable y_lambda_est = 20.718 - default = 4.000 variable y_lambda_est = 28.787 - default = 4.000 variable y_lambda_est = 40.000 - default = 4.000 variable eta = 0.001 - default = 0.005 variable eta = 0.001 - default = 0.005 variable eta = 0.001 - default = 0.005 variable eta = 0.001 - default = 0.005 variable eta = 0.002 - default = 0.005 variable eta = 0.003 - default = 0.005 variable eta = 0.004 - default = 0.005 variable eta = 0.005 - default = 0.005 variable eta = 0.007 - default = 0.005 variable eta = 0.010 - default = 0.005 variable eta = 0.013 - default = 0.005 variable eta = 0.019 - default = 0.005 variable eta = 0.026 - default = 0.005 variable eta = 0.036 - default = 0.005 variable eta = 0.050 - default = 0.005 variable start_iter = 0.200 - default = 2.000 variable start_iter = 0.278 - default = 2.000 variable start_iter = 0.386 - default = 2.000 variable start_iter = 0.537 - default = 2.000 variable start_iter = 0.746 - default = 2.000 variable start_iter = 1.036 - default = 2.000 variable start_iter = 1.439 - default = 2.000 variable start_iter = 2.000 - default = 2.000 variable start_iter = 2.779 - default = 2.000 variable start_iter = 3.861 - default = 2.000 variable start_iter = 5.365 - default = 2.000 variable start_iter = 7.455 - default = 2.000 variable start_iter = 10.359 - default = 2.000 variable start_iter = 14.394 - default = 2.000 variable start_iter = 20.000 - default = 2.000
for variable in default.keys():
values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].plot(values, np.ones_like(SE_thr[variable][:, i]), 'k--')
ax[i].plot(values, SE[variable][:, i], label='full')
ax[i].plot(values, SE_thr[variable][:, i], '--', label='thr')
ax[i].set_xlabel(variable)
ax[i].set_ylabel(SE_l);
fig.suptitle(variable)
ax[i].set_xscale('log')
ax[i].set_yscale('log')
ax[0].legend(loc='best');
SE_ista = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
x_noise = add_noise(x, noise_)
events_pred_thr = ISTA(x_noise, envelope)
SE_ista[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
SE_ista[i_noise, 1] = SE_y(events, events_pred_thr)
SE_noise['ista']= SE_ista
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].plot(noises, np.ones(len(noises)), 'k--')
for method in ['full', 'thr', 'cg', 'ista']:
ax[i].plot(noises, SE_noise[method][:, i], label=method)
ax[i].set_xlabel('noise')
ax[i].set_ylabel(SE_l)
ax[i].set_xscale('log')
ax[i].set_yscale('log')
ax[0].legend(loc='best');
FISTA: Fast ISTA¶
def FISTA(x_noise, envelope, rho_est=rho_est, N_iter=N_iter, stype='soft'):
# intialization
events_pred = events_pred_thr = events_pred_thr_ = invert(x_noise, envelope)
t = 1
# iterations
iters = range(int(N_iter))
for i_iter in iters:
t_old = t
t = .5 * (1 + np.sqrt(1 + 4*t**2))
events_old = events_pred
events_pred_thr_old = events_pred_thr_
grad = model(envelope, x_noise) - model(envelope**2, events_pred_thr)
#grad -= y_lambda_est * np.sign(events_pred_thr)
events_pred += eta / (i_iter+1) * grad
return events_pred
N_scan = 15
default = dict(N_iter=1000, rho_est=rho)
print('default =', default)
default = {'N_iter': 1000, 'rho_est': 0.001}
SE = {}
SE_thr = {}
for variable in default.keys():
SE[variable] = np.zeros((N_scan, 2))
SE_thr[variable] = np.zeros((N_scan, 2))
values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
for i_value, value in enumerate(values):
opts = default.copy()
opts[variable] = value
print('variable', variable, '=', '%.3f' % opts[variable], ' - default =', '%.3f' % default[variable])
x_noise = add_noise(x, noise)
events_pred = FISTA(x_noise, envelope, **opts)
SE[variable][i_value, 0] = SE_x(x, envelope, events_pred)
SE[variable][i_value, 1] = SE_y(events, events_pred)
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-opts['rho_est'])
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
SE_thr[variable][i_value, 0] = SE_x(x, envelope, events_pred_thr)
SE_thr[variable][i_value, 1] = SE_y(events, events_pred_thr)
variable N_iter = 100.000 - default = 1000.000 variable N_iter = 138.950 - default = 1000.000 variable N_iter = 193.070 - default = 1000.000 variable N_iter = 268.270 - default = 1000.000 variable N_iter = 372.759 - default = 1000.000 variable N_iter = 517.947 - default = 1000.000 variable N_iter = 719.686 - default = 1000.000 variable N_iter = 1000.000 - default = 1000.000 variable N_iter = 1389.495 - default = 1000.000 variable N_iter = 1930.698 - default = 1000.000 variable N_iter = 2682.696 - default = 1000.000 variable N_iter = 3727.594 - default = 1000.000 variable N_iter = 5179.475 - default = 1000.000 variable N_iter = 7196.857 - default = 1000.000 variable N_iter = 10000.000 - default = 1000.000 variable rho_est = 0.000 - default = 0.001 variable rho_est = 0.000 - default = 0.001 variable rho_est = 0.000 - default = 0.001 variable rho_est = 0.000 - default = 0.001 variable rho_est = 0.000 - default = 0.001 variable rho_est = 0.001 - default = 0.001 variable rho_est = 0.001 - default = 0.001 variable rho_est = 0.001 - default = 0.001 variable rho_est = 0.001 - default = 0.001 variable rho_est = 0.002 - default = 0.001 variable rho_est = 0.003 - default = 0.001 variable rho_est = 0.004 - default = 0.001 variable rho_est = 0.005 - default = 0.001 variable rho_est = 0.007 - default = 0.001 variable rho_est = 0.010 - default = 0.001
for variable in default.keys():
values = np.logspace(-1, 1, N_scan, base=10, endpoint=True) * default[variable]
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].plot(values, np.ones_like(SE_thr[variable][:, i]), 'k--')
ax[i].plot(values, SE[variable][:, i], label='full')
ax[i].plot(values, SE_thr[variable][:, i], '--', label='thr')
ax[i].set_xlabel(variable)
ax[i].set_ylabel(SE_l);
fig.suptitle(variable)
ax[i].set_xscale('log')
ax[i].set_yscale('log')
ax[0].legend(loc='best');
SE_fista = np.zeros((N_scan, 2))
for i_noise, noise_ in enumerate(noises):
x_noise = add_noise(x, noise_)
events_pred_thr = FISTA(x_noise, envelope)
SE_fista[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
SE_fista[i_noise, 1] = SE_y(events, events_pred_thr)
SE_noise['fista']= SE_fista
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].loglog(noises, np.ones_like(SE_noise[method][:, i]), 'k--')
for method in ['full', 'thr', 'cg', 'ista', 'fista']:
ax[i].plot(noises, SE_noise[method][:, i], label=method)
ax[i].set_xlabel('noise')
ax[i].set_ylabel(SE_l)
ax[0].legend(loc='best');
Checking the efficiency for different sparseness levels
#N_X, N_Y = 2**10, 2**10
#N_X, N_Y = 2**7, 2**7
#noise = .04
envelope = MC_env(N_X, N_Y)
SE_thr = np.zeros((N_scan_rho, 2))
SE_ista = np.zeros((N_scan_rho, 2))
SE_fista = np.zeros((N_scan_rho, 2))
rhos = np.logspace(-4, -2, N_scan_rho, base=10, endpoint=True)
for i_rho, rho_ in enumerate(rhos):
events, x = generative_model(envelope, rho=rho_)
x_noise = add_noise(x, noise)
events_pred = invert(x_noise, envelope)
threshold = np.quantile(np.absolute(events_pred).ravel(), 1-rho_)
events_pred_thr = events_pred * ((events_pred < -threshold) + (events_pred > threshold))
SE_thr[i_noise, 0] = SE_x(x, envelope, events_pred_thr)
SE_thr[i_noise, 1] = SE_y(events, events_pred_thr)
events_pred_thr = ISTA(x_noise, envelope, rho_est=rho_)
SE_ista[i_rho, 0] = SE_x(x, envelope, events_pred_thr)
SE_ista[i_rho, 1] = SE_y(events, events_pred_thr)
events_pred_thr = FISTA(x_noise, envelope, rho_est=rho_)
SE_fista[i_rho, 0] = SE_x(x, envelope, events_pred_thr)
SE_fista[i_rho, 1] = SE_y(events, events_pred_thr)
fig, ax = plt.subplots(1, 2, figsize=figsize)
for i, SE_l in enumerate(['SE_x', 'SE_y']):
ax[i].loglog(rhos, np.ones_like(SE_thr[:, i]), 'k--')
ax[i].loglog(rhos, SE_thr[:, i], label='Original')
ax[i].loglog(rhos, SE_ista[:, i], '--', label='Noisy')
ax[i].loglog(rhos, SE_fista[:, i], '--', label='FISTA')
ax[i].set_xlabel('rho_est')
ax[i].set_ylabel(SE_l);
ax[0].legend(loc='best');
- there is obviously something wrong with ISTA / FISTA
- TODO: learn enveloppe http://yann.lecun.com/exdb/publis/pdf/gregor-icml-10.pdf
- TODO: Noisy case: correlated noise
some book keeping for the notebook¶
%reload_ext watermark
%watermark -i -h -m -v -p numpy,MotionClouds,matplotlib,scipy,imageio -r -g -b
Python implementation: CPython Python version : 3.9.10 IPython version : 7.27.0 numpy : 1.22.3 MotionClouds: 20200212 matplotlib : 3.5.1 scipy : 1.6.0 imageio : 2.16.1 Compiler : Clang 13.0.0 (clang-1300.0.29.3) OS : Darwin Release : 21.3.0 Machine : x86_64 Processor : i386 CPU cores : 36 Architecture: 64bit Hostname: fortytwo Git hash: 6afc708ac96955ef9573fafc11530bf1488d5ecb Git repo: https://github.com/laurentperrinet/sciblog.git Git branch: master