testing COMPs-Pcum

Laurent Perrinet

2017-05-23 15:58

In this notebook, we will study how homeostasis (cooperation) may be an essential ingredient to this algorithm working on a winner-take-all basis (competition). This extension has been published as Perrinet, Neural Computation (2010) (see https://laurentperrinet.github.io/publication/perrinet-10-shl ). In particular, we will show how one can build the non-linear functions based on the activity of each filter and which implement homeostasis.

See also the other posts on unsupervised learning,

This is joint work with Victor Boutin.

Conclusion: using Pcum functions seems to work, but one needs to learn the non-linear functions in a smooth manner. This will be done in a next post.

In [1]:

%load_ext autoreload
%autoreload 2

import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
#%config InlineBackend.figure_format='svg'

import numpy as np
np.set_printoptions(formatter = dict( float = lambda x: "%.3g" % x ), precision=3, suppress=True, threshold=np.inf)

import time

starting with a short learning without homeostasis¶

In [2]:

DEBUG = True
DEBUG = False
if not DEBUG:
    matname = '2017-05-31_Testing_COMPs'
    DEBUG_DOWNSCALE = 1
else:
    matname = '2017-05-31_Testing_COMPs-DEBUG'
    DEBUG_DOWNSCALE = 10

seed = 42
nb_quant = 512
C = 5.
do_sym = False

from shl_scripts.shl_experiments import SHL
shl = SHL(DEBUG_DOWNSCALE=DEBUG_DOWNSCALE, seed=seed, 
           eta=0.05, verbose=2, record_each=50, n_iter=1000, eta_homeo=0., alpha_homeo=1., 
          do_sym=do_sym, nb_quant=nb_quant, C=C)
data = shl.get_data(matname=matname)

loading the data called : data_cache/2017-05-31_Testing_COMPs_data

In [3]:

test_size = data.shape[0]//2
data_training = data[:test_size, :]
data_test = data[test_size:,:]   
#DEBUG
test_size = data.shape[0]//20
data_training = data[:(data.shape[0]-test_size),:].copy()
data_test = data[:test_size, :].copy()

We start off by using a short learning with no homeostasis such that we end up with a unbalanced dictionary:

In [4]:

dico_partial_learning = shl.learn_dico(data=data_training, matname=matname)

loading the dico called : data_cache/2017-05-31_Testing_COMPs_dico.pkl

In [5]:

dico_partial_learning.P_cum

In [6]:

fig, ax = shl.show_dico(dico_partial_learning, data=data, title=matname)
fig.show()

$No description has been provided for this image$

In [7]:

fig, ax = shl.time_plot(dico_partial_learning, variable='prob_active');
fig.show()

No description has been provided for this image

classical Matching Pursuit¶

In [8]:

#n_samples, n_pixels = data_test.shape
#n_dictionary, n_pixels = dico_partial_learning.dictionary.shape
norm_each_filter = np.sqrt(np.sum(dico_partial_learning.dictionary**2, axis=1))
print('L2 norm of each filter=', norm_each_filter)
dico_partial_learning.dictionary /= norm_each_filter[:,np.newaxis]

sparse_code_mp = shl.code(data_test, dico_partial_learning, matname=matname)

L2 norm of each filter= [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
loading the code called : data_cache/2017-05-31_Testing_COMPs_coding.npy

In [9]:

def plot_proba_histogram(coding):
    n_dictionary=coding.shape[1]

    p = np.count_nonzero(coding, axis=0)/coding.shape[1]
    p /= p.sum()
    rel_ent = np.sum( -p * np.log(p)) / np.log(n_dictionary)
    #print('Entropy / Entropy_max=', rel_ent )

    fig = plt.figure(figsize=(16, 4))
    ax = fig.add_subplot(111)
    ax.bar(np.arange(n_dictionary), p*n_dictionary)
    ax.set_title('distribution of the selection probability - entropy= ' + str(rel_ent)  )
    ax.set_ylabel('pdf')
    ax.axis('tight')
    ax.set_xlim(0, n_dictionary)
    return fig, ax

fig, ax = plot_proba_histogram(sparse_code_mp)

In [10]:

p = np.count_nonzero(sparse_code_mp, axis=0)
print('relative proba of being selected', p/p.mean())
print('most selected filter : {}'.format(np.argmax(p)))
print('less selected filter : {}'.format(np.argmin(p)))

relative proba of being selected [0.772 0.439 1.04 1.06 0.444 0.761 2.09 1.09 0.587 0.867 1.1 0.793 1.04
 0.973 0.714 1.02 1.17 0.92 1.25 1.63 0.989 0.666 0.523 4.82 1.33 1.03
 0.883 1.4 0.708 0.481 0.412 0.619 0.973 1.18 0.693 1.57 0.455 0.597 0.862
 1 0.576 1.14 0.772 0.56 0.708 1.12 1.16 0.597 0.724 0.613 0.555 3.58 1.05
 0.798 0.93 0.576 2.36 3.31 0.497 1.56 1.07 2.86 0.671 0.455 1.45 0.455
 0.952 0.661 1.18 0.529 0.862 1.09 0.693 0.941 1 1.88 0.603 1.11 0.735 1.12
 0.656 1.28 0.909 0.566 0.952 1.19 1.05 2.78 0.719 2.47 0.809 1.55 0.782
 0.93 0.592 0.582 0.671 1.51 1 0.513 0.825 0.687 1.23 3.37 0.735 0.449
 0.952 3.4 1.19 0.518 0.687 0.957 0.645 0.761 1.02 1.12 1.21 0.915 0.946
 1.07 0.449 1.46 0.693 1.08 0.719 0.957 0.856 0.539 0.698 0.661 0.941 0.967
 0.994 0.624 1.19 0.936 0.756 0.74 3.88 1.96 0.64 0.693 1.02 0.497 1.55
 2.29 0.893 0.83 0.936 1.15 0.56 1.08 0.804 1.61 1 1.3 1.23 1.22 1.02 0.983
 0.455 0.55 0.788 0.74 0.523 1.3 0.83 0.767 1.36 1.65 0.893 0.767 0.867
 0.841 1.09 1.3 0.444 0.809 0.529 1.08 0.608 0.724 0.56 0.518 1.25 0.55
 1.21 0.603 0.619 1.2 0.888 1.25 1.02 0.952 0.941 0.661 0.967 0.788 0.619
 1.08 0.418 1.05 3.7 0.846 0.64 2.01 1.32 1.15 0.751 1.36 1.65 1.78 0.476
 0.782 0.693 0.566 1.21 1.04 0.872 0.745 1.26 1.59 1.06 1.23 1.8 1.37 0.967
 1.28 0.486 1.24 1.04 0.936 0.798 0.64 0.529 0.904 0.629 0.682 0.719 0.423
 0.693 0.534 0.952 1.01 0.518 1.33 0.967 0.582 0.698 0.645 0.793 1.31 1.39
 0.608 0.513 1.44 0.682 0.93 0.587 0.724 3.45 0.904 3.53 0.698 1.18 0.724
 1.85 0.571 0.862 0.671 0.486 0.851 0.656 0.745 0.455 0.65 0.481 0.629
 0.767 0.545 1.05 0.698 1.34 0.719 1.19 0.809 0.867 0.545 0.661 0.893 0.608
 0.55 0.566 0.539 0.756 0.73 0.856 0.93 1.03 1.52 0.962 1.04 1.25 0.994
 0.856 0.862 0.719 0.708 0.878 0.904 1.76 0.999 0.899 0.508 0.534 1.43 1.04
 0.613 0.481 0.761 0.449 0.534 1.11 0.888]
most selected filter : 23
less selected filter : 30

COMP : learning modulations¶

defining a rescaling function for coefficients to avoid overflows:

In [11]:

def rescaling(code, C=5., do_sym=True):
    if do_sym:
        return 1.-np.exp(-np.abs(code)/C)
    else:
        return (1.-np.exp(-code/C))*(code>0)
    
fig=plt.figure(figsize=(13, 8))
ax = plt.subplot(111)
x=np.linspace(0, 30, 100)                   
ax.plot(x, np.ones_like(x), '--')
ax.plot(x, rescaling(x, C=C, do_sym=do_sym))
ax.set_ylabel('raw coefficient')
ax.set_xlabel('rescaled coefficient')
ax.set_xscale('log');

In [12]:

def get_P_cum(code, nb_quant=100):    
    n_samples, nb_filter = code.shape
    code_bins = np.linspace(0., 1, nb_quant, endpoint=True)
    P_cum = np.zeros((nb_filter, nb_quant))
    for i in range(nb_filter):
        p, bins = np.histogram(rescaling(code[:, i], C=C, do_sym=do_sym), bins=code_bins, density=True)
        p /= p.sum()
        P_cum[i, :] = np.hstack((0, np.cumsum(p)))
    return P_cum

P_cum_mp = get_P_cum(sparse_code_mp, nb_quant=nb_quant)

In [13]:

from shl_scripts.shl_tools import plot_P_cum
fig, ax = plot_P_cum(P_cum_mp, verbose=False, alpha=.15);
ax.set_ylim(0.85, 1.01)
#ax.set_xmargin(0.);

Out[13]:

(0.85, 1.01)

COMP : using modulations¶

Computing a quantile using np.interp:

In [14]:

code_bins = np.linspace(0., 1, nb_quant+1, endpoint=True)
def quantile(code_bins, Pcum, c):
    q_i = np.zeros_like(c)
    for i in range(Pcum.shape[0]):
        q_i[i] = np.interp(c[i], code_bins[:-1], Pcum[i, :])
    return q_i

corr = (data_test @ dico_partial_learning.dictionary.T)[0, :]
P_cum = P_cum_mp.copy()
print('correlation=', corr)
print('rescaled correlation=', rescaling(corr, C=C, do_sym=do_sym))
print('quantile=', quantile(code_bins, P_cum, rescaling(corr, C=C, do_sym=do_sym)))

correlation= [-0.578 2.52 0.51 -0.0995 -0.48 -0.397 -1.18 1.01 0.368 -0.0685 -0.638 0.73
 0.662 -0.177 -0.735 -0.16 -0.313 -0.408 0.194 -0.0507 0.0542 -0.224 0.245
 0.213 0.606 0.375 0.375 0.24 0.31 0.798 0.0375 -0.36 -1.16 0.327 -0.447
 -0.221 -0.174 -0.503 0.403 -0.02 -0.318 -0.000521 -1.63 0.932 -1.4 -0.0734
 0.511 -1.93 -0.273 0.0229 -0.257 -0.173 0.41 -0.287 -0.00687 -0.0753 0.386
 -0.586 -0.351 -0.225 0.466 1.95 -0.144 -0.768 -0.159 0.761 -0.0439 -1.94
 0.00655 0.322 0.0772 0.0734 -0.395 -0.445 -0.114 0.00327 0.467 -0.417 0.49
 0.62 -0.183 0.696 0.302 -0.354 0.146 -0.225 -0.873 -0.321 0.398 0.514
 -0.705 0.174 -0.143 0.417 -1.59 0.0991 -0.407 -0.294 0.612 0.172 -0.761
 0.389 0.235 0.354 1.12 -0.945 -0.643 0.656 0.22 -0.478 -0.929 -0.876 2.81
 0.309 -0.0675 1.09 -0.0483 0.247 0.265 -0.199 0.416 0.213 -0.102 0.5 0.72
 0.876 -0.0915 -0.947 0.465 0.544 -0.488 0.323 0.09 1.09 0.294 0.597 0.053
 0.583 0.302 -0.328 -0.711 -0.283 0.737 0.35 -0.231 0.306 0.0878 -0.0276
 -1.05 -0.371 -0.478 -0.218 0.979 -0.234 0.223 -0.0373 -0.234 0.367 -0.993
 -0.52 0.316 -0.134 -0.458 0.76 -0.0953 -0.262 0.0278 -0.0626 -0.169 0.53
 0.782 0.42 -0.191 0.0638 0.0389 -0.569 1.95 -0.798 -2.27 -0.369 0.719
 0.437 -2 -0.529 0.75 0.208 0.301 0.675 0.0886 0.165 -0.58 -0.389 0.859
 -0.346 1.55 -0.092 -0.118 0.417 0.414 0.554 -0.77 -0.0206 -0.0592 0.00913
 -0.122 -0.143 0.0831 0.53 -0.485 -0.0601 -0.112 0.389 -0.406 -0.442 0.903
 -0.16 0.125 0.212 -0.696 0.951 -0.406 0.531 -0.0383 -0.235 0.313 -0.431
 1.21 -0.144 -0.809 -0.886 -0.0554 0.153 -0.701 -0.000746 -0.237 0.94 0.993
 0.729 0.487 -0.317 0.303 1.1 0.178 0.273 0.381 -1.08 -0.135 0.132 0.136
 -0.0405 -0.27 0.0144 -0.229 -0.00496 -0.182 0.66 -0.402 0.865 -0.588 1.91
 0.473 0.675 0.124 0.0819 -1.05 0.421 -0.204 0.345 0.0407 -0.388 -1.47
 -0.559 0.15 0.0563 -1.46 -0.273 0.47 0.198 -0.158 -0.0988 0.367 -0.839
 0.259 -1.09 0.481 0.494 -0.211 1.23 0.231 0.297 -0.403 0.333 -0.268 0.0581
 0.00154 -0.0769 -0.334 -0.22 0.0281 0.0913 -0.139 -0.502 -0.377 -0.0628
 -0.525 -0.037 -0.13 -0.636 -0.329 0.251 -0.244 -0.875 0.225 -0.395 0.266
 -0.573 -0.0907 0.709 -0.0446 -0.055 1.24 -0.148 -0.0754 -0.729]
rescaled correlation= [-0 0.396 0.097 -0 -0 -0 -0 0.183 0.071 -0 -0 0.136 0.124 -0 -0 -0 -0 -0
 0.038 -0 0.0108 -0 0.0478 0.0418 0.114 0.0722 0.0723 0.0469 0.0602 0.148
 0.00748 -0 -0 0.0634 -0 -0 -0 -0 0.0774 -0 -0 -0 -0 0.17 -0 -0 0.0971 -0
 -0 0.00457 -0 -0 0.0787 -0 -0 -0 0.0743 -0 -0 -0 0.089 0.322 -0 -0 -0
 0.141 -0 -0 0.00131 0.0624 0.0153 0.0146 -0 -0 -0 0.000653 0.0892 -0
 0.0934 0.117 -0 0.13 0.0585 -0 0.0288 -0 -0 -0 0.0765 0.0977 -0 0.0341 -0
 0.0801 -0 0.0196 -0 -0 0.115 0.0339 -0 0.0749 0.0458 0.0683 0.201 -0 -0
 0.123 0.0431 -0 -0 -0 0.43 0.0598 -0 0.196 -0 0.0482 0.0515 -0 0.0798
 0.0417 -0 0.0951 0.134 0.161 -0 -0 0.0888 0.103 -0 0.0626 0.0178 0.196
 0.057 0.113 0.0105 0.11 0.0585 -0 -0 -0 0.137 0.0676 -0 0.0593 0.0174 -0
 -0 -0 -0 -0 0.178 -0 0.0436 -0 -0 0.0707 -0 -0 0.0612 -0 -0 0.141 -0 -0
 0.00554 -0 -0 0.101 0.145 0.0805 -0 0.0127 0.00775 -0 0.323 -0 -0 -0 0.134
 0.0838 -0 -0 0.139 0.0408 0.0584 0.126 0.0176 0.0325 -0 -0 0.158 -0 0.267
 -0 -0 0.08 0.0794 0.105 -0 -0 -0 0.00182 -0 -0 0.0165 0.1 -0 -0 -0 0.0749
 -0 -0 0.165 -0 0.0246 0.0416 -0 0.173 -0 0.101 -0 -0 0.0607 -0 0.215 -0 -0
 -0 -0 0.0301 -0 -0 -0 0.171 0.18 0.136 0.0929 -0 0.0588 0.197 0.035 0.0531
 0.0734 -0 -0 0.026 0.0268 -0 -0 0.00288 -0 -0 -0 0.124 -0 0.159 -0 0.318
 0.0902 0.126 0.0244 0.0162 -0 0.0808 -0 0.0667 0.00811 -0 -0 -0 0.0296
 0.0112 -0 -0 0.0897 0.0388 -0 -0 0.0708 -0 0.0506 -0 0.0916 0.0941 -0
 0.218 0.0451 0.0578 -0 0.0644 -0 0.0115 0.000308 -0 -0 -0 0.0056 0.0181 -0
 -0 -0 -0 -0 -0 -0 -0 -0 0.049 -0 -0 0.0441 -0 0.0519 -0 -0 0.132 -0 -0
 0.219 -0 -0 -0]
quantile= [0 0.991 0.961 0 0 0 0 0.977 0.98 0 0 0.978 0.972 0 0 0 0 0 0.969 0 0.954 0
 0.977 0.945 0.972 0.961 0.965 0.952 0.975 0.983 0.981 0 0 0.951 0 0 0 0
 0.972 0 0 0 0 0.979 0 0 0.978 0 0 0.972 0 0 0.96 0 0 0 0.96 0 0 0 0.957
 0.986 0 0 0 0.981 0 0 0.633 0.979 0.963 0.951 0 0 0 0.305 0.975 0 0.977
 0.973 0 0.962 0.963 0 0.96 0 0 0 0.968 0.974 0 0.932 0 0.96 0 0.973 0 0
 0.962 0.977 0 0.971 0.944 0.957 0.983 0 0 0.978 0.947 0 0 0 0.987 0.971 0
 0.979 0 0.963 0.961 0 0.982 0.958 0 0.962 0.975 0.972 0 0 0.976 0.976 0
 0.957 0.955 0.984 0.96 0.971 0.965 0.977 0.967 0 0 0 0.977 0.98 0 0.958
 0.959 0 0 0 0 0 0.978 0 0.958 0 0 0.963 0 0 0.982 0 0 0.978 0 0 0.962 0 0
 0.938 0.972 0.971 0 0.962 0.95 0 0.988 0 0 0 0.979 0.971 0 0 0.977 0.979
 0.953 0.979 0.971 0.946 0 0 0.965 0 0.986 0 0 0.968 0.978 0.962 0 0 0
 0.898 0 0 0.95 0.958 0 0 0 0.957 0 0 0.979 0 0.95 0.956 0 0.981 0 0.986 0
 0 0.956 0 0.979 0 0 0 0 0.958 0 0 0 0.978 0.985 0.981 0.971 0 0.98 0.983
 0.957 0.956 0.982 0 0 0.976 0.971 0 0 0.939 0 0 0 0.975 0 0.968 0 0.986
 0.972 0.969 0.953 0.968 0 0.97 0 0.975 0.962 0 0 0 0.97 0.966 0 0 0.981
 0.972 0 0 0.966 0 0.964 0 0.96 0.972 0 0.984 0.973 0.961 0 0.976 0 0.975
 0.152 0 0 0 0.952 0.932 0 0 0 0 0 0 0 0 0 0.962 0 0 0.962 0 0.977 0 0
 0.982 0 0 0.986 0 0 0]

Such that we can redefine Matching Pursuit by including this non-linear function:

In [15]:

l0_sparseness = shl.l0_sparseness

def comp(data, dico, code_bins, P_cum, l0_sparseness=l0_sparseness, verbose=0):
    if verbose!=0:
        t0 = time.time()
    n_samples, n_dictionary = data.shape[0], dico.shape[0]
    sparse_code = np.zeros((n_samples, n_dictionary))
    corr = (data @ dico.T)
    Xcorr = (dico @ dico.T)
    
    for i_sample in range(n_samples):
        c = corr[i_sample, :].copy()
        if verbose!=0: ind_list=list()
        for i_l0 in range(int(l0_sparseness)):
            
            if P_cum is None:
                q_i = rescaling(c, C=C, do_sym=do_sym)
            else:
                q_i = quantile(code_bins, P_cum, rescaling(c, C=C, do_sym=do_sym))
                
            ind  = np.argmax(q_i)
            if verbose!=0:
                ind_list.append(ind)

            c_ind = c[ind] / Xcorr[ind, ind]
            sparse_code[i_sample, ind] += c_ind
            c -= c_ind * Xcorr[ind, :]

        if verbose!=0:
            if i_sample in range(2):
                q_i = quantile(code_bins, P_cum, rescaling(c, C=C, do_sym=do_sym))
                print(ind_list, [q_i[i] for i in ind_list], np.median(q_i), q_i.max(), [c[i] for i in ind_list], c.min(), c.max())
    if verbose!=0:
        duration = time.time()-t0
        print('coding duration : {0}s'.format(duration))

    return sparse_code

sparse_code = comp(data_test, dico_partial_learning.dictionary, code_bins, P_cum, verbose=1)

[1, 133, 320, 221, 276, 194, 112, 221, 61, 317, 120, 180, 313, 1, 30] [0.97973632812500011, 0.975341796875, 0.0, 0.0, 0.97803881875671017, 0.0, 0.0, 0.0, 0.96484375, 0.0, 0.982177734375, 0.97202625953524369, 0.98150739846300095, 0.97973632812500011, 0.0] 0.0 0.982177734375 [0.058885183324112515, 0.20871222858305538, -0.093887979122806831, -0.074365497468498992, 0.069296363126686925, -0.14640867668556121, -0.1515863408494843, -0.074365497468498992, 0.16783330681509309, -0.044130427335518807, 0.30755391951210032, 0.063087338988091324, 0.17011884150536954, 0.058885183324112515, 0.0] -0.896107961193 1.07584577964
[313, 36, 46, 281, 288, 276, 265, 78, 0, 244, 50, 283, 70, 63, 314] [0.0, 0.98095703125, 0.0, 0.98388671875, 0.0, 0.0, 0.96721837144303968, 0.99072265625, 0.96745312439172237, 0.0, 0.97932618006081851, 0.0, 0.96728515625000011, 0.979736328125, 0.0] 0.0 0.9912109375 [-2.568570658447952, 0.61935695626241805, -0.083715077906073534, 1.3890114841172858, -0.25149834124726606, -0.57000385468119186, 0.085953823280803171, 1.6596156736035916, 0.16441874584450164, -0.014587550980786262, 0.71610559371263049, -0.055418953039705399, 0.14130516157262435, 0.17933776107924937, 0.0] -5.29755086412 7.44691933515
coding duration : 261.7665820121765s

print(code_bins[:, 1])

In [16]:

sparse_code.shape

Out[16]:

(4096, 324)

In [17]:

np.mean(np.count_nonzero(sparse_code, axis=0)/sparse_code.shape[1])

Out[17]:

0.50696349641822891

fig, ax = plot_proba_histogram(sparse_code)

In [18]:

np.count_nonzero(sparse_code, axis=0)

Out[18]:

array([ 254, 1255,   40,   48,  937,  159,   38,   44,  480,   29,   48,
         27,   56,   35,   74,   18,   23,  145,   83,   30,   79,  120,
        230,   45,   64,   89,   67,   48,  112,  958, 1045,   29,   84,
         18,   51,  208,  271, 1581,   34,   30,   62,   94,  160,   69,
         26,   20,   50,   83,   68,   80,   89,   88,   54,   43,   83,
         96,   59,  173, 1393,   37,   34,   86,   38,  364,   32,  473,
         87,  127,   43,  777,  221,  146,   59,  147,   51,   27,   54,
         21,  297,   72,   68,   45,   19,   89,   13,   24,  208,   35,
         25,   32,   20,   40,   84,   15,   84,   82,  111,  245,   14,
        220,  143,   23,   25,   17,   86,  587,  179,  155,   26,  371,
        143,   63,   47,   29,   43,  129,   21,   66,   48,   23,  903,
         35,   33,   23,   75,   43,   38,  397,  105,   88,   68,   23,
         35,  195,   91,   84,   14,   66,   31,   48,   62,   42,  171,
        491,   23,   58,   65,   42,   26,   51,  449,   31,  142,   53,
         59,   55,   34,   89,   50,   44, 1486,  427,  167,   67,  194,
         30,    9,   54,   34,   24,   79,  143,  144,  158,   35,   19,
       1037,   34,  276,   43,  115,   25,  135,  219,  159,  501,   33,
        197,   44,   29,   63,   37,   36,   16,  131,  256,   48,   69,
        130,   37, 1563,   55,   39,   65,  179,  158,   18,   24,   45,
        156,   32,   12,  858,   44,  176, 1850,   11,   47,   21,  186,
         27,  818,   17,   13,   18,   42,   80,   30,  460,   57,   17,
         34,   17,   33,   91,  107,  453,  167,   38,  955,  288,  117,
         26,   58, 1377,   34,   27,  159,   38,   30,  110,   23,   49,
        602,  115,   61,  218,   13,   74,   79,   40,   17,   45,   57,
         39,   18,   28,    7,   69,   70,  887,   32,  129,   24, 1139,
        196,  608,   24,   21,  118,   77,   54,   41,  121,   63,   75,
         17,  644,   62,   41,   34,   30,  113,  120,   32,   83,   85,
         57,   44,   38,  118,  121,   23,   70,   16,  212,  148,   85,
         53,   18,   50,   21,   13, 1621,  321,   54,   62,  270,  642,
        874,  521,   53,   23,   82])

sparse_code

In [19]:

def plot_scatter_MpVsComp(sparse_vector, my_sparse_code):
    fig = plt.figure(figsize=(16, 16))
    ax = fig.add_subplot(111)
    a_min = np.min((sparse_vector.min(), my_sparse_code.min()))
    a_max = np.max((sparse_vector.max(), my_sparse_code.max()))
    ax.plot(np.array([a_min, a_max]), np.array([a_min, a_max]), 'k--', lw=2)
    
    ax.scatter(sparse_vector.ravel(), my_sparse_code.ravel(), alpha=0.01)
    ax.set_title('MP')
    ax.set_ylabel('COMP')
    #ax.set_xlim(0)
    #ax.set_ylim(0)
    ax.axis('equal')
    return fig, ax

#fig, ax = plot_scatter_MpVsComp(sparse_code_mp, sparse_code)
#fig.show()

testing that COMP with flat Pcum is equivalent to MP¶

In [20]:

n_samples, nb_filter = sparse_code.shape

P_cum = np.linspace(0, 1, nb_quant, endpoint=True)[np.newaxis, :] * np.ones((nb_filter, 1))

sparse_code_comp = comp(data_test, dico_partial_learning.dictionary, code_bins, P_cum, verbose=1)

[112, 1, 176, 259, 194, 61, 320, 257, 199, 219, 115, 120, 180, 12, 81] [0.0, 0.0, 0.0, 0.0, 0.0, 0.013864142875207208, 0.028772536419743071, 0.016233212505017167, 0.0, 0.024265759252589399, 0.10207030152319692, 0.034597904055880355, 0.0, 0.018343420091593308, 0.0] 0.000412864457953 0.141560943202 [-0.44404114150688967, -0.26877134490129767, -0.18201957102880398, -0.12664604750941777, -0.030136470340884552, 0.069668445082797154, 0.14568360795725216, 0.081670932247593975, -0.23606863784845655, 0.12258226214410899, 0.53720753857826176, 0.17570295703817568, -0.033546386202322795, 0.092386254783856453, 0.0] -0.981150296523 0.761587804166
[234, 193, 46, 265, 245, 313, 44, 290, 136, 212, 260, 193, 27, 36, 82] [0.0, 0.0, 0.22563269740595826, 0.015581609675975019, 0.14039962809637588, 0.0, 0.0, 0.11338662816679515, 0.17227117585711077, 0.0, 0.15557370905708512, 0.0, 0.0, 0.070234302641278737, 0.0] 0.0 0.424559659253 [-1.4159508240284979, -0.41402145954707881, 1.2757001664220735, 0.078366824142893865, 0.75484363017084188, -0.49501967151356852, -0.5981828937227347, 0.60048262929775498, 0.94331636749450609, -0.015420251732595008, 0.84369028330390172, -0.41402145954707881, -0.14786845057432443, 0.36337567568528634, 0.0] -4.06088364925 2.75589871837
coding duration : 219.649316072464s

In [21]:

print('Relative difference = ', np.sum((sparse_code_mp - sparse_code_comp)**2)/np.sum((sparse_code_mp)**2))

Relative difference =  0.0

In [22]:

fig, ax = plot_proba_histogram(sparse_code_mp)
fig.show()
fig, ax = plot_proba_histogram(sparse_code_comp)

In [23]:

fig, ax = plot_scatter_MpVsComp(sparse_code_mp, sparse_code_comp)

gradient descent¶

In [24]:

P_cum_new = np.linspace(0, 1, nb_quant, endpoint=True)[np.newaxis, :] * np.ones((nb_filter, 1))
print('Shape of modulation function', P_cum.shape)

eta_homeo = .005

for i in range(1000//DEBUG_DOWNSCALE):
    sparse_code_comp = comp(data_test, dico_partial_learning.dictionary, code_bins, P_cum_new, verbose=0)
    P_cum_ = get_P_cum(sparse_code_comp, nb_quant=nb_quant)
    P_cum_new = (1-eta_homeo) * P_cum_new + eta_homeo  * P_cum_
    if i % (100//DEBUG_DOWNSCALE) == 0:
        print('Learning step', i)
        fig, ax = plot_proba_histogram(sparse_code_comp)
        plt.show()

Shape of modulation function (324, 512)
Learning step 0

Learning step 100

Learning step 200

Learning step 300

Learning step 400

Learning step 500

Learning step 600

Learning step 700

Learning step 800

Learning step 900

One observes that the distribution gets progressively more uniform which was our goal. However, this implementation makes the Matching Pursuit algorithm very slow such that we need to speed things up. This will be done in the next post.

Note the shape of the initial non-linear functions:

In [25]:

from shl_scripts.shl_tools import plot_P_cum
fig, ax = plot_P_cum(P_cum_mp, verbose=False);
ax.set_ylim(0.85, 1.01);

With that at convergence:

In [26]:

from shl_scripts.shl_tools import plot_P_cum
fig, ax = plot_P_cum(P_cum_new, verbose=False);
ax.set_ylim(0.85, 1.01);

Indeed, as the unsupervised learning is forced to be homeostatic, the filters that are learned ultimately converge to features of equal "saillance", such that the non-linear functions are approximately equal at convergence: In a equalitarian network at convergence, there is no need for these rules anymore (unless you want to learn again).

Version used¶

In [27]:

%load_ext version_information
%version_information numpy, shl_scripts

Out[27]:

Software	Version
Python	3.6.3 64bit [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]
IPython	5.0.0
OS	Darwin 17.2.0 x86_64 i386 64bit
numpy	1.13.3
shl_scripts	2017-02-09
Sun Dec 03 06:29:50 2017 CET