Experimenting with transfer learning for visual categorization

Laurent Perrinet

2021-12-03 16:23

Hi! I am Jean-Nicolas Jérémie and the goal of this notebook is to provide a framework to implement (and experiment with) transfer learning on deep convolutional neuronal network (DCNN). In a nutshell, transfer learning allows to re-use the knowlegde learned on a problem, such as categorizing images from a large dataset, and apply it to a different (yet related) problem, performing the categorization on a smaller dataset. It is a powerful method as it allows to implement complex task de novo quite rapidly (in a few hours) without having to retrain the millions of parameters of a DCNN (which takes days of computations). The basic hypothesis is that it suffices to re-train the last classification layers (the head) while keeping the first layers fixed. Here, these networks teach us also some interesting insights into how living systems may perform such categorization tasks.

Based on our previous work, we will start from a VGG16 network loaded from the torchvision.models library and pre-trained on the Imagenet dataset wich allows to perform label detection on naturals images for $K = 1000$ labels. Our goal here will be to re-train the last fully-Connected layer of the network to perfom the same task but in a sub-set of $K = 10$ labels from the Imagenet dataset.

Moreover, we are going to evaluate different strategies of transfer learning:

VGG General : Substitute the last layer of the pyTorch VGG16 network ($K = 1000$ labels) with a new layer build from a specific subset ($K = 10$ labels).
VGG Linear : Add a new layer build from a specific subset ($K = 10$ labels) after the last Fully-Connected layer of the the pyTorch VGG16 network.
VGG Gray : Same architecture as the VGG General network but trained with grayscale images.
VGG Scale : Same architecture as the VGG General network but trained with images of different size.
VGG Full : Same architecture as the VGG General network but all the layers are trained (otherwise I trained the last Fully-Connected layer).

In this notebook, I will use the pyTorch library for running the networks and the pandas library to collect and display the results. This notebook was done during a master 2 internship at the Neurosciences Institute of Timone (INT) under the supervision of Laurent Perrinet. It is curated in the following github repo.

Implementing transfer learning on Vgg16 using pyTorch¶

In our previous work, as the VGG16 network was first trained on the entire dataset of $K=1000$ labels, and in order to recover the categorization confidence predicted by the model according to the specific subset of classes ($K = 10$ labels) on which it is tested, the output softmax mathematical function of the last layer of the network was slightly changed. By assuming that we know a priori that the image belongs to one (and only one) category from the sub-set the probabilities obtained would correspond to a confidence of categorization discriminating only the classes of interest and can be compared to a chance level of $1 /K$. This creates another network (which is not retrained) directly based on VGG:

VGG Subset : Just consider the specific subset ($K = 10$ labels) from the last layer of the pyTorch VGG16 network ($K = 1000$ labels).

This notebook aims in addition to test this hypothesis. Our use case consists of measuring whether there are differences in the likelihood of these networks during an image recognition task on a sub-set of $1000$ classes of the ImageNet library, with $K = 10$ (experiment 1). Additionally, we will implement some image transformations as up/down-sampling (experiment 2) or transforming to grayscale (experiment 3) to quantify their influence on the accuracy and computation time of each network.

Some useful links :

Let's first install requirements

In [1]:

%pip install --upgrade -r requirements.txt

/usr/lib/python3/dist-packages/secretstorage/dhcrypto.py:15: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead
  from cryptography.utils import int_from_bytes
/usr/lib/python3/dist-packages/secretstorage/util.py:19: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead
  from cryptography.utils import int_from_bytes
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pip in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from -r requirements.txt (line 4)) (21.3.1)
Requirement already satisfied: matplotlib in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from -r requirements.txt (line 5)) (3.5.0)
Requirement already satisfied: numpy in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from -r requirements.txt (line 6)) (1.21.4)
Requirement already satisfied: imageio in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from -r requirements.txt (line 7)) (2.13.1)
Requirement already satisfied: torch in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from -r requirements.txt (line 8)) (1.10.0)
Requirement already satisfied: torchvision in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from -r requirements.txt (line 9)) (0.11.1)
Requirement already satisfied: pandas in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from -r requirements.txt (line 10)) (1.3.4)
Requirement already satisfied: requests in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from -r requirements.txt (line 11)) (2.26.0)
Requirement already satisfied: sklearn in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from -r requirements.txt (line 12)) (0.0)
Requirement already satisfied: scipy in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from -r requirements.txt (line 13)) (1.7.3)
Requirement already satisfied: seaborn in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from -r requirements.txt (line 14)) (0.11.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->-r requirements.txt (line 5)) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib->-r requirements.txt (line 5)) (0.10.0)
Requirement already satisfied: setuptools-scm>=4 in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from matplotlib->-r requirements.txt (line 5)) (6.3.2)
Requirement already satisfied: packaging>=20.0 in /usr/lib/python3/dist-packages (from matplotlib->-r requirements.txt (line 5)) (20.3)
Requirement already satisfied: python-dateutil>=2.7 in /usr/lib/python3/dist-packages (from matplotlib->-r requirements.txt (line 5)) (2.7.3)
Requirement already satisfied: pillow>=6.2.0 in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from matplotlib->-r requirements.txt (line 5)) (8.4.0)
Requirement already satisfied: pyparsing>=2.2.1 in /usr/lib/python3/dist-packages (from matplotlib->-r requirements.txt (line 5)) (2.4.6)
Requirement already satisfied: fonttools>=4.22.0 in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from matplotlib->-r requirements.txt (line 5)) (4.28.2)
Requirement already satisfied: typing-extensions in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from torch->-r requirements.txt (line 8)) (3.10.0.0)
Requirement already satisfied: pytz>=2017.3 in /usr/lib/python3/dist-packages (from pandas->-r requirements.txt (line 10)) (2019.3)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->-r requirements.txt (line 11)) (2.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests->-r requirements.txt (line 11)) (2019.11.28)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from requests->-r requirements.txt (line 11)) (2.0.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3/dist-packages (from requests->-r requirements.txt (line 11)) (1.25.8)
Requirement already satisfied: scikit-learn in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from sklearn->-r requirements.txt (line 12)) (0.24.2)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from cycler>=0.10->matplotlib->-r requirements.txt (line 5)) (1.14.0)
Requirement already satisfied: tomli>=1.0.0 in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from setuptools-scm>=4->matplotlib->-r requirements.txt (line 5)) (1.2.2)
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from setuptools-scm>=4->matplotlib->-r requirements.txt (line 5)) (45.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from scikit-learn->sklearn->-r requirements.txt (line 12)) (2.1.0)
Requirement already satisfied: joblib>=0.11 in /home/INT/perrinet.l/.local/lib/python3.8/site-packages (from scikit-learn->sklearn->-r requirements.txt (line 12)) (1.0.1)
Note: you may need to restart the kernel to use updated packages.

In [2]:

%matplotlib inline
# uncommment to re-run training
#%rm -fr models
%mkdir -p DCNN_transfer_learning
%mkdir -p results
%mkdir -p models

Initialization of the libraries/variables¶

Our coding strategy is to build up a small library as a package of scripts in the DCNN_transfer_learning folder and to run all calls to that library from this notebook. This follows our previous work in which we benchmarked various DCNNs and which allowed us to select VGG16 network as a good compromise between performance and complexity.

First of all, a init.py script defines all our usefull variables like the new labels to learn, the number of training images or the root folder to use. Also, we import libraries to train the different networks and display the results.

In [1]:

scriptname = 'DCNN_transfer_learning/init.py'

In [2]:

%%writefile {scriptname}

# Importing libraries
import torch
import argparse
import json
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['xtick.labelsize'] = 18
plt.rcParams['ytick.labelsize'] = 18
import numpy as np
import os
import requests
import time
from math import log 

from time import strftime, gmtime
datetag = strftime("%Y-%m-%d", gmtime())

HOST, device = os.uname()[1], torch.device("cuda" if torch.cuda.is_available() else "cpu")

# to store results
import pandas as pd

def arg_parse():
    DEBUG = 25
    DEBUG = 1
    parser = argparse.ArgumentParser(description='DCNN_transfer_learning/init.py set root')
    parser.add_argument("--root", dest = 'root', help = "Directory containing images to perform the training",
                        default = 'data', type = str)
    parser.add_argument("--folders", dest = 'folders', help =  "Set the training, validation and testing folders relative to the root",
                        default = ['test', 'val', 'train'], type = list)
    parser.add_argument("--N_images", dest = 'N_images', help ="Set the number of images per classe in the train folder",
                        default = [400//DEBUG, 200//DEBUG, 800//DEBUG], type = list)
    parser.add_argument("--HOST", dest = 'HOST', help = "Set the name of your machine",
                    default=HOST, type = str)
    parser.add_argument("--datetag", dest = 'datetag', help = "Set the datetag of the result's file",
                    default = datetag, type = str)
    parser.add_argument("--image_size", dest = 'image_size', help = "Set the default image_size of the input",
                    default = 256)
    parser.add_argument("--image_sizes", dest = 'image_sizes', help = "Set the image_sizes of the input for experiment 2 (downscaling)",
                    default = [64, 128, 256, 512], type = list)
    parser.add_argument("--num_epochs", dest = 'num_epochs', help = "Set the number of epoch to perform during the traitransportationning phase",
                    default = 25//DEBUG)
    parser.add_argument("--batch_size", dest = 'batch_size', help="Set the batch size", default = 16)
    parser.add_argument("--lr", dest = 'lr', help="Set the learning rate", default = 0.0001)
    parser.add_argument("--momentum", dest = 'momentum', help="Set the momentum", default = 0.9)
    parser.add_argument("--beta2", dest = 'beta2', help="Set the second momentum - use zero for SGD", default = 0.)
    parser.add_argument("--subset_i_labels", dest = 'subset_i_labels', help="Set the labels of the classes (list of int)",
                    default = [945, 513, 886, 508, 786, 310, 373, 145, 146, 396], type = list)
    parser.add_argument("--class_loader", dest = 'class_loader', help = "Set the Directory containing imagenet downloaders class",
                        default = 'imagenet_label_to_wordnet_synset.json', type = str)
    parser.add_argument("--url_loader", dest = 'url_loader', help = "Set the file containing imagenet urls",
                        default = 'Imagenet_urls_ILSVRC_2016.json', type = str)
    parser.add_argument("--model_path", dest = 'model_path', help = "Set the path to the pre-trained model",
                        default = 'models/re-trained_', type = str)
    parser.add_argument("--model_names", dest = 'model_names', help = "Modes for the new trained networks",
                        default = ['vgg16_lin', 'vgg16_gen', 'vgg16_scale', 'vgg16_gray', 'vgg16_full'], type = list)
    return parser.parse_args()

args = arg_parse()
datetag = args.datetag
json_fname = os.path.join('results', datetag + '_config_args.json')
load_parse = False # False to custom the config

if load_parse:
    with open(json_fname, 'rt') as f:
        print(f'file {json_fname} exists: LOADING')
        override = json.load(f)
        args.__dict__.update(override)
else:
    print(f'Creating file {json_fname}')
    with open(json_fname, 'wt') as f:
        json.dump(vars(args), f, indent=4)
    
# matplotlib parameters
colors = ['b', 'r', 'k', 'g', 'm','y']
fig_width = 20
phi = (np.sqrt(5)+1)/2 # golden ratio for the figures :-)

#to plot & display 
def pprint(message): #display function
    print('-'*len(message))
    print(message)
    print('-'*len(message))
    
#DCCN training
print('On date', args.datetag, ', Running benchmark on host', args.HOST, ' with device', device.type)

# Labels Configuration
N_labels = len(args.subset_i_labels)

paths = {}
N_images_per_class = {}
for folder, N_image in zip(args.folders, args.N_images):
    paths[folder] = os.path.join(args.root, folder) # data path
    N_images_per_class[folder] = N_image
    os.makedirs(paths[folder], exist_ok=True)
    
with open(args.class_loader, 'r') as fp: # get all the classes on the data_downloader
    imagenet = json.load(fp)

# gathering labels
labels = []
class_wnids = []
reverse_id_labels = {}
for a, img_id in enumerate(imagenet):
    reverse_id_labels[str('n' + (imagenet[img_id]['id'].replace('-n','')))] = imagenet[img_id]['label'].split(',')[0]
    labels.append(imagenet[img_id]['label'].split(',')[0])
    if int(img_id) in args.subset_i_labels:
        class_wnids.append('n' + (imagenet[img_id]['id'].replace('-n','')))    
        
# a reverse look-up-table giving the index of a given label (within the whole set of imagenet labels)
reverse_labels = {}
for i_label, label in enumerate(labels):
    reverse_labels[label] = i_label
# a reverse look-up-table giving the index of a given i_label (within the sub-set of classes)
reverse_subset_i_labels = {}
for i_label, label in enumerate(args.subset_i_labels):
    reverse_subset_i_labels[label] = i_label
    
# a reverse look-up-table giving the label of a given index in the last layer of the new model (within the sub-set of classes)
subset_labels = []
pprint('List of Pre-selected classes : ')
# choosing the selected classes for recognition
for i_label, id_ in zip(args.subset_i_labels, class_wnids) : 
    subset_labels.append(labels[i_label])
    print('-> label', i_label, '=', labels[i_label], '\nid wordnet : ', id_)
subset_labels.sort()

Overwriting DCNN_transfer_learning/init.py

In [3]:

%run -int {scriptname}

Creating file results/2021-12-08_config_args.json
On date 2021-12-08 , Running benchmark on host neo-ope-de04  with device cuda
-------------------------------
List of Pre-selected classes : 
-------------------------------
-> label 945 = bell pepper 
id wordnet :  n02056570
-> label 513 = cornet 
id wordnet :  n02058221
-> label 886 = vending machine 
id wordnet :  n02219486
-> label 508 = computer keyboard 
id wordnet :  n02487347
-> label 786 = sewing machine 
id wordnet :  n02643566
-> label 310 = ant 
id wordnet :  n03085013
-> label 373 = macaque 
id wordnet :  n03110669
-> label 145 = king penguin 
id wordnet :  n04179913
-> label 146 = albatross 
id wordnet :  n04525305
-> label 396 = lionfish 
id wordnet :  n07720875

IPython CPU timings (estimated):
  User   :       1.67 s.
  System :       3.47 s.
Wall time:       1.26 s.

Download the `train` & `val` dataset¶

In the dataset.py, we use an archive of the Imagenet urls (from fall 2011) to populate datasets based on the pre-selected classes listed in the DCNN_transfer_learning/init.py file. The following script is inspired by previous work in our group.

In [4]:

scriptname = 'DCNN_transfer_learning/dataset.py'

In [5]:

%%writefile {scriptname}

from DCNN_transfer_learning.init import *  
verbose = False

with open(args.url_loader) as json_file:
    Imagenet_urls_ILSVRC_2016 = json.load(json_file)

def clean_list(list_dir, patterns=['.DS_Store']):
    for pattern in patterns:
        if pattern in list_dir: list_dir.remove('.DS_Store')
    return list_dir

import imageio
def get_image(img_url, timeout=3., min_content=3, verbose=verbose):
    try:
        img_resp = imageio.imread(img_url)
        if (len(img_resp.shape) < min_content):
            print(f"Url {img_url} does not have enough content")
            return False
        else:
            if verbose : print(f"Success with url {img_url}")
            return img_resp
    except Exception as e:
        if verbose : print(f"Failed with {e} for url {img_url}")
        return False # did not work

import hashlib # jah.
# root folder
os.makedirs(args.root, exist_ok=True)
# train, val and test folders
for folder in args.folders : 
    os.makedirs(paths[folder], exist_ok=True)
    
list_urls = {}
list_img_name_used = {}
for class_wnid in class_wnids:
    list_urls[class_wnid] =  Imagenet_urls_ILSVRC_2016[str(class_wnid)]
    np.random.shuffle(list_urls[class_wnid])
    list_img_name_used[class_wnid] = []

    # a folder per class in each train, val and test folder
    for folder in args.folders : 
        class_name = reverse_id_labels[class_wnid]
        class_folder = os.path.join(paths[folder], class_name)
        os.makedirs(class_folder, exist_ok=True)
        list_img_name_used[class_wnid] += clean_list(os.listdir(class_folder)) # join two lists
    
# train, val and test folders
for folder in args.folders : 
    print(f'Folder \"{folder}\"')

    filename = f'results/{datetag}_dataset_{folder}_{args.HOST}.json'
    columns = ['img_url', 'img_name', 'is_flickr', 'dt', 'worked', 'class_wnid', 'class_name']
    if os.path.isfile(filename):
        df_dataset = pd.read_json(filename)
    else:
        df_dataset = pd.DataFrame([], columns=columns)

    for class_wnid in class_wnids:
        class_name = reverse_id_labels[class_wnid]
        print(f'Scraping images for class \"{class_name}\"')
        class_folder = os.path.join(paths[folder], class_name)
        while (len(clean_list(os.listdir(class_folder))) < N_images_per_class[folder]) and (len(list_urls[class_wnid]) > 0):

            # pick and remove element from shuffled list 
            img_url = list_urls[class_wnid].pop()
            
            if len(df_dataset[df_dataset['img_url']==img_url])==0 : # we have not yet tested this URL yet
                # Transform URL into filename
                # https://laurentperrinet.github.io/sciblog/posts/2018-06-13-generating-an-unique-seed-for-a-given-filename.html
                img_name = hashlib.sha224(img_url.encode('utf-8')).hexdigest() + '.png'
                tic = time.time()
                if img_url.split('.')[-1] in ['.tiff', '.bmp', 'jpe', 'gif']:
                    if verbose: print('Bad extension for the img_url', img_url)
                    worked, dt = False, 0.
                # make sure it was not used in other folders
                elif not (img_name in list_img_name_used[class_wnid]):
                    img_content = get_image(img_url, verbose=verbose)
                    worked = img_content is not False
                    if worked:
                        if verbose : print('Good URl, now saving', img_url, ' in', class_folder, ' as', img_name)
                        imageio.imsave(os.path.join(class_folder, img_name), img_content, format='png')
                        list_img_name_used[class_wnid].append(img_name)
                df_dataset.loc[len(df_dataset.index)] = {'img_url':img_url, 'img_name':img_name, 'is_flickr':1 if 'flickr' in img_url else 0, 'dt':time.time() - tic,
                                'worked':worked, 'class_wnid':class_wnid, 'class_name':class_name}
                df_dataset.to_json(filename)
                print(f'\r{len(clean_list(os.listdir(class_folder)))} / {N_images_per_class[folder]}', end='\n' if verbose else '', flush=not verbose)

        if (len(clean_list(os.listdir(class_folder))) < N_images_per_class[folder]) and (len(list_urls[class_wnid]) == 0): 
            print('Not enough working url to complete the dataset') 
    df_dataset.to_json(filename)

Overwriting DCNN_transfer_learning/dataset.py

In [6]:

%run -int {scriptname}

Creating file results/2021-12-01_config_args.json
On date 2021-12-01 , Running benchmark on host neo-ope-de04  with device cuda
-------------------------------
List of Pre-selected classes : 
-------------------------------
-> label 945 = bell pepper 
id wordnet :  n02056570
-> label 513 = cornet 
id wordnet :  n02058221
-> label 886 = vending machine 
id wordnet :  n02219486
-> label 508 = computer keyboard 
id wordnet :  n02487347
-> label 786 = sewing machine 
id wordnet :  n02643566
-> label 310 = ant 
id wordnet :  n03085013
-> label 373 = macaque 
id wordnet :  n03110669
-> label 145 = king penguin 
id wordnet :  n04179913
-> label 146 = albatross 
id wordnet :  n04525305
-> label 396 = lionfish 
id wordnet :  n07720875
Folder "test"
Scraping images for class "king penguin"
Scraping images for class "albatross"
Scraping images for class "ant"
Scraping images for class "macaque"
Scraping images for class "lionfish"
Scraping images for class "computer keyboard"
Scraping images for class "cornet"
Scraping images for class "sewing machine"
Scraping images for class "vending machine"
Scraping images for class "bell pepper"
Folder "val"
Scraping images for class "king penguin"
Scraping images for class "albatross"
Scraping images for class "ant"
Scraping images for class "macaque"
Scraping images for class "lionfish"
Scraping images for class "computer keyboard"
Scraping images for class "cornet"
Scraping images for class "sewing machine"
Scraping images for class "vending machine"
Scraping images for class "bell pepper"
Folder "train"
Scraping images for class "king penguin"
Scraping images for class "albatross"
Scraping images for class "ant"
Scraping images for class "macaque"
Scraping images for class "lionfish"
Scraping images for class "computer keyboard"
Scraping images for class "cornet"
Scraping images for class "sewing machine"
Scraping images for class "vending machine"
Scraping images for class "bell pepper"

IPython CPU timings (estimated):
  User   :       0.28 s.
  System :       0.25 s.
Wall time:       0.52 s.

Let's plot some statistics for the scrapped images:

In [14]:

for folder in args.folders : 
    filename = f'results/{datetag}_dataset_{folder}_{args.HOST}.json'
    if os.path.isfile(filename):
        df_dataset = pd.read_json(filename)

        df_type = pd.DataFrame({'urls_type': [len(df_dataset[df_dataset['is_flickr']==1]), 
                                              len(df_dataset[df_dataset['is_flickr']==0])]},
                          index=['is_flickr', 'not_flikr'])
        df_flikr = pd.DataFrame({'not_flikr': [df_dataset[df_dataset['is_flickr']==0]['worked'].sum(), 
                                               (len(df_dataset[df_dataset['is_flickr']==0]) - df_dataset[df_dataset['is_flickr']==0]['worked'].sum())],
                                 'is_flickr': [df_dataset[df_dataset['is_flickr']==1]['worked'].sum(), 
                                               (len(df_dataset[df_dataset['is_flickr']==1]) - df_dataset[df_dataset['is_flickr']==1]['worked'].sum())],
                                'url': [len(df_dataset[df_dataset['worked']==1]), len(df_dataset[df_dataset['worked']==0])]},
                                  index=['worked', 'not_working'])

        fig, axes = plt.subplots(figsize=(12,12),nrows=2, ncols=2)
        fig.suptitle('Stats for the folder '+ folder + ' (' + str(len(df_dataset)) + ' attempts) :', size = 18)
        df_flikr["url"].plot(rot=0, ax=axes[0,0], kind='bar', grid=True, fontsize=14)
        axes[0,0].set_xlabel('All URLs', size=14)
        df_flikr["not_flikr"].plot(rot=0, ax=axes[1,1], kind='bar', grid=True, fontsize=14)
        axes[1,1].set_xlabel('Non flikr URLs', size=14)
        df_flikr["is_flickr"].plot(rot=0, ax=axes[1,0], kind='bar', grid=True, fontsize=14)
        axes[1,0].set_xlabel('Flikr URLs', size=14)
        df_type["urls_type"].plot(rot=0, ax=axes[0,1], kind='bar', grid=True, fontsize=14)
        axes[0,1].set_xlabel('Different types of URLs', size=14)
        
    else:
        print(f'The file {filename} is not available...')

No description has been provided for this image

Let's show some random images from each label :

In [23]:

import imageio
folder = 'test'
N_image_i = 5
plot_classes = {}
for class_wnid in class_wnids:
    class_name = reverse_id_labels[class_wnid]
    class_folder = os.path.join(paths[folder], class_name)
    plot_classes[class_name] = os.listdir(class_folder)
x = 0
fig, axs = plt.subplots(len(plot_classes), N_image_i, figsize=(fig_width, fig_width))
for ax, class_name in zip(axs, plot_classes):
    for i_image in np.arange(N_image_i):
        ax = axs[x][i_image]
        path = os.path.join(paths[folder], class_name, plot_classes[class_name][i_image])
        ax.imshow(imageio.imread(path))
        ax.set_xticks([])
        ax.set_yticks([])  
        if i_image%5 == 0:
            ax.set_ylabel(class_name)
    x +=1
fig.set_facecolor(color='white')

Transfer learning and dataset config¶

In the model.py script, we first define the transform functions for the datasets. To perform image augmentation, we apply the pyTorch AutoAugment function to the train and val dataset. Then, we load the pretrained models and store them in memory.

In [4]:

scriptname = 'DCNN_transfer_learning/model.py'

In [5]:

%%writefile {scriptname}

from DCNN_transfer_learning.init import *

import torchvision
from torchvision import datasets, models, transforms
from torchvision.datasets import ImageFolder
import torch.nn as nn

# normalization used to train VGG
# see https://pytorch.org/hub/pytorch_vision_vgg/
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
transforms_norm = transforms.Normalize(mean=mean, std=std) # to normalize colors on the imagenet dataset

import seaborn as sns
import sklearn.metrics
from scipy import stats
from scipy.special import logit

# VGG-16 datasets initialisation
def datasets_transforms(image_size=args.image_size, c=1, p=0, num_workers=1, batch_size=args.batch_size, **kwargs):
    data_transforms = {
        'train': transforms.Compose([
            transforms.Resize((int(image_size), int(image_size))),
            transforms.AutoAugment(), # https://pytorch.org/vision/master/transforms.html#torchvision.transforms.AutoAugment
            transforms.RandomGrayscale(p=p),
            transforms.ToTensor(),      # Convert the image to pyTorch Tensor data type.
            transforms_norm ]),

        'val': transforms.Compose([
            transforms.Resize((int(image_size), int(image_size))),
            transforms.AutoAugment(), # https://pytorch.org/vision/master/transforms.html#torchvision.transforms.AutoAugment
            transforms.RandomGrayscale(p=p),
            transforms.ToTensor(),      # Convert the image to pyTorch Tensor data type.
            transforms_norm ]),

        'test': transforms.Compose([
            transforms.Resize((int(image_size), int(image_size))),
            transforms.RandomGrayscale(p=p),
            transforms.ColorJitter(contrast=c), # https://pytorch.org/vision/0.8/_modules/torchvision/transforms/transforms.html#ColorJitter
            transforms.ToTensor(),      # Convert the image to pyTorch Tensor data type.
            transforms_norm ]),
    }

    image_datasets = {
        folder: datasets.ImageFolder(
            paths[folder], 
            transform=data_transforms[folder]
        )
        for folder in args.folders
    }

    dataloaders = {
        folder: torch.utils.data.DataLoader(
            image_datasets[folder], batch_size=batch_size,
            shuffle=False if folder == "test" else True, num_workers=num_workers
        )
        for folder in args.folders
    }

    dataset_sizes = {folder: len(image_datasets[folder]) for folder in args.folders}

    return dataset_sizes, dataloaders, image_datasets, data_transforms

(dataset_sizes, dataloaders, image_datasets, data_transforms) = datasets_transforms(image_size=args.image_size)

for folder in args.folders : print(f"Loaded {dataset_sizes[folder]} images under {folder}")
class_names = image_datasets['train'].classes
print("Classes: ", image_datasets['train'].classes)
n_output = len(os.listdir(paths['train']))

Overwriting DCNN_transfer_learning/model.py

In [6]:

%run -int {scriptname}

Creating file results/2021-12-08_config_args.json
On date 2021-12-08 , Running benchmark on host neo-ope-de04  with device cuda
-------------------------------
List of Pre-selected classes : 
-------------------------------
-> label 945 = bell pepper 
id wordnet :  n02056570
-> label 513 = cornet 
id wordnet :  n02058221
-> label 886 = vending machine 
id wordnet :  n02219486
-> label 508 = computer keyboard 
id wordnet :  n02487347
-> label 786 = sewing machine 
id wordnet :  n02643566
-> label 310 = ant 
id wordnet :  n03085013
-> label 373 = macaque 
id wordnet :  n03110669
-> label 145 = king penguin 
id wordnet :  n04179913
-> label 146 = albatross 
id wordnet :  n04525305
-> label 396 = lionfish 
id wordnet :  n07720875
Loaded 4002 images under test
Loaded 2088 images under val
Loaded 5331 images under train
Classes:  ['albatross', 'ant', 'bell pepper', 'computer keyboard', 'cornet', 'king penguin', 'lionfish', 'macaque', 'sewing machine', 'vending machine']

IPython CPU timings (estimated):
  User   :       1.36 s.
  System :       2.89 s.
Wall time:       0.58 s.

Training process¶

Finaly, we implement the training process in experiment_train.py, using a classic training script with pyTorch. For further statistical analyses, we extract factors (like the accuracy and loss) within a pandas object (a DataFrame).

In [7]:

scriptname = 'experiment_train.py'

In [8]:

%%writefile {scriptname}
from DCNN_transfer_learning.model import *

def train_model(model, num_epochs, dataloaders, lr=args.lr, momentum=args.momentum, beta2=args.beta2, log_interval=100, **kwargs):
    
    model.to(device)
    if beta2 > 0.: 
        optimizer = torch.optim.Adam(model.parameters(), lr=lr, betas=(momentum, beta2)) #, amsgrad=amsgrad)
    else:
        optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum) # to set training variables

    df_train = pd.DataFrame([], columns=['epoch', 'avg_loss', 'avg_acc', 'avg_loss_val', 'avg_acc_val', 'device_type']) 

    for epoch in range(num_epochs):
        loss_train = 0
        acc_train = 0
        for i, (images, labels) in enumerate(dataloaders['train']):
            images, labels = images.to(device), labels.to(device)
            
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)

            loss.backward()
            optimizer.step()

            loss_train += loss.item() * images.size(0)
            _, preds = torch.max(outputs.data, 1)
            acc_train += torch.sum(preds == labels.data)
            
        avg_loss = loss_train / dataset_sizes['train']
        avg_acc = acc_train / dataset_sizes['train']
           
        with torch.no_grad():
            loss_val = 0
            acc_val = 0
            for i, (images, labels) in enumerate(dataloaders['val']):
                images, labels = images.to(device), labels.to(device)

                outputs = model(images)
                loss = criterion(outputs, labels)

                loss_val += loss.item() * images.size(0)
                _, preds = torch.max(outputs.data, 1)
                acc_val += torch.sum(preds == labels.data)
        
            avg_loss_val = loss_val / dataset_sizes['val']
            avg_acc_val = acc_val / dataset_sizes['val']
        
        df_train.loc[epoch] = {'epoch':epoch, 'avg_loss':avg_loss, 'avg_acc':float(avg_acc),
                               'avg_loss_val':avg_loss_val, 'avg_acc_val':float(avg_acc_val), 'device_type':device.type}
        print(f"Epoch {epoch+1}/{num_epochs} : train= loss: {avg_loss:.4f} / acc : {avg_acc:.4f} - val= loss : {avg_loss_val:.4f} / acc : {avg_acc_val:.4f}")

    model.cpu()
    torch.cuda.empty_cache()
    return model, df_train

 
criterion = nn.CrossEntropyLoss()


# Training and saving the network

models_vgg = {}
opt = {}

models_vgg['vgg'] = torchvision.models.vgg16(pretrained=True)

# Downloading the model
model_filenames = {}
for model_name in args.model_names:
    model_filenames[model_name] = args.model_path + model_name + '.pt'
    filename = f'results/{datetag}_{args.HOST}_train_{model_name}.json'

    models_vgg[model_name] = torchvision.models.vgg16(pretrained=True)
    if model_name == 'vgg16_full':
        pass
    else:    
        for param in models_vgg[model_name].features.parameters():
            param.require_grad = False 

    if model_name == 'vgg16_lin':
        num_features = models_vgg[model_name].classifier[-1].out_features
        features = list(models_vgg[model_name].classifier.children())
        features.extend([nn.Linear(num_features, n_output)]) # Adding one layer on top of last layer
        models_vgg[model_name].classifier = nn.Sequential(*features)

    else : 
        num_features = models_vgg[model_name].classifier[-1].in_features
        features = list(models_vgg[model_name].classifier.children())[:-1] # Remove last layer
        features.extend([nn.Linear(num_features, n_output)]) # Add our layer with 10 outputs
        models_vgg[model_name].classifier = nn.Sequential(*features) # Replace the model classifier

    if os.path.isfile(model_filenames[model_name]):
        print("Loading pretrained model for..", model_name, ' from', model_filenames[model_name])
        if device.type == 'cuda':
            models_vgg[model_name].load_state_dict(torch.load(model_filenames[model_name])) #on GPU
        else:
            models_vgg[model_name].load_state_dict(torch.load(model_filenames[model_name], map_location=torch.device('cpu'))) #on CPU

    else:
        print("Re-training pretrained model...", model_filenames[model_name])
        since = time.time()

        p = 1 if model_name == 'vgg16_gray' else 0
        if model_name =='vgg16_scale':
            df_train = None
            for image_size_ in args.image_sizes: # starting with low resolution images 
                print(f"Traning {model_name}, image_size = {image_size_}, p (Grayscale) = {p}")
                (dataset_sizes, dataloaders, image_datasets, data_transforms) = datasets_transforms(image_size=image_size_, p=p)
                models_vgg[model_name], df_train_ = train_model(models_vgg[model_name], num_epochs=args.num_epochs//len(args.image_sizes),
                                                             dataloaders=dataloaders)
                df_train = df_train_ if df_train is None else df_train.append(df_train_, ignore_index=True)
        else :
            print(f"Traning {model_name}, image_size = {args.image_size}, p (Grayscale) = {p}")
            (dataset_sizes, dataloaders, image_datasets, data_transforms) = datasets_transforms(image_size=args.image_size, p=p)
            models_vgg[model_name], df_train = train_model(models_vgg[model_name], num_epochs=args.num_epochs,
                                                        dataloaders=dataloaders)
        torch.save(models_vgg[model_name].state_dict(), model_filenames[model_name])
        df_train.to_json(filename)
        elapsed_time = time.time() - since
        print(f"Training completed in {elapsed_time // 60:.0f}m {elapsed_time % 60:.0f}s")
        print()

Overwriting experiment_train.py

In [9]:

%run -int {scriptname}

Loaded 4002 images under test
Loaded 2088 images under val
Loaded 5331 images under train
Classes:  ['albatross', 'ant', 'bell pepper', 'computer keyboard', 'cornet', 'king penguin', 'lionfish', 'macaque', 'sewing machine', 'vending machine']
Loading pretrained model for.. vgg16_lin  from models/re-trained_vgg16_lin.pt
Loading pretrained model for.. vgg16_gen  from models/re-trained_vgg16_gen.pt
Loading pretrained model for.. vgg16_scale  from models/re-trained_vgg16_scale.pt
Loading pretrained model for.. vgg16_gray  from models/re-trained_vgg16_gray.pt
Loading pretrained model for.. vgg16_full  from models/re-trained_vgg16_full.pt

IPython CPU timings (estimated):
  User   :      28.51 s.
  System :       6.44 s.
Wall time:      11.24 s.

Here we display both average accuracy and loss during the training phase and during the validation one :

In [99]:

for model_name in args.model_names:
    filename = f'results/{datetag}_{args.HOST}_train_{model_name}.json'
    df_train = pd.read_json(filename)
    fig, axs = plt.subplots(figsize=(fig_width, fig_width/phi/2))
    ax = df_train['avg_loss'].plot(lw=2, marker='.', markersize=10)
    ax = df_train['avg_loss_val'].plot(lw=2, marker='.', markersize=10)
    ax.legend(["avg_loss", "avg_loss_val"], fontsize=18);
    ax.set_xlabel("Epoch", size=18)
    ax.spines['left'].set_position(('axes', -0.01))
    ax.set_xlim(-0.5, args.num_epochs)
    ax.grid(which='both')
    for side in ['top', 'right'] :ax.spines[side].set_visible(False)
    ax.set_ylim(0., 1.1)
    axs.set_title(f'Average values of the loss by epoch : {filename}' , size = 20)
    ax.get_legend().remove()
    fig.legend(bbox_to_anchor=(1.05, .5), loc='lower right', fontsize = 20)

In [100]:

for model_name in args.model_names:
    filename = f'results/{datetag}_{args.HOST}_train_{model_name}.json'
    df_train = pd.read_json(filename)
    fig, axs = plt.subplots(figsize=(fig_width, fig_width/phi/2))
    ax = df_train['avg_acc'].plot(lw=2, marker='.', markersize=10)
    ax = df_train['avg_acc_val'].plot(lw=2, marker='.', markersize=10)
    ax.legend(["avg_acc", "avg_acc_val"], fontsize=18);
    ax.set_xlabel("Epoch", size=18)
    ax.spines['left'].set_position(('axes', -0.01))
    ax.set_ylim(0.70, .992)
    ax.set_yscale("logit", one_half="1/2", use_overline=True)
    ax.grid(which='both')
    ax.set_xlim(-0.5, args.num_epochs+.5)
    for side in ['top', 'right'] :ax.spines[side].set_visible(False)
    axs.set_title(f'Average values of the accuracy by epoch : {filename}' , size = 20)
    ax.get_legend().remove()
    fig.legend(bbox_to_anchor=(1.05, .5), loc='lower right', fontsize=20)

Bonus: Scan of some parameters¶

If there is some GPU time time left, let's try to meta-optimize some parameters by testing how accuracy would vary. To avoid potential over-fitting problems, we perform that test on the val validation set which is separate from the test and train datasets.

In [11]:

scriptname = 'experiment_scan.py'

In [12]:

%%writefile {scriptname}

#import model's script and set the output file
from DCNN_transfer_learning.model import *

scan_dicts= {'batch_size' : [8, 13, 21, 34, 55],
             'lr': args.lr * np.logspace(-1, 1, 7, base=10),
             'momentum': 1 - np.logspace(-3, -.5, 7, base=10),
             'beta2': 1 - np.logspace(-5, -1, 7, base=10),
            }

def main(N_avg=10, num_epochs=args.num_epochs//4):
    from experiment_train import train_model

    for key in scan_dicts:
        filename = f'results/{datetag}_train_scan_{key}_{args.HOST}.json'
        print(f'{filename=}')
        if os.path.isfile(filename):
            df_scan = pd.read_json(filename)
        else:
            i_trial = 0
            measure_columns = [key, 'avg_loss_val', 'avg_acc_val', 'time']

            df_scan = pd.DataFrame([], columns=measure_columns) 
            for i_trial, value in enumerate(scan_dicts[key]):
                new_kwarg = {key: value}
                print('trial', i_trial, ' /', len(scan_dicts[key]))
                print('new_kwarg', new_kwarg)
                # Training and saving the network
                models_vgg_ = torchvision.models.vgg16(pretrained=True)
                # Freeze training for all layers
                # Newly created modules have require_grad=True by default
                for param in models_vgg_.features.parameters():
                    param.require_grad = False 

                num_features = models_vgg_.classifier[-1].in_features
                features = list(models_vgg_.classifier.children())[:-1] # Remove last layer
                features.extend([nn.Linear(num_features, n_output)]) # Add our layer with `n_output` outputs
                models_vgg_.classifier = nn.Sequential(*features) # Replace the model classifier

                since = time.time()

                (dataset_sizes, dataloaders, image_datasets, data_transforms) = datasets_transforms(image_size=args.image_size, p=0, **new_kwarg)
                models_vgg_, df_train = train_model(models_vgg_, num_epochs=num_epochs, dataloaders=dataloaders, **new_kwarg)

                elapsed_time = time.time() - since
                print(f"Training completed in {elapsed_time // 60:.0f}m {elapsed_time % 60:.0f}s")

                df_scan.loc[i_trial] = {key:value, 'avg_loss_val':df_train.iloc[-N_avg:-1]['avg_loss_val'].mean(), 
                                   'avg_acc_val':df_train.iloc[-N_avg:-1]['avg_acc_val'].mean(), 'time':elapsed_time}
                print(df_scan.loc[i_trial])
                i_trial += 1
            df_scan.to_json(filename)

main()

Overwriting experiment_scan.py

In [13]:

%run -int {scriptname}

IPython CPU timings (estimated):
  User   :       0.01 s.
  System :       0.00 s.
Wall time:       0.01 s.

In [14]:

for key in scan_dicts:
    filename = f'results/{datetag}_train_scan_{key}_{args.HOST}.json'
    print(filename)
    df_scan = pd.read_json(filename)
    print(df_scan)

results/2021-12-01_train_scan_batch_size_neo-ope-de04.json
   batch_size  avg_loss_val  avg_acc_val         time
0           8      0.126422     0.966556  5061.051990
1          13      0.121382     0.966889  4700.749532
2          21      0.117780     0.968000  4668.656569
3          34      0.118205     0.963889  4789.453682
4          55      0.120869     0.963333  4832.274955
results/2021-12-01_train_scan_lr_neo-ope-de04.json
         lr  avg_loss_val  avg_acc_val         time
0  0.000010      0.144348     0.953778  4776.644311
1  0.000022      0.127897     0.960167  4757.568244
2  0.000046      0.121650     0.963889  4778.429414
3  0.000100      0.111028     0.968167  4790.580558
4  0.000215      0.128553     0.966444  4807.239716
5  0.000464      0.139715     0.962389  4841.201148
6  0.001000      0.159941     0.960333  4843.080728
results/2021-12-01_train_scan_momentum_neo-ope-de04.json
   momentum  avg_loss_val  avg_acc_val         time
0  0.999000      1.700792     0.427444  4791.569364
1  0.997390      0.373107     0.908056  4792.389380
2  0.993187      0.222885     0.947667  4756.617264
3  0.982217      0.144883     0.964056  4720.620919
4  0.953584      0.120019     0.968278  4687.429672
5  0.878847      0.119867     0.964667  4751.313942
6  0.683772      0.116867     0.963167  4763.222023
results/2021-12-01_train_scan_beta2_neo-ope-de04.json
      beta2  avg_loss_val  avg_acc_val         time
0  0.999990      0.318622     0.918444  4817.681090
1  0.999954      0.394988     0.903278  4789.693172
2  0.999785      0.375904     0.913611  4791.000414
3  0.999000      0.517510     0.890778  4792.347842
4  0.995358      0.610045     0.899056  4818.462842
5  0.978456      1.191684     0.785556  4864.483053
6  0.900000      2.190897     0.474056  4873.013371

In [101]:

subplotpars = matplotlib.figure.SubplotParams(left=0.1, right=.95, bottom=0.25, top=.975, hspace=.6)

dfs_ = {}
for key in scan_dicts:
    filename = f'results/{datetag}_train_scan_{key}_{args.HOST}.json'
    dfs_[str(key)]  = pd.read_json(filename)

fig, axs = plt.subplots(len(dfs_), 1, figsize=(fig_width, fig_width*len(dfs_)/(phi*2)), subplotpars=subplotpars)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.tick_params(axis='both', which='major', labelsize=10)

for ax, df_train, key in zip(axs, dfs_, scan_dicts):
    ax.plot(range(len(scan_dicts[key])), dfs_[df_train]["avg_acc_val"], alpha=0.5, lw=2, marker='.')
    ax.set_ylabel(f"Accuracy for {key}", size=18)
    ax.set_xlabel(f"Parameter : {key}", size=18)
    ax.set_xticks(range(len(scan_dicts[key])))
    ax.set_xticklabels([f'{s:.4f}' for s in scan_dicts[key]], rotation=60, size = 20)
    ax.spines['left'].set_position(('axes', -0.01))
    ax.set_ylim(0.40, .99)
    ax.set_yscale("logit", one_half="1/2", use_overline=True)
    ax.grid(which='both')
    for side in ['top', 'right'] :ax.spines[side].set_visible(False)
    #ax.get_legend().remove()
axs[0].set_title(f'Average values of the accuracy for different parameters :' , size = 20);

In [103]:

dfs_ = {}
for key in scan_dicts:
    filename = f'results/{datetag}_train_scan_{key}_{args.HOST}.json'
    dfs_[str(key)]  = pd.read_json(filename)
fig, axs = plt.subplots(len(dfs_), 1, figsize=(fig_width, fig_width*len(dfs_)/(phi*2)), subplotpars=subplotpars)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.tick_params(axis='both', which='major', labelsize=10)
for ax, df_train, key in zip(axs, dfs_, scan_dicts):
    ax.plot(range(len(scan_dicts[key])), dfs_[df_train]["avg_loss_val"], alpha=0.5, lw=2, marker='.')
    ax.set_ylabel(f"Loss value for {key}", size=18)    
    ax.set_xlabel(f"Parameter :{key}", size= 16)
    ax.set_xticks(range(len(scan_dicts[key])))
    ax.set_xticklabels([f'{s:.4f}' for s in scan_dicts[key]], rotation=60, size = 20)
    ax.grid(which='both')
    for side in ['top', 'right'] :ax.spines[side].set_visible(False)
axs[0].set_title(f'Average values of the accuracy for different parameters :' , size = 20);

These results are useful to fine-tune the parameters in order to maximise the efficiency of the transfer learning method.

Experiment 1: Image processing and recognition for differents labels¶

The networks are now ready for a quantitative evaluation. The second part of this notebook offers a comparison between:

A pre-trained image recognition's networks, here VGG16 network, trained on the Imagenet dataset wich allows to work on naturals images for $1000$ labels, taken from the torchvision.models library
And five re-trained version of the same network VGG16 network based on a reduced Imagenet dataset wich allows to focus on naturals images from $10$ labels.

For further statistical analyses, we extract these differents factors (like the accuracy and the processing time for differents datasets at differents resolution) in a pandas.DataFrame object.

In [1]:

scriptname = 'experiment_basic.py'

In [2]:

%%writefile {scriptname}

#import model's script and set the output file
from experiment_train import *
filename = f'results/{datetag}_results_1_{args.HOST}.json'
print(f'{filename=}')

def main():
    if os.path.isfile(filename):
        df = pd.read_json(filename)
    else:
        i_trial = 0
        df = pd.DataFrame([], columns=['model', 'likelihood', 'fps', 'time', 'label', 'i_label', 'i_image', 'filename', 'device_type', 'top_1']) 
        (dataset_sizes, dataloaders, image_datasets, data_transforms) = datasets_transforms(image_size=args.image_size, batch_size=1)
        
        for i_image, (data, label) in enumerate(dataloaders['test']):            
            data, label = data.to(device), label.to(device)
            for model_name in models_vgg.keys():
                model = models_vgg[model_name]
                model = model.to(device)

                with torch.no_grad():
                    i_label_top = reverse_labels[image_datasets['test'].classes[label]]
                    tic = time.time()
                    out = model(data).squeeze(0)
                    _, indices = torch.sort(out, descending=True)
                    if model_name == 'vgg' : # our previous work
                        top_1 = labels[indices[0]]
                        percentage = torch.nn.functional.softmax(out[args.subset_i_labels], dim=0) * 100
                        likelihood = percentage[reverse_subset_i_labels[i_label_top]].item()
                    else :
                        top_1 = subset_labels[indices[0]] 
                        percentage = torch.nn.functional.softmax(out, dim=0) * 100
                        likelihood = percentage[label].item()
                    elapsed_time = time.time() - tic
                    
                print(f'The {model_name} model get {labels[i_label_top]} at {likelihood:.2f} % confidence in {elapsed_time:.3f} seconds, best confidence for : {top_1}')
                df.loc[i_trial] = {'model':model_name, 'likelihood':likelihood, 'time':elapsed_time, 'fps': 1/elapsed_time,
                                   'label':labels[i_label_top], 'i_label':i_label_top, 
                                   'i_image':i_image, 'filename':image_datasets['test'].imgs[i_image][0], 'device_type':device.type, 'top_1':top_1}
                i_trial += 1
        df.to_json(filename)

main()

Overwriting experiment_basic.py

In [3]:

%run -int {scriptname}

Creating file results/2021-12-06_config_args.json
On date 2021-12-06 , Running benchmark on host neo-ope-de04  with device cuda
-------------------------------
List of Pre-selected classes : 
-------------------------------
-> label 945 = bell pepper 
id wordnet :  n02056570
-> label 513 = cornet 
id wordnet :  n02058221
-> label 886 = vending machine 
id wordnet :  n02219486
-> label 508 = computer keyboard 
id wordnet :  n02487347
-> label 786 = sewing machine 
id wordnet :  n02643566
-> label 310 = ant 
id wordnet :  n03085013
-> label 373 = macaque 
id wordnet :  n03110669
-> label 145 = king penguin 
id wordnet :  n04179913
-> label 146 = albatross 
id wordnet :  n04525305
-> label 396 = lionfish 
id wordnet :  n07720875
Loaded 3992 images under test
Loaded 2088 images under val
Loaded 5318 images under train
Classes:  ['albatross', 'ant', 'bell pepper', 'computer keyboard', 'cornet', 'king penguin', 'lionfish', 'macaque', 'sewing machine', 'vending machine']
Loading pretrained model for.. vgg16_lin  from models/re-trained_vgg16_lin.pt
Loading pretrained model for.. vgg16_gen  from models/re-trained_vgg16_gen.pt
Loading pretrained model for.. vgg16_scale  from models/re-trained_vgg16_scale.pt
Loading pretrained model for.. vgg16_gray  from models/re-trained_vgg16_gray.pt
Loading pretrained model for.. vgg16_full  from models/re-trained_vgg16_full.pt
filename='results/2021-12-06_results_1_neo-ope-de04.json'

IPython CPU timings (estimated):
  User   :      24.46 s.
  System :      12.62 s.
Wall time:      12.31 s.

Here we collect our results, we can already display all the data in a table

In [4]:

filename = f'results/{datetag}_results_1_{args.HOST}.json'
df = pd.read_json(filename)
df

Out[4]:

	model	likelihood	fps	time	label	i_label	i_image	filename	device_type	top_1
0	vgg	99.994598	26.714631	0.037433	albatross	146	0	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
1	vgg16_lin	99.995316	142.882098	0.006999	albatross	146	0	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
2	vgg16_gen	99.837700	153.238976	0.006526	albatross	146	0	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
3	vgg16_scale	99.953819	151.973043	0.006580	albatross	146	0	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
4	vgg16_gray	99.920662	151.309668	0.006609	albatross	146	0	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
...	...	...	...	...	...	...	...	...	...	...
23947	vgg16_lin	99.999512	148.151037	0.006750	vending machine	886	3991	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
23948	vgg16_gen	99.928513	149.593552	0.006685	vending machine	886	3991	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
23949	vgg16_scale	99.999535	163.215192	0.006127	vending machine	886	3991	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
23950	vgg16_gray	99.740562	164.715049	0.006071	vending machine	886	3991	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
23951	vgg16_full	99.999130	164.134930	0.006093	vending machine	886	3991	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine

23952 rows × 10 columns

Image display¶

We first display the best categorizations as ranked by likelihood, all models combined :

In [5]:

import imageio
N_image_i = 8
N_image_j = 8
fig, axs = plt.subplots(N_image_i, N_image_j, figsize=(fig_width*1.3, fig_width))
for i_image, idx in enumerate(df.sort_values(by=['likelihood'], ascending=False).head(N_image_i*N_image_j).index):
    ax = axs[i_image%N_image_i][i_image//N_image_i]
    img_address = image_datasets['test'].imgs[df.loc[idx]['i_image']][0]
    ax.imshow(imageio.imread(img_address))
    ax.set_xticks([])
    ax.set_yticks([])
    color = 'g' if df.loc[idx]['top_1'] == df.loc[idx]['label'] else 'r'
    ax.set_xlabel(df.loc[idx]['top_1'] + ' | ' + df.loc[idx]['model'], color=color)
    likelihood = df.loc[idx]['likelihood']
    ax.set_ylabel(f'P ={likelihood:2.3f}%', color=color)
fig.set_facecolor(color='white')

Then we display the worst categorizations as ranked by likelihood, all models combined :

In [6]:

import imageio
N_image_i = 8
N_image_j = 8
fig, axs = plt.subplots(N_image_i, N_image_j, figsize=(fig_width*1.3, fig_width))
for i_image, idx in enumerate(df.sort_values(by=['likelihood'], ascending=True).head(N_image_i*N_image_j).index):
    ax = axs[i_image%N_image_i][i_image//N_image_i]
    img_address = image_datasets['test'].imgs[df.loc[idx]['i_image']][0]
    ax.imshow(imageio.imread(img_address))
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_xlabel(df.loc[idx]['top_1'] + ' | ' + df.loc[idx]['model'], color='r')
    likelihood = df.loc[idx]['likelihood']
    ax.set_ylabel(df.loc[idx]['label'], color='r')
fig.set_facecolor(color='white')

Accuracy, Precision Recall & F1 Score¶

We now compute the top-1 accuracy (which is a metric that describes how the model performs across all classes, here top 1 because we only take the best likelihood at the output of the networks), the precision (which reflects how reliable the model is in classifying samples as Positive) and the recall (which measures the model's ability to detect Positive samples) of each networks. We use the sklearn librairy to perform this analysis.

In [7]:

from sklearn.metrics import accuracy_score, precision_score, f1_score

df_precision = pd.DataFrame({model_name: {subset_label: precision_score(df[(df['model']==model_name) & (df['label']==subset_label)]["top_1"], 
                                                                  df[(df['model']==model_name) & (df['label']==subset_label)]["label"],
                                                                 average='micro')
                                    for subset_label in subset_labels} 
                       for model_name in models_vgg.keys()})

ax = df_precision.plot.bar(rot=60, figsize=(fig_width, fig_width//4), fontsize=18)
ax.set_ylim(0, 1)
ax.hlines(xmin=-.5, xmax=len(subset_labels)-.5, y=1/n_output, ls='--', ec='k', label='chance level')
plt.legend(bbox_to_anchor=(1.1, .35), loc='lower right')
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
ax.set_title('Precision for each models - experiment 1', size=20)
ax.set_ylabel('Precision', size=20)
ax.set_xlabel('Label', size=20);

In [8]:

df_f1_score = pd.DataFrame({model_name: {subset_label: f1_score(df[(df['model']==model_name) & (df['label']==subset_label)]["top_1"], 
                                                                df[(df['model']==model_name) & (df['label']==subset_label)]["label"],
                                                                average='micro')
                                    for subset_label in subset_labels} 
                       for model_name in models_vgg.keys()})

ax = df_f1_score.plot.bar(rot=60, figsize=(fig_width, fig_width//4), fontsize=18)
ax.set_ylim(0, 1)
ax.hlines(xmin=-.5, xmax=len(subset_labels)-.5, y=1/n_output, ls='--', ec='k', label='chance level')
plt.legend(bbox_to_anchor=(1.1, .35), loc='lower right')
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
ax.set_title('F1-score for each models - experiment 1', size=20)
ax.set_ylabel('F1-score', size=20)
ax.set_xlabel('Label', size=20);

In [9]:

df_acc = pd.DataFrame({model_name: {subset_label: accuracy_score(df[(df['model']==model_name) & (df['label']==subset_label)]["top_1"], 
                                                                 df[(df['model']==model_name) & (df['label']==subset_label)]["label"])
                                    for subset_label in subset_labels} 
                       for model_name in models_vgg.keys()})

ax = df_acc.plot.bar(rot=60, figsize=(fig_width, fig_width//4), fontsize=18)
ax.set_ylim(0, 1)
ax.hlines(xmin=-.5, xmax=len(subset_labels)-.5, y=1/n_output, ls='--', ec='k', label='chance level')
plt.legend(bbox_to_anchor=(1.1, .35), loc='lower right')
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
ax.set_title('Accuracy top_1 : for each models - experiment 1', size=20)
ax.set_ylabel('Accuracy', size=20)
ax.set_xlabel('Label', size=20);

Evidence form accuracy and likelihood¶

These graphs show the frequency of the logit of the categorization likelihood for our five models and also for the original VGG16 network. The categorization likelihood represents the predicted likelihood of detection for a given label at the output of the network. In addition, I display the accuracies of our networks using the logit ("logistic unit") function wich is the inverse of the logistic sigmoid function. Where the logistic function converts evidence into probabilities, its inverse converts probabilities into evidence, a metric wich appears naturally in Bayesian statistics. Then, as most of them are close either to 100 as to 0, we used the Hartley unit) to quantify the difference in performance between these networks.

In [10]:

df_acc = pd.DataFrame({model_name: {subset_label: (logit(accuracy_score(df[(df['model']==model_name) & (df['label']==subset_label)]["top_1"], 
                                                                 df[(df['model']==model_name) & (df['label']==subset_label)]["label"])))/log(10)
                                    for subset_label in subset_labels} 
                       for model_name in models_vgg.keys()})

ax = df_acc.plot.bar(rot=60, figsize=(fig_width, fig_width//4), fontsize=18)

plt.legend(bbox_to_anchor=(1.1, .35), loc='lower right')
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
ax.set_title('Evidence compute from the accuracy of each label: for each models - experiment 1', size=20)
ax.set_ylabel('Evidence (decibans)', size=20)
ax.set_xlabel('Label', size=20);

In [11]:

fig, axs = plt.subplots(len(models_vgg.keys()), 1, figsize=(fig_width, fig_width*phi/2), sharex=True, sharey=True)
for ax, color, model_name in zip(axs, colors, models_vgg.keys()):
    ax.set_ylabel('Frequency', fontsize=14)
    ((logit((df[df['model']==model_name]['likelihood']/100)/(1-(df[df['model']==model_name]['likelihood']/100))))/log(10)).plot.hist(bins=np.linspace(-4, 4, 100), lw=1, label=model_name, ax=ax, color=color, density=True)
    ax.legend(loc='upper left', fontsize=20)
    ax.vlines(ymin=0, ymax=1, x=(logit(.1/.9))/log(10), ls='--', ec='k', label='chance level')
    ax.get_legend().remove()
axs[-1].set_xlabel('Evidence (Hartley)', size=18)
axs[0].set_title('Distribution of the likelihood. Processed on : ' + args.HOST + '_' + str(df['device_type'][0]), size = 20);
fig.legend(bbox_to_anchor=(1.06, .35), loc='lower right', fontsize=20);

In [12]:

df_acc = pd.DataFrame({'accuracy': [accuracy_score(df[df['model']==model_name]["top_1"], df[df['model']==model_name]["label"]) for model_name in models_vgg.keys()]}, index=models_vgg.keys())
ax = df_acc.plot.bar(rot=0, figsize=(fig_width, fig_width//4), fontsize=18)
ax.set_ylim(0, 1)
ax.hlines(xmin=-.5, xmax=len(models_vgg.keys())-.5, y=1/n_output, ls='--', ec='k', label='chance level')
# https://matplotlib.org/stable/gallery/lines_bars_and_markers/bar_label_demo.html
ax.bar_label(ax.containers[0], padding=-24, color='black', fontsize=14, fmt='%.3f')
plt.legend(bbox_to_anchor=(1.1, .5), loc='lower right')
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
ax.set_title('Average accuracy top_1 : for each models - experiment 1', size=20)
ax.set_xlabel('Model', size=20);

In [13]:

acc = []
lik = []
for model_name in models_vgg.keys():
    acc.append(accuracy_score(df[df['model']==model_name]["top_1"], df[df['model']==model_name]["label"]))
    lik.append((np.mean(df[df['model']==model_name]["likelihood"]))/100)

df_test = pd.DataFrame({"accuracy": acc, "mean likelihood": lik}, index = models_vgg.keys())
ax = df_test.plot.bar(rot=0, figsize=(fig_width, fig_width//4), fontsize=18)
ax.set_ylim(0, 1)
ax.hlines(xmin=-.5, xmax=len(models_vgg.keys())-.5, y=1/n_output, ls='--', ec='k', label='chance level')
# https://matplotlib.org/stable/gallery/lines_bars_and_markers/bar_label_demo.html
for container in ax.containers: ax.bar_label(container, padding=-50, color='black', fontsize=14, fmt='%.3f', rotation=90)
plt.legend(bbox_to_anchor=(1.1, .5), loc='lower right')
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
ax.set_title(f'Experiment 1 - Accuracy vs Mean likelihood', size=20)
ax.set_xlabel('Model', size=20);
plt.show();

Notice that the VGG16 network seems over confident as his mean likelihood ($= 0.937$) is greater than his actual accuracy ($= 0.722$) while our re-trained models seem to get accuracies that match their mean likelihoods.

Overall, with all the metrics used in this part of the notebook, our five re-trained networks seems more efficient than the VGG16 network on the newly defined task. For the end of this notebook, we will focus on the likelihood and the f1 score of our networks. It is defined as the harmonic mean of the model’s precision and recall and thus conveniently combines these measures.

Computation time¶

A display of the differents computation time of each models on the same dataset for the sequence of trials :

In [14]:

fig, axs = plt.subplots(len(models_vgg.keys()), 1, figsize=(fig_width, fig_width*phi/2), sharex=True, sharey=True)
for ax, color, model_name in zip(axs, colors, models_vgg.keys()):
    ax.set_ylabel('Frequency', fontsize=14)
    df[df['model']==model_name]['time'].plot.hist(bins=350, lw=1, label=model_name,ax=ax, color=color, density=True)
    ax.set_xlim(df['time'].quantile(.01), df['time'].quantile(.99))
    ax.legend(bbox_to_anchor=(1.19, .5), loc='lower right', fontsize=20)
    ax.grid(which='both', axis='y')
    ax.set_xlabel('Processing time (s): ' + model_name, size=22)
axs[0].set_title('Distribution of the Processing time (s). Processed on : ' + args.HOST + '_' + str(df['device_type'][0]), size = 20);

Summary¶

The re-trained networks and the VGG16 network gets close results for most of the parameters extrated from the experiment 1, their difference rely on the accuracy of the categorization, as the VGG16 network is over confident compared to our networks. To make it even clearer we extracted a specific mean for each models :

Mean F1 score

In [15]:

for model_name in models_vgg.keys():
    mean_f1_score = f1_score(df[df['model']==model_name]["top_1"] , df[df['model']==model_name]["label"], average='micro')
    print(f'For the {model_name} model, the mean f1 score = {mean_f1_score*100:.4f} %' )

For the vgg model, the mean f1 score = 71.9689 %
For the vgg16_lin model, the mean f1 score = 96.0421 %
For the vgg16_gen model, the mean f1 score = 95.8667 %
For the vgg16_scale model, the mean f1 score = 95.8166 %
For the vgg16_gray model, the mean f1 score = 96.1673 %
For the vgg16_full model, the mean f1 score = 96.1423 %

Mean categorization likelihood

In [16]:

for model_name in models_vgg.keys():
    med_likelihood = np.mean(df[df['model']==model_name]["likelihood"])
    print(f'For the {model_name} model, the mean clasification likelihood = {med_likelihood:.4f} %' )

For the vgg model, the mean clasification likelihood = 93.6520 %
For the vgg16_lin model, the mean clasification likelihood = 95.4406 %
For the vgg16_gen model, the mean clasification likelihood = 95.2496 %
For the vgg16_scale model, the mean clasification likelihood = 95.2119 %
For the vgg16_gray model, the mean clasification likelihood = 95.1933 %
For the vgg16_full model, the mean clasification likelihood = 95.1924 %

Mean computation time

In [17]:

for model_name in models_vgg.keys():
    med_likelihood = np.mean(df[df['model']==model_name]["time"])
    print(f'For the {model_name} model, the mean computation time = {med_likelihood:.5f} s')

For the vgg model, the mean computation time = 0.00613 s
For the vgg16_lin model, the mean computation time = 0.00612 s
For the vgg16_gen model, the mean computation time = 0.00604 s
For the vgg16_scale model, the mean computation time = 0.00607 s
For the vgg16_gray model, the mean computation time = 0.00606 s
For the vgg16_full model, the mean computation time = 0.00605 s

Mean frame per second

In [18]:

for model_name in models_vgg.keys():
    med_likelihood = np.mean(df[df['model']==model_name]["fps"])
    print(f'For the {model_name} model, the mean fps = {med_likelihood:.3f} Hz' )

For the vgg model, the mean fps = 164.520 Hz
For the vgg16_lin model, the mean fps = 165.491 Hz
For the vgg16_gen model, the mean fps = 166.931 Hz
For the vgg16_scale model, the mean fps = 166.828 Hz
For the vgg16_gray model, the mean fps = 166.881 Hz
For the vgg16_full model, the mean fps = 167.084 Hz

Experiment 2: Image processing and recognition for differents resolutions¶

In order to infer on the robustness of our networks and the impact of the dataset transformation during the learning process I study that same indicators at different image resolutions.

In [11]:

scriptname = 'experiment_downsample.py'

In [12]:

%%writefile {scriptname}
#import model's script and set the output file
from experiment_train import *
filename = f'results/{datetag}_results_2_{args.HOST}.json'
print(f'{filename=}')

def main():
    if os.path.isfile(filename):
        df_downsample = pd.read_json(filename)
    else:
        i_trial = 0
        df_downsample = pd.DataFrame([], columns=['model', 'likelihood', 'fps', 'time', 'label', 'i_label', 'i_image', 'image_size', 'filename', 'device_type', 'top_1']) 
        # image preprocessing
        for image_size_ in args.image_sizes:
            (dataset_sizes, dataloaders, image_datasets, data_transforms) = datasets_transforms(image_size=image_size_, batch_size=1)
            print(f'Résolution de {image_size_=}')
            # Displays the input image of the model 
            for i_image, (data, label) in enumerate(dataloaders['test']):                
                data, label = data.to(device), label.to(device)

                for model_name in models_vgg.keys():
                    model = models_vgg[model_name]
                    model = model.to(device)

                    with torch.no_grad():
                        i_label_top = reverse_labels[image_datasets['test'].classes[label]]
                        tic = time.time()
                        out = model(data).squeeze(0)
                        _, indices = torch.sort(out, descending=True)
                        if model_name == 'vgg' : # our previous work
                            top_1 = labels[indices[0]]
                            percentage = torch.nn.functional.softmax(out[args.subset_i_labels], dim=0) * 100
                            likelihood = percentage[reverse_subset_i_labels[i_label_top]].item()
                        else :
                            top_1 = subset_labels[indices[0]] 
                            percentage = torch.nn.functional.softmax(out, dim=0) * 100
                            likelihood = percentage[label].item()
                        dt = time.time() - tic
                    #print(f'The {model_name} model get {labels[i_label_top]} at {likelihood:.2f} % confidence in {dt:.3f} seconds, best confidence for : {top_1}')
                    df_downsample.loc[i_trial] = {'model':model_name, 'likelihood':likelihood, 'time':dt, 'fps': 1/dt,
                                       'label':labels[i_label_top], 'i_label':i_label_top, 
                                       'i_image':i_image, 'filename':image_datasets['test'].imgs[i_image][0], 'image_size': image_size_, 'device_type':device.type, 'top_1':str(top_1)}
                    i_trial += 1

            df_downsample.to_json(filename)

main()

Overwriting experiment_downsample.py

In [30]:

%run -int {scriptname}

Résolution de image_size_=64
Résolution de image_size_=128
Résolution de image_size_=256
Résolution de image_size_=512

IPython CPU timings (estimated):
  User   :    2441.52 s.
  System :       9.80 s.
Wall time:    2456.94 s.

Here, again, we collect our results, and display all the data in a table

In [31]:

filename = f'results/{datetag}_results_2_{args.HOST}.json'
df_downsample = pd.read_json(filename)
df_downsample

Out[31]:

	model	likelihood	fps	time	label	i_label	i_image	image_size	filename	device_type	top_1
0	vgg	96.485046	167.651451	0.005965	albatross	146	0	64	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	red-backed sandpiper
1	vgg16_lin	99.991264	135.694080	0.007370	albatross	146	0	64	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
2	vgg16_gen	99.261597	270.547894	0.003696	albatross	146	0	64	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
3	vgg16_scale	99.991653	189.530230	0.005276	albatross	146	0	64	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
4	vgg16_gray	99.998032	312.424879	0.003201	albatross	146	0	64	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
...	...	...	...	...	...	...	...	...	...	...	...
95803	vgg16_lin	94.555542	55.092524	0.018151	vending machine	886	3991	512	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
95804	vgg16_gen	99.633514	55.148301	0.018133	vending machine	886	3991	512	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
95805	vgg16_scale	99.899689	55.216545	0.018111	vending machine	886	3991	512	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
95806	vgg16_gray	97.880707	55.205644	0.018114	vending machine	886	3991	512	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
95807	vgg16_full	99.325089	55.124382	0.018141	vending machine	886	3991	512	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine

95808 rows × 11 columns

Image display¶

The 64 worsts categorization likelihood, all models and sizes combined :

In [32]:

import imageio
N_image_i = 8
N_image_j = 8
fig, axs = plt.subplots(N_image_i, N_image_j, figsize=(fig_width*1.3, fig_width))
for i_image, idx in enumerate(df_downsample.sort_values(by=['likelihood'], ascending=True).head(N_image_i*N_image_j).index):
    ax = axs[i_image%N_image_i][i_image//N_image_i]
    ax.imshow(imageio.imread(image_datasets['test'].imgs[df_downsample.loc[idx]['i_image']][0]))
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_xlabel(df_downsample.loc[idx]['top_1'] + ' | ' + df_downsample.loc[idx]['model'] + ' | ' + str(df_downsample.loc[idx]['image_size']), color='r')
    label = df_downsample.loc[idx]['label']
    ax.set_ylabel(f'True= {label}', color='r')
fig.set_facecolor(color='white')

F1 score¶

And extract the f1 score for each networks :

In [33]:

for image_size in args.image_sizes:
    df_acc = pd.DataFrame({model_name: {subset_label: f1_score(df_downsample[(df_downsample['model']==model_name) & (df_downsample['image_size']==image_size)]["top_1"], 
                                                               df_downsample[(df_downsample['model']==model_name) & (df_downsample['image_size']==image_size)]["label"],
                                                               average = 'micro')
                                        for subset_label in subset_labels} 
                           for model_name in models_vgg.keys()})

    ax = df_acc.plot.bar(rot=60, figsize=(fig_width, fig_width//4), fontsize=18)
    ax.set_ylim(0, 1)
    ax.hlines(xmin=-.5, xmax=len(subset_labels)-.5, y=1/n_output, ls='--', ec='k', label='chance level')
    plt.legend(bbox_to_anchor=(1.1, .35), loc='lower right')
    ax.grid(which='both', axis='y')
    for side in ['top', 'right'] :ax.spines[side].set_visible(False)
    ax.set_title(f'F1-score for each models - experiment 2 - image size = {image_size}', size=20)
    ax.set_ylabel('F1-score', size=20)
    ax.set_xlabel('Label', size=20);
    plt.show();

In [34]:

df_acc = pd.DataFrame({model_name: {image_size: f1_score(df_downsample[(df_downsample['model']==model_name) & (df_downsample['image_size']==image_size)]["top_1"], 
                                                         df_downsample[(df_downsample['model']==model_name) & (df_downsample['image_size']==image_size)]["label"],
                                                         average='micro')
                                    for image_size in args.image_sizes} 
                       for model_name in models_vgg.keys()})

ax = df_acc.plot.bar(rot=0, figsize=(fig_width, fig_width//4), fontsize=18)
ax.set_ylim(0, 1)
ax.hlines(xmin=-.5, xmax=len(models_vgg.keys())-.5, y=1/n_output, ls='--', ec='k', label='chance level')
# https://matplotlib.org/stable/gallery/lines_bars_and_markers/bar_label_demo.html
for container in ax.containers: ax.bar_label(container, padding=-50, color='black', fontsize=14, fmt='%.3f', rotation=90)
plt.legend(bbox_to_anchor=(1.1, .35), loc='lower right')
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
ax.set_title(f'Experiment 2 - F1 score at different image size for each models ', size=20)
ax.set_ylabel('F1 score', size=20)
ax.set_xlabel('Image size', size=20)
plt.show();

In [35]:

ax = df_acc.T.plot.bar(rot=0, figsize=(fig_width, fig_width//4), fontsize=18)
ax.set_ylim(0, 1)
ax.hlines(xmin=-.5, xmax=len(models_vgg.keys())-.5, y=1/n_output, ls='--', ec='k', label='chance level')
# https://matplotlib.org/stable/gallery/lines_bars_and_markers/bar_label_demo.html
for container in ax.containers: ax.bar_label(container, padding=-50, color='black', fontsize=14, fmt='%.3f', rotation=90)
plt.legend(bbox_to_anchor=(1.1, .35), loc='lower right')
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
ax.set_title(f'Experiment 2 - F1 score for each models at different image size', size=20)
ax.set_ylabel('F1 score', size=20)
ax.set_xlabel('Model', size=20)
plt.show();

Computation time¶

A display of the differents computation time of each models on the same dataset for differents resolutions :

In [36]:

fig, axs = plt.subplots(figsize=(fig_width, fig_width/phi))
for color, model_name in zip(colors, models_vgg.keys()):
    axs = sns.violinplot(x="image_size", y="time", data=df_downsample, inner="quartile", hue='model')
    axs.set_title('Processing time (s) for each network at different image size. Processed on : ' + args.HOST + '_' + str(df_downsample['device_type'][0]), size = 20)
    axs.set_ylabel('Computation time (s)', size=18)
    axs.set_xlabel('Image size', size=18)
    axs.set_yscale('log')
    axs.grid(which='both', axis='y')
    for side in ['top', 'right'] :axs.spines[side].set_visible(False)
h, l = axs.get_legend_handles_labels()
axs.legend(h[:5], l[:5], loc='upper center', fontsize=16);

Categorization likelihood¶

Let's display the likelihood of each models on the same dataset for differents resolutions. Here accuracies are displayed as a violin plot to allow a better representation of the models.

In [37]:

fig, axs = plt.subplots(figsize=(fig_width, fig_width/phi))
axs = sns.violinplot(x="image_size", y="likelihood", data=df_downsample, inner="quartile", hue='model', cut = 0, scale = 'width')
axs.set_title('Categorization likelihood for each network at different image size. Processed on : ' + args.HOST + '_' + str(df_downsample['device_type'][0]), size=20)
axs.set_ylabel('Categorization likelihood (%)', size=18)
axs.set_xlabel('Image size', size=18)
axs.legend(bbox_to_anchor=(1.2, .45), loc='lower right', fontsize = 20)
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
h, l = axs.get_legend_handles_labels()

Summary¶

We observe that for all networks combined, the best F1 score is reached for the $256 \times 256$ resolution and decreases whenever we increase ($512 \times 512$ pixels) or decrease ($64 \times 64$ pixels and $128 \times 128$ pixels) this variable. This is certainly due to the fact that the VGG16 networks were pre-trained on a dataset of $234 \times 234$ pixels images. The F1 score of the pre-trained networks is better than the VGG16 network. Among the five retrain networks the VGG_Scale is the one that shows the most stable F1 score. Another impact of the variation of the image size in input is the computation time necessary to perform the detection. As we increase the size by a factor of $8$ (from $64$ to $512$, and therefore a pixel number factor of $64$), the computation time increases by a factor about $4$ for all the networks

Mean f1 score

In [38]:

for model_name in models_vgg.keys():
    pprint(f'Benchmarking model {model_name}')
    for image_size in args.image_sizes:
        df_ = df_downsample[(df_downsample['model']==model_name) & (df_downsample['image_size']==image_size)]
        mean_f1_score = f1_score(df_["top_1"] , df_["label"] , average = 'micro')
        print(f'For size {image_size}, the mean f1 score = {mean_f1_score*100:.4f} %' )

----------------------
Benchmarking model vgg
----------------------
For size 64, the mean f1 score = 19.1884 %
For size 128, the mean f1 score = 55.7866 %
For size 256, the mean f1 score = 72.4699 %
For size 512, the mean f1 score = 62.8758 %
----------------------------
Benchmarking model vgg16_lin
----------------------------
For size 64, the mean f1 score = 74.4739 %
For size 128, the mean f1 score = 92.5100 %
For size 256, the mean f1 score = 96.5431 %
For size 512, the mean f1 score = 93.7876 %
----------------------------
Benchmarking model vgg16_gen
----------------------------
For size 64, the mean f1 score = 75.9269 %
For size 128, the mean f1 score = 92.8106 %
For size 256, the mean f1 score = 95.8918 %
For size 512, the mean f1 score = 93.4870 %
------------------------------
Benchmarking model vgg16_scale
------------------------------
For size 64, the mean f1 score = 77.6553 %
For size 128, the mean f1 score = 91.5581 %
For size 256, the mean f1 score = 95.6914 %
For size 512, the mean f1 score = 94.9649 %
-----------------------------
Benchmarking model vgg16_gray
-----------------------------
For size 64, the mean f1 score = 73.4469 %
For size 128, the mean f1 score = 92.3096 %
For size 256, the mean f1 score = 95.9168 %
For size 512, the mean f1 score = 94.2134 %
-----------------------------
Benchmarking model vgg16_full
-----------------------------
For size 64, the mean f1 score = 74.6493 %
For size 128, the mean f1 score = 92.8106 %
For size 256, the mean f1 score = 96.0421 %
For size 512, the mean f1 score = 94.8397 %

Mean categorization likelihood

In [39]:

for model_name in models_vgg.keys():
    pprint(f'Benchmarking model {model_name}')
    for image_size in args.image_sizes:
        med_likelihood = np.mean(df_downsample[(df_downsample['model']==model_name) & (df_downsample['image_size']==image_size)]["likelihood"])
        print(f'For size {image_size}, the mean clasification likelihood = {med_likelihood:.5f} %' )

----------------------
Benchmarking model vgg
----------------------
For size 64, the mean clasification likelihood = 71.30679 %
For size 128, the mean clasification likelihood = 90.60969 %
For size 256, the mean clasification likelihood = 94.11937 %
For size 512, the mean clasification likelihood = 89.01307 %
----------------------------
Benchmarking model vgg16_lin
----------------------------
For size 64, the mean clasification likelihood = 73.72051 %
For size 128, the mean clasification likelihood = 92.13664 %
For size 256, the mean clasification likelihood = 95.78246 %
For size 512, the mean clasification likelihood = 91.08758 %
----------------------------
Benchmarking model vgg16_gen
----------------------------
For size 64, the mean clasification likelihood = 75.01264 %
For size 128, the mean clasification likelihood = 92.37035 %
For size 256, the mean clasification likelihood = 95.28685 %
For size 512, the mean clasification likelihood = 89.37263 %
------------------------------
Benchmarking model vgg16_scale
------------------------------
For size 64, the mean clasification likelihood = 77.01949 %
For size 128, the mean clasification likelihood = 91.17951 %
For size 256, the mean clasification likelihood = 95.13552 %
For size 512, the mean clasification likelihood = 92.68679 %
-----------------------------
Benchmarking model vgg16_gray
-----------------------------
For size 64, the mean clasification likelihood = 72.49181 %
For size 128, the mean clasification likelihood = 91.75696 %
For size 256, the mean clasification likelihood = 95.12077 %
For size 512, the mean clasification likelihood = 89.65246 %
-----------------------------
Benchmarking model vgg16_full
-----------------------------
For size 64, the mean clasification likelihood = 73.41183 %
For size 128, the mean clasification likelihood = 92.25266 %
For size 256, the mean clasification likelihood = 95.37998 %
For size 512, the mean clasification likelihood = 90.24870 %

Mean computation time

In [40]:

for model_name in models_vgg.keys():
    pprint(f'Benchmarking model {model_name}')
    for image_size in args.image_sizes:
        med_likelihood = np.mean(df_downsample[(df_downsample['model']==model_name) & (df_downsample['image_size']==image_size)]["time"])
        print(f'For size {image_size}, the mean computation time = {med_likelihood:.3f} s' )

----------------------
Benchmarking model vgg
----------------------
For size 64, the mean computation time = 0.003 s
For size 128, the mean computation time = 0.004 s
For size 256, the mean computation time = 0.006 s
For size 512, the mean computation time = 0.017 s
----------------------------
Benchmarking model vgg16_lin
----------------------------
For size 64, the mean computation time = 0.003 s
For size 128, the mean computation time = 0.004 s
For size 256, the mean computation time = 0.006 s
For size 512, the mean computation time = 0.017 s
----------------------------
Benchmarking model vgg16_gen
----------------------------
For size 64, the mean computation time = 0.003 s
For size 128, the mean computation time = 0.004 s
For size 256, the mean computation time = 0.006 s
For size 512, the mean computation time = 0.017 s
------------------------------
Benchmarking model vgg16_scale
------------------------------
For size 64, the mean computation time = 0.003 s
For size 128, the mean computation time = 0.004 s
For size 256, the mean computation time = 0.006 s
For size 512, the mean computation time = 0.017 s
-----------------------------
Benchmarking model vgg16_gray
-----------------------------
For size 64, the mean computation time = 0.003 s
For size 128, the mean computation time = 0.004 s
For size 256, the mean computation time = 0.006 s
For size 512, the mean computation time = 0.017 s
-----------------------------
Benchmarking model vgg16_full
-----------------------------
For size 64, the mean computation time = 0.003 s
For size 128, the mean computation time = 0.004 s
For size 256, the mean computation time = 0.006 s
For size 512, the mean computation time = 0.017 s

Mean frame per second

In [41]:

for model_name in models_vgg.keys():
    pprint(f'Benchmarking model {model_name}')
    for image_size in args.image_sizes:
        med_likelihood = np.mean(df_downsample[(df_downsample['model']==model_name) & (df_downsample['image_size']==image_size)]["fps"])
        print(f'For size {image_size}, the mean fps = {med_likelihood:.3f} Hz' )

----------------------
Benchmarking model vgg
----------------------
For size 64, the mean fps = 304.122 Hz
For size 128, the mean fps = 271.853 Hz
For size 256, the mean fps = 155.893 Hz
For size 512, the mean fps = 57.927 Hz
----------------------------
Benchmarking model vgg16_lin
----------------------------
For size 64, the mean fps = 310.492 Hz
For size 128, the mean fps = 276.000 Hz
For size 256, the mean fps = 156.795 Hz
For size 512, the mean fps = 58.022 Hz
----------------------------
Benchmarking model vgg16_gen
----------------------------
For size 64, the mean fps = 314.709 Hz
For size 128, the mean fps = 279.379 Hz
For size 256, the mean fps = 157.803 Hz
For size 512, the mean fps = 58.190 Hz
------------------------------
Benchmarking model vgg16_scale
------------------------------
For size 64, the mean fps = 315.299 Hz
For size 128, the mean fps = 279.953 Hz
For size 256, the mean fps = 157.960 Hz
For size 512, the mean fps = 58.180 Hz
-----------------------------
Benchmarking model vgg16_gray
-----------------------------
For size 64, the mean fps = 315.056 Hz
For size 128, the mean fps = 280.027 Hz
For size 256, the mean fps = 157.838 Hz
For size 512, the mean fps = 58.159 Hz
-----------------------------
Benchmarking model vgg16_full
-----------------------------
For size 64, the mean fps = 316.148 Hz
For size 128, the mean fps = 280.455 Hz
For size 256, the mean fps = 157.872 Hz
For size 512, the mean fps = 58.186 Hz

Experiment 3: Image processing and recognition on grayscale images¶

Again, another analysis of the robustness using the likelihood indicators but now with a grayscale transformation and compare them with the likelihood indicators from experiment 1.

In [13]:

scriptname = 'experiment_grayscale.py'

In [14]:

%%writefile {scriptname}

#import model's script and set the output file
from experiment_train import *
filename = f'results/{datetag}_results_3_{args.HOST}.json'
print(f'{filename=}')

def main():
    if os.path.isfile(filename):
        df_gray = pd.read_json(filename)
    else:
        i_trial = 0
        df_gray = pd.DataFrame([], columns=['model', 'likelihood', 'fps', 'time', 'label', 'i_label', 'i_image', 'filename', 'device_type', 'top_1']) 
        # image preprocessing setting a grayscale output
        (dataset_sizes, dataloaders, image_datasets, data_transforms) = datasets_transforms(image_size=args.image_size, p=1, batch_size=1)

        # Displays the input image of the model 
        for i_image, (data, label) in enumerate(dataloaders['test']):
            data, label = data.to(device), label.to(device)

            for model_name in models_vgg.keys():
                model = models_vgg[model_name]
                model = model.to(device)

                with torch.no_grad():
                    i_label_top = reverse_labels[image_datasets['test'].classes[label]]
                    tic = time.time()
                    out = model(data).squeeze(0)
                    if model_name == 'vgg' :
                        percentage = torch.nn.functional.softmax(out[args.subset_i_labels], dim=0) * 100
                        _, indices = torch.sort(out, descending=True)
                        top_1 = labels[indices[0]]
                        likelihood = percentage[reverse_subset_i_labels[i_label_top]].item()
                    else :
                        percentage = torch.nn.functional.softmax(out, dim=0) * 100
                        _, indices = torch.sort(out, descending=True)
                        top_1 = subset_labels[indices[0]] 
                        likelihood = percentage[label].item()
                dt = time.time() - tic
                df_gray.loc[i_trial] = {'model':model_name, 'likelihood':likelihood, 'time':dt, 'fps': 1/dt,
                                   'label':labels[i_label_top], 'i_label':i_label_top, 
                                   'i_image':i_image, 'filename':image_datasets['test'].imgs[i_image][0], 'device_type':device.type, 'top_1':str(top_1)}
                print(f'The {model_name} model get {labels[i_label_top]} at {likelihood:.2f} % confidence in {dt:.3f} seconds, best confidence for : {top_1}')
                i_trial += 1
        df_gray.to_json(filename)

main()

Overwriting experiment_grayscale.py

In [44]:

%run -int {scriptname}

IPython CPU timings (estimated):
  User   :       0.18 s.
  System :       0.00 s.
Wall time:       0.18 s.

Collecting all the results, displaying all the data in a table

In [45]:

filename = f'results/{datetag}_results_3_{args.HOST}.json'
df_gray = pd.read_json(filename)
df_gray

Out[45]:

	model	likelihood	fps	time	label	i_label	i_image	filename	device_type	top_1
0	vgg	99.161537	15.158363	0.065970	albatross	146	0	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	goose
1	vgg16_lin	99.995735	142.707087	0.007007	albatross	146	0	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
2	vgg16_gen	99.977547	143.951127	0.006947	albatross	146	0	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
3	vgg16_scale	99.864014	142.001693	0.007042	albatross	146	0	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
4	vgg16_gray	99.967918	132.467044	0.007549	albatross	146	0	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
...	...	...	...	...	...	...	...	...	...	...
23947	vgg16_lin	99.999252	156.387174	0.006394	vending machine	886	3991	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
23948	vgg16_gen	99.999962	157.769569	0.006338	vending machine	886	3991	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
23949	vgg16_scale	99.999855	157.757701	0.006339	vending machine	886	3991	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
23950	vgg16_gray	99.945679	157.219582	0.006361	vending machine	886	3991	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
23951	vgg16_full	99.989235	157.650968	0.006343	vending machine	886	3991	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine

23952 rows × 10 columns

Image display¶

The 64 worsts categorization likelihood, all models combined :

In [46]:

import imageio
N_image_i = 8
N_image_j = 8
fig, axs = plt.subplots(N_image_i, N_image_j, figsize=(fig_width*1.3, fig_width))
for i_image, idx in enumerate(df_gray.sort_values(by=['likelihood'], ascending=True).head(N_image_i*N_image_j).index):
    ax = axs[i_image%N_image_i][i_image//N_image_i]
    ax.imshow(imageio.imread(image_datasets['test'].imgs[df_gray.loc[idx]['i_image']][0], pilmode="L"), cmap='gray')
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_xlabel('Top 1 :' + df_gray.loc[idx]['top_1'] + ' | ' + df_gray.loc[idx]['model'], color='r')
    label = df_gray.loc[idx]['label']
    ax.set_ylabel(f'True= {label}', color='r')
fig.set_facecolor(color='white')

F1 score¶

Mean f1 score

In [47]:

df_f1_score = pd.DataFrame({model_name: {subset_label: f1_score(df_gray[(df_gray['model']==model_name) & (df_gray['label']==subset_label)]["top_1"], 
                                                                df_gray[(df_gray['model']==model_name) & (df_gray['label']==subset_label)]["label"],
                                                                average='micro')
                                    for subset_label in subset_labels} 
                       for model_name in models_vgg.keys()})

ax = df_f1_score.plot.bar(rot=60, figsize=(fig_width, fig_width//4), fontsize=18)
ax.set_ylim(0, 1)
ax.hlines(xmin=-.5, xmax=len(subset_labels)-.5, y=1/n_output, ls='--', ec='k', label='chance level')
plt.legend(bbox_to_anchor=(1.1, .35), loc='lower right')
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
ax.set_title('F1-score for each models - experiment 1', size=20)
ax.set_ylabel('F1-score', size=20)
ax.set_xlabel('Label', size=20);

In [48]:

from sklearn.metrics import accuracy_score, precision_score, f1_score
df_acc = pd.DataFrame({model_name: {label: f1_score(df_[(df_['model']==model_name)]["top_1"], 
                                                               df_[(df_['model']==model_name)]["label"],
                                                   average='micro')
                                    for label, df_ in zip(['original', 'gray'], [df, df_gray])} 
                       for model_name in models_vgg.keys()})

ax = df_acc.T.plot.bar(rot=0, figsize=(fig_width, fig_width//4), fontsize=18)
ax.set_ylim(0, 1)
ax.hlines(xmin=-.5, xmax=len(models_vgg.keys())-.5, y=1/n_output, ls='--', ec='k', label='chance level')
# https://matplotlib.org/stable/gallery/lines_bars_and_markers/bar_label_demo.html
for container in ax.containers: ax.bar_label(container, padding=-50, color='black', fontsize=14, fmt='%.3f', rotation=90)
plt.legend(bbox_to_anchor=(1.1, .5), loc='lower right')
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
ax.set_title(f'Experiment 3 - color vs gray images', size=20)
ax.set_ylabel('F1 Score', size=14)
plt.show();

Computation time¶

A display of the differents computation time of each models on the same dataset for a single resolution :

In [49]:

fig, axs = plt.subplots(len(models_vgg.keys()), 1, figsize=(fig_width, fig_width*phi/2))
for color, df_, label, legend in zip(['gray', 'red'], [df_gray, df], ['black', 'color'], ['Grayscale', 'Regular']):
    for ax, model_name in zip(axs, models_vgg.keys()):
        ax.set_ylabel('Frequency', fontsize=20) 
        df_[df_['model']==model_name]['time'].plot.hist(bins=150, lw=1, label=str(legend+ ' ' + model_name), ax=ax, color=color, density=True)
        ax.legend(loc='upper right', fontsize=20)
        ax.set_xlim(df_gray['time'].quantile(.01), df_gray['time'].quantile(.99))
        ax.legend(bbox_to_anchor=(1.15, .5), loc='lower right')
axs[-1].set_xlabel('Processing time (s)', size=18)
axs[0].set_title('Processed on : ' + args.HOST + '_' + str(df['device_type'][0]), size = 20);

Categorization likelihood¶

Let's analyze the categorization likelihood of each models on the same dataset for color versus grayscale images. Here likelihood's are displayed as a violin plot to allow a better representation of the models.

In [50]:

import seaborn as sns

fig, axs = plt.subplots(figsize=(fig_width, fig_width/phi**2))
for color, df_, label in zip(['gray', 'red'], [df_gray, df], ['black', 'color']):
    axs = sns.violinplot(x="model", y="likelihood", data=df_, inner="quartile", cut=0, color=color, alpha=.5, scale = 'width', label = color)
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
axs.set_title('Categorization likelihood for each network in color and grayscale. Processed on : ' + args.HOST + '_' + str(df_['device_type'][0]), size=20)
axs.set_ylabel('Categorization likelihood (%)', size=18)
axs.set_xlabel('Model', size=18)
fig.legend(['Color (Red)', 'Grayscale (Gray)'], bbox_to_anchor=(1.06, .45), fontsize=18,loc='center right');

Summary¶

The F1 score for all networks are affected by the use of grayscale images, even the VGG_Gray, although the latter is the most robust. This result may originate from the fact that it was pre-trained on RGB images. However, the performance remains largely above the chance level, and this for all networks, which shows that, without colors, the networks can still discriminate the $10$ classes in most cases. For some images, this information is necessary for the discrimination, a need that can be partly compensated by transfer learning in the VGG_Gray network.

Mean F1 score

In [51]:

for model_name in models_vgg.keys():
    mean_f1_score_orig = f1_score(df[df['model']==model_name]["top_1"] , df[df['model']==model_name]["label"], average='micro')
    mean_f1_score = f1_score(df_gray[df_gray['model']==model_name]["top_1"] , df_gray[df_gray['model']==model_name]["label"], average='micro')
    print(f'For the {model_name} model, the mean clasification likelihood = {mean_f1_score*100:.5f} % (color = {mean_f1_score_orig*100:.5f} % )' )

For the vgg model, the mean clasification likelihood = 56.11222 % (color = 71.96894 % )
For the vgg16_lin model, the mean clasification likelihood = 91.33267 % (color = 96.04208 % )
For the vgg16_gen model, the mean clasification likelihood = 92.71042 % (color = 95.86673 % )
For the vgg16_scale model, the mean clasification likelihood = 91.28257 % (color = 95.81663 % )
For the vgg16_gray model, the mean clasification likelihood = 95.04008 % (color = 96.16733 % )
For the vgg16_full model, the mean clasification likelihood = 92.18437 % (color = 96.14228 % )

Mean categorization likelihood

In [52]:

for model_name in models_vgg.keys():
    med_likelihood_orig = np.mean(df[df['model']==model_name]["likelihood"])
    med_likelihood = np.mean(df_gray[df_gray['model']==model_name]["likelihood"])
    print(f'For the {model_name} model, the mean clasification likelihood = {med_likelihood:.5f} % (color = {med_likelihood_orig:.5f} % )' )
    print(stats.ttest_1samp(df_gray[df_gray['model']==model_name]["likelihood"], np.mean(df[df['model']==model_name]["likelihood"])))

For the vgg model, the mean clasification likelihood = 87.33926 % (color = 93.65200 % )
Ttest_1sampResult(statistic=-14.649980053455742, pvalue=2.2429526996649943e-47)
For the vgg16_lin model, the mean clasification likelihood = 90.26159 % (color = 95.44061 % )
Ttest_1sampResult(statistic=-12.95031355669835, pvalue=1.3259253690031956e-37)
For the vgg16_gen model, the mean clasification likelihood = 91.33653 % (color = 95.24963 % )
Ttest_1sampResult(statistic=-10.437591406415853, pvalue=3.5127506993701646e-25)
For the vgg16_scale model, the mean clasification likelihood = 90.04138 % (color = 95.21186 % )
Ttest_1sampResult(statistic=-12.394191316766264, pvalue=1.2100071819816907e-34)
For the vgg16_gray model, the mean clasification likelihood = 93.94850 % (color = 95.19329 % )
Ttest_1sampResult(statistic=-3.9455179894920955, pvalue=8.099664974423112e-05)
For the vgg16_full model, the mean clasification likelihood = 90.30311 % (color = 95.19237 % )
Ttest_1sampResult(statistic=-12.409216564045165, pvalue=1.0101014943635101e-34)

Mean computation time

In [53]:

for model_name in models_vgg.keys():
    med_likelihood_orig = np.mean(df[df['model']==model_name]["time"])
    med_likelihood = np.mean(df_gray[df_gray['model']==model_name]["time"])
    print(f'For the {model_name} model, the mean computation time = {med_likelihood:.4f} s (color = {med_likelihood_orig:.4f} s )' )

For the vgg model, the mean computation time = 0.0064 s (color = 0.0061 s )
For the vgg16_lin model, the mean computation time = 0.0062 s (color = 0.0061 s )
For the vgg16_gen model, the mean computation time = 0.0062 s (color = 0.0060 s )
For the vgg16_scale model, the mean computation time = 0.0062 s (color = 0.0061 s )
For the vgg16_gray model, the mean computation time = 0.0062 s (color = 0.0061 s )
For the vgg16_full model, the mean computation time = 0.0062 s (color = 0.0061 s )

Mean frame per second

In [54]:

for model_name in models_vgg.keys():
    med_likelihood_orig = np.mean(df[df['model']==model_name]["fps"])
    med_likelihood = np.mean(df_gray[df_gray['model']==model_name]["fps"])
    print(f'For the {model_name} model, the mean fps = {med_likelihood:.3f} Hz (color = {med_likelihood_orig:.3f} Hz )' )

For the vgg model, the mean fps = 158.050 Hz (color = 164.520 Hz )
For the vgg16_lin model, the mean fps = 161.456 Hz (color = 165.491 Hz )
For the vgg16_gen model, the mean fps = 162.464 Hz (color = 166.931 Hz )
For the vgg16_scale model, the mean fps = 162.417 Hz (color = 166.828 Hz )
For the vgg16_gray model, the mean fps = 162.607 Hz (color = 166.881 Hz )
For the vgg16_full model, the mean fps = 162.751 Hz (color = 167.084 Hz )

Experiment 4: Image processing and recognition on contrasted images¶

Again, same likelihood indicators but now with a contrast filter.

In [1]:

scriptname = 'experiment_contrast.py'

In [2]:

%%writefile {scriptname}
#import model's script and set the output file
from experiment_train import *
filename = f'results/{datetag}_results_4_{args.HOST}.json'
print(f'{filename=}')

def main():
    if os.path.isfile(filename):
        df_contrast = pd.read_json(filename)
    else:
        i_trial = 0
        df_contrast = pd.DataFrame([], columns=['model', 'likelihood', 'fps', 'time', 'label', 'i_label', 'i_image', 'contrast', 'filename', 'device_type', 'top_1']) 
        # image preprocessing
        for contrast in np.arange(1,110,10):
            (dataset_sizes, dataloaders, image_datasets, data_transforms) = datasets_transforms(c=contrast, batch_size=1)
            print(f'Contrast de {contrast=}')
            # Displays the input image of the model 
            for i_image, (data, label) in enumerate(dataloaders['test']):                
                data, label = data.to(device), label.to(device)

                for model_name in models_vgg.keys():
                    model = models_vgg[model_name]
                    model = model.to(device)

                    with torch.no_grad():
                        i_label_top = reverse_labels[image_datasets['test'].classes[label]]
                        tic = time.time()
                        out = model(data).squeeze(0)
                        _, indices = torch.sort(out, descending=True)
                        if model_name == 'vgg' : # our previous work
                            top_1 = labels[indices[0]]
                            percentage = torch.nn.functional.softmax(out[args.subset_i_labels], dim=0) * 100
                            likelihood = percentage[reverse_subset_i_labels[i_label_top]].item()
                        else :
                            top_1 = subset_labels[indices[0]] 
                            percentage = torch.nn.functional.softmax(out, dim=0) * 100
                            likelihood = percentage[label].item()
                        dt = time.time() - tic
                    print(f'The {model_name} model get {labels[i_label_top]} at {likelihood:.2f} % confidence in {dt:.3f} seconds, best confidence for : {top_1}')
                    df_contrast.loc[i_trial] = {'model':model_name, 'likelihood':likelihood, 'time':dt, 'fps': 1/dt,
                                       'label':labels[i_label_top], 'i_label':i_label_top, 
                                       'i_image':i_image, 'filename':image_datasets['test'].imgs[i_image][0], 'contrast': contrast, 'device_type':device.type, 'top_1':str(top_1)}
                    i_trial += 1

    df_contrast.to_json(filename)

main()

Overwriting experiment_contrast.py

In [3]:

%run -int {scriptname}

Creating file results/2021-12-08_config_args.json
On date 2021-12-08 , Running benchmark on host neo-ope-de04  with device cuda
-------------------------------
List of Pre-selected classes : 
-------------------------------
-> label 945 = bell pepper 
id wordnet :  n02056570
-> label 513 = cornet 
id wordnet :  n02058221
-> label 886 = vending machine 
id wordnet :  n02219486
-> label 508 = computer keyboard 
id wordnet :  n02487347
-> label 786 = sewing machine 
id wordnet :  n02643566
-> label 310 = ant 
id wordnet :  n03085013
-> label 373 = macaque 
id wordnet :  n03110669
-> label 145 = king penguin 
id wordnet :  n04179913
-> label 146 = albatross 
id wordnet :  n04525305
-> label 396 = lionfish 
id wordnet :  n07720875
Loaded 4002 images under test
Loaded 2088 images under val
Loaded 5331 images under train
Classes:  ['albatross', 'ant', 'bell pepper', 'computer keyboard', 'cornet', 'king penguin', 'lionfish', 'macaque', 'sewing machine', 'vending machine']
Loading pretrained model for.. vgg16_lin  from models/re-trained_vgg16_lin.pt
Loading pretrained model for.. vgg16_gen  from models/re-trained_vgg16_gen.pt
Loading pretrained model for.. vgg16_scale  from models/re-trained_vgg16_scale.pt
Loading pretrained model for.. vgg16_gray  from models/re-trained_vgg16_gray.pt
Loading pretrained model for.. vgg16_full  from models/re-trained_vgg16_full.pt
filename='results/2021-12-08_results_4_neo-ope-de04.json'

IPython CPU timings (estimated):
  User   :      28.94 s.
  System :      15.74 s.
Wall time:      18.33 s.

In [5]:

filename = f'results/{datetag}_results_4_{args.HOST}.json'
df_contrast = pd.read_json(filename)
df_contrast

Out[5]:

	model	likelihood	fps	time	label	i_label	i_image	contrast	filename	device_type	top_1
0	vgg	99.973885	20.847063	0.047968	albatross	146	0	1	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
1	vgg16_lin	99.999916	147.318464	0.006788	albatross	146	0	1	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
2	vgg16_gen	99.813889	149.444310	0.006691	albatross	146	0	1	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
3	vgg16_scale	97.847305	153.071202	0.006533	albatross	146	0	1	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
4	vgg16_gray	99.998985	147.743985	0.006768	albatross	146	0	1	data/test/albatross/0001096bb6acdc6c546229f243...	cuda	albatross
...	...	...	...	...	...	...	...	...	...	...	...
264127	vgg16_lin	95.563713	154.094713	0.006490	vending machine	886	4001	101	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
264128	vgg16_gen	99.999878	155.017334	0.006451	vending machine	886	4001	101	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
264129	vgg16_scale	100.000000	155.477036	0.006432	vending machine	886	4001	101	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
264130	vgg16_gray	99.967804	154.862797	0.006457	vending machine	886	4001	101	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine
264131	vgg16_full	99.999870	155.800453	0.006418	vending machine	886	4001	101	data/test/vending machine/ffe141944f70eaa5c09d...	cuda	vending machine

264132 rows × 11 columns

Image display¶

The 64 worsts categorization likelihood, all models combined :

In [6]:

import imageio
N_image_i = 8
N_image_j = 8
fig, axs = plt.subplots(N_image_i, N_image_j, figsize=(fig_width*1.3, fig_width))
for i_image, idx in enumerate(df_contrast.sort_values(by=['likelihood'], ascending=True).head(N_image_i*N_image_j).index):
    ax = axs[i_image%N_image_i][i_image//N_image_i]
    ax.imshow(imageio.imread(image_datasets['test'].imgs[df_contrast.loc[idx]['i_image']][0]))
    ax.set_xticks([])
    ax.set_yticks([])
    ax.set_xlabel(df_contrast.loc[idx]['top_1'] + ' | ' + df_contrast.loc[idx]['model'] + ' | ' + str(df_contrast.loc[idx]['contrast']), color='r')
    label = df_contrast.loc[idx]['label']
    ax.set_ylabel(f'True= {label}', color='r')
fig.set_facecolor(color='white')

F1 score¶

And extract the f1 score for each networks :

In [7]:

from sklearn.metrics import accuracy_score, precision_score, f1_score
for contrast in np.arange(1,110,10):
    pprint(f'Benchmarking contrast = {contrast}')
    df_acc = pd.DataFrame({model_name: {subset_label: f1_score(df_contrast[(df_contrast['model']==model_name) & (df_contrast['contrast']==contrast)]["top_1"], 
                                                               df_contrast[(df_contrast['model']==model_name) & (df_contrast['contrast']==contrast)]["label"],
                                                               average = 'micro')
                                        for subset_label in subset_labels} 
                           for model_name in models_vgg.keys()})

    ax = df_acc.plot.bar(rot=60, figsize=(fig_width, fig_width//4), fontsize=18)
    ax.set_ylim(0, 1)
    ax.hlines(xmin=-.5, xmax=len(subset_labels)-.5, y=1/n_output, ls='--', ec='k', label='chance level')
    plt.legend(bbox_to_anchor=(1.1, .35), loc='lower right')
    ax.grid(which='both', axis='y')
    for side in ['top', 'right'] :ax.spines[side].set_visible(False)
    ax.set_title(f'Experiment 4 - Categorization f1 score for each network at different contrast = {contrast}', size=20)
    ax.set_ylabel('F1 score', size=14)
    plt.show();

-------------------------
Benchmarking contrast = 1
-------------------------

--------------------------
Benchmarking contrast = 11
--------------------------

--------------------------
Benchmarking contrast = 21
--------------------------

--------------------------
Benchmarking contrast = 31
--------------------------

--------------------------
Benchmarking contrast = 41
--------------------------

--------------------------
Benchmarking contrast = 51
--------------------------

--------------------------
Benchmarking contrast = 61
--------------------------

--------------------------
Benchmarking contrast = 71
--------------------------

--------------------------
Benchmarking contrast = 81
--------------------------

--------------------------
Benchmarking contrast = 91
--------------------------

---------------------------
Benchmarking contrast = 101
---------------------------

In [8]:

df_acc = pd.DataFrame({model_name: {contrast: f1_score(df_contrast[(df_contrast['model']==model_name) & (df_contrast['contrast']==contrast)]["top_1"], 
                                                         df_contrast[(df_contrast['model']==model_name) & (df_contrast['contrast']==contrast)]["label"],
                                                         average='micro')
                                    for contrast in np.arange(1,110,10)} 
                       for model_name in models_vgg.keys()})

ax = df_acc.T.plot.bar(rot=0, figsize=(fig_width, fig_width/2), fontsize=20, width=.9)
ax.set_ylim(0, 1)
ax.hlines(xmin=-4.5, xmax=len(models_vgg.keys())+4.5, y=1/n_output, ls='--', ec='k', label='chance level')
# https://matplotlib.org/stable/gallery/lines_bars_and_markers/bar_label_demo.html
for container in ax.containers: ax.bar_label(container, padding=-50, color='black', fontsize=16, fmt='%.3f', rotation=90,fontweight='bold')
plt.legend(bbox_to_anchor=(1.15, .2), loc='lower right', fontsize = 15)
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
ax.set_title(f'Experiment 4 - Categorization f1 score at different contrast for each network', size=25)
ax.set_ylabel('F1 score', size=25)
ax.set_xlabel('Model', size=25)
plt.show();

Computation time¶

As there is no big difference for the models, in order to have a clearer display, we chose to expose the characteristics for VGG_Gen. Here a display of the differents computation time on the same dataset for different contrast:

In [9]:

fig, axs = plt.subplots(figsize=(fig_width, fig_width/(phi*2)))
data = df_contrast.loc[df_contrast['model'] == 'vgg16_gen']
axs = sns.violinplot(x="contrast", y="time", data=data, inner="quartile")
axs.set_title('Processing time (s) for the VGG Gen network at different contrast. Processed on : ' + args.HOST + '_' + str(df_contrast['device_type'][0]), size = 20)
axs.set_ylabel('Computation time (s)', size=18)
axs.set_xlabel('Contrast factor', size=18)
axs.set_yscale('log')
axs.grid(which='both', axis='y')
for side in ['top', 'right'] :axs.spines[side].set_visible(False)
h, l = axs.get_legend_handles_labels()
axs.legend(h[:5], l[:5], loc='upper center', fontsize=16);

Categorization likelihood¶

Again, in order to have a clearer display, we chose to expose the the categorization likelihood for VGG_Gen at different contrast.

In [10]:

fig, axs = plt.subplots(figsize=(fig_width, fig_width/(phi*2)))
data = df_contrast.loc[df_contrast['model'] == 'vgg16_gen']
axs = sns.violinplot(x="contrast", y="likelihood", data=data, inner="quartile", cut = 0, scale = 'width')
axs.set_title('Categorization likelihood for the VGG Gen network at different contrast.. Processed on : ' + args.HOST + '_' + str(df_contrast['device_type'][0]), size=20)
axs.set_ylabel('Categorization likelihood (%)', size=18)
axs.set_xlabel('Contrast factor', size=18)
ax.grid(which='both', axis='y')
for side in ['top', 'right'] :ax.spines[side].set_visible(False)
h, l = axs.get_legend_handles_labels();

Summary¶

All network's F1 score are affected by changing the contrast. However, as in experiment 3, the performance remains largely above the chance level, and this for all the re-trained networks, this distinction may from the fact that they were re-trained with a contrast transformation, which shows that our training process partly compensated the degradation of the input's images.

Mean f1 score

In [11]:

for model_name in models_vgg.keys():
    pprint(f'Benchmarking model {model_name}')
    for contrast in np.arange(1,110,10):
        df_ = df_contrast[(df_contrast['model']==model_name) & (df_contrast['contrast']==contrast)]
        mean_f1_score = f1_score(df_["top_1"] , df_["label"] , average = 'micro')
        print(f'For contrast factor {contrast}, the mean f1 score = {mean_f1_score*100:.4f} %' )

----------------------
Benchmarking model vgg
----------------------
For contrast factor 1, the mean f1 score = 71.2644 %
For contrast factor 11, the mean f1 score = 43.5782 %
For contrast factor 21, the mean f1 score = 29.5602 %
For contrast factor 31, the mean f1 score = 23.8131 %
For contrast factor 41, the mean f1 score = 20.6397 %
For contrast factor 51, the mean f1 score = 18.7156 %
For contrast factor 61, the mean f1 score = 15.4423 %
For contrast factor 71, the mean f1 score = 14.8926 %
For contrast factor 81, the mean f1 score = 14.3928 %
For contrast factor 91, the mean f1 score = 13.3433 %
For contrast factor 101, the mean f1 score = 12.8936 %
----------------------------
Benchmarking model vgg16_lin
----------------------------
For contrast factor 1, the mean f1 score = 95.6272 %
For contrast factor 11, the mean f1 score = 90.3298 %
For contrast factor 21, the mean f1 score = 86.3068 %
For contrast factor 31, the mean f1 score = 83.2584 %
For contrast factor 41, the mean f1 score = 80.9095 %
For contrast factor 51, the mean f1 score = 79.0855 %
For contrast factor 61, the mean f1 score = 78.8856 %
For contrast factor 71, the mean f1 score = 77.4363 %
For contrast factor 81, the mean f1 score = 77.3863 %
For contrast factor 91, the mean f1 score = 77.2614 %
For contrast factor 101, the mean f1 score = 76.5367 %
----------------------------
Benchmarking model vgg16_gen
----------------------------
For contrast factor 1, the mean f1 score = 95.9770 %
For contrast factor 11, the mean f1 score = 89.6802 %
For contrast factor 21, the mean f1 score = 86.0070 %
For contrast factor 31, the mean f1 score = 82.8586 %
For contrast factor 41, the mean f1 score = 80.6597 %
For contrast factor 51, the mean f1 score = 80.5347 %
For contrast factor 61, the mean f1 score = 79.2104 %
For contrast factor 71, the mean f1 score = 77.3613 %
For contrast factor 81, the mean f1 score = 78.6357 %
For contrast factor 91, the mean f1 score = 76.8866 %
For contrast factor 101, the mean f1 score = 75.8371 %
------------------------------
Benchmarking model vgg16_scale
------------------------------
For contrast factor 1, the mean f1 score = 95.6022 %
For contrast factor 11, the mean f1 score = 90.1799 %
For contrast factor 21, the mean f1 score = 87.0815 %
For contrast factor 31, the mean f1 score = 84.1329 %
For contrast factor 41, the mean f1 score = 82.5337 %
For contrast factor 51, the mean f1 score = 81.7091 %
For contrast factor 61, the mean f1 score = 80.4598 %
For contrast factor 71, the mean f1 score = 79.8601 %
For contrast factor 81, the mean f1 score = 80.2099 %
For contrast factor 91, the mean f1 score = 79.1104 %
For contrast factor 101, the mean f1 score = 79.3603 %
-----------------------------
Benchmarking model vgg16_gray
-----------------------------
For contrast factor 1, the mean f1 score = 95.8771 %
For contrast factor 11, the mean f1 score = 89.0555 %
For contrast factor 21, the mean f1 score = 83.5332 %
For contrast factor 31, the mean f1 score = 79.0855 %
For contrast factor 41, the mean f1 score = 76.4868 %
For contrast factor 51, the mean f1 score = 74.8376 %
For contrast factor 61, the mean f1 score = 74.4878 %
For contrast factor 71, the mean f1 score = 73.0885 %
For contrast factor 81, the mean f1 score = 72.6137 %
For contrast factor 91, the mean f1 score = 71.7891 %
For contrast factor 101, the mean f1 score = 70.7646 %
-----------------------------
Benchmarking model vgg16_full
-----------------------------
For contrast factor 1, the mean f1 score = 96.1769 %
For contrast factor 11, the mean f1 score = 90.2549 %
For contrast factor 21, the mean f1 score = 86.8066 %
For contrast factor 31, the mean f1 score = 83.8081 %
For contrast factor 41, the mean f1 score = 81.4843 %
For contrast factor 51, the mean f1 score = 80.8096 %
For contrast factor 61, the mean f1 score = 80.0350 %
For contrast factor 71, the mean f1 score = 78.8356 %
For contrast factor 81, the mean f1 score = 78.6607 %
For contrast factor 91, the mean f1 score = 78.2609 %
For contrast factor 101, the mean f1 score = 77.5112 %

Mean categorization likelihood

In [12]:

for model_name in models_vgg.keys():
    pprint(f'Benchmarking model {model_name}')
    for contrast in np.arange(1,110,10):
        med_likelihood = np.mean(df_contrast[(df_contrast['model']==model_name) & (df_contrast['contrast']==contrast)]["likelihood"])
        print(f'For contrast factor {contrast}, the mean clasification likelihood = {med_likelihood:.5f} %' )

----------------------
Benchmarking model vgg
----------------------
For contrast factor 1, the mean clasification likelihood = 93.42450 %
For contrast factor 11, the mean clasification likelihood = 78.65400 %
For contrast factor 21, the mean clasification likelihood = 68.43755 %
For contrast factor 31, the mean clasification likelihood = 61.33069 %
For contrast factor 41, the mean clasification likelihood = 58.24161 %
For contrast factor 51, the mean clasification likelihood = 55.11642 %
For contrast factor 61, the mean clasification likelihood = 53.10443 %
For contrast factor 71, the mean clasification likelihood = 52.19755 %
For contrast factor 81, the mean clasification likelihood = 51.43817 %
For contrast factor 91, the mean clasification likelihood = 50.23643 %
For contrast factor 101, the mean clasification likelihood = 49.22434 %
----------------------------
Benchmarking model vgg16_lin
----------------------------
For contrast factor 1, the mean clasification likelihood = 95.00309 %
For contrast factor 11, the mean clasification likelihood = 88.55016 %
For contrast factor 21, the mean clasification likelihood = 83.98296 %
For contrast factor 31, the mean clasification likelihood = 80.66875 %
For contrast factor 41, the mean clasification likelihood = 78.24489 %
For contrast factor 51, the mean clasification likelihood = 76.74437 %
For contrast factor 61, the mean clasification likelihood = 75.91672 %
For contrast factor 71, the mean clasification likelihood = 74.48677 %
For contrast factor 81, the mean clasification likelihood = 74.11826 %
For contrast factor 91, the mean clasification likelihood = 73.70028 %
For contrast factor 101, the mean clasification likelihood = 73.11205 %
----------------------------
Benchmarking model vgg16_gen
----------------------------
For contrast factor 1, the mean clasification likelihood = 94.80409 %
For contrast factor 11, the mean clasification likelihood = 88.36317 %
For contrast factor 21, the mean clasification likelihood = 83.94363 %
For contrast factor 31, the mean clasification likelihood = 80.24268 %
For contrast factor 41, the mean clasification likelihood = 77.85029 %
For contrast factor 51, the mean clasification likelihood = 76.93702 %
For contrast factor 61, the mean clasification likelihood = 76.04521 %
For contrast factor 71, the mean clasification likelihood = 74.03292 %
For contrast factor 81, the mean clasification likelihood = 74.58919 %
For contrast factor 91, the mean clasification likelihood = 73.56544 %
For contrast factor 101, the mean clasification likelihood = 72.59151 %
------------------------------
Benchmarking model vgg16_scale
------------------------------
For contrast factor 1, the mean clasification likelihood = 94.84356 %
For contrast factor 11, the mean clasification likelihood = 89.58862 %
For contrast factor 21, the mean clasification likelihood = 85.81754 %
For contrast factor 31, the mean clasification likelihood = 82.65550 %
For contrast factor 41, the mean clasification likelihood = 80.74809 %
For contrast factor 51, the mean clasification likelihood = 80.00198 %
For contrast factor 61, the mean clasification likelihood = 78.79630 %
For contrast factor 71, the mean clasification likelihood = 77.94702 %
For contrast factor 81, the mean clasification likelihood = 77.92877 %
For contrast factor 91, the mean clasification likelihood = 77.03847 %
For contrast factor 101, the mean clasification likelihood = 77.13479 %
-----------------------------
Benchmarking model vgg16_gray
-----------------------------
For contrast factor 1, the mean clasification likelihood = 94.67260 %
For contrast factor 11, the mean clasification likelihood = 87.07480 %
For contrast factor 21, the mean clasification likelihood = 81.03234 %
For contrast factor 31, the mean clasification likelihood = 76.41387 %
For contrast factor 41, the mean clasification likelihood = 74.01669 %
For contrast factor 51, the mean clasification likelihood = 72.37908 %
For contrast factor 61, the mean clasification likelihood = 71.10579 %
For contrast factor 71, the mean clasification likelihood = 69.60282 %
For contrast factor 81, the mean clasification likelihood = 69.62824 %
For contrast factor 91, the mean clasification likelihood = 68.36464 %
For contrast factor 101, the mean clasification likelihood = 67.40389 %
-----------------------------
Benchmarking model vgg16_full
-----------------------------
For contrast factor 1, the mean clasification likelihood = 94.99972 %
For contrast factor 11, the mean clasification likelihood = 88.72732 %
For contrast factor 21, the mean clasification likelihood = 84.30309 %
For contrast factor 31, the mean clasification likelihood = 80.99912 %
For contrast factor 41, the mean clasification likelihood = 78.90924 %
For contrast factor 51, the mean clasification likelihood = 77.63452 %
For contrast factor 61, the mean clasification likelihood = 76.75700 %
For contrast factor 71, the mean clasification likelihood = 75.57893 %
For contrast factor 81, the mean clasification likelihood = 75.50289 %
For contrast factor 91, the mean clasification likelihood = 74.71195 %
For contrast factor 101, the mean clasification likelihood = 74.13195 %

Final summary¶

As a summary, we have shown here that implementing transfer learning on a subset of images reaches a higher accuracy than the VGG16 network. We have also shown that transfer learning could also be used to teach the network to different perturbations of the image, for instance changes in the resolution of in the colorspace. Thus, as there is no huge differences between the VGG_Gen and VGG_Full networks on this task, we infer that the training of a single layer is sufficient to perform the categorization. Again, as there is no huge differences between the VGG_Gen and the VGG_Lin networks on this task, the architecture used for the VGG_Gen network seems sufficient in order to perform categorization. Such framework is thus promising for further applications which aim at modelling higher-order cognitive processes.

Implementing transfer learning on Vgg16 using pyTorch¶

Initialization of the libraries/variables¶

Download the train & val dataset¶

Transfer learning and dataset config¶

Training process¶

Bonus: Scan of some parameters¶

Experiment 1: Image processing and recognition for differents labels¶

Image display¶

Accuracy, Precision Recall & F1 Score¶

Evidence form accuracy and likelihood¶

Computation time¶

Summary¶

Experiment 2: Image processing and recognition for differents resolutions¶

Image display¶

F1 score¶

Computation time¶

Categorization likelihood¶

Summary¶

Experiment 3: Image processing and recognition on grayscale images¶

Image display¶

F1 score¶

Computation time¶

Categorization likelihood¶

Summary¶

Experiment 4: Image processing and recognition on contrasted images¶

Image display¶

F1 score¶

Computation time¶

Categorization likelihood¶

Summary¶

Final summary¶

Download the `train` & `val` dataset¶