{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Exercise 6: ROC curves and image retrieval\n",
        "\n",
        "In order to complete the exercise you have to present these files to the teaching assistant. Some assignments contain questions that require sketching, writing or manual calculation. Write these answers down and bring them to the presentation as well. The tasks that are written in bold are optional. Without completing them you can get at most 75 points for the exercise (the total number of points is 100 and results in grade 10). Each optional task has the amout of additional points written next to it, sometimes there are more optional exercises and you do not have to complete all of them."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Introduction \n",
        "\n",
        "The exercise includes two assignments: In the first one you will get to know ROC curves that can be used to evaluate and compare classifiers and retrieval systems. In the second assignment you will implement and test several image retrieval systems that operate on different features and evalue them using ROC analysis."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "%matplotlib notebook\n",
        "\n",
        "# Importing necessary libraries\n",
        "import numpy as np\n",
        "import os\n",
        "\n",
        "# We will be using skimage for image manipulation and MatPlotLib for drawing\n",
        "from skimage import data, io, transform\n",
        "from matplotlib import pyplot as plt\n",
        "\n",
        "# You will be using PyTorch for deep learning feature extraction\n",
        "import torch\n",
        "import torch.nn as nn\n",
        "from torchvision import models, transforms\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Assignment 1: The ROC analysis\n",
        "\n",
        "The purpose of this assignment is to learn the theory and practical use of ROC analysis. Therefore you should first read <a href=\"https://www.sciencedirect.com/science/article/pii/S016786550500303X\">a paper about ROC curves</a> (*if you do not have access to ScienceDirect from home, then you can use <a href=\"https://box.vicos.si/borja/mms/Introduction_to_ROC_Analysis.pdf\">this link</a> to obtain the paper*). While reading, pay special attention to sections $1$--$5$ and $7$--$8$. In the following tasks you will work on a given theoretical example of two classifiers ($C_1$ and $C_2$).\n",
        "\n",
        "We have a set of samples that we wish to classify in one of two classes and a ground truth class of each sample (denoted as $0$ and $1$). For each sample a classifier gives us a score based on which we can determine to which class should the sample belong to (score closer to $0$ means class $0$, score closer to $1$ means class $1$). Below are the results for $8$ samples, their ground truth values ($\\xi_\\mathrm{id}$) and the score values for both classifiers ($\\xi_{C_1}$ and $\\xi_{C_2}$).\n",
        "\n",
        "\\begin{equation}\n",
        "    \\begin{array}{*{20}c}\n",
        "            {\\xi_\\mathrm{id} = } & {\\left[ {} \\right.} & 1 & 0 & 1 & 1 & 1 & 0 & 0 & 0 & {\\left. {} \\right]}  \\\\\n",
        "            {\\xi_{C_1} = } & {\\left[ {} \\right.} & {0.5} & {0.3} & {0.6} & {0.22} & {0.4} & {0.51} & {0.2} & {0.33} & {\\left. {} \\right]}  \\\\\n",
        "            {\\xi_{C_2} = } & {\\left[ {} \\right.} & {0.04} & {0.1} & {0.68} & {0.22} & {0.4} & {0.11} & {0.8} & {0.53} & {\\left. {} \\right]}  \\\\\n",
        "    \\end{array}\n",
        "\\end{equation}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "a) For the example above calculate and draw the ROC curves (by hand) for classifier $C_1$ as well as classifier $C_2$. Also calculate the area under the curve (AUC) for both classifiers. For the classifier $C_1$ select a decision threshold (working points) $\\vartheta_{th1}=0.33$ and use it to calculate the confusion matrix and the $F$ measure score. Do the same thing for the classifier $C_2$ using a threshold value $\\vartheta_{th2}=0.1$.\n",
        "\n",
        "<b>Question:</b> Based on <a href=\"https://www.sciencedirect.com/science/article/pii/S016786550500303X\">Fawcett theory</a> decide which classifier is better in the selected working points and motivate your decision. Which working point is optimal for each classifier?"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "b) You will now calculate ROC curves for <i>combined classifiers</i>. We have two binary classifiers, $C_1$ ($\\vartheta_{th1}=0.33$) and $C_2$ ($\\vartheta_{th2}=0.1$), which means that the classifier $C_1$ classifies a sample as class $1$, if its score is $\\xi_{C_1}(\\mathbf{x}_i) > \\vartheta_{th1}$, otherwise it classifies it as class $0$, similarly can be said about classifier $C_2$. The first combined classifier $C_3$ can be obtained as the intersection of decisions of the two basic classifiers, $C_3 = C_1 \\bigwedge C_2$ ($C_3$ classifies a sample as class $1$ if both basic classifiers classify it as class $1$). The second one can be obtained as an union of the decisions of the two basic classifiers, $C_4 = C_1 \\bigvee C_2$ ($C_4$ classifies a sample as class $1$ if at least one of the basic classifiers classifies it as class $1$). For each combined classifier calculate and plot its point in the ROC space together with the ROC curves from previous tasks."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "c) Implement function <b>get_roc(scores, groundtruth)</b> that expects a vector <b>scores</b> ($\\xi_\\mathrm{score}$) and a vector <b>groundtruth</b> ($\\xi_\\mathrm{id}$) as an input and returns a ROC curve in matrix <b>R</b> as well as the area under the curve value in variable <b>AUC</b>. The ROC curve should be encoded as a $2 \\times N$ matrix, the first row includes <i>true positive rate</i> values, the second row includes <i>false positive rate</i> values. You should study the <a href=\"https://www.sciencedirect.com/science/article/pii/S016786550500303X\">Algorithm 1</a> described in the given paper for the implementation. Validate the correctness of the algorithm by computing the ROC curve on the data from the first task."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO: your get_roc(scores, groundtruth) implementation\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "d) <a href=\"https://www.sciencedirect.com/science/article/pii/S016786550500303X\">Fawcett et al.</a> explains how to select an optimal working point for a classifier from a ROC curve. The threshold should be at the point on the curve that is the closest to the point $[0,1]$ in the ROC space. Extend your function <b>get_roc</b> so that it will also calculate the optimal working point for the ROC curve and that it will return its position and the corresponding threshold value and $F$ score value as additional results. Test your code on classifiers $C_1$ and $C_2$. Based on the AUC decide which classifier is performing better over all threshold values, based on the $F$ score at the optimal working point decide which classifier is better at this point. Write a script that calculates and draws ROC curves for classifiers $C_1$ and $C_2$ on the same plot, for both classifiers mark the optimal working point and set the title of the plot to display values of the thresholds, $F$ scores and AUC for both classifiers."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Assignment 2: Image retrieval using histograms and correlation\n",
        "\n",
        "In this assignment you will implement several image retrival systems. The input to the system is a query image and the system should return the images in the database sorted by similarity to the query image. For each approach you will have to extract features from all the images, then compare your query image to all of the database images.\n",
        "You will also have to compare the performance of different features over all the images in your dataset by calcualting an $N \\times N$ similarity matrix.\n",
        "\n",
        "You will test your retrieval system on a dataset that is based on the <a href=\"http://www.vision.caltech.edu/Image_Datasets/Caltech101/\">Caltech 101 dataset</a>. It consists of approximately 5000 color images from 83 classes (some classes have been removed, others have been reduced in size to make all classes more balanced). You can use the funtion <b>prepare_dataset</b> to extract the images and their corresponding classes. The function returns a list of images along with the corresponding class labels. You can modify it to only extract (or ignore) specific classes and/or to resize the images for faster processing if you wish. For developing the system you are advised to use a smaller subset of classes with a smaller number of samples to speed up the computation (or alternatively, a subset of easier classes, e.g. <i>accordion</i>, <i>faces</i>, <i>strawberry</i> etc.)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Run this cell to download the data used in this exercise - the image dataset\n",
        "import zipfile, urllib.request, io\n",
        "zipfile.ZipFile(io.BytesIO(urllib.request.urlopen(\"http://data.vicos.si/lukacu/multimedia/exercise6.zip\").read())).extractall()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [],
      "source": [
        "def prepare_dataset(n_samples = 10):\n",
        "    # Loads images from caltech_101 dataset and stores their classes\n",
        "    # uses n_samples images from each class\n",
        "    # only uses color images from each class\n",
        "    \n",
        "    # List of images\n",
        "    images_list = []\n",
        "    \n",
        "    # Corresponding list of classes\n",
        "    classes_list = []\n",
        "    \n",
        "    categories = os.walk('images')\n",
        "\n",
        "    is_first = 1\n",
        "    for subdir_info in categories:\n",
        "        if(is_first):\n",
        "            is_first = 0\n",
        "            \n",
        "        else:\n",
        "            dir_name = subdir_info[0].split('/')\n",
        " \n",
        "            images = [f for f in os.listdir(subdir_info[0]) if os.path.isfile(os.path.join(subdir_info[0], f))]\n",
        "\n",
        "            counter = 0\n",
        "            for image in images:\n",
        "                data = io.imread(os.path.join(subdir_info[0], image))\n",
        "                if(len(data.shape) == 3):\n",
        "                    if(counter < n_samples):\n",
        "                        images_list.append(image)\n",
        "                        classes_list.append(dir_name[-1])\n",
        "                        counter += 1\n",
        "                    \n",
        "    return images_list, classes_list"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "a) Implement a classifier based on color histograms. Write a function that computes 3D color histograms for all images in your database. Use the numpy function <b>histogramdd</b> in the RGB color space, then reshape the $3$D histograms to $1$D histograms and stack them together in a $2$D matrix (the matrix is composed in a way that the $i$-th row includes the histogram of the $i$-th image)..\n",
        "\n",
        "To compute the distance between the reference histogram and every other histogram in the database you will use Hellinger distance that is defined as:\n",
        "\n",
        "\\begin{equation}\n",
        "H(\\mathbf{h}_1,\\mathbf{h}_2) = \\sqrt{  \\frac{1}{2} \\sum_{i=0}^{N-1} \\Big( \\sqrt{h_1(i)} - \\sqrt{h_2(i)} \\Big)^2 }.\n",
        "\\end{equation}\n",
        "\n",
        "Note that low values of Hellinger distances signify high similarity while high values signify low similarity. This is exactly the opposite to what is expected by your ROC curve function that you will use in the following tasks. This can be fixed by redefining the histogram distance measure. If we assume that $H(\\mathbf{h}_1,\\mathbf{h}_2)$ denotes the Helliner distance between histograms $\\mathbf{h}_1$ and $\\mathbf{h}_2$, we can define the new distance simply as\n",
        "\n",
        "\\begin{equation}\n",
        "\\rho(\\mathbf{h}_1,\\mathbf{h}_2) = 1 - H(\\mathbf{h}_1,\\mathbf{h}_2).\n",
        "\\end{equation}\n",
        "\n",
        "Write a script that tests your system by loading the database (use $8$ bins per color channel), using the fifth image in the database as a reference image. Compute the distances to all other images, sort the distances and display the first five matches and the reference image in a same figure."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Example usage of \"histogramdd\" function:\n",
        "#\n",
        "# Load an image\n",
        "example_image = io.imread(\"images/airplanes/image_0001.jpg\")\n",
        "# Reshape image of size (h, w, 3) to (h * w, 3)\n",
        "example_image_reshaped = np.reshape(example_image, (example_image.shape[0] * example_image.shape[1], 3))\n",
        "# Compute histograms for each color channel (and normalise it)\n",
        "H, _ = np.histogramdd(example_image_reshaped, bins = 8, normed = True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 194,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "b) Implement a system that uses normalized cross-correlation of grayscale images. The normalized cross-correlation between two sequences of same size $\\mathbf{X}$ and $\\mathbf{Y}$, denoted as $NCC(\\mathbf{X},\\mathbf{Y})$ is defined as scalar product between normalized sequences:\n",
        "\n",
        "\\begin{equation}\n",
        "NCC(\\mathbf{X},\\mathbf{Y}) = \\frac{1}{N} \\frac{\\sum  (x_i - \\bar{x}) (y_i - \\bar{y})}{ \\Big(\\sqrt{ \\frac{1}{N} \\sum (x_i - \\bar{x})^2 } \\Big) \\Big(\\sqrt{ \\frac{1}{N} \\sum (y_i - \\bar{y})^2 } \\Big) }.\n",
        "\\end{equation}\n",
        "\n",
        "where $\\bar{x}$ and $\\bar{y}$ denote the mean values of the elements in the sequences. Sequences are more similar if the correlation is higher. We can compute normalized cross-correlation for grayscale images of same size if we reshape them into vectors of intensity values. Repeat the testing of the system in the same manner than in the previous task, load images, convert them to grayscale, select a reference image, compute the distances and display the first five matches. What is the system sensitive to?\n",
        "\n",
        "More information is available <a href=\"http://en.wikipedia.org/wiki/Cross-correlation\">here</a>."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "c) Implement a retrieval system that uses CNN features to calculate image similarity. You will need the <b>PyTorch</b> library and a pretrained neural network of your choice. Note that architectures of the models are different and you need to check the model documentation to find out which layer contains useful features (but generally it is the penultimate layer). Its features should be one dimensional vectors that you can compare using the Hellinger distance. Plot the ROC curve for the system and comment on its performance related to the NCC and color histogram systems. Help yourself with the sample code below."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Get pretrained ALEXNET model (you can also use other models)\n",
        "# Full list of pretrained models is available at: https://pytorch.org/docs/stable/torchvision/models.html\n",
        "model = models.alexnet(pretrained=True)\n",
        "\n",
        "# Remove last fully-connected layer (Second to last layer usually contains useful features)\n",
        "new_classifier = nn.Sequential(*list(model.classifier.children())[:-1])\n",
        "model.classifier = new_classifier\n",
        "\n",
        "\n",
        "# Read an image (we will use PIL library for loading the image)\n",
        "image = io.imread(\"images/airplanes/image_0001.jpg\")\n",
        "\n",
        "# Image transforms (for preprocessing)\n",
        "# Note: You must check the documentation of your chosen model to see what kind of \n",
        "#       preprocessing procedure is required   \n",
        "transform = transforms.Compose([\n",
        "    transforms.ToPILImage(),\n",
        "    transforms.Resize(256), # Resize the image to 256x256\n",
        "    transforms.CenterCrop(224), # Central crop the image to 224x224\n",
        "    transforms.ToTensor(), # Convert it to a tensor\n",
        "    transforms.Normalize(mean=[0.485, 0.456, 0.406], # Subtract mean value\n",
        "                         std=[0.229, 0.224, 0.225])  # And normalize it using the std value\n",
        "])\n",
        "\n",
        "# Preprocess the input image and prepare a batch to be passed to the model\n",
        "image_preprocessed = transform(image)\n",
        "inference_batch = torch.unsqueeze(image_preprocessed, 0)\n",
        "\n",
        "# Put the model in EVAL (inference) mode\n",
        "model.eval()\n",
        "\n",
        "# Perform the inference on the given batch\n",
        "feature_vector = model(inference_batch)\n",
        "\n",
        "# Show values of the feature_vector tensor\n",
        "print(feature_vector)\n",
        "# Show the shape of the feature_vector tensor\n",
        "# Note: It should be a one dimensional tensor/vector\n",
        "print(feature_vector.shape)  \n",
        "\n",
        "# Convert tensor vector to numpy\n",
        "# Use this to check image similarities\n",
        "feature_vector_numpy = feature_vector.detach().numpy()\n",
        "\n",
        "# Uncomment to print the non-truncated feature vector\n",
        "# with np.printoptions(threshold=np.inf):\n",
        "#     print(feature_vector_numpy)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "d) The image retrieval systems can currently return distances between each image in the database to the reference image. If we want to build a binary classifier we have to determine the optimal threshold that can be used to decide if a classifier belongs to the class of the reference image or not. We will use ROC analysis that you have implemented in the previous assignment. Use the ROC curves to determine optimal threshold. You can use an image in the database as a reference image and compute distances to every other image in the dataset. If we select the first image we can generate a ROC curve for that image, however, this curve only tells us the properties of the system for this input and may not be generalizable. If we select a different image we can get a completely different ROC curve with a different optimal point. Instead, use the following procedure to compute the ROC curve over all images in the database.\n",
        "\n",
        "<ol>\n",
        "    <li>Load the database.</li>\n",
        "    <li>For each image in the database:\n",
        "        <ol>\n",
        "            <li>Use the selected image as a reference image.</li>\n",
        "            <li>Compute the distance between the reference image and the remaining images in the database and store the distances to vector $\\vartheta_\\mathrm{score}$<br>(the vector must not include distance to the reference image itself).</li>\n",
        "            <li>Use the provided label ground truth data to compose a vector of binary ground truth for the given reference image by comparing it to the class of the reference image. Save the binary ground truth to vector $\\vartheta_\\mathrm{class}$.</li>\n",
        "            <li>Extend the overall vectors for scores and ground truth by appending the new data: $\\varphi_\\mathrm{class} = \\varphi_\\mathrm{class} + \\vartheta_\\mathrm{class}$ and $\\varphi_\\mathrm{score} = \\varphi_\\mathrm{score} + \\vartheta_\\mathrm{score}$.</li>\n",
        "        </ol>\n",
        "    <li>Use the composed overall vectors $\\varphi_\\mathrm{score}$ and $\\varphi_\\mathrm{class}$ to compute the ROC curve.</li>\n",
        "</ol>\n",
        "\n",
        "Evaluate all three systems by plotting their ROC curves and compare them. Determine the optimal threshold value, plot it in the ROC space (on top of the ROC curve). Do not forget to label the axes of the plot."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO: write your code here"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "e) <b>(5 points)</b> Extend the retrieval method based on color histograms by including spatial information. Divide the input image into subregions, calculate the histogram for reach region, then concatenate the histograms into a final feature vector. Experiment with different numbers of subregions and comment on the performance of the new method."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO: write your code here"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "f) **(10 points)** Compare performance of several deep descriptors (e.g. ResNet family, VGG) using ROC analysis. Download pretrained networks and extract the feature extraction part of the model (remove the fully-connected layers at the end). It is a good idea to pre-compute the features for parts of the dataset and store them on the disk, then load them for further analysis to save time."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO: write your code here"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "f) <b>(15 points)</b> Explore options of speeding up retrieval using space partitioning tecniques, e.g. hierarchical clustering. Demonstrate the benefit of your method by counting an average number of vector comparisons as well as the retrieval performance (you do not have to use ROC here, simply a top N elements accuracy). "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO: write your code here"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "g) <b>(20 points)</b> Implement a bag-of-words (BoW) approach for image retrieval. First, extract local features from each image (you can use SIFT or a similar descriptor found in OpenCV library), then generate a codebook via a clustering method (as described <a href=\"https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision#Codebook_generation\">here</a>). The image feature vectors are then histograms of the codewords and can be used for comparison. You do not have to use the entire library to build a dictionary, use just a quater.\n",
        "\n",
        "See <a href=\"https://arxiv.org/abs/1304.5168\">the paper</a> for further info on implementation details for BoW image retrieval."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "# TODO: write your code here"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.7"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}