{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Exercise 1: The basics of image processing\n", "=======================\n", "\n", "To complete the exercise, follow the instructions and complete the missing code and write the answers where required. All points, except the ones marked with **(N points)** are mandatory. The optional tasks require more independent work and some extra effort. Without completing them you can get at most 75 points for the exercise (the total number of points is 100 and results in grade 10). Sometimes there are more optional exercises and you do not have to complete all of them, you can get at most 100 points.\n", "\n", "If you have not used Python, IPython and Jupyter environment before, take a look at the following list introductory tutorials:\n", "\n", " * [Introduction to Python 3](https://realpython.com/python-introduction/)\n", " * [Useful IPython facts](https://ipython.org/ipython-doc/3/interactive/tutorial.html)\n", " * [Introduction to Jupyter notebooks](https://www.dataquest.io/blog/jupyter-notebook-tutorial/)\n", " * [Introduction to NumPy, SciKit, MatPlotLib](https://cs231n.github.io/python-numpy-tutorial/)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# First, run this cell to download the data used in this exercise\n", "import zipfile, urllib.request, io\n", "zipfile.ZipFile(io.BytesIO(urllib.request.urlopen(\"http://data.vicos.si/lukacu/multimedia/Exercise1.zip\").read())).extractall()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assignment 1: Basic image processing\n", "----------\n", "\n", "The aim of this assignment is to familiarize yourself with the basic functionality of SciKit, NumPy and MatPlotLib, as well as the use of matrices for storing image information. In this assignment, you will try to load an image, display it and manipulate its content with NumPy operations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Read image from the file `umbrellas.jpg`, and display it using functions skimage.io.imshow. The image that you have loaded consists of three channels (Red, Green, and Blue), and is represented as a 3-D matrix with dimensions height × width × channels." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use this IPython command to enable interactive plots (without it they will be rendered only as static images)\n", "%matplotlib notebook\n", "\n", "# We will be using SciKit-Image library for image IO and image processing and MatPlotLib \n", "# for visualization in the notebook.\n", "from skimage import data, io\n", "from matplotlib import pyplot as plt\n", "\n", "image = io.imread(\"umbrellas.jpg\")\n", "plt.imshow(image) # Draw the image\n", "plt.show() # Display the image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * You can query the dimensions of the matrix using the property `shape`. Observe that a color image has three layers (third dimension), while a grayscale image has only one layer and is also missing the third dimension. Also check the dtype of the matrix, by default images are presented using uint8 type (unsigned integers in range 0 to 255)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(image.shape)\n", "\n", "gray = io.imread(\"phone.jpg\")\n", "print(gray.shape)\n", "\n", "print(image.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Convert the color image into a grayscale one by averaging all three channels. Be careful when visualizing single channel images as they are not always visualized as grayscale. It is important to correctly set the colormap of the plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We will be using numpy library for low-level matrix operations.\n", "import numpy as np\n", "\n", "gray = np.mean(image, 2) # Function numpy.mean converts image to float64\n", "print(gray.dtype) # Display the data type of matrix\n", "\n", "plt.figure()\n", "plt.imshow(gray.astype(np.uint8), cmap = plt.cm.gray) # If we want to display image correctly we have to cast it to uint8 again.\n", "plt.show()\n", "\n", "plt.figure()\n", "plt.imshow(gray.astype(np.uint8), cmap = plt.cm.jet) # Change colormap to interpret values differently.\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Cut out a rectangular sub-image, and display it as a new image. Mark the same region in the original image by setting its third (the blue) color channel to 0, and display the modified image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cutout = image[100:200, 100:200, 1]\n", "plt.imshow(cutout)\n", "plt.show()\n", "\n", "# TODO: set a part of the image to 0 using range-indexing notation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Display a grayscale image that has the selected region negated (its values are inverted)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: use the same region as before. You can use matrix-scalar operations to achieve negation (255 - A)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Compute and display a thresholded binary image. Thresholding is an image operation that produces a binary image (mask) of the same size as the source image; its values are 1 (true) and 0 (false), depending on whether the corresponding pixels in the source image have values greater or lower than the specified threshold. Use a threshold of 150, and display the resulting image. Experiment with different thresholds and write down your observations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "image = io.imread(\"umbrellas.jpg\")\n", "\n", "gray = np.mean(image, 2)\n", "\n", "_, ax = plt.subplots(1, 3, figsize=(9, 4)) # Divide plot into a grid of axes, use each handle to draw on subplot\n", "\n", "ax[0].imshow(image)\n", "ax[1].imshow(gray, cmap=plt.cm.gray)\n", "\n", "thresholded = gray > 150\n", "ax[2].imshow(thresholded, cmap=plt.cm.gray)\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assignment 2: Histograms\n", "----\n", " \n", "In this assignment, we will take a look at the construction of histograms. Histograms are a very useful tool in image analysis; as we will be using them extensively in the later exercises, it is recommended that you pay extra attention to how they are built. In this assignment, we will focus on the construction of histograms for single-channel (grayscale)\n", "images. We will use function `skimage.exposure.histogram` which has some useful modes. Check the documentation for arguments `nbins`, `source_range` and `normalize` and write down their function. What is the meaning of variables returned by the function?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from skimage.exposure import histogram\n", "\n", "image = np.mean(io.imread(\"umbrellas.jpg\"), 2).astype(np.uint8)\n", "hvalues, hbins = histogram(image, nbins=256, source_range='dtype', normalize=False)\n", "\n", "print(hvalues)\n", "print(hbins)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Compute histogram for more than one image of your choice and visualize the results using `matplotlib.pyplot.bar`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "_, ax = plt.subplots(2, 1, figsize=(4, 8))\n", "\n", "ax[0].imshow(image, cmap = plt.cm.gray)\n", "ax[1].bar(hbins, hvalues) # Use this function to draw a bar graph\n", "plt.show()\n", "\n", "# TODO: add at least two more images and their histograms to the left of the figure." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * For this task, you will implement a simple image operation called `histogram stretching`. Using the pseudo-code provided below, implement the function `histstretch`, which performs the histogram stretching on the input grayscale image. For the outline of the algorithm, consult the slides from the lectures. As we are performing the same operation (with the same factors) on all image elements, the operation can be sped-up via use of matrix operations, which perform the operation on the whole image at once. Do not use any existing function to simplify your work, this assignment should test your skills a bit.\n", " \n", " Hints: The maximum and the minimum grayscale value in the input image, can be determined using functions `np.max` and `np.min`.\n", " \n", " Test the function by writing a script that reads an image from file `phone.jpg` (note that it is already a grayscale image), compute the histogram with 256 bins, and displays it. As you can observe from the histogram, the lowest grayscale value in the image is not 0, and the highest value is not 255. Perform the histogram stretching operation and visualize the results (display the image and plot its 256-bin histogram)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def histstretch(image):\n", " # TODO:\n", " # 1. determine the minimum and maximum value of I\n", " # 2. the minimum and maximum of the output image S are known\n", " # 3. use the stretch formula to compute new value for each pixel\n", " pass\n", " \n", "# TODO: write a code to test your function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assignment 3: Color spaces\n", "------\n", " \n", "The color information can be encoded using different color spaces, with each color space having its own characteristics. This assignment will demonstrate how a relatively simple conversion between the RGB and the HSV color spaces helps us achieve interesting results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Read image from the file `trucks.jpg`. Display the image on screen, both as a color image in the RGB color space, and each of its channels as a separate grayscale image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "image = io.imread(\"trucks.jpg\")\n", "\n", "_, ax = plt.subplots(1, 4, figsize=(9, 3))\n", "\n", "ax[0].imshow(image)\n", "ax[0].set_title(\"Color\")\n", "ax[1].imshow(image[:, :, 0], cmap = plt.cm.gray)\n", "ax[1].set_title(\"Red\")\n", "ax[2].imshow(image[:, :, 1], cmap = plt.cm.gray)\n", "ax[2].set_title(\"Green\")\n", "ax[3].imshow(image[:, :, 2], cmap = plt.cm.gray)\n", "ax[3].set_title(\"Blue\")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Convert the image from the RGB color space to the HSV color space, using the built-in function `skimage.color.rgb2hsv`, and display each channel as a separate grayscale image. When working with resulting matrices, take a note of their type; the original image in the RGB color space is stored in a matrix of type `uint8`, while the converted image is stored in a matrix of type `float64` (real values in range 0 to 1). How do you interpret the channels of the HSV color space with respect to the original RGB channels?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from skimage.color import rgb2hsv\n", "\n", "rgbimage = io.imread(\"trucks.jpg\")\n", "\n", "hsvimage = rgb2hsv(rgbimage)\n", "\n", "_, ax = plt.subplots(1, 5, figsize=(9, 3))\n", "\n", "print(image.dtype)\n", "\n", "ax[0].imshow(rgbimage)\n", "ax[0].set_title(\"Color (RGB)\")\n", "ax[1].imshow(hsvimage[:, :, 0], cmap = plt.cm.gray)\n", "ax[1].set_title(\"Hue\")\n", "ax[2].imshow(hsvimage[:, :, 1], cmap = plt.cm.gray)\n", "ax[2].set_title(\"Saturation\")\n", "ax[3].imshow(hsvimage[:, :, 2], cmap = plt.cm.gray)\n", "ax[3].set_title(\"Value\")\n", "ax[4].imshow(hsvimage) # Do not do this, image will be interpreded as RGB, result will be strange.\n", "ax[4].set_title(\"Color (HSV as RGB)\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Different color spaces are also useful when we wish to threshold the image. For example, in the RGB color space, it is difficult to determine regions that belong to a certain shade of a color. To demonstrate this, load the image from file `trucks.jpg`, and threshold its blue channel with the threshold value of 200. Display the original and the thresholded image next to each other." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * If we want to extract a custom color shade, it is much more intuitive to convert the image to the HSV color space. Modify your script so that the image is converted from the RGB to the HSV color space, and perform thresholding on the Hue channel. As the blue color occupies only a limited portion of the hue value range, we need to apply two thresholds - an upper and a lower one. For numpy matrices this can be done by applying a logical function of two masks (obtained with two different thresholds), e.g., `AB = np.logical_and(A, B)`.\n", "\n", " Experiment with different threshold values to find the optimal ones, and display the resulting thresholded image. To display the masked color in an easily interpretable way, you can use the function \\fun{isolate\\_color} that is included in the exercise material.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# To determine the threshold values for the blue color, you can use the following code snippet,\n", "# that will display the color spectrum corresponding to the whole hue component.\n", "\n", "plt.figure()\n", "plt.imshow(np.meshgrid(np.linspace(0, 1, 255), np.ones((10, 1)))[0], cmap=plt.cm.hsv)\n", "plt.show()\n", "\n", "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **(5 points)** How would you make the thresholding in the HSV color space more robust? Hint: Why is it difficult to determine the hue for some regions? Could you use an additional information to extract such regions? Verify your solution by implementing it.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * **(5 points)** How would you use the HSV color space to perform the histogram stretching operation to improve the contrast, but without distorting the colors? Find a color image with weak contrast and write a script that demonstrates your solution on it. Use `skimage.exposure.equalize_hist` to improve image, but note that the function only works correctly on grayscale images, so an extra step is required to achieve correct adjustment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO\n", "\n", "from skimage.exposure import equalize_hist\n", "from skimage.color import hsv2rgb\n", "\n", "rgbimage = io.imread(\"umbrellas.jpg\")\n", "\n", "hsvimage = rgb2hsv(rgbimage)\n", "\n", "hsvimage[:, :, 2] = hsvimage[:, :, 2] / 0.1\n", "rgbimage = hsv2rgb(hsvimage)\n", "eimg = equalize_hist(rgbimage)\n", "\n", "plt.figure()\n", "plt.imshow(eimg)\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assignment 4: Homography\n", "-----\n", " \n", "A homography is a bijective transformation between two projection spaces, in our case planes; the first plane is the source image plane, while the second plane is defined by input points that denote an area in which we wish to embed the source image. A homography s described by a matrix; in case of transformation between two planes, it has dimension\n", "3 × 3." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Write a script in which you read and display the image from file `monitor.jpg`, and determine a polygon of four points. It is recommended that you pick the four points that correspond to the corners of the monitor in the image. Determine a suitable order of points (for example, begin at top-right corner and continue in counter-clockwise direction). Afterwards, display a polygon that is defined by the selected points (e.g. set the pixels within the polygon to white as shown in the example below). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "image = io.imread(\"monitor.jpg\")\n", "\n", "p2 = [(50, 50), (400, 50), (500, 500), (40, 400)] #TODO: change destination point coordinates here\n", "fig = plt.figure()\n", "io.imshow(image)\n", "xs, ys = zip(*p2) # Convert points to vertices of x and y coordinates (handy function in Python)\n", "plt.fill(xs, ys, edgecolor='r', fill=False)\n", "plt.show()\n", "\n", "def onclick(event):\n", " value = str(event) # Dynamically update the text box above\n", " print(value)\n", "\n", "# Create an hard reference to the callback not to be cleared by the garbage collector\n", "ka = fig.canvas.mpl_connect('button_press_event', onclick)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * Create a script that reads an image and uses the its dimensions and the selected points to compute the homography, using the function estimate_homography. This function returns the homography matrix H, which can be used to transform pixel-coordinates from the source image plane using the following formula:\n", "\\begin{equation}\n", "\\label{eq:homography1}\n", "p^\\prime_b = \\mathbf{H}_{ab}p_a \\,\n", "\\end{equation}\n", "with the following individual parts of the equation:\n", "\\begin{equation}\n", "\\label{eq:homography2}\n", "p_a = \\begin{bmatrix} x_a\\\\y_a\\\\1\\end{bmatrix}, p^\\prime_b = \\begin{bmatrix} w^{\\prime}x_b\\\\w^{\\prime}y_{b}\\\\w^{\\prime}\\end{bmatrix}, \\mathbf{H}_{ab} = \\begin{bmatrix} h_{11}&h_{12}&h_{13}\\\\h_{21}&h_{22}&h_{23}\\\\h_{31}&h_{32}&h_{33} \\end{bmatrix}.\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def estimate_homography(p1, p2):\n", " A = np.zeros((8,9))\n", " # Homography matrix\n", " for i in range(4): # Using the corners\n", " A[i*2,:] = [ p1[i][0], p1[i][1], 1, 0, 0, 0, -p2[i][0]*p1[i][0], -p2[i][0]*p1[i][1], -p2[i][0] ]\n", " A[i*2+1,:] = [0, 0, 0, p1[i][0], p1[i][1], 1, -p2[i][1]*p1[i][0], -p2[i][1]*p1[i][1], -p2[i][1] ]\n", "\n", " [U,S,V] = np.linalg.svd(A)\n", " return np.reshape(V[-1,:],(3,3))\n", "\n", "p1 = [(0, 0), (100, 0), (100, 100), (0, 100)] # Some dummy source points\n", "\n", "H = estimate_homography(p1, p2)\n", "\n", "print(H)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * As can be seen from equations above, the resulting point $p^\\prime_b$ is in homogenous form, i.e. before we can use its coordinates, they need to be divided by $w^{\\prime}$. Homography equation allows you to transform the coordinates of each pixel from the original image to its destination coordinates in the target image. Replace the pixels in the target image with the corresponding pixels from your source image.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "destination = io.imread(\"monitor.jpg\")\n", "source = io.imread(\"trucks.jpg\")\n", "\n", "p1 = [(0, 0), (source.shape[1], 0), (source.shape[1], source.shape[0]), (0, source.shape[0])]\n", "print(p2)\n", "H = estimate_homography(p1, p2)\n", "\n", "# * Iterate over the source image and map every source pixel to destination one:\n", "# * Generate its homogeneous coordinate and project it using matrix H\n", "# * Obtain coordinates in the destination image by normalizing x an y with w and rounding\n", "# * Copy values from source to destination (be careful about the out-of-image coordinates)\n", "# * Display the resulting image\n", "\n", "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " * **(10 points)** What is the problem with the pixel mapping approach that we have used in the previous task? Write a better pixel mapping that you have discussed at the lectures that will not have the same problems." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Introspection\n", "----\n", "\n", "Edit this cell to write your thoughts about the exercise, where did you have problems and what were your findings. It is mandatory that you write at least a few sentences." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }