# Excercise 3: Audio Processing

To complete the exercise, follow the instructions and complete the missing code and write the answers where required. All points, except the ones marked with (N points) are mandatory. The optional tasks require more independent work and some extra effort. Without completing them you can get at most 75 points for the exercise (the total number of points is 100 and results in grade 10). Sometimes there are more optional exercises and you do not have to complete all of them, you can get at most 100 points.

In this exercise, you will generate simple sounds, vary their parameters and perform frequency analysis. You will also familiarize yourself with basic audio filtering and effects.


In [8]:
%matplotlib notebook

import scipy
import numpy as np
import matplotlib.pyplot as plt

# Import library for sound visualization
import IPython.display as ipd
# Import librosa to work with sound
import librosa as lb
import librosa.display as lbd
import librosa.feature as lbf
import soundfile

In [2]:
# Run this cell to download the data used in this exercise
import zipfile, urllib.request, io
zipfile.ZipFile(io.BytesIO(urllib.request.urlopen("http://data.vicos.si/lukacu/multimedia/exercise3.zip").read())).extractall()

## Assignment 1: Generating sounds

The first assignment will focus on generating simple waveforms, plotting them and playing
them via speakers. It consists of three subtasks in total.

a) Generate a sine wave and plot it. The sine wave is a function of time

\begin{equation}
f(t) = A \sin{(\omega t + \phi)}
\end{equation}

where $A$ is the amplitude (from 0 to 1), $\omega$ is the angular frequency (i.e. the frequency in Hz multiplied by $2\pi$), and $\phi$ is the phase (in radians). Use the standard sampling frequency of 44.1 kHz. That means that you have to calculate the value of the waveform 44100 times for each second of your recording.

Note: Plot only the first oscillation of the selected sine wave.

![image.png](attachment:image.png)


In [17]:
sr = 44100 # Sampling rate

# TO-DO: Generate a sine wave y and plot it


Using IPython.display.Audio, you can play an audio signal:

In [4]:
ipd.Audio(y, rate=sr) # sr = sampling rate

librosa.output.write_wav allows you to save the NumPy array of generated audio signal as a WAV file.

In [10]:
# y = audio signal
# sr = sample rate
soundfile.write('output_audio.wav', y, sr)

The librosa.display.waveplot allows us to plot the amplitude envelope of a waveform. Plot only the first oscillation of the selected sine wave.

Note, that if $y$ is monophonic, a filled curve is drawn between $\left[-\mathrm{abs}(y), \mathrm{abs}(y)\right]$. However, if $y$ is stereo, then the curve is drawn between $\left[\mathrm{abs}(y[1]), \mathrm{abs}(y[0])\right]$, so that the left and right channels are drawn above and below the axis, respectively.

In [11]:
plt.figure(figsize=(14, 5))
lbd.waveplot(y[:100], sr=sr)

b) Sounds encountered in real life situations are never as clean as the sinusoids you generated in the previous assignment. Try adding some noise to the waveform, then plot and listen to the result. Experiment with different types of noise! For better visibility, plot only the first oscillation of the selected waveform.

In [76]:
# TO-DO: Add noise to the selected waveform, plot it and listen to it


In [77]:
ipd.Audio(y_noise1, rate=sr) # sr = sampling rate

In [14]:
ipd.Audio(y_noise2, rate=sr) # sr = sampling rate

c) Harmonics are what gives different instruments their sound color or timbre. They are softer multiples of the primary frequency. Try adding multiples of the primary frequency at a lower amplitude to your sinusoid and listen to it. Experiment with odd and even multiples.

In [75]:
# TO-DO: Add multiples of the primary frequency at different amplitudes to your sinusoid
# plot and listen to the results
#ipd.Audio(y2, rate=sr)


d) $\star$ (10 points) Write a function to generate non-sinusoidal waveforms of your choice like square, triangle or sawtooth. You can also experiment with more exotic sounds, like chirp. Implement at least three different sounds.

In [74]:
# TO-DO: Implement and plot at least three diferent non-sinusoidal waveforms.


## Assignment 2: Frequency analysis

Due to the high sampling rates, visually interpreting the digital audio signal is usually difficult. Transforming the signal to the frequency spectrum allows us to interpret the signal content more directly.

a) Calculate the Fourier transform of a simple waveform using Scipy function fft. You also need to divide the result by the number of points used for the FFT, which is equal to the signal length by default. Since the result is complex and symmetrical, you will only use the positive part to plot the frequency components. Take the absolute value of the result and then use the first $\frac{F_s}{2}$ values (where $F_s$ is the sampling rate) to get useful values. The resulting spectrum should go from $0$ to $\frac{F_s}{2}$, which is the highest theoretical frequency that can be contained in the signal (per the Nyquist theorem). Plot the results for all signals you generated in Assignment 1.

Question: How do the formula parameters influence the frequency spectrum?

In [99]:
# TO-DO: Calculate Fourier transform y_fft of the signal from Assignment 1.c and plot the results
# sample spacing


b) $\star$ (5 points) Aliasing can occur when the signal is sampled too sparsely, which causes high frequencies included in the signal to reflect back to lower spectrum and produce errors in the frequency analysis. Use one of the signals from Assignment 1 and sample it with a frequency below the Nyquist frequency (i.e. the sampling rate should be lower than twice the highest frequency present in the signal). Calculate and plot the frequency spectrum.

Question: Considering the human hearing range, does the standard sampling frequency of 44.1 kHz seem arbitrary?

In [30]:
# TO-DO: Visualize the aliasing problem on one of the signals from Assignment 1
# sample spacing


## Assignment 3: Filtering

Audio signals can be processed to extract or attenuate certain frequency ranges. Since the design of audio filters is a large field, you will only focus on simple low- and high-pass Gaussian filters and their effects on audio signals.

Note: It might be hard to hear the difference when using laptop speakers, therefore consider listening to the result using headphones.


a) Use the included function gaussian_kernel to calculate a kernel for performing a low-pass operation on an audio signal. Use the function np.convolve to perform the filtering. Plot and listen to the result. Choose a signal that will produce obvious results.

![image.png](attachment:image.png)

In [31]:
def gaussian_kernel(width, sigma):
 # width is the width of the produced kernel
 # sigma defines the shape of the Gaussian function
 
 x = np.linspace(-width / 2, width / 2, width)
 y = np.exp(-x ** 2 / (2 * sigma ** 2));
 y = y / np.sum(y); # normalize
 
 return y

In [38]:
# Load simpleLoop audio file. 
y_sl, sr_sl = lb.load('simpleLoop.wav')

# TO-DO: Choose an appropriate Gaussian kernel
# Filter the signal y_sl using the provided gaussian_kernel function and your selected kernel

# Plot the results
plt.figure()
plt.plot(y_sl) # Plot the unfiltered signal
plt.plot(y_filtered) # Plot the filtered signal over it

In [34]:
# Listen to the original audio file
ipd.Audio(y_sl, rate=sr_sl)

In [39]:
# Listen to the filtered audio file
# What do you notice?
ipd.Audio(y_filtered, rate=sr_sl)

b) Convert the low-pass Gaussian kernel into a high-pass filter. This can be achieved by alternately multiplying the kernel coefficients by $-1$. The resulting kernel will remove the low frequency components of the signal an only retain the high frequencies. Test on a sound of your choice. You can also use sounds provided ('simpleLoop.wav', 'piano.wav').

In [47]:
# Load the audio file
y_sl_hp, sr_sl_hp = lb.load('simpleLoop.wav')

# TO-DO: Convert the low-pass Gaussian kernel into a high-pass filter
# Filter the signal y_sl_hp using the provided gaussian_kernel function and your high-pass filter kernel
# Plot the results


In [48]:
# Listen to the filtere audio file
# What do you notice?
ipd.Audio(y_filtered_hp, rate=sr_sl_hp)

## Assignment 4: Effects

Special kinds of filters can also produce other effects. Here you will implement some of them.

a) **Delay**: A delay time-shifts the signal and adds it to itself. Write a function that
introduces a delay of a specified duration. You can again use the function **np.convolve** or perform a delay as a weighted sum of original and shifted signal using **scipy.ndimage.interpolation.shift**. Experiment with different delay values, below and above 100ms. Do you notice a difference?

In [55]:
# TO-DO: write a function that introduces a delay of a specified duration


b) Echo: Echo is a combination of multiple delays combined with attenuation. Write a function that accepts the number of echoes and their corresponding damping factors. Display and play the results.

In [73]:
# TO-DO: write a function that accepts the number of echoes and their corresponding damping factors


c) $\star$ (10 points) Flanger: Is an effect produced by introducing a delay which depends on an outside oscillator. Put simpler, the delay for each sample of the output is not constant but changes based on a sinusoidal function.

In [98]:
# TO-DO


d) $\star$ (10 points) Distortion: This effect changes the frequency content of the signal by adding gain to high energy frequencies and thus producing clipping. Implement it by using the formula from the lecture slides

\begin{equation}
y(n) = \frac{(1 + k) x(n)}{1 + k |x(n)|},
\end{equation}

where k controls the amount of distortion.

Question: How do these effects change the signal in both time and frequency
domains? If you want to complete this task it is important to know this, not to just implement the effect.



In [80]:
# TO-DO
