Home [DSP] Section 5: 이산 푸리에 변환, 스펙토그램, STFT
Post
Cancel

[DSP] Section 5: 이산 푸리에 변환, 스펙토그램, STFT

[DSP] Section 5

[Ref | AudioSignalProcessingForML]

13. Discrete Fourier Transform (DFT)

Digitalization

Digital signal

  • $g(t) \rightarrow x(n)$
  • $t=nT$

Building a discrete Fourier transform

$\hat g(f) = \int g(t) \cdot e^{-i2\pi ft} dt$
$\hat x(f) = \sum_n x(n) \cdot e^{-i2\pi fn}$

!!Problem!!

  1. Continuous
  2. Freq. + Inf. Time

Hack 1: Time

  • consider $f$ to be non 0 in a finite time interval
  • x(0), x(1), …, x(N-1)

Hack 2: Freq.

  • compute transform for finite # of freq.
  • # feq (M) = # samples (N)

Hacking

  • $\hat x(f) = \sum_{n=0}^{N-1}x(n) \cdot e^{-i2\pi fn}$
  • $\hat x(k/N) = \sum_{n=0}^{N-1}x(n) \cdot e^{-i2\pi n {k \over N} }$
  • $F(k) = {k \over NT} = {k s_r \over N}$

Redundancy in DFT

$k \ = \ N/2 \ \rightarrow \ F(N/2) \ = \ s_r / 2$ -> Nyquist Frequency

From DFT to Fast Fourier Transform

  • DFT is computationally expensive ($N^2$)
  • FFT is more efficient ($Nlog_2N$)
  • FFT exploits redundancies across sinusoids
  • FFT works when N is a power of 2

14. Extracting Spectrograms from Audio with Python

1
2
3
4
5
import os
import numpy as np
import matplotlib.pyplot as plt
import librosa
import IPython.display as ipd
1
2
c:\ProgramData\Anaconda3\lib\site-packages\paramiko\transport.py:219: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
1
2
3
4
5
6
BASE_DIR = r"./audio/"

voiolin_file = "violin_c.wav"
sax_file = "sax.wav"
piano_file = "piano_c.wav"
noise_file = "noise.wav"
1
ipd.Audio(os.path.join(BASE_DIR, voiolin_file))
1
ipd.Audio(os.path.join(BASE_DIR, sax_file))
1
ipd.Audio(os.path.join(BASE_DIR, piano_file))
1
ipd.Audio(os.path.join(BASE_DIR, noise_file))
1
2
3
4
5
6
violin_c4, sr = librosa.load(os.path.join(BASE_DIR, voiolin_file))
sax_c4, sr = librosa.load(os.path.join(BASE_DIR, sax_file))
piano_c5, sr = librosa.load(os.path.join(BASE_DIR, piano_file))
noise, sr = librosa.load(os.path.join(BASE_DIR, noise_file))

sr
1
22050
1
violin_c4.shape
1
(59772,)
1
2
# FT
violin_ft = np.fft.fft(violin_c4)
1
violin_ft.shape
1
(59772,)
1
violin_ft[0]
1
(-0.8536464284237131+1.4155343563970746e-15j)
1
magnitude_spectrum_violin = np.abs(violin_ft)
1
magnitude_spectrum_violin[0]
1
0.8536464284237131
1
2
3
4
5
6
7
8
9
10
11
def plot_magnitude_spectrum(signal, title, sr):
    ft = np.fft.fft(signal)
    magnitude_spectrum = np.abs(ft)
    # plot magnitude spectrum
    plt.figure(figsize=(10, 3))
    
    frequeny = np.linspace(0, sr, len(magnitude_spectrum))
    plt.plot(frequeny, magnitude_spectrum)
    plt.xlabel("Frequency (Hz)")
    plt.title(title)
    plt.show()
1
plot_magnitude_spectrum(violin_c4, "violin", sr)

png

1
2
3
4
5
6
7
8
9
10
11
12
13
def plot_magnitude_spectrum(signal, title, sr, f_ratio=1):
    ft = np.fft.fft(signal)
    magnitude_spectrum = np.abs(ft)
    # plot magnitude spectrum
    plt.figure(figsize=(10, 3))
    
    frequeny = np.linspace(0, sr, len(magnitude_spectrum))
    num_frequency_bins = int(len(frequeny)*f_ratio)
    
    plt.plot(frequeny[:num_frequency_bins], magnitude_spectrum[:num_frequency_bins])
    plt.xlabel("Frequency (Hz)")
    plt.title(title)
    plt.show()
1
plot_magnitude_spectrum(violin_c4, "violin", sr, 1)

png

1
plot_magnitude_spectrum(violin_c4, "violin", sr, 0.1)

png

1
plot_magnitude_spectrum(sax_c4, "sax", sr, 0.1)

png

1
plot_magnitude_spectrum(piano_c5, "piano", sr, 0.1)

png

1
plot_magnitude_spectrum(noise, "noise", sr, 0.1)

png

15. Short-Time Fourier Transfom explained easily

Fourier Transform Problem

  • We Know WHAT, We don’t Know WHEN

STFT intution

Windowing

  • Apply windowing function to signal
  • $x_w(k) = x(k) \cdot w(k)$

    Overlapping frames

  • hop size (H)

From DFT to STFT

  • $\hat x(k) = \sum_{n=0}^{N-1}x(n) \cdot e^{-i2\pi n {k \over N} }$
  • $S(m, k) = \sum+{n=0}^{N-1}x(n+mH) \cdot w(n) \cdot e^{-i2\pi n {k \over N} }$

k: 근사한 주파수
m: frame # mH: 현재 프레임의 시작 위치

STFT에서는 N이 윈도우의 전체 길이에 해당

Outputs

  • DFT
    • Spectral vector (# frequency bins)
    • N complex Fourier coefficients
  • STFT
    • Spectral matrix (# frequency bins, # frames)
    • Complex Fourier coefficients
  • # frequency bins = ${framesize \over 2} + 1$
  • # frames = ${ {samples - framesize} \over hopsize} + 1$

Example STFT output

  • signal = 10k samples
  • Frame size = 1000
  • Hop size = 500

# frequency bins = 1000 / 2 + 1 = 501 -> (0, sampling rate/2)
# frames = (10000 - 1000) / 500 + 1 = 19

STFT -> (501, 19)

STFT parameters

  • Frame size
  • Hop size
  • Windowing function

Time / frequency trade off

  • $frame \ size \propto freq \ resolution \propto 1/time \ resolution$

Han windwo

$w(k) = 0.5 \cdot (1-cos( { {2\pi k} \over {K-1} } ) ), \ \ \ k = 1 … K$

Visualising sound

  • $Y(m, k) =S(m, k)^2$
  • heatmap (spectrogram)
This post is licensed under CC BY 4.0 by the author.

[DSP] Section 4: 푸리에 변환

23년 ~4월 1주차 주간 회고

Comments powered by Disqus.