[DSP] Section 7 [REF | AudioSignalProcessingForML]
19. MFCC(Mel-Spectogram)s Explained Easily Mel-Frequncy Cepstral Coefficients Cepstral > Cepstrum > Spectrum !!NOTE!! |Cepstrum|Quefrency|Liftering|Rhamonic| |Specturm|Frequency|Filtering|Harmonic|
Computing the cepstrum $C( x(t) ) = F^{-1} [log(F[x (t) ] ) ]$
!!NOTE!! Glottal pusle > Vocal Tract > Speech signal (성문 펄스 > 성대 > 음성 신호) Log-spectrum > Spectral envelope > Spectral detail Formants = Carry identity of sound : Spectral envelope의 윗 부분음성 과학 및 음성학에서 포만트는 인간 성대의 음향 공명으로 인한 광범위한 스펙트럼 최대값 Speech > Vocal tract freq. response > Glottal pulse Speech = Convolution of vocal tract freq. response with glottal pulse $X(t)=E(t) \cdot H(t)$ $\log(X (t) ) = \log( E(t) \cdot H(t) )$ $\log(X (t) ) = \log( E(t) ) + \log( H(t) )$ Speech = Glottal pulse + Vocal tract freq. response
Goal: Separating components 로우-필터를 적용해서, 고주파수를 없앰 ($E(t)$ 제거) Computing Mel-Frequency Cepstral Coefficients Waveform > DFT > Log-Amplitude Spectrum > Mel-Scaling > Discrete Cosine Transform > MFCCs Simplified version of FT Get real-valued coefficient Decorrelate energy in different mel bands Reduce # dimension to represent spectrum How many coefficients? Traditionally: first 12-13 coefficients First coefficients keep most information (e.g, formants, spectral envelope) Use $\Delta$ and $\Delta \Delta$ MFCCs Total 39 coefficients per frame MFCCs advantages Describe the “large” structures of the spectrum Ignore fine spectral structures Work well in speech and music processing MFCCs disadvantages Not robust to noise Extensive knowledge engineering Not efficient for synthesis MFCCs applications Speech processingSpeech recognition Speaker recognition … Music processingMusic genre classification Mood classification Automatic taggin … 1
2
3
4
5
6
import os
import librosa
import librosa.display
import IPython.display as ipd
import matplotlib.pyplot as plt
import numpy as np
1
2
3
4
base_dir = r " ./raw/20_audio/ "
audio_file = os . path . join ( base_dir , " debussy.wav " )
ipd . Audio ( audio_file )
Your browser does not support the audio element. 1
2
3
signal , sr = librosa . load ( audio_file )
print ( signal . shape )
1
2
3
mfccs = librosa . feature . mfcc ( signal , n_mfcc = 13 , sr = sr )
print ( mfccs . shape )
Visualise MFCCs 1
2
3
4
5
6
7
plt . figure ( figsize = ( 12 , 5 ))
librosa . display . specshow ( mfccs ,
x_axis = " time " ,
sr = sr )
plt . colorbar ( format = " %+2f " )
plt . title ( " MFCCs " )
plt . show ()
Calculate delta and delta2 MFCCs 1
2
3
4
delta_mfccs = librosa . feature . delta ( mfccs )
delta2_mfccs = librosa . feature . delta ( mfccs , order = 2 )
print ( f " delta: { delta_mfccs . shape } \n delta2: { delta_mfccs . shape } " )
1
2
delta: (13, 1292)
delta2: (13, 1292)
1
2
3
4
5
6
7
plt . figure ( figsize = ( 12 , 5 ))
librosa . display . specshow ( delta_mfccs ,
x_axis = " time " ,
sr = sr )
plt . colorbar ( format = " %+2f " )
plt . title ( " MFCCs Delta " )
plt . show ()
1
2
3
4
5
6
7
plt . figure ( figsize = ( 12 , 5 ))
librosa . display . specshow ( delta2_mfccs ,
x_axis = " time " ,
sr = sr )
plt . colorbar ( format = " %+2f " )
plt . title ( " MFCCs Delta2 " )
plt . show ()
1
2
3
comprehensive_mfccs = np . concatenate (( mfccs , delta_mfccs , delta2_mfccs ))
print ( comprehensive_mfccs . shape )
21. Frequency-Domain Audio Features Freq.-domain features Band energy ratio (BER) Spectral centroid (SC) Bandwidth (BW) … Extracting freq.-domain features Waveform > (STFT) > Spectogram > Feature Computation Math convetions $m_t(n)$ -> Magnitude of signal at freq. bin $n$ and frame $t$ $N$ -> # freq. bins Band Energy Ratio (BER) Comparison of energy in the lower/higher freq. bands Measure of how dominant low frequencies are \(BER_t = { {\sum_{n=1}^{F-1} m_t(n)^2} \over {\sum_{n=F}^N m_t(n)^2} }\)
$m_t(n)^2$: power of t, n $F$: Split freq. $\sum_{n=1}^{F-1} m_t(n)^2$: Power in the lower freq. bands $\sum_{n=F}^N m_t(n)^2$: Power in the higher freq. bands BER applications Music/Speech discimination (구별) Music Classification Spectral centroid (SC) Centre of gravity of magnitude spectrum Freq. band where most of the energy is concentrated Measure of “brightness” of sound Weighted mean of the freq. \(SC_t = { {\sum_{n=1}^N m_t(n) \cdot n} \over {\sum_{n=1}^N m_t(n)} }\)
$n$: freq. bin $m_t(n)$: Weight for n SC applications Audio Classification Music Classification Bandwidth Derived from spectral centroid Spectral range around the centroid Variance from the spectral centroid Describe perceived timbre Weighted mean of the distances of freq. bands from SC Energy spread across frequency bands $\propto BW_t$ \(BW_t = { {\sum_{n=1}^N \left| n-SC_t \right| \cdot m_t(n) } \over {\sum_{n=1}^N m_t(n) } }\)
$m_t(n)$: Weight for n $\left n-SC_t \right $: Distance of freq. band from spectral centroid
BW applications 22. Implementing Band Energy Ratio from Scratch with Python 1
2
3
4
5
6
7
8
9
import math
import matplotlib.pyplot as plt
import numpy as np
import librosa
import librosa.display
import IPython.display as ipd
debussy_file = " ./raw/22_audio/debussy.wav "
redhot_file = " ./raw/22_audio/redhot.wav "
1
ipd . Audio ( debussy_file )
Your browser does not support the audio element. Your browser does not support the audio element. 1
2
debussy , sr = librosa . load ( debussy_file )
redhot , _ = librosa . load ( redhot_file )
1
2
3
4
5
6
7
FRAME_SIZE = 2048
HOP_SIZE = 512
debussy_spec = librosa . stft ( debussy , n_fft = FRAME_SIZE , hop_length = HOP_SIZE )
redhot_spec = librosa . stft ( redhot , n_fft = FRAME_SIZE , hop_length = HOP_SIZE )
print ( debussy_spec . shape )
Calculate Band Energy Ratio 1
2
3
4
5
def calculate_split_frequency_bin ( split_frequency , sample_rate , num_frequency_bins ):
frequency_range = sample_rate / 2
frequency_delta_per_bin = frequency_range / num_frequency_bins
split_frequency_bin = math . floor ( split_frequency / frequency_delta_per_bin )
return int ( split_frequency_bin )
1
2
split_frequency_bin = calculate_split_frequency_bin ( 2000 , 22050 , 1025 )
print ( split_frequency_bin )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def band_energy_ratio ( spectrogram , split_frequency , sample_rate ):
split_frequency_bin = calculate_split_frequency_bin ( split_frequency , sample_rate , len ( spectrogram [ 0 ]))
band_energy_ratio = []
# calculate power spectrogram
power_spectrogram = np . abs ( spectrogram ) ** 2
power_spectrogram = power_spectrogram . T
# calculate BER value for each frame
for frame in power_spectrogram :
sum_power_low_frequencies = frame [: split_frequency_bin ]. sum ()
sum_power_high_frequencies = frame [ split_frequency_bin :]. sum ()
band_energy_ratio_current_frame = sum_power_low_frequencies / sum_power_high_frequencies
band_energy_ratio . append ( band_energy_ratio_current_frame )
return np . array ( band_energy_ratio )
1
2
3
4
5
ber_debussy = band_energy_ratio ( debussy_spec , 2000 , sr )
ber_redhot = band_energy_ratio ( redhot_spec , 2000 , sr )
print ( f " { debussy_spec . T . shape } " )
print ( f " { ber_debussy . shape } " )
Visualise Band Energy Ratio curves 1
2
3
4
frames = range ( len ( ber_debussy ))
t = librosa . frames_to_time ( frames , hop_length = HOP_SIZE )
print ( len ( t ))
1
2
3
4
5
6
plt . figure ( figsize = ( 15 , 5 ))
plt . plot ( t , ber_debussy , color = " b " )
plt . plot ( t , ber_redhot , color = " r " )
plt . show ()
23. Spectral centroid and bandwidth 1
2
3
4
5
6
7
debussy_file = " ./raw/23_audio/debussy.wav "
redhot_file = " ./raw/23_audio/redhot.wav "
duke_file = " ./raw/23_audio/duke.wav "
debussy , sr = librosa . load ( debussy_file )
redhot , _ = librosa . load ( redhot_file )
duke , _ = librosa . load ( duke_file )
Spectral centroid with Librosa 1
2
3
4
5
6
FRAME_SIZE = 1024
HOP_LENGTH = 512
sc_debussy = librosa . feature . spectral_centroid ( y = debussy , sr = sr , n_fft = FRAME_SIZE , hop_length = HOP_LENGTH )[ 0 ]
sc_rehot = librosa . feature . spectral_centroid ( y = redhot , sr = sr , n_fft = FRAME_SIZE , hop_length = HOP_LENGTH )[ 0 ]
sc_duke = librosa . feature . spectral_centroid ( y = duke , sr = sr , n_fft = FRAME_SIZE , hop_length = HOP_LENGTH )[ 0 ]
Visualise spectral centroid 1
2
3
4
5
6
7
8
9
10
frames = range ( len ( ber_debussy ))
t = librosa . frames_to_time ( frames , hop_length = HOP_LENGTH )
plt . figure ( figsize = ( 15 , 5 ))
plt . plot ( t , sc_debussy , color = " b " )
plt . plot ( t , sc_rehot , color = " r " )
plt . plot ( t , sc_duke , color = " y " )
plt . show ()
Calculate bandwidth 1
2
3
ban_debussy = librosa . feature . spectral_bandwidth ( y = debussy , sr = sr , n_fft = FRAME_SIZE , hop_length = HOP_LENGTH )[ 0 ]
ban_redhot = librosa . feature . spectral_bandwidth ( y = redhot , sr = sr , n_fft = FRAME_SIZE , hop_length = HOP_LENGTH )[ 0 ]
ban_duke = librosa . feature . spectral_bandwidth ( y = duke , sr = sr , n_fft = FRAME_SIZE , hop_length = HOP_LENGTH )[ 0 ]
1
2
3
4
5
6
7
plt . figure ( figsize = ( 15 , 5 ))
plt . plot ( t , ban_debussy , color = ' b ' )
plt . plot ( t , ban_redhot , color = ' r ' )
plt . plot ( t , ban_duke , color = ' y ' )
plt . show ()
Comments powered by Disqus.