[DSP] Section 2: 오디오 특성 추출

Posted Feb 28, 2023

By nuyhc

3 min read

[DSP] Section 2

[REF | AudioSignalProcessingForML]

4. Understanding Audio Signal

Audio Signal

Representation of sound
Encodes all info we need to reproduce sound

!!Problem!!

Analog vs Digital

Analog Signal

Continous values for time
Continous values for amplitude

Digital Signal

Sequence of discrete values
Data points can only take on a finite number of values

Analog to Digital Conversion (ADC)

Sampling
Quantization

Pulse-code modualation

Sampling

Locating Samples

$t_n = nT$
Sampling Rate
$s_r = {1 \over T}$

Nyquist Frequency

$f_N = {s_r \over 2}$

Aliasing

Quantization

Resolution = num of bits
Bit depth
CD resolution = 16 bits

Memory for 1’ of sound

$((BitDepth \times SR / 1,048,576)/8) \times 60$
${ {BitDepth \times SR} \over 1,048,576} \times {1 \over 8} \times 60$

Dynamic Range

Difference between largest/smallest signal a system can record

Signal-to-quantizaion-noise ratio

Relationship between max signal strength and quantization error
Correlates with dynamic range
$SQNR \approx 6.02 \times Q$
Q: Bit Depth
$SQNR(16) \approx 96dB$

How do we record sound?

Analog signal -> ADC -> Digital signal

How do we reproduce sound?

Digital signal -> DAC -> Analog signal

5. Types of audio featues for ML

Why audio features?

Description of sound
Different features capture different aspects of sound
Build intelligent audio systems

Audio feature categorisation

Level of abstraction
Temporal scope
Music aspect
Signal domain
ML approach

Level of abstraction

High-level : intrumentation, key, chords, melody, …
Mid-level : pitch-and beat-related descriptors, MFCC, …
Low-level : amplitude envelope, energy, spectral centroid, …

Temporal scope

Instantaneous (~50ms)
Segment-level (seconds)
Global

Music aspect

Beat
Timbre
Pitch
Harmony
…

Signal domain

Time domain
- Amplitude envelope
- RMS energy
- …
Frequency domain (푸리에 변환 이용)
- Band energy ratio
- Spectral centroid
- Spectral flux
- …
Time-frequency representation
- Spectrogram
- Mel-spectrogram
- Constant-Q transform

ML approach

Traditional ML
DL

Traditional ML (다양한 피처들 중 일부를 골라 분류)

Amplitude envelope
RMS energy
Zero crossing rate
…

DL

피처 전체를 넣음

Types of intelligent audio system

DSP -> rule-based system
Tranditional ML -> feature engineering
DL -> automatic feature extraction

6. How to Extract Audio Features?

Time-domain feature pipeline

Analog signal -> ADC -> Digital signal -> framing -> feature computation -> aggregation(mean, median, ..) -> feature value/vector/matrix

Frames

Perceivable audio chunk
- $1 \ sample @ 44.1KHz = 0.0227ms$
- $Duration \ 1 \ sample « Ear’s \ time resolution (10ms)$
Power of 2 num samples
Typical values: 256 - 8192
- $d_f = {1 \over s_r} \times K$
- d: duration
- K: frame size

Frequency-domain feature pipeline

Analog signal -> ADC -> Digital signal -> framing -> windowing -> FT -> feature computation -> aggregation(mean, median, ..) -> feature value/vector/matrix

Spectral leakage

Processed signal isn’t an integer number of periods
Endpoints are discontinous
Discontinuities appear as high-frequency components not present in the original signal

Windowing

Apply windowing function to each frame
Eliminates samples at both ends of a frame
Generates a periodic signal

Hann window

$w(k) = 0.5 \times (1 - \cos( { {2\pi k} \over K-1 } ) ), k=1…K$
k: sample rate
윈도우 적용:
- $s_w(k) = s(k) \times w(k), k=1…K$
프레임을 이어붙이면 신호가 사라지는 문제가 있음
- -> Overlapping frames으로 해결 가능

This post is licensed under CC BY 4.0 by the author.

Recently Updated

Trending Tags

Pandas Deep Learning Tutorial Seaborn PyTorch matplot Numpy GitHub Machine Learning sklearn

Contents

Comments powered by Disqus.

Trending Tags

Pandas Deep Learning Tutorial Seaborn PyTorch matplot Numpy GitHub Machine Learning sklearn

A new version of content is available.