[DSP] Section 2
[REF | AudioSignalProcessingForML]
4. Understanding Audio Signal
Audio Signal
- Representation of sound
- Encodes all info we need to reproduce sound
!!Problem!!
- Analog vs Digital
Analog Signal
- Continous values for time
- Continous values for amplitude
Digital Signal
- Sequence of discrete values
- Data points can only take on a finite number of values
Analog to Digital Conversion (ADC)
- Sampling
- Quantization
Pulse-code modualation
Sampling
Locating Samples
Nyquist Frequency
- $f_N = {s_r \over 2}$
Aliasing
Quantization
- Resolution = num of bits
- Bit depth
- CD resolution = 16 bits
Memory for 1’ of sound
- $((BitDepth \times SR / 1,048,576)/8) \times 60$
- ${ {BitDepth \times SR} \over 1,048,576} \times {1 \over 8} \times 60$
Dynamic Range
- Difference between largest/smallest signal a system can record
Signal-to-quantizaion-noise ratio
- Relationship between max signal strength and quantization error
- Correlates with dynamic range
- $SQNR \approx 6.02 \times Q$
- Q: Bit Depth
- $SQNR(16) \approx 96dB$
How do we record sound?
- Analog signal -> ADC -> Digital signal
How do we reproduce sound?
- Digital signal -> DAC -> Analog signal
5. Types of audio featues for ML
Why audio features?
- Description of sound
- Different features capture different aspects of sound
- Build intelligent audio systems
Audio feature categorisation
- Level of abstraction
- Temporal scope
- Music aspect
- Signal domain
- ML approach
Level of abstraction
- High-level : intrumentation, key, chords, melody, …
- Mid-level : pitch-and beat-related descriptors, MFCC, …
- Low-level : amplitude envelope, energy, spectral centroid, …
Temporal scope
- Instantaneous (~50ms)
- Segment-level (seconds)
- Global
Music aspect
- Beat
- Timbre
- Pitch
- Harmony
- …
Signal domain
- Time domain
- Amplitude envelope
- RMS energy
- …
- Frequency domain (푸리에 변환 이용)
- Band energy ratio
- Spectral centroid
- Spectral flux
- …
- Time-frequency representation
- Spectrogram
- Mel-spectrogram
- Constant-Q transform
ML approach
- Traditional ML
- DL
Traditional ML (다양한 피처들 중 일부를 골라 분류)
- Amplitude envelope
- RMS energy
- Zero crossing rate
- …
DL
- 피처 전체를 넣음
Types of intelligent audio system
- DSP -> rule-based system
- Tranditional ML -> feature engineering
- DL -> automatic feature extraction
6. How to Extract Audio Features?
Time-domain feature pipeline
- Analog signal -> ADC -> Digital signal -> framing -> feature computation -> aggregation(mean, median, ..) -> feature value/vector/matrix
Frames
- Perceivable audio chunk
- $1 \ sample @ 44.1KHz = 0.0227ms$
- $Duration \ 1 \ sample « Ear’s \ time resolution (10ms)$
- Power of 2 num samples
- Typical values: 256 - 8192
- $d_f = {1 \over s_r} \times K$
- d: duration
- K: frame size
Frequency-domain feature pipeline
- Analog signal -> ADC -> Digital signal -> framing -> windowing -> FT -> feature computation -> aggregation(mean, median, ..) -> feature value/vector/matrix
Spectral leakage
- Processed signal isn’t an integer number of periods
- Endpoints are discontinous
- Discontinuities appear as high-frequency components not present in the original signal
Windowing
- Apply windowing function to each frame
- Eliminates samples at both ends of a frame
- Generates a periodic signal
Hann window
- $w(k) = 0.5 \times (1 - \cos( { {2\pi k} \over K-1 } ) ), k=1…K$
- k: sample rate
- 윈도우 적용:
- $s_w(k) = s(k) \times w(k), k=1…K$
- 프레임을 이어붙이면 신호가 사라지는 문제가 있음
- -> Overlapping frames으로 해결 가능
Comments powered by Disqus.