[DSP] Section 1
[REF | AudioSignalProcessingForML]
1. Overview
- 이미지 딥러닝에 대한 정보는 많지만, 오디오 관련된 정보는 적은 편임
Main Q?
- 해당 기술(audio processing)을 어디에 적용할 수 있는가?
- Audio classification
- Speech Recognition / Speaker Verification
- Audio Denoising / Audio Upsampling
- Music Information Retrieval
- Music Intrument Classification
- Mood Classification
- …
- …
Content
- Sound Waves
- DAC / ADC
- Time-and Frequency-domain audio featues (e.g., rms, spectral centroid, …)
- Audio Transformaions
- FT / STFT
- Constant-Q
- Mel Spectograms
- Chromograms
- …
What you’ll learn
- Get a deep understanding of audio data
- Familiarise with freq./time-domain audio featues
- Extract features from raw audio
- Recognise what audio features to use for ML applications
- Preprocess audio data for ML
- Understand math behind audio transformations
- Use
librosa
for audio projects
2. Sound and waveforms
Sound
- Produced by vibration of an object
- Vibrations cause air molecules to oscillate
- Change in air pressure creates a wave
Mechanical wave
- Oscillation that travels through space
- Energy travels from one point to another
- The medium is deformed
Sound Wave
- Atmospheric Pressure
- Compression
- Rarefaction
Waveform
- 소리의 압력 변화
- 오디오의 특징(features)를 결정하는 중요한 기본
- Frequency
- Intensity
- Timbre
Periodic and Aperiodic Sound
- All Waveforms
- Periodic
- Simple (Single SineWaves)
- Complex (Multiple SineWaves)
- Aperiodic
- Continuous (Noise)
- Transient (Pulse)
- Periodic
Frequency and Amplitude
- Freq. -> higer
- Amp. -> louder
Pitch
- Logarithmic perception
- 2 frequencies are perceived similarly if thet differ by a power of 2
- A4(440Hz) / A5(880Hz)
Mapping pitch to freq.
\(F(p) = 2^{ {p-69} \over 12 } \times 440\) \(F(p+1)/F(p) = 2^{1/12} = 1.059\)
Cents
- 특정한 두 음의 음고(Pitch) 높낮이의 거리를 로그(Logarithm, log) 스케일로 표시하는 단위
- Octave divided in 1200 cents
- 100 cents in a semitone
- Noticeable pitch difference: 10-25 cents
3. Intensity, loudness, and timbre (세기와 음색)
Sound Power
- Rate at which energy is transferred
- Energy per unit of time emitted by a sound source in all directions
- Measured in watt (W)
Sound Intensity
- Sound power per unit area
- Measurd in $W/m^2$
Threshold of Hearing
- $TOH = 10^{-12}W/m^2$
Threshold of Pain
- $TOP = 10W/m^2$
Intensity level
- Logarithmic scale
- Measured in decibels (dB)
- Ration between two intensity values
- Use an intensity of reference (TOH)
$dB(I) = 10 \log_{10}({I \over I_{TOH} })$
- ~3dBs -> intensity doubles
Loudness
- Subjective perception of sound intensity
- Depends on duration / frequency of a sound
- Depends on age
- Measured in phons (복잡한 소리에 대한 음량 레벨의 로그 단위)
Timbre
- Colour of sound
- Diff between two sounds with same intensity, frequency, duration
What are the features of timbre?
- Timbre is multidimensional
- Sound envelope
- Harmonic content
Sound envelope
- Attack-Decay-Sustain-Release (ADSR) Model
Complex sound
- Superposition of sinusoids
- A partial a sinusoid used to describe a sound
- The lowest partial is called fundamental frequnecy
- A harmonic partial is a frequency that’s multiple of the fundamental frequency
- Inharmonicity indicates a deviation from a harmonic partial
Frequency modulation
- AKA vibrato
- Periodic variation in frequency
- In music, used for expressive purposes
Amplitude modulation
- AKA tremolo
- Periodic variation in amplitude
- In music, used for expressive purposes
Timbre recap
- Multifactorial sound dimension
- Amplitude envelope
- Distribution of energy across partials
- Signal modulation (frequency/amplitude)
Sound recap
- Sound is a wave
- Frequency, Intensity, Timbre
- Pitch, Loudness, Timbre
Comments powered by Disqus.