Home [DSP] Section 1: 소리의 특성
Post
Cancel

[DSP] Section 1: 소리의 특성

[DSP] Section 1

[REF | AudioSignalProcessingForML]

1. Overview

  • 이미지 딥러닝에 대한 정보는 많지만, 오디오 관련된 정보는 적은 편임

Main Q?

  • 해당 기술(audio processing)을 어디에 적용할 수 있는가?
    1. Audio classification
    2. Speech Recognition / Speaker Verification
    3. Audio Denoising / Audio Upsampling
    4. Music Information Retrieval
      1. Music Intrument Classification
      2. Mood Classification

Content

  • Sound Waves
  • DAC / ADC
  • Time-and Frequency-domain audio featues (e.g., rms, spectral centroid, …)
  • Audio Transformaions
    • FT / STFT
    • Constant-Q
    • Mel Spectograms
    • Chromograms

What you’ll learn

  • Get a deep understanding of audio data
  • Familiarise with freq./time-domain audio featues
  • Extract features from raw audio
  • Recognise what audio features to use for ML applications
  • Preprocess audio data for ML
  • Understand math behind audio transformations
  • Use librosa for audio projects

2. Sound and waveforms

Sound

  • Produced by vibration of an object
  • Vibrations cause air molecules to oscillate
  • Change in air pressure creates a wave

Mechanical wave

  • Oscillation that travels through space
  • Energy travels from one point to another
  • The medium is deformed

Sound Wave

  • Atmospheric Pressure
  • Compression
  • Rarefaction

Waveform

  • 소리의 압력 변화
  • 오디오의 특징(features)를 결정하는 중요한 기본
    • Frequency
    • Intensity
    • Timbre

Periodic and Aperiodic Sound

  • All Waveforms
    • Periodic
      • Simple (Single SineWaves)
      • Complex (Multiple SineWaves)
    • Aperiodic
      • Continuous (Noise)
      • Transient (Pulse)

Frequency and Amplitude

  • Freq. -> higer
  • Amp. -> louder

Pitch

  • Logarithmic perception
  • 2 frequencies are perceived similarly if thet differ by a power of 2
  • A4(440Hz) / A5(880Hz)

Mapping pitch to freq.

\(F(p) = 2^{ {p-69} \over 12 } \times 440\) \(F(p+1)/F(p) = 2^{1/12} = 1.059\)

Cents

  • 특정한 두 음의 음고(Pitch) 높낮이의 거리를 로그(Logarithm, log) 스케일로 표시하는 단위
  • Octave divided in 1200 cents
  • 100 cents in a semitone
  • Noticeable pitch difference: 10-25 cents

3. Intensity, loudness, and timbre (세기와 음색)

Sound Power

  • Rate at which energy is transferred
  • Energy per unit of time emitted by a sound source in all directions
  • Measured in watt (W)

Sound Intensity

  • Sound power per unit area
  • Measurd in $W/m^2$

Threshold of Hearing

  • $TOH = 10^{-12}W/m^2$

Threshold of Pain

  • $TOP = 10W/m^2$

Intensity level

  • Logarithmic scale
  • Measured in decibels (dB)
  • Ration between two intensity values
  • Use an intensity of reference (TOH)

$dB(I) = 10 \log_{10}({I \over I_{TOH} })$

  • ~3dBs -> intensity doubles

Loudness

  • Subjective perception of sound intensity
  • Depends on duration / frequency of a sound
  • Depends on age
  • Measured in phons (복잡한 소리에 대한 음량 레벨의 로그 단위)

Timbre

  • Colour of sound
  • Diff between two sounds with same intensity, frequency, duration

What are the features of timbre?

  • Timbre is multidimensional
  • Sound envelope
  • Harmonic content

Sound envelope

  • Attack-Decay-Sustain-Release (ADSR) Model

Complex sound

  • Superposition of sinusoids
  • A partial a sinusoid used to describe a sound
  • The lowest partial is called fundamental frequnecy
  • A harmonic partial is a frequency that’s multiple of the fundamental frequency
  • Inharmonicity indicates a deviation from a harmonic partial

Frequency modulation

  • AKA vibrato
  • Periodic variation in frequency
  • In music, used for expressive purposes

Amplitude modulation

  • AKA tremolo
  • Periodic variation in amplitude
  • In music, used for expressive purposes

Timbre recap

  • Multifactorial sound dimension
  • Amplitude envelope
  • Distribution of energy across partials
  • Signal modulation (frequency/amplitude)

Sound recap

  • Sound is a wave
  • Frequency, Intensity, Timbre
  • Pitch, Loudness, Timbre
This post is licensed under CC BY 4.0 by the author.

QA와 OP

23년 2월 3주차 주간 회고

Comments powered by Disqus.