1. Basic Concepts of Sound
This chapter looks at some basic concepts of audio, both analog and digital. Here are some resources:
The Scientist and Engineers Guide to Digital Signal Processing ( www.dspguide.com/ ) by Steven W. Smith
Music and Computers: A Theoretical and Historical Approach ( http://music.columbia.edu/cmc/MusicAndComputers/ ) by Phil Burk, Larry Polansky, Douglas Repetto, Mary Roberts, Dan Rockmore
Sampled Audio
Audio is an analog phenomenon. Sounds are produced in all sorts of ways, through voice, instruments, and natural events such as trees falling in forests (whether or not there is anyone to hear). Sounds received at a point can be plotted as amplitude against time and can assume almost any functional state, including discontinuous.
The analysis of sound is frequently done by looking at its spectrum. Mathematically this is achieved by taking the Fourier transform, but the ear performs almost a similar transform just by the structure of the ear. Pure sounds heard by the ear correspond to simple sine waves, and harmonics correspond to sine waves, which have a frequency thats a multiple of the base sine wave.
Analog signals within a system such as an analog audio amplifier are designed to work with these spectral signals. They try to produce an equal amplification across the audible spectrum.
Computers, and an increasingly large number of electronic devices, work on digital signals , comprised of bits of 1s and 0s. Bits are combined into bytes with 256 possible values, into 16-bit words with 65,536 possible values, or even into larger combinations such as 32- or 64-bit words.
Sample Rate
Digitizing an analog signal means taking samples from that signal at regular intervals and representing those samples on a discrete scale. The frequency of taking samples is the sample rate. For example, audio on a CD is sampled at 44,100Hz, that is, 44,100 times each second. On a DVD, samples may be taken up to 192,000 times per second, with a sampling rate of 192kHz. Conversely, the standard telephone sampling rate is 8kHz.
Figure illustrates sampling.
Figure 1-1.
Analog and sampled signal (Wikipedia: http://en.wikipedia.org/wiki/Pulse-code_modulation )
The sampling rate affects two major factors. First, the higher the sampling rate, the larger the size of the data. All other things being equal, doubling the sample rate will double the data requirements. On the other hand, the Nyquist-Shannon theorem ( http://en.wikipedia.org/wiki/Nyquist_theorem ) places limits on the accuracy of sampling continuous data: an analog signal can be reconstructed only from a digital signal (in other words, be distortion-free) if the highest frequency in the signal is less than one-half the sampling rate.
This is often where the arguments about the quality of vinyl versus CDs end up, as in Vinyl vs. CD myths refuse to die ( www.eetimes.com/electronics-blogs/audio-designline-blog/4033509/Vinyl-vs-CD-myths-refuse-to-die ). With a sampling rate of 44.1kHz, frequencies in the original signal above 22.05kHz may not be reproduced accurately when converted back to analog for a loudspeaker or headphones. Since the typical hearing range for humans is only up to 20,000Hz (and mine is now down to about 10,000Hz), then this should not be a significant problem. But some audiophiles claim to have amazingly sensitive ears!
Sample Format
The sample format is the other major feature of digitizing audio: the number of bits used to discretize the sample. For example, telephone signals use an 8kHz sampling rate and 8-bit resolution so that a telephone signal can only convey 2^8 (in other words, 256) levels (see How Telephones Work at http://electronics.howstuffworks.com/telephone3.htm ).
Most CDs and computer systems use 16-bit formats, giving a very fine gradation of the signal and allowing a range of 96dB (see Audacity: Digital Sampling at http://manual.audacityteam.org/man/Digital_Audio ).
Frames
A frame holds all the samples from one time instance. For a stereo device, each frame holds two samples, while for a five-speaker device, each frame holds five samples.
Pulse-Code Modulation
Pulse-code modulation (PCM) is the standard form of representing a digitized analog signal. According to Wikipedia ( http://en.wikipedia.org/wiki/Pulse-code_modulation ), Pulse-code modulation is a method used to digitally represent sampled analog signals. It is the standard form for digital audio in computers and various Blu-ray, DVD, and CD formats, as well as other uses such as digital telephone systems. A PCM stream is a digital representation of an analog signal, in which the magnitude of the analog signal is sampled regularly at uniform intervals, with each sample being quantized to the nearest value within a range of digital steps.
PCM streams have two basic properties that determine their fidelity to the original analog signal: the sampling rate, which is the number of times per second that samples are taken, and the bit depth, which determines the number of possible digital values that each sample can take.
However, even though this is the standard, there are variations ( http://wiki.multimedia.cx/index.php?title=PCM ). The principal one concerns the representation as bytes in a word-based system: little-endian or big-endian ( http://searchnetworking.techtarget.com/definition/big-endian-and-little-endian ). The next variation is signed versus unsigned ( http://en.wikipedia.org/wiki/Signedness ).
There are a number of other variations that are less important, such as whether the digitization is linear or logarithmic. See the MultimediaWiki at http://wiki.multimedia.cx/index.php?title=PCM for a discussion of these.
Overrun and Underrun
According to Introduction to Sound Programming with ALSA ( www.linuxjournal.com/article/6735?page=0,1 ), When a sound device is active, data is transferred continuously between the hardware and application buffers. In the case of data capture (recording), if the application does not read the data in the buffer rapidly enough, the circular buffer is overwritten with new data. The resulting data loss is known as overrun . During playback, if the application does not pass data into the buffer quickly enough, it becomes starved for data, resulting in an error called underrun .
Latency
Latency is the amount of time that elapses from when a signal enters a system to when it (or its equivalent such as an amplified version) leaves the system.
According to Ian Waughs Fixing Audio Latency Part 1 ( www.practicalpc.co.uk/computing/sound/latency1.htm ), Latency is a delay. Its most evident and problematic in computer-based music audio systems where it manifests as the delay between triggering a signal and hearing it, for example, pressing a key on your MIDI keyboard and hearing the sound play through your sound card.
Its like a delayed reaction, and if the delay is large, it becomes impossible to play anything in time because the sound you hear is always a little bit behind what youre playing, which is distracting.
This delay does not have to be large before it causes problems. Many people can work with a latency of about 40ms even though the delay is noticeable, although if you are playing pyrotechnic music lines, it may be too long.