For decades experienced Unix users have employed many text processing tools to make document editing tasks much easier. Console utilities such as sed, awk, cut, paste, and join, though useful in isolation, only realise their full potential when combined together through the use of pipes.
Recently Linux has been used for more than just processing of ASCII text. The growing popularity of various multimedia formats, in the form of images and audio data, has spurred on the development of tools to deal with such files. Many of these tools have graphical user interfaces and cannot operate in absence of user interaction. There are, however, a growing number of tools which can be operated in batch mode with their interfaces disabled. Some tools are even designed to be used from the command prompt or within shell scripts.
It is this class of tools that this article will explore. Complex media manipulation functions can often be effected by combining simple tools together using techniques normally applied to text processing filters. The focus will be on audio stream processing as these formats work particularly well with the Unix filter pipeline paradigm.
There are a multitude of sound file formats and converting between them is a frequent operation. The sound exchange utility sox fulfills this role and is invoked at the command prompt:
sox sample.wav sample.aiff
sox sample.aiff -r 8000 -b -c 1 low.aiff
sox sample.aiff -r 44100 -w -c 2 high.aiff
When sox cannot guess the destination format from the file extension it is necessary to specify this explicitly:
sox sample.wav -t aiff sample.000
sox sample.wav -t raw -r 11025 -sw -c 2 sample.000
sox -t raw -r 11025 -sw -c 2 sample.000 sample.aiff
One need not use the "-t raw" option if the file extension is .raw, however this option is essential when the raw samples are coming from standard input or being sent to standard output. To do this, use the "-" in place of the file name:
sox -t raw -r 11025 -sw -c 2 - sample.aiff < sample.raw
sox sample.aiff -t raw -r 11025 -sw -c 2 - > sample.raw
sox sample.aiff -t raw -r 44100 -sw -c 2 - | sox -t raw -r 32000 -sw -c 2 - slow.aiff
sox sample.aiff -t raw -r 32000 -sw -c 2 - | sox -t raw -r 44100 -sw -c 2 - fast.aiff
sox sample.aiff -t raw -r 44100 -sw -c 2 - | head -c 352800 | sox -t raw -r 44100 -sw -c 2 - twosecs.aiff
Likewise to extract the last second of a sample:
sox sample.aiff -t raw -r 44100 -sw -c 2 - | tail -c 176400 | sox -t raw -r 44100 -sw -c 2 - lastsec.aiff
sox sample.aiff -t raw -r 44100 -sw -c 2 - | tail -c +352801 | head -c 176400 | sox -t raw -r 44100 -sw -c 2 - lastsec.aiff
One can extract parts of different samples and join them together into one file via nested sub-shell commands:
(sox sample-1.aiff -t raw -r 44100 -sw -c 2 - | head -c 176400 sox sample-2.aiff -t raw -r 44100 -sw -c 2 - | head -c 176400 ) | sox -t raw -r 44100 -sw -c 2 - newsample.aiff
Sounds can be sent to the OSS (open sound system) device /dev/dsp with the "-t ossdsp" option:
sox sample.aiff -t ossdsp /dev/dsp
play sample.aiff
Audio samples played this way monopolise the output hardware. Another sound capable application must wait until the audio device is freed before attempting to play more samples. Desktop environments such as GNOME and KDE provide facilities to play more than one audio sample simultaneously. Samples may be issued by different applications at any time without having to wait, although not every audio application knows how to do this for each of the various desktops. sox is one such program that lacks this capability. However, with a little investigation of the audio media services provided by GNOME and KDE, one can devise ways to overcome this shortcoming.
There are quite a few packages that allow audio device sharing. One common strategy is to run a background server to which client applications must send their samples to be played. The server then grabs control of the sound device and forwards the audio data to it. Should more than one client send samples at the same time the server mixes them together and sends a single combined stream to the output device.
The Enlightened Sound Daemon (ESD) uses this method. The server, esd, can often be found running in the background of GNOME desktops. The ESD package goes by the name, esound, on most distributions and includes a few simple client applications such as:
sox sample.aiff -t raw -r 44100 -sw -c 2 - | head -c 176400 | esdcat
sox sample.cdr -t raw -r 44100 -sw -c 2 - | esdcat
The Analog RealTime Synthesizer (ARtS) is similar to ESD but is often used with KDE. The background server is artsd with the corresponding client programs, artsplay and artscat. To play a sample:
sox sample.cdr -t raw -r 44100 -sw -c 2 - | tail -c 352800 |artscat
Both ESD and ARtS are not dependent on any one particular desktop environment. With some work, one could in theory use ESD with KDE and ARtS with GNOME. Each can even be used within a console login session. Thus one can mix samples, encoded in a plethora of formats, with or without the graphical desktop interface.
Having covered what goes on the end of an audio pipeline, we should consider what can be placed at the start. Sometimes one would like to manipulate samples extracted from music files in MP3, MIDI, or module (MOD, XM, S3M, etc) format. Command line tools exist for each of these formats that will output raw samples to standard output.
For MP3 music one can use "maplay -s"
maplay -s music.mp3 | artscat
maplay -s mono22khz.mp3 | esdcat -r 22050 -m
maplay -s mono22khz.mp3 | artscat -r 22050 -c 1
mpg123 -s -r 44100 --stereo lowfi.mp3 | artscat
Users of Ogg Vorbis may use the following:
ogg123 -d raw -f - music.ogg | artscat
Music files also can be obtained in MIDI format. If (like me) you have an old sound card with poor sequencer hardware, you may find that timidity can work wonders. Normally this package converts MIDI files into sound samples for direct output to the sound device. Carefully chosen command line options can redirect this output:
timidity -Or1sl -o - -s 44100 music.mid | artscat
If you're a fan of the demo scene you might want to play a few music modules on your desktop. Fortunately mikmod can play most of the common module formats. The application can also output directly to the sound device or via ESD. The current stable version of libmikmod, 3.1.9, does not seem to be ARtS aware yet. One can remedy this using a command pipeline:
mikmod -d stdout -q -f 44100 music.mod | artscat
mikmod -d pipe,pipe=artscat -f 44100 music.mod
play sample.aiff echo 1 0.6 150 0.6
play sample.aiff vibro 20 0.9
play sample.aiff flanger 0.7 0.7 4 0.8 2 play sample.aiff phaser 0.6 0.6 4 0.6 2
play sample.aiff band 3000 700
play sample.aiff band 0 700
play sample.aiff chorus 0.7 0.7 20 1 5 2 -s
play sample.aiff reverse
mikmod -d stdout -q -f 44100 music.xm | sox -t raw -r 44100 -sw -c 2 - -t raw - chorus 0.7 0.7 80 0.5 2 1 -s | artscat
ogg123 -d raw -f - music.ogg | tail -c +705601 |artscat
timidity -Or1sl -o - -s 44100 music.mid | sox -t raw -r 44100 -sw -c 2 - -t raw - echo 1 0.6 80 0.6 | oggenc -o music.ogg --raw -
maplay -s mono32.mp3 | sox -v 0.5 -t raw -r 32000 -sw -c 1 - -t raw -r 44100 -c 2 - split | oggenc -o music.ogg --raw -
for x in *.aiff do sox $x -v 0.5 -t raw -r 8000 -bu -c 1 - done | sox -t raw -r 8000 -bu -c 1 - all.wav
Hopefully these examples hint at what can be accomplished with the pipeline technique. One cannot argue against using interactive applications with elaborate graphical user interfaces. They often can perform much more complicated tasks while saving the user from having to memorise pages of argument flags. There will always be instances where command pipelines are more suitable however. Converting a large number of sound samples will require some form of scripting. Interactive programs cannot be invoked as part of an at or cron job.
Audio pipelines can also be used to save disk space. One need not store a dozen copies of what is essentially the same sample with different modifications applied. Instead, create a dozen scripts each with a different pipeline of filters. These can be invoked when the modified version of the sound sample is called for. The altered sound is generated on demand.
I encourage you to experiment with the tools described in this article. Try combining them together in increasingly elaborate sequences. Most importantly, remember to have fun while doing so.