What is a Vocoder?

A vocoder is a sound effect that can make a human voice sound synthetic. It is often used to speak like a robot, with a metallic and monotonous voice.

Here is a sample taken from the song "Right type of mood" by Herbie which was processed by vocoder software. Compare yourself:

Original sample

Sample processed by vocoder

Background

To put it simple: whenever you speak, your voice consists of two components. The first component is your basic voice type, produced by your vocal chords. It varies in pitch but remains nearly constant in type and is quite unique. That's why you can distinguish between persons when you hear their voices. The second component is how you modulate the basic voice. Modulation means that you dynamically amplify and attenuate frequencies. This is done by the mouth and tongue when you speak.

Example: Say a long "ohh". To accomplish this task, you nearly close your mouth. Next, say a long "ahh". This time, you opened your mouth. Your vocal chords produced the same sound for both, ohh and ahh but the modulation made it sound different.

The modulation signal is called formant, because it forms and shapes the basic voice, which is called carrier due to the fact that it carries the formant signal. The formant signal carries the information and has a much lower frequency than the carrier, a circumstance that can be used to reduce bandwidth consuption for telephone services. This was also the original intention of a vocder.

What does a vocoder do?

A vocoder aims to replace the carrier of your voice with another carrier from another source. Thus, it changes the sound of the voice but not the message when you speak. It takes formant and carrier from external sources and splits them up in bands (a band is a region of frequencies, same thing an equalizer does). Then, the envelope (the modulation) is extraced from each formant band. This part is done by an envelope follower, an extreme low pass filter. Next, formant bands are modulated onto the carrier bands and the resulting bands are mixed together to the output signal.

The benefit of doing this is, you can make the carrier speak or sing. As a side effect, the formant's voice type is absolutely irrelevant to the output so everybody (even those with an ugly voice) can create cool and futuristic samples :-)

You usually use a human voice as the formant and an instrument as the carrier. It makes the instrument speak. Good results can be achived with strings, brasses, flutes or any other sound with nearly constant dynamic. Even chords may be used to give the result more depth.

Input sources each may be file or microphone (if supported) as for the output. If you use a mic as a source, please note: input is sampled in stereo but internally processed as two mono channels. One channel is considered to be the formant, the other the carrier.

Where can I find it?

Vocoder is included in Debian packages swh-plugins (collection of LADSPA plugins) and lv2vocoder.