Here we see some important info about VoIP, needed to understand it.
To setup a VoIP communication we need:
Base architecture Voice )) ADC - Compression Algorithm - Assembling RTP in TCP/IP ----- ----> | <---- | Voice (( DAC - Decompress. Algorithm - Disass. RTP from TCP/IP -----
This is made by hardware, typically by card integrated ADC.
Today every sound card allows you convert with 16 bit a band of 22050 Hz (for sampling it you need a freq of 44100 Hz for Nyquist Principle) obtaining a throughput of 2 bytes * 44100 (samples per second) = 88200 Bytes/s, 176.4 kBytes/s for stereo stream.
For VoIP we needn't a 22 kHz bandwidth (and also we needn't 16 bit!): next we'll see other coding used for it.
Now that we have digital data we may convert it to a standard format that could be quickly transmitted.
PCM, Pulse Code Modulation, Standard ITU-T G.711
ADPCM, Adaptive differential PCM, Standard ITU-T G.726
It converts only the difference between the actual and the previous voice packet requiring 32 kbps (see Standard ITU-T G.726).
LD-CELP, Standard ITU-T G.728 CS-ACELP, Standard ITU-T G.729 and G.729a MP-MLQ, Standard ITU-T G.723.1, 6.3kbps, Truespeech ACELP, Standard ITU-T G.723.1, 5.3kbps, Truespeech LPC-10, able to reach 2.5 kbps!!
This last protocols are the most important cause can guarantee a very low minimal band using source coding; also G.723.1 codecs have a very high MOS (Mean Opinion Score, used to measure voice fidelity) but attention to elaboration performance required by them, up to 26 MIPS!
Now we have the raw data and we want to encapsulate it into TCP/IP stack. We follow the structure:
VoIP data packets RTP UDP IP I,II layers
VoIP data packets live in RTP (Real-Time Transport Protocol) packets which are inside UDP-IP packets.
Firstly, VoIP doesn't use TCP because it is too heavy for real time applications, so instead a UDP (datagram) is used.
Secondly, UDP has no control over the order in which packets arrive at the destination or how long it takes them to get there (datagram concept). Both of these are very important to overall voice quality (how well you can understand what the other person is saying) and conversation quality (how easy it is to carry out a conversation). RTP solves the problem enabling the receiver to put the packets back into the correct order and not wait too long for packets that have either lost their way or are taking too long to arrive (we don't need every single voice packet, but we need a continuous flow of many of them and ordered).
Real Time Transport Protocol 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Where:
For a complete description of RTP protocol and all its applications see relative RFCs 1889 and 1890.
There are also other protocols used in VoIP, like RSVP, that can manage Quality of Service (QoS).
RSVP is a signaling protocol that requests a certain amount of bandwidth and latency in every network hop that supports it.
For detailed info about RSVP see the RFC 2205
We said many times that VoIP applications require a real-time data streaming cause we expect an interactive data voice exchange.
Unfortunately, TCP/IP cannot guarantee this kind of purpose, it just make a "best effort" to do it. So we need to introduce tricks and policies that could manage the packet flow in EVERY router we cross.
So here are:
For an exhaustive information about QoS see Differentiated Services at IETF.
H323 protocol is used, for example, by Microsoft Netmeeting to make VoIP calls.
This protocol allow a variety of elements talking each other:
h323 allows not only VoIP but also video and data communications.
Concerning VoIP, h323 can carry audio codecs G.711, G.722, G.723, G.728 and G.729 while for video it supports h261 and h263.
More info about h323 is available at Openh323 Standards, at this h323 web site and at its standard description: ITU H-series Recommendations.
You can find it implemented in various application software like Microsoft Netmeeting, Net2Phone, DialPad, ... and also in freeware products you can find at Openh323 Web Site.