While various mechanisms exist for providing feedback from a telephone system to the person using a particular station device, the only lowest common denominator is the media stream itself.
As a result, telephone systems working with the voice network use two sets of tones that can be generated and detected easily, one in order to provide feedback and the other to indicate that modulated data is to be used. These are referred to as advisory tones and telecom tones
respectively.
DTMF
DTMF stands for Dual Tone Multi Frequency. It refers to a standard mechanism for encoding the 16 digits that can appear on a telephone dial pad as a combination of tones that are easily generated and detected in the media stream of a voice call. DTMF tones are also frequently referred to as touchtones.
As the name implies, DTMF tones are formed by unique combinations of two precisely
defined tones. The scheme uses a set of four ''high" tones and a set of four "low" tones. Using a combination of one low and one high tone (for 16 combinations) makes the detection of digits much more reliable; it is unlikely that some other source of media information on the call (like a person speaking) would accidentally generate one of these precise combinations of tones.
Generating these tones is also very easy to implement through a push-button interface, as shown below. Each column is associated with one of the four high-frequency tones and each of the rows is associated with one of the four low-frequency tones. Pressing a particular button connects tone generators for the appropriate row and column to the call in order to make the appropriate dual-tone or "touchtone."
The first three columns represent the twelve DTMF tones in widespread use. The last column represents four additional DTMF tones labeled "A" through "D" which are generally referred to as military or autovon tones. These additional tones should not be confused with the
The following are the tones that are most commonly encountered in a telephone network: • Dial Tone
Dial tone is an advisory tone indicating that call processing has created a new call on behalf of a given device and a new command can be sent. It is typically associated with the initiated connection state.
• Billing
A billing tone is an advisory tone indicating that call processing is expecting to receive billing information, typically a credit card or calling card number. This tone is also referred to by some as bong tone.
• Busy
Busy tone is an advisory tone indicating that there is no device currently available to which to present the call. Call progress for the call in question has stalled, so this tone is associated with a connection state of fail.
• Reorder
Reorder tone is an advisory tone indicating that a call has become blocked in a telephone system because a necessary network interface device or other switching facility is
unavailable. The cause may be a misdialed number. This tone is also referred to by some as fast-busy. Call progress for the call in question has stalled, so this tone is associated with a connection state of fail.
• Special Information (SIT) Tones
Special information tones, or SIT tones, are sequences of three precisely defined tones used to indicate that call progress has stalled for some specific reason. SIT tones usually precede a
prerecorded message describing the problem. They are associated with a connection state of fail. The four SIT tones currently in use are:
Vacant code – the number dialed is not assigned.
Intercept – all calls to the number dialed are being intercepted (typically because the number has changed).
No circuit – no circuits are available for the call.
Reorder – the call cannot be placed; the number may have been misdialed. • Ringback
Ringback is an advisory tone indicating that call processing is attempting to connect the call to another device. Ringback is the "ringing" sound that a caller hears after placing a call and while waiting for it to be answered. It is associated with the ringing mode of the
alerting connection state. • Beep
A beep is the tone generated by a voice answering machine, voice mail system, or other media access device that records from a voice media stream. The beep is a prompt to the person calling, indicating that the recording has begun and that he or she should begin speaking (e.g., " . . . please leave your message after the beep . . . ").
• Record Warning
A record warning tone is a short 1400 Hz tone used to indicate that a conversation is being recorded. It is required by law in some places and typically is used in all situations (such as 911 services and security dispatch) where there is a high degree of accountability or the need to collect evidence of a conversation through a call.
• Fax CNG
Fax CNG, or fax calling tone, is a tone generated by a fax machine or fax modem that wishes to initiate data modulation for fax transmission on a given voice call. It is a telecom tone.
• Modem CNG
Modem CNG, or modem calling tone, is a tone generated by a modem that wishes to initiate data modulation on a given voice call. It is a telecom tone.
• Carrier
The detection of carrier refers to the presence of modulated data transmission on a given voice call. It is a telecom tone.
• Silence
Silence is the absence of any tones, voice, or modulated data in the media stream
associated with a voice connection. It is quite useful to have a silence detection capability to determine that a modem transmission has completed, that a caller may have hung up, or that a message may be played.
The actual frequencies and cadences corresponding to a tone of a particular meaning may vary widely from country to country or between implementations of telephony products. This
makes detection of these tones in every case very difficult, if not impossible to guarantee. There is sufficient standardization, however, that makes tone detection for calls within a particular country or system reliable.
3.7.4 Media Services
Media service interfaces provide external access to the contents of media streams using the specific capabilities individual media resources. A particular media service interface, along with the media resources that can be used to access the media streams of a given call and any required media access devices, is known as a media service instance.
Some media resources capture information from a media stream. These are able to convert the media stream into a media data format for use with the media service instance. Other media resources are able to convert media data specified through the media service interface into a media stream for the call. Still others are able to do both simultaneously.
The number of possible media service types is virtually unbounded, but the most popular involve media resources that work with raw sound, speech, modulated data, and digital data.
Live Sound Capture (Isochronous)
A live sound capture media service is able to capture the audio content from a media stream and deliver it to a component external to the telephone system such as a digitizer, tape or CD recorder, or even a speaker. The physical media interface involved might be digital or analog.
Live Sound Transmit (Isochronous)
A live sound transmit media service is able to obtain an isochronous stream of raw sound from an external source such as a computer audio output, a tape or CD player, radio, or even a microphone and transmit it through the telephone system. The physical media interface involved might be digital or analog.
Sound Record
A sound record media service is able to capture sound from the media stream and store it for future use. In this case, the media service interface is used simply to start and stop the
recording and specify where and how the sound is to be stored.
Sound record is different from sound capture in that the telephone system itself is doing the recording, and the sound data never leaves the telephone system.
Sound Playback
A sound playback media service is able to play previously recorded sounds to the media stream. In this case, the media service interface simply is used to start and stop the recording and specify what sound is to be played.
Sound playback is different from sound transmit in that the telephone system itself is providing the prerecorded sound.
Text-to-Speech
A text-to-speech, TTS, or speech synthesis, media service is able to transform text into a stream of speech-like sounds generated by a synthetic, electronic voice. The media service interface is used to specify the text to speak and the attributes (male / female voice, accent, prosody, volume, speed, etc.) of the speech desired.
Text-to-speech is very useful because it allows arbitrary or dynamic text information to be spoken over the phone automatically. The alternative, prerecording all of the necessary information or, at a minimum, all of the necessary words that make up the information, is generally much more complicated and expensive.
Concatenated Speech
Concatenated speech is a media service comparable to speech synthesis, but it uses strings of whole prerecorded words or syllables rather than synthesizing each syllable. Concatenated speech generally provides much higher quality than text-to-speech, but is limited to a certain vocabulary of prerecorded words or sounds.
Speaker Recognition
Speaker recognition media services identify the person speaking in the media stream based on voice energy characteristics unique to each individual.
Speech Recognition
Speech recognition services convert human speech in a media stream to text. The principal attributes of a speech recognition implementation are:
• Speaker-dependent/independent
Some speech recognition implementations must be trained to understand the speech patterns of a particular individual. These are called speaker-dependent systems. Other implementations have been extensively trained and can understand virtually any speaker of a given dialect. These are called speaker-independent implementations.
• Continuous/Discrete
Continuous speech recognition implementations are those that can automatically identify word breaks, allowing speakers to talk continuously (normally). Discrete speech
recognition implementations cannot identify word breaks. A person speaking to such a system must place distinct pauses between words.
• Vocabulary
Most speech recognition implementations rely on a particular set of speech grammar rules, which limits their vocabulary to a particular set of words. Implementations vary in the size of the vocabulary they can support and whether they are limited to a
predetermined vocabulary and grammar.
The media service interface is used to specify, as necessary, the speaker and / or the vocabulary and grammar, and to deliver the text corresponding to the recognized speech.
Fax Printer
Fax printer media services refer to the fax receive-and-print functionality available in a fax machine. If a telephone system connects a fax printer media service to a call on which the presence of a fax CNG tone is indicating an attempt to transmit a fax, the fax will be received and printed on the appropriate device.
Fax Scanner
The fax scanner media service refers to the fax scan-and-send functionality available in a fax machine. It is the complement to the fax printer media service. If a telephone system connects a fax scanner media service, the fax scanner media service will attempt to establish a
modulated fax data connection with another fax-capable device on the call, and then will transmit any sheets of paper fed into the fax machine's paper scanner.
Fax Modem
Fax modem media services provide fax data modulation for sending and receiving fax
transmissions. (See the sidebar "Modulated Data" on page 98.) The media service interface is used to send and receive the compressed image data and fax transmission control information.
Data Modem
Data modem media services provide data modulation for establishing bidirectional modem communication. (See the sidebar "Modulated Data" on page 98.) The media service interface is used to configure the modem service and to send and receive asynchronous data.
Digital Data
A digital data media service provides access to the raw stream of digital data associated with a digital data media stream. The media service interface is used to convey the data.
Video Phone
A video phone media service is analogous to the fax scanner and fax printer media services, but applies to media streams containing video data. When attached to a media stream, this media service displays video on the video screen associated with the appropriate device, and captures and transmits video from a camera associated with the device.
Media services concepts are discussed in greater detail in Chapter 7.