In voice and video communication, quality usually dictates whether the experience is a good or bad one. Besides the qualitative description we hear, like 'quite good' or 'very bad', there is a numerical method of expressing voice/video QoE called Mean Opinion Score (MOS). MOS gives a numerical indication of the perceived quality of the media received after being transmitted and eventually compressed using codecs. MOS is expressed in one number, from 1 to 5, 1 indicates the worst quality and 5 the best. MOS is quite subjective, as it results from what is perceived by people during tests. However, there are objective methods that measure MOS as will be discussed later in this chapter. The satisfaction level at the end user corresponding to the MOS score is shown below in Table 3-1.
Table 3-1 MOS vs. Satisfaction Level
MOS Score Satisfaction level
5 Perfect (Like face-to-face conversation or radio reception) 4 Fair. Imperfections can be perceived, but sound still clear. This is
(supposedly) the range for cell phones.
3 Annoying.
2 Very annoying. Nearly impossible to communicate.
1 Impossible to communicate.
The values do not need to be whole numbers. Certain thresholds and limits are often expressed in decimal values from this MOS spectrum shown in Figure 3-1. For instance, a value of 4.0 to 4.5 indicates complete
satisfaction [103]. This is the normal value of PSTN and many VVoIP services are targeting such quality, often with success. Values dropping below 3.5 are considered unacceptable by many users. MOS can simply be used to compare between VVoIP services and providers. But more importantly, they are used to assess the performance of codecs under different network conditions. Testing accurately the quality of VVoIP is still considered a challenge, however services have greatly improved over the last few years as both the providers became more reliable and the ISPs offer better connections. Having a metric to measure changes or degradation in the quality of the VVoIP connection after testing can help in identifying problems. Accordingly, VVoIP calls often are in the 3.5 to 4.2 MOS range [13].
3.2.1
Types of MOS rating
In this section, we briefly review different types of MOS rating: Listening MOS, Network MOS and Conversational MOS. In addition, we show the factors affecting each type of MOS (see [99] for more details about the types of MOS rating.
Listening MOS
Listening MOS is a rating of the Listening Quality (MOS-LQ) of the audio stream that is played to the user. This value takes into consideration the audio fidelity and distortion and speech and noise levels, and from this data predicts how a large group of users would rate the quality of the audio they hear. This value takes into consideration the speech and noise levels of the user along with any external distortions, and from this data predicts how a large group of users would rate the audio quality they hear.
The Listening MOS varies depending on:
The type of codec used (Narrowband or Wideband codec).
Audio capture device characteristics.
Occurrence of transcoding.
Background noise at the sender side.
Percentage of Packet loss (either random or burst losses).
Speech level.
Since this type of MOS rating is a function of a large number of factors, it is preferred to measure the listening MOS statistically rather than using a single call.
Network MOS
Network MOS is a type of rating of the Quality of audio/video that is played to the user indicating the effect of the network QoS factors on the QoE at the end user perceived. This value takes into consideration only network factors such as: codec, random packet loss, burst losses and jitter. The difference between Network MOS and Listening MOS is that the Network MOS considers only the impact of the network on the QoE, whereas Listening MOS also considers the payload (speech level, noise level, etc). This makes Network MOS useful for identifying network conditions impacting the audio quality being delivered and providing solutions for the impairments of the network that impact the call quality. For each codec, there is a maximum possible Network MOS that represents the best possible Quality (MOS) under perfect network conditions. Because the maximum Network MOS varies depending on the scenario (because different codecs are used), it is usually more interesting to look at the average degradation of the Network MOS during the call. The average degradation can be broken down into how much is due to network jitter and how much is due to packet loss. For very small degradations, the cause of the degradation may not be available.
Conversational MOS
Conversational MOS is a rating of the audio or multimedia stream played to the user that takes into consideration the listening or seen quality of the audio/video played and sent across the network, the speech, noise levels for audio streams, echoes, and lip synchronization which is considered an important factor in multimedia call quality assessment. Such a MOS value represents how a large group of people would rate the voice or multimedia quality of the connection for holding a VVoIP call.
The Conversational MOS varies based on the same factors as Listening MOS, as well as the following:
Echo.
Network delay.
Delay due to jitter buffering.
Delay due to devices and codecs.
Similarly to the listening MOS, it is better to calculate the conversational MOS statistically rather than by using a single call due to the large factors that influence such type of MOS [99]. Throughout our research, we will use mainly the Network MOS and Conversational MOS. For instance, we use the Network MOS in monitoring the network conditions that influence the call quality and taking some decisions based on the expected Network MOS in order to improve the call quality. In addition, we use the conversational MOS in order to understand the real perception of end users under different network conditions.