Analysis, monitoring, and management of quality of experience in video delivery services over IP

Texto completo

(1)Universidad Politécnica de Madrid Escuela Técnica Superior de Ingenieros de Telecomunicación. ANALYSIS, MONITORING, AND MANAGEMENT OF QUALITY OF EXPERIENCE IN VIDEO DELIVERY SERVICES OVER IP. Tesis Doctoral. Pablo Pérez Garcı́a Ingeniero de Telecomunicación. 2013.

(2)

(3) Universidad Politécnica de Madrid Departamento de Señales, Sistemas y Radiocomunicaciones Escuela Técnica Superior de Ingenieros de Telecomunicación. Tesis Doctoral. ANALYSIS, MONITORING, AND MANAGEMENT OF QUALITY OF EXPERIENCE IN VIDEO DELIVERY SERVICES OVER IP. Autor:. Director:. Pablo Pérez Garcı́a. Narciso Garcı́a Santos. Ingeniero de Telecomunicación. Doctor Ingeniero de Telecomunicación. 2013.

(4)

(5) Tesis Doctoral ANALYSIS, MONITORING, AND MANAGEMENT OF QUALITY OF EXPERIENCE IN VIDEO DELIVERY SERVICES OVER IP. Autor: Pablo Pérez Garcı́a Director: Narciso Garcı́a Santos. Tribunal nombrado por el Mfgco. y Excmo. Sr. Rector de la Universidad Politécnica de Madrid, el dı́a . . . . . . de . . . . . . . . . . . . . . . . . . . . . . . . de 2013. Presidente: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vocal: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vocal: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vocal: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Secretario: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Realizado el acto de defensa y lectura de la Tesis el dı́a . . . . . . de . . . . . . . . . . . . . . . . . . . . . de 2013 en . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Calificación: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. EL PRESIDENTE. LOS VOCALES. EL SECRETARIO.

(6)

(7) “If you make listening and observation your occupation you will gain much more than you can by talk.”. Robert Baden-Powell.

(8)

(9) UNIVERSIDAD POLITÉCNICA DE MADRID. Abstract TESIS DOCTORAL ANALYSIS, MONITORING, AND MANAGEMENT OF QUALITY OF EXPERIENCE IN VIDEO DELIVERY SERVICES OVER IP by Pablo Pérez Garcı́a. This thesis proposes a comprehensive approach to the monitoring and management of Quality of Experience (QoE) in multimedia delivery services over IP. It addresses the problem of preventing, detecting, measuring, and reacting to QoE degradations, under the constraints of a service provider: the solution must scale for a wide IP network delivering individual media streams to thousands of users. The solution proposed for the monitoring is called QuEM (Qualitative Experience Monitoring). It is based on the detection of degradations in the network Quality of Service (packet losses, bandwidth drops. . . ) and the mapping of each degradation event to a qualitative description of its effect in the perceived Quality of Experience (audio mutes, video artifacts. . . ). This mapping is based on the analysis of the transport and Network Abstraction Layer information of the coded stream, and allows a good characterization of the most relevant defects that exist in this kind of services: screen freezing, macroblocking, audio mutes, video quality drops, delay issues, and service outages. The results have been validated by subjective quality assessment tests. The methodology used for those test has also been designed to mimic as much as possible the conditions of a real user of those services: the impairments to evaluate are introduced randomly in the middle of a continuous video stream. Based on the monitoring solution, several applications have been proposed as well: an unequal error protection system which provides higher protection to the parts of the stream which are more critical for the QoE, a solution which applies the same principles to minimize the impact of incomplete segment downloads in HTTP Adaptive Streaming, and a selective scrambling algorithm which ciphers only the most sensitive parts of the media stream. A fast channel change application is also presented, as well as a discussion about how to apply the previous results and concepts in a 3D video scenario..

(10)

(11) UNIVERSIDAD POLITÉCNICA DE MADRID. Resumen TESIS DOCTORAL ANALYSIS, MONITORING, AND MANAGEMENT OF QUALITY OF EXPERIENCE IN VIDEO DELIVERY SERVICES OVER IP por Pablo Pérez Garcı́a. Esta tesis estudia la monitorización y gestión de la Calidad de Experiencia (QoE) en los servicios de distribución de vı́deo sobre IP. Aborda el problema de cómo prevenir, detectar, medir y reaccionar a las degradaciones de la QoE desde la perspectiva de un proveedor de servicios: la solución debe ser escalable para una red IP extensa que entregue flujos individuales a miles de usuarios simultáneamente. La solución de monitorización propuesta se ha denominado QuEM (Qualitative Experience Monitoring, o Monitorización Cualitativa de la Experiencia). Se basa en la detección de las degradaciones de la calidad de servicio de red (pérdidas de paquetes, disminuciones abruptas del ancho de banda. . . ) e inferir de cada una una descripción cualitativa de su efecto en la Calidad de Experiencia percibida (silencios, defectos en el vı́deo. . . ). Este análisis se apoya en la información de transporte y de la capa de abstracción de red de los flujos codificados, y permite caracterizar los defectos más relevantes que se observan en este tipo de servicios: congelaciones, efecto de “cuadros”, silencios, pérdida de calidad del vı́deo, retardos e interrupciones en el servicio. Los resultados se han validado mediante pruebas de calidad subjetiva. La metodologı́a usada en esas pruebas se ha desarrollado a su vez para imitar lo más posible las condiciones de visualización de un usuario de este tipo de servicios: los defectos que se evalúan se introducen de forma aleatoria en medio de una secuencia de vı́deo continua. Se han propuesto también algunas aplicaciones basadas en la solución de monitorización: un sistema de protección desigual frente a errores que ofrece más protección a las partes del vı́deo más sensibles a pérdidas, una solución para minimizar el impacto de la interrupción de la descarga de segmentos de Streaming Adaptativo sobre HTTP, y un sistema de cifrado selectivo que encripta únicamente las partes del vı́deo más sensibles. También se ha presentado una solución de cambio rápido de canal, ası́ como el análisis de la aplicabilidad de los resultados anteriores a un escenario de vı́deo en 3D..

(12)

(13) Acknowledgements This thesis would not have been possible without the help of all the people with whom I have been so lucky to share my way in these more than eight years. Let me express my gratitude to all of them in my mother tongue. La vida es un conjunto de relaciones; y enumerar todas las que se pueden forjar en los ocho años que ha durado este trabajo ocuparı́a más espacio del que, probablemente, sea razonable dedicar en una tesis doctoral. De modo que es probable que esté siendo injusto con algunas personas que, por descuido, olvido, o falta de espacio, no aparecerán aquı́ citadas. Vaya de antemano mi disculpa (y agradecimiento) también para ellas. Gracias ante todo a Narciso Garcı́a, que sigue logrando sacar huecos en su cada vez más complicada agenda para acompañarme en esta aventura. Es un privilegio contar con él como director de tesis. Gracias también, muy especialmente, a Jaime Ruiz, que ha sido mucho más que un manager en estos ocho años. No exagero si digo que, si no fuera por él, difı́cilmente podrı́a yo haber terminado este trabajo. Gracias al excepcional equipo humano y profesional con el que he tenido la suerte de trabajar a lo largo de estos años en Telefónica I+D y Alcatel-Lucent. A Jesús Macı́as, que me enseñó a mirar el vı́deo de otra manera. A Álvaro Villegas, en cuyo trabajo se apoya buena parte del mı́o. A Silvia Varela, por ayudarme a encontrar el enfoque de este espinoso asunto de la calidad. A Enrique Estalayo y José M. Cubero, con los que he compartido tanto en tantos proyectos. A Ernesto Puerta, por las conversaciones sobre cuantificación y otros asuntos arcanos. A Javier López Poncela, por guiarme por los entresijos de los descodificadores. Gracias también a la gente del Grupo de Tratamiento de Imágenes, que me ha seguido acogiendo como en casa durante todos estos años. Muy en particular a Jesús Gutiérrez, por todo el trabajo de las pruebas de calidad subjetiva: sin él, acabar esta tesis habrı́a resultado mucho más difı́cil. Gracias también a Julián Cabrera y Fernando Jaureguizar, siempre dispuestos a echar una mano en lo que hiciera falta. Mi sincero agradecimiento a todas aquellas personas que, a lo largo de estos años, han puesto también su granito de arena en esta tesis. A Juan Casal, por compartir su experiencia sobre codificación de vı́deo. A Rocı́o Bravo, por la ayuda con las audiencias de televisión. A todos los socios del CENIT VISION, donde se gestó buena parte de la investigación que ahora presento.. xiii.

(14) Finalmente, muchas gracias a mi familia y amigos. A mis hermanos Lucas y David, que marcaron el camino a seguir. A mi hermano Jesús, de quien he aprendido lo poco que sé de audio digital (y algún que otro truco de televisión). A mi madre Teresa, que tanto ha puesto de su parte para empujarme a terminar la tesis. A mi padre Juan, a quien seguro que le habrı́a gustado verla acabada, y con quien también he discutido alguna de las ecuaciones que en ella aparecen. Y a Graciela, por todo lo que hemos compartido, y lo que queda por venir; tanto, que no se puede resumir en una frase. Gracias, en definitiva, a todos los que han hecho posible que esta tesis se haya escrito. Aun de aquellos que, por la falta de espacio, no he tenido ocasión de mencionar en estas lı́neas, guardo un buen recuerdo en el corazón. Gracias a ti, que te estás tomando el trabajo de leer estas páginas. Y gracias a Dios por habernos puesto en contacto..

(15) Contents Abstract. ix. Resumen. xi. Acknowledgements. xiii. List of Figures. xix. List of Tables. xxi. Abbreviations. xxiii. 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Understanding Quality of Experience 2.1 Quality of Experience and its relatives . . . . . . 2.2 A word about multimedia services . . . . . . . . 2.2.1 Players . . . . . . . . . . . . . . . . . . . 2.2.2 Coding standards and transport protocols 2.2.3 Artifacts . . . . . . . . . . . . . . . . . . . 2.3 Who is who in the QoE metrics . . . . . . . . . . 2.3.1 Subjective quality assessment . . . . . . . 2.3.2 Full-Reference quality metrics . . . . . . . 2.3.3 Reduced-Reference quality metrics . . . . 2.3.4 No-Reference quality metrics . . . . . . . 2.4 Other topics related to QoE in IPTV services . . 2.4.1 Media formats in IPTV deployments . . . 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. 3 Designing QoE-Aware Multimedia Delivery Services 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Delivering multimedia over IP . . . . . . . . . . . . . . . . . . 3.2.1 Architecture of a multimedia service delivery platform 3.2.2 Impairing the Quality of Experience . . . . . . . . . . xv. . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . .. . . . .. 1 1 3. . . . . . . . . . . . . .. 7 7 8 10 11 14 16 18 20 22 23 26 29 31. . . . .. 33 33 36 36 41.

(16) xvi. CONTENTS 3.3. 3.4. 3.5. 3.6. QuEM: a qualitative approach to QoE monitoring 3.3.1 Problem statement . . . . . . . . . . . . . . 3.3.2 System design . . . . . . . . . . . . . . . . . 3.3.3 Qualitative Impairment Detectors . . . . . 3.3.4 Severity Transfer Function . . . . . . . . . . A Subjective Assessment methodology to calibrate Detectors . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Design principles . . . . . . . . . . . . . . . 3.4.2 Test methodology . . . . . . . . . . . . . . 3.4.3 Selection of impairments . . . . . . . . . . . QoE enablers . . . . . . . . . . . . . . . . . . . . . 3.5.1 Headend metadata architecture . . . . . . . 3.5.2 Intelligent Packet Rewrapper . . . . . . . . 3.5.3 Edge Servers for IPTV and OTT . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impairment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4 Quality Impairment Detectors 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Video Packet Loss Effect Prediction (PLEP) model . . . . . . . 4.2.1 Description of the model . . . . . . . . . . . . . . . . . . 4.2.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Subjective analysis . . . . . . . . . . . . . . . . . . . . . 4.3 Audio packet loss effect . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Objective analysis . . . . . . . . . . . . . . . . . . . . . 4.3.2 Subjective analysis . . . . . . . . . . . . . . . . . . . . . 4.4 Coding quality and rate forced drops . . . . . . . . . . . . . . . 4.4.1 Analysis of feature-based RR/NR metrics as estimators coding quality . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Managing coding quality drops . . . . . . . . . . . . . . 4.5 Outages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Detection of outages . . . . . . . . . . . . . . . . . . . . 4.5.2 Subjective impact of outages . . . . . . . . . . . . . . . 4.6 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Lag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Channel Change time . . . . . . . . . . . . . . . . . . . 4.6.3 Latency trade-offs . . . . . . . . . . . . . . . . . . . . . 4.7 Mapping to Severity . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Applications 5.1 Introduction . . . . . . . . . . . . . . . . . 5.2 Unequal Error Protection . . . . . . . . . 5.2.1 Priority Model . . . . . . . . . . . 5.2.2 Experimentation and results . . . 5.2.3 Applications . . . . . . . . . . . . 5.3 Fine-grain segmenting for HTTP adaptive 5.3.1 Description of the solution . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . streaming . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . of video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . .. 44 44 45 47 48. . . . . . . . . .. 48 49 50 52 53 53 55 57 58. . . . . . . . . .. 59 59 60 62 65 72 74 74 77 79. . . . . . . . . . . .. 80 84 87 87 88 88 89 91 94 95 97. . . . . . . .. 99 99 100 101 105 111 112 113.

(17) CONTENTS 5.4. 5.5 5.6. Selective Scrambling . . . 5.4.1 Problem statement 5.4.2 Algorithms . . . . 5.4.3 Results . . . . . . Fast Channel Change . . . Application to 3D Video .. xvii . . . . . . . . . . and requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 6 Conclusions. A Experimental setup A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . A.2 Subjective Assessment based on QuEM approach . . . A.2.1 Selection and preparation of content . . . . . . A.2.2 Selection of impairments . . . . . . . . . . . . . A.2.3 Test sessions . . . . . . . . . . . . . . . . . . . A.3 Subjective quality assessment of H.264 video encoders A.4 Test sequences from IPTV deployments . . . . . . . .. Bibliography. . . . . . .. 116 117 118 119 120 121 123. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. 127 . 127 . 127 . 127 . 128 . 133 . 134 . 135. 137.

(18)

(19) List of Figures 2.1 2.2 2.3 2.4. Layer and domain model for multimedia services . . . Protocol stack for multimedia services over IP . . . . . Models for objective quality assessment: FR/RR/NR . Hierarchical GOP structure . . . . . . . . . . . . . . .. . . . .. 10 13 17 31. 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9. Network architecture for IPTV and OTT services . . . . . . . . . . . . . . Delivery chain of a multimedia service . . . . . . . . . . . . . . . . . . . . QuEM architecture design . . . . . . . . . . . . . . . . . . . . . . . . . . . Test sequences in ACR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test sequences in our proposed method . . . . . . . . . . . . . . . . . . . Questionnaire for subjective assessment tests . . . . . . . . . . . . . . . . Structure of the content streams in the subjective assessment test session Schematic representation of a modular headend . . . . . . . . . . . . . . . RTP header and extension introduced by the rewrapper processing . . . .. 37 45 46 51 51 51 53 54 56. 4.1 4.2 4.3 4.4 4.5 4.6. Video sequence used for qualitative analysis . . . . . . . . . . . . . . . . MSE and PLEP for all sequences under study, varying the loss position Detail of MSE and PLEP for all sequences under study . . . . . . . . . MSE vs PLEP (log scale) and linear fit . . . . . . . . . . . . . . . . . . . % of different macroblocks vs PLEP and linear fit . . . . . . . . . . . . % of different macroblocks and PLEP for all sequences under study, varying the loss position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results of the subjective assessment for Video Loss impairments . . . . Detailed results for each of the individual segments for Video Loss . . . Waveform of a lossy audio file . . . . . . . . . . . . . . . . . . . . . . . . Effect of audio losses: measured vs. expected . . . . . . . . . . . . . . . Short-length audio losses . . . . . . . . . . . . . . . . . . . . . . . . . . . Results of the subjective assessment for Audio Loss impairments . . . . Detailed results for each of the individual segments for Audio Loss . . . Results of TI and Contrast NR metrics . . . . . . . . . . . . . . . . . . . Results of the subjective assessment for Rate Drop impairments . . . . . Detailed results for each of the individual segments for Rate Drop . . . Results of the subjective assessment for Outage impairments . . . . . . Detailed results for each of the individual segments for Outage . . . . . Simplified transmission chain for real-time video . . . . . . . . . . . . . Decoding delay for video and audio components of a MPEG-2 Transport Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results for all the QuIDs mentioned in the chapter . . . . . . . . . . . .. . . . . .. 67 69 69 70 70. . . . . . . . . . . . . . .. 71 73 74 75 76 77 78 79 83 86 86 89 89 90. 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 4.21. xix. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . 93 . 96.

(20) xx. LIST OF FIGURES 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8. Example of the packet priority model . . . . . . . . . . . . . . . . Implementation of the prioritization model . . . . . . . . . . . . Effect of the window size in packet prioritization results . . . . . Values of MSE comparing random vs. priority-based packet loss Effect of varying the loss burst size . . . . . . . . . . . . . . . . . Contribution of each term to the prioritization equation . . . . . Effects of a limited bit budget to encode the priority . . . . . . . Priority-based HTTP Adaptive Streaming segment structure . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 103 104 107 107 108 109 110 115. A.1 Structure of the content streams in the subjective assessment test session 132 A.2 Summary of the subjective quality assessment test results . . . . . . . . . 133 A.3 Subjective MOS for a football sequence . . . . . . . . . . . . . . . . . . . 135.

(21) List of Tables 2.1. ACR and DCR evaluation scales . . . . . . . . . . . . . . . . . . . . . . . 19. 3.1. Priority values used in the RTP header extension . . . . . . . . . . . . . . 56. 4.1 4.2 4.3 4.4 4.5 4.6 4.7. Coefficient of determination (R2 ) of MSE vs PLEP fit for several video sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PLEP impairments analyzed in the subjective assessment tests . . . . . Audio losses analyzed in the subjective assessment tests. . . . . . . . . . Comparison of NR/RR results with subjective tests . . . . . . . . . . . Quality drops analyzed in the subjective assessment tests. . . . . . . . . Outage events analyzed in the subjective assessment tests . . . . . . . . Example Channel Change time ranges and their mapping to QoE . . . .. . . . . . . .. 71 72 78 82 85 88 94. 5.1 5.2 5.3 5.4. Priority value for each slice type . . . . . . . . . . . . . . . . . . . . . Values of the Aggregated Gain Ratio . . . . . . . . . . . . . . . . . . . Bit budget assignation to encode priority . . . . . . . . . . . . . . . . Minimum scrambling rate required to completely loss the video signal. . . . .. . . . .. 102 106 111 119. A.1 A.2 A.3 A.4 A.5 A.6 A.7 A.8 A.9. Video test sequences: bitrate and resolution Bitrate drops . . . . . . . . . . . . . . . . . Frame rate drops . . . . . . . . . . . . . . . Audio losses . . . . . . . . . . . . . . . . . . Macroblocking errors . . . . . . . . . . . . . Video freezing . . . . . . . . . . . . . . . . . Impairment sets . . . . . . . . . . . . . . . . Example of a sequence of impairments . . . Test sequences . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. 128 129 129 130 130 131 131 132 136. xxi. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . ..

(22)

(23) Abbreviations 3G. Third generation of mobile communication technology. ACR. Absolute Category Rating. AL-FEC. Application Layer Forward Error Correction. ARQ. Automatic Repeat reQuest. AVC. Advanced Video Coding (also H.264 or MPEG-4 part 10). CA. Conditional Access. CABAC. Context-Adaptive Binary Arithmetic Coding. CBR. Constant Bit Rate. CDN. Content Delivery Network. CoD. Content on Demand. DCR. Degradation Category Rating. DRM. Digital Rights Management. DSL. Digital Subscriber Line. DTS. DecodingTime Stamp. DVB. Digital Video Broadcasting. FCC. Fast Channel Change. FEC. Forward Error Correction. FR. Full Reference. GOP. Group Of Pictures. GPON. Gigabit-capable Passive Optical Network. HAS. HTTP Adaptive Streaming. HDS. HTTP Dynamic Streaming. HLS. HTTP Live Streaming. HNED. Home Network End Device. HTTP. Hypertext Transfer Protocol xxiii.

(24) xxiv. ABBREVIATIONS IDR. Instantaneous Decoding Refresh. IP. Internet Protocol. IPTV. Television over Internet Protocol. ITU. International Telecommunication Union. LMB. Live Media Broadcast. LTE. Long Term Evolution. MDI. Media Delivery Index. MOS. Mean Opinion Score. MPEG. Moving Picture Experts Group. MSE. Mean Square Error. MVC. Multi-view Video Coding. NAL. Network Abstraction Layer. NR. No Reference. OTT. Over The Top multimedia delivery services. PCR. Program Clock Reference. PLEP. Packet Loss Effect Prediction metric. PLP. Packet Loss Pattern. PLR. Packet Loss Rate. PSNR. Peak Signal to Loss Ratio. PTS. Presentation Time Stamp. QoE. Quality of Experience. QoS. Quality of Service. QuEM. Qualitative Experience Monitoring. QuID. Quality Impairment Detector. RAP. Random Access Point. RET. RETranmsission (synonym of ARQ). RGW. Residential Gateway. RR. Reduced Reference. RTP. Real-Time Transport Protocol. SS. Smooth Streaming. STF. Severity Transfer Function. TCP. Transmission Control Protocol. UDP. User Datagram Protocol.

(25) ABBREVIATIONS. xxv. VBR. Variable Bit Rate. VQEG. Video Quality Experts Group.

(26)

(27) To the loving memory of Juan To Teresa.

(28)

(29) Chapter 1. Introduction 1.1. Motivation. There is little doubt about the social relevance of the audiovisual delivery services since the beginning of the first television broadcasts. During the second half of the 20th century, broadcast television channels controlled the audiovisual market and were the main communication path for information, culture, and entertainment. But in the last decades, though the traditional broadcasters are still quite relevant players in the content marketplace, their offer has been complemented by a plethora of new services: IP television, video on demand, web video portals, user-generated content. . . The way in which contents are consumed is rapidly changing, and there are two technological drivers which have made this possible: digital video and IP networks. With the standardization of MPEG video in the 1990s, it became possible to consume video products at home with high quality and at an affordable cost. The popularization of the internet, at about the same time, brought the possibility to easily interconnect any two points in the world. The combination of both events allowed that video contents could be managed, stored, and distributed homogeneously with the rest of the information. Somehow, the distribution of video to the households had just become a problem of digital data communication and storage. And the main problem to solve was, consequently, finding enough bandwidth to fit the transmission requirements of video assets. The first decade of the 21st century witnessed a quantitative change which resulted in a qualitative jump: improvements in video codec technologies and in the capacity of the xDSL access networks allowed to distribute real time video over IP networks with a quality that could compete with that of television and DVDs. This gave birth to 1.

(30) 2. Chapter 1. Introduction. the television over IP (IPTV), which introduced real interactivity and personalization into the audiovisual ecosystem. And in few years time, with subsequent generations of technological improvements, it has been possible to obtain a competing service of video distribution even over the standard best-effort internet, in what has been called over-the-top video delivery (OTT). This has significantly reduced the barriers to entry the multimedia business. And, as this happens, new services are appearing beyond the classic television channels, covering from huge video-clubs over the internet to the distribution of personalized, or even user-generated, video content. Together with the evolution of the services, it comes the problem of how to provide them with enough quality for the end users. The transmission of high quality video can be demanding for the capabilities of IP networks, especially in the access segment. Errors happen, and service providers struggle to have them under control. The monitoring of Quality of Service (QoS) parameters, such as bit rate, packet loss rate, or delay, is not straightforward when the service is distributed over a complex IP network topology. And even when a suitable QoS monitoring system has been set up in the delivery service network, it shows insufficient. The interesting concept to monitor is not strictly the QoS, but the QoE: the Quality of Experience perceived by the final customer. There has been an important effort in the last decade to characterize the perceived quality of an audiovisual content, as well as to find algorithms able to model it. A first method is using subjective quality assessment tests, where a panel of viewers evaluate the perceived quality of the video clips under study. This can provide quite accurate information about video quality and user preferences, but at the high cost of having a group of users involved in the assessment. The complementary approach is developing objective quality metrics: algorithms which try to emulate the responses of those viewers by computer analysis of the video sequences. It has been a very active field of research, especially during the last decade. Dozens of algorithms have been developed, from simple measures of mean square errors between images, up to complex metrics which include information about the Human Visual System (HVS) perception and about the visual structure of the impairments introduced in the video by the coding and transmission chain. However, few of those methods have impacted the market relevantly. There are commercially available quality probes which implement this kind of algorithms, but they are typically used just to measure the quality of the video compression process, and not always in real time. For the monitoring of the quality in the distribution and access network, only network-based measures are used: packet losses, router failures... Moreover, in the recent years, the manufacturers of measurement equipment seem to have reduced the efforts to introduce these complex metrics in their equipments..

(31) Chapter 1. Introduction. 3. There are good reasons for that. Video QoE metrics are complex to develop and expensive to deploy in the field. They also cover a very specialized field of interest, frequently critical in the video headend and video production departments, but much rarer in the service definition and in the network operation. In many cases the teams operating the network already have an overwhelming amount of QoS data which is hardly possible to manage; so that there is little use of increasing the complexity of this information. Besides, monitoring algorithms need to be implemented in heavily-loaded routers or low-processing user terminals, thus requiring to be extremely lightweight in processing power needs, what may disqualify a large number of the metrics available in the literature. Finally, some metrics are even impossible to apply due to the unavailability of the information at the monitoring point, as it is the case, for instance, when parts of the video stream are encrypted by digital rights management (DRM) or conditional access (CA) systems. In summary, service providers are still using mainly QoS metrics to monitor their networks, but it happens because they are the ones which are applicably under the budgetary, computing, and information availability restrictions that they have to cope with. There is still room for improvement. And this thesis wants to be a step in this direction, trying to reduce the gap between QoE expertise and multimedia delivery service providers. The focus of the work is precisely analyzing how to model, monitor and manage the Quality of Experience under the mentioned restrictions. The research of the thesis has been carried out along the last 8 years in the framework of the Grupo de Tratamiento de Imágenes research group at Universidad Politécnica de Madrid, in parallel to a professional career in the multimedia competence center of Alcatel-Lucent in Madrid. In this time, services, products, research areas and standardization efforts have evolved significantly. During the first years of the research, the line that we are proposing in this thesis was almost inexistent in the most relevant journals, save for a couple of remarkable exceptions. In the recent years, however, there has been an increasing interest in the research and standardization of monitoring strategies which are easier to apply in real operation environments.. 1.2. Overview. The aim of this thesis is providing architecture, models and results which make it possible for multimedia service providers to control the Quality of Experience offered by their service in a way which is relevant for their interests, practical and better than QoSonly monitoring schemes. It intends to answer the most frequently asked questions that.

(32) 4. Chapter 1. Introduction. a service provider can raise about the QoE it is offering: which elements determine the quality of the multimedia stream, which are the most relevant impairments in the perceived quality, what causes them, and how can they be monitored, prevented, and minimized. The thesis proposes a comprehensive strategy to address this problem as a whole, as well as detailed solutions for most of its elements. Part of the inputs taken to create the approach presented in this thesis have come from the day-by-day experience of assessing IPTV service providers, designing solutions for them, and developing products for the content delivery market. All the assumptions taken in the development of the thesis will be supported either by the work itself or by previous works published in the scientific literature. However, broader decisions, such as the relevance of the problem to study or the general approach to it, are influenced by the experience of listening to the customer, capturing their requirements and understanding the advantages and disadvantages of different measurement schemes from a service provider point of view. This fact has no effect on the scientific quality of this work, but it may help understand its underlying motivation. As a consequence, the work is probably biased towards this application-oriented approach in two different ways. On the one hand, there is a stronger focus on the ideas and concepts, rather than on training of mathematical models or extensive analysis of experimental results. As it is virtually unaddressable to simulate the conditions of work of any possible service provider in the world, the research has been aimed at building models which have as less dependency as possible on the context where they are applied, or that can be easily adapted to any specific deployment. In a word: clean and generic models have been preferred to trained and optimized ones. On the other hand, there has been an explicit effort to be sure that any architecture or algorithm proposed in this thesis can be directly applied to real multimedia delivery services. And, in fact, some of them have already been included in products which are currently deployed in the field. The study starts by analyzing several aspects of the state of the art (Chapter 2). It defines what a multimedia delivery service is, which technologies it implies, and which are the most relevant problems to its quality. Although the market applicability of the multimedia services is quite wide, its underlying technological problem is much more restricted. The existing techniques to model, analyze, and monitor the multimedia quality are covered, with special focus on their applicability to content delivery services, and including the published studies which support or formalize the knowledge obtained by work experience. Chapter 3 contains guidelines to design a multimedia delivery service which takes into account the Quality of Experience. It describes a reference architecture model for the service with some QoE-specific elements. It also proposes an specific design for a monitoring.

(33) Chapter 1. Introduction. 5. system, which explicitly includes the most relevant requirements that any commercially deployable system should fulfill. The design is complemented with a methodology of subjective assessment tests that can be used to select, validate and calibrate its quality monitoring metrics. Chapter 4 dives into the quality metrics themselves. It presents a novel approach to predict the effect of packet losses on video quality, as well as some complementary metrics for audio losses, coding quality drops and outage. The effect of latency in the quality is analyzed as well. All the metrics also include the results of their respective subjective assessment tests. Chapter 5 shows some applications which derive from the previous work and go beyond the pure monitoring of quality. The knowledge of the effect of packet losses can be used as input to a packet prioritization model, usable for error protection in IPTV channels or to improve the error resiliency of HTTP Adaptive Streaming schemes. Other proposed applications are a method to increase the effect of selective scrambling and a system to reduce zapping time in IPTV and hybrid environments. Finally Chapter 6 presents the conclusions of the thesis, also summarizing which parts of it contain work which has been published in national and international scientific journals and conferences. There is also an appendix with some ancillary work: Appendix A, which describes the detail of some subjective and objective quality assessment tests used for several results in Chapters 3 and 4..

(34)

(35) Chapter 2. Understanding Quality of Experience 2.1. Quality of Experience and its relatives. Quality of Experience is defined as the overall acceptability of an application or service, as perceived subjectively by the end-user. It includes the complete end-to-end system effects (client, terminal, network, services infrastructure, etc.) and may be influenced by user expectations and context [43]. Some identifiable factors which impact in the QoE are the following [120, 121]: • Individual interests of the viewer on the content. • Audiovisual quality of the content. • Viewing conditions, screen resolution and type. . . • Interaction with the service or display device (e..g. zap time, remote control, EPG). . .. • Individual experience and expectations (previous experiences. . . ). The concept of Quality of Experience is therefore quite wide, including aspects from the subjective preferences of each user to the objective technical conditions under which the service was provided. Roughly speaking, there are elements related to the content itself (the movie, TV show. . . ) and others related to the service (how the content is delivered and presented to the end user). Most of the analysis of the Quality of Experience are restricted to the service-related factors, which can be effectively monitored and managed 7.

(36) 8. Chapter 2. Understanding Quality of Experience. from an engineering point of view: media compression and synchronization, network transmission performance, channel zapping time. . . [43] A step down in the abstraction scale, we find the audiovisual quality or multimedia quality (MMQ), which is the study of the quality of the video and audio signals (either separately or together). Within the framework of multimedia services, the multimedia quality is by far the most relevant element of the QoE, up to the point that both terms are frequently exchanged. Likewise, the analysis of MMQ is typically focused on the video quality, which is the most critical in most multimedia services. An additional concept is the multimedia Quality of Service (M-QoS, or just QoS). By QoS we understand the complete and uninterrupted delivery of the multimedia stream through the network, from one end of the communication to the other one. It is the quality offered by the transmission chain (from the output of the multiplexer to the input of the demultiplexer) [32] without taking into account the contribution of the encoder, decoder, capture, and display devices into the final quality. These three quality concepts have a tight relationship. The QoS describes the capabilities of the communication network (bandwidth, delay. . . ) and their possible degradations (bit errors, packet losses, jitter. . . ). It therefore limits the level of MMQ that can be obtained in two senses: on the one hand, limitations in bandwidth result in limitations in the coding quality of the sequence; on the other, QoS degradations can cause impairments in the transmitted multimedia signal and, hence, in its MMQ. The final QoE will have to do with the final MMQ, as well as with other factors which are influenced by the QoS: interactivity, end-to-end latency, zap time. . .. 2.2. A word about multimedia services. The concept of multimedia service which will be used in this work is, basically, the possibility of watching an audiovisual content at home, usually assuming as well that the content is also delivered to the household at the time when it is going to be viewed. Multimedia services, thus, have been universally present at homes for the last half a century, first in the form of television broadcasting and, later, with the possibility to watch recorded contents in video recording systems. However, in the recent years, this scenario has been evolving rapidly, with the irruption of at least three significant technology changes, which have led to the three most relevant families of existing multimedia delivery services. The first one was the switch from analog to digital video, which increased the availability of different television channels to the households, fostering the growth of channels.

(37) Chapter 2. Understanding Quality of Experience. 9. for specific target audiences (documentary, sport, children channels. . . ) and impacting strongly on the business models in the television marketplace. As a side (but relevant) effect, the experience of watching television changed, with increasing received quality (including high definition video), the appearance of new video defects, the raise of zapping times, the presence of Electronic Program Guides. . . This technology supports the existing television broadcast services: terrestrial, cable, and satellite. As a second step, some of those broadcast television services started their evolution towards all IP delivery networks [1]. IP delivery networks offer an easy integration with triple-play offers (voice, internet access and television), as well as inherent interactivity, which allows to deliver personalized services and, especially, Video on Demand (VoD), a remote access to stored video content (i.e. the experience of “renting a film in a video-club” integrated with the television service). A response to this evolution is the standardization of IPTV architectures, such as the DVB-IPTV [19], focusing on the delivery of continuous high-quality video services and covering the natural evolution of the television services (High Definition, stereoscopic video. . . ). And, in parallel, the deployment of IPTV platforms all over the world. The third technology change has been the irruption in the marketplace of the last generation of smartphones and tablets, which have given rise to new video delivery services, based on the streaming of multimedia content over unmanaged networks [23]. These services, which do not require a specialized end-to-end network for them, are experimenting a very fast growth. As an example, the website of the BBC delivered 106 million requests for online video during the recent Olympic Games of London 2012 [73]. The result is that, in the near future, multimedia services will have to handle a complex scenario comprising from 3.5-inch smartphone screens to 100-inch wall-mounted plasmas, covering the services coming both from the “television” and from the “internet” worlds [72][107]. Consequently, content sources will move in a wide range of formats and qualities, from the user-generated content in the social TV to the high-budget 3D movie produced by Hollywood studios. Nevertheless, the core of the multimedia delivery services is the same for all of them —television broadcasting, IPTV, or internet video—: taking a multimedia content and delivering it to an end user, providing the best possible Quality of Experience within the limitations imposed by the available network Quality of Service. In the rest of this section we will explore the common properties of all those multimedia services: the players or entities which take part in the service chain, the standards and protocols used to compress and transport the media stream, and the most relevant quality degradations or limitations that are present in those services. The focus will be on the multimedia.

(38) 10. Chapter 2. Understanding Quality of Experience. services over IP networks; but most of the concepts are applicable to other transmission means as well.. 2.2.1. Players. The first step in the analysis of multimedia services is characterizing the players and their roles. We will use the model proposed by the DVB-IPTV standard [19], and depicted in figure 2.1. This model is applicable to most service scenarios and it has the advantage of showing the relationship between the different players (or “domains”) and their relationships regarding the OSI layer model.. Figure 2.1: Layer and domain model for multimedia services. The Content Provider is “the entity that owns or is licensed to sell content or content assets”. The Content Provided may have direct relationship with the end user for the management of usage rights to the content, or it can even be the entity which has the commercial agreement with the end user (the end user being then a direct customer of the Content Provider). However, regarding the content flow, the Content Provider delivers content assets only to the Service Provider. The content offered by the Content Provider is already “finished”, in the sense that it is a content asset which is deliverable to an end user (a TV channel, a live event, a movie. . . ). All the complexity of the content generation is outside this model and out of the scope of our work..

(39) Chapter 2. Understanding Quality of Experience. 11. The Service Provider is “the entity providing a service to the end-user”. This is the one with has direct logical connection with the end user for the purpose of delivering video content. The Service Provider is also the responsible of controlling the Quality of Experience offered to the end user, and therefore the subject of the quality monitoring services covered in our work. The Delivery Network is “the entity connecting clients and service providers”. According to DVB-IPTV, “the delivery network is transparent to the IP traffic, although there may be timing and packet loss issues relevant for A/V content streamed on IP”. In the practice, however, the Service Provider will need to impose specific requirements to the delivery network, what leads into two different delivery scenarios: • “Managed IPTV” (or simply “IPTV”). The Service Provider controls (and typically owns) the end-to-end IP distribution to the Home domain. The most relevant. implication here is that it is possible to distribute UDP traffic over IP multicast with sufficient Quality of Service. This scenario has been the most important (sometimes the only one) for the last years, and therefore it has also been the main focus of our research and of this work. • “Over The Top” content (or simply “OTT”). Video delivery is done “over the top” of the internet, i.e., using a delivery network which is neither owned nor controlled. by the Service Provider. As such, some of the IPTV-related delivery network features (multicast support, controlled QoS) are not available. In this context, however, Service Providers normally make use of (or even own) Content Delivery Networks (CDNs). CDNs are distributed networks which deliver the video content in an efficient way to points of presence which are closer to the end users, thus shortening the part of the delivery chain which goes really “over the top”. Home is “the domain where the A/V services are consumed”. The Home domain is property of the content consumer (the end customer or subscriber), and includes the User Terminal —or Home Network End Device (HNED), using DVB-IPTV terminology. Due to the fact that IPTV is traditionally delivered to a TV screen, the Home domain is normally depicted as the end user’s own home. However, the User Terminal may be also a mobile device with direct connection to the Delivery Network. The Home domain may, but does not need to, include a home local area network.. 2.2.2. Coding standards and transport protocols. The multimedia codec and transport technologies used in IPTV and OTT services result from the ones used in digital television. There are several families of digital television.

(40) 12. Chapter 2. Understanding Quality of Experience. standards around the world: Digital Video Broadcasting (DVB), adopted in Europe, Africa, Australia and parts of Asia; Advanced Television System Committee (ATSC), used mainly in North America; Integrated Services Digital Broadcasting (ISDB), used in Japan and most of Central and South America; and Digital Terrestrial Multimedia Broadcast (DTMB), adopted in China. All of them are quite similar in their basis: transport of audiovisual services, multiplexed in MPEG-2 Transport Stream, over different physical media and using different modulation techniques. When needed, we will take DVB as a reference, considering that the differences with other standards will be almost insignificant for the purposes of our work. DVB (and others) standardize the transport of audiovisual services multiplexed in MPEG-2 Transport Stream [36]. Video elementary streams are coded in MPEG-2 video [37] or MPEG-4 AVC/H.264 [38], while audio is coded in MPEG-1, MPEG-2, Dolby AC3, or MPEG-4 AAC [18]. Both video codecs use similar concepts for compression: motion prediction (to make use of temporal redundancy), block transformations (to make use of local spatial redundancy), quantification of transform coefficients, entropy coding of the resulting data, and package of data into a bitstream which add some headers of meta information (such as delimitation and characterization of the different video frames). Besides, audio codecs are also quite similar among them in the basic concepts (encoding of different frequency sub-bands of a block of audio samples). As a result, the key elements which affect multimedia quality will be very similar among all the different scenarios for digital television, regardless of the underlying transport. Both IPTV and OTT platforms may offer several different services around the distribution of multimedia content. However, we will focus here on the pure delivery of content assets to the Home domain. In both cases, there are two basic service types:. • Live content (Live Media Broadcast, or LMB, in DVB-IPTV terminology). The most typical examples are the live broadcast TV channels, which still are the main contributor in IPTV deployments and one of the most popular audiovisual services in any deployment. Its most important property is the real-time constraint: the end-to-end latency must remain constant for the whole play out of the stream to avoid discontinuities in the received multimedia session. Live content must be ingested, processed, and delivered by the Service Provider in real time. • On-demand content (Content on Demand, or CoD, in DVB-IPTV terminology).. This content is pre-loaded by the Content Provider into the Service Provider domain. It may take some time for the Service Provider to process it before it is ready for its delivery to the end user..

(41) Chapter 2. Understanding Quality of Experience. 13. Figure 2.2: Protocol stack for multimedia services over IP. Those audiovisual services are delivered over IP. Figure 2.2 shows the protocol stack used for this purposes, where there is a clear differentiation between IPTV and OTT protocol families: • MPEG-2 TS / RTP / UDP / IP. This is the standard scenario for an IPTV deployment over managed network, as considered in [19], [76], and [55]. It follows a push paradigm: the server controls the bit rate of the delivery. • HTTP Adaptive Streaming (HAS) / TCP / IP. This is the upcoming scenario for. OTT environments. It follows a pull paradigm: the client decides which video segments it downloads and when.. HTTP Adaptive Streaming (HAS) is a solution used to deliver multimedia content to users where the bitrate is adapted to the network. Although the distribution of video over the internet can be done in dozens of different ways, the use of adaptive streaming is becoming the most popular one, especially in the context of OTT services offered by IPTV service providers [75]. It is also natively supported by most smartphones, tablets, and set-top-boxes. HAS works as follows: the content is encoded at a specific bitrate as a concatenation of small segments, each containing a few seconds of the stream, with the property that at the video segment boundaries the terminal can switch from one variant (at a particular bitrate) to another (at a different bitrate) without any visible effects on the screen or the audio. Each of these segments is accessible as an independent asset with its own URL, so once it is present in an HTTP server it can be retrieved by a standard web client using pure HTTP mechanisms..

(42) 14. Chapter 2. Understanding Quality of Experience. There are several different HAS implementations. The most widespread distributed in the market come from the initiative of individual companies: Apple HTTP Live Streaming (HLS), Microsoft Smooth Streaming (SS), and Adobe HTTP Dynamic Streaming (HDS). All of them are based in the same principles and use similar codecs. Their main differences are the signaling of the segments and the multiplexing layer: HLS uses MPEG-2 Transport Stream while SS and HDS use extensions of the ISO base media file format. MPEG has also recently standardized a proposal for HTTP adaptive streaming called MPEG DASH (Dynamic Adaptive Streaming over HTTP) [39]. MPEG DASH supports both MPEG-2 TS and ISO file format profiles.. 2.2.3. Artifacts. The “perfect” possible media quality for a multimedia service is the quality of the audiovisual content just after the production process has finished. This reference “production quality” shows the product exactly as its creators wanted it to be. Of course, there might be defects in the capture, recording and production process, but, in a professional product, it is reasonable to assume that they will be very rare and with a small impact in the perceived quality. Producers must then deliver their products to the service provider. This is usually done encoding the content with a very lightweight compression, to avoid a perceptible loss of quality, giving as a result a product with “contribution quality”. It can be assumed that a product with contribution quality has the highest possible multimedia quality, with no perceptible visual or sound artifact or impairment. However, due to the impairments produced in the delivery chain, the final multimedia quality received by the end users may be far from the contribution quality. We will consider three main types of impairments, according to the place where they are generated: compression artifacts, transmission errors, and display errors [113]. Other terminologies and classifications are also possible [2, 7]. Compression artifacts are defects introduced when compressing the video from contribution to distribution quality, which must fit into the bitrate budget that the service provider has reserved for that specific media stream. In this compression process, several impairments can be introduced [105]: • Blocking effect appears as a pattern of square-shaped blocks in the compressed image. It is caused by the independent quantification of adjacent groups of pixels,. which are processed in 4x4, 8x8, or 16x16 blocks, which leads to discontinuities in the block boundaries. This effect is easy to appreciate due to the regularity of the.

(43) Chapter 2. Understanding Quality of Experience. 15. generated pattern, and it is typically the most salient defect in MPEG-2 video. In AVC video it is partially mitigated by the use of smaller blocks and the effect of the deblocking filter. • Blurring is the loss of spatial detail and edge sharpness in the image. It is generated. by the application of strong quantification in the high frequency components, and it is emphasized by the application of deblocking filters, thus being typically the most relevant artifact in AVC video.. • Flickering is a defect introduced in highly textured regions which are compressed with different quantification factors along time (normally having higher quality. in key frames than in predicted frames). As a result, the coding quality of those regions fluctuates periodically along time, and so does the perceived detailed level. • Ringing (also known as Gibbs effect) produces ring-like periodic intensity variations around image edges in areas which should not have a perceptible texture. It is caused by a strong quantification of high frequency coefficients in edgy regions. • Chromatic dispersion is produced by the suppression of high frequency components. in the chrominance signal, resulting in cross-talk and loss of color definition in areas with strong color variation.. • Motion jerkiness is caused by the use of a smaller frame rate than the one needed to properly display the image motion.. Transmission errors are produced by the loss, corruption or excessive delay of some packets in the transmission chain, which results in stream discontinuities or buffer underrun events in the receiver. They typically result in stronger versions of the compression defects: • Macroblocking: a highly visible blocking effect produced by the loss of video in-. formation, which forces the receiver to build the picture using wrong references (normally repeating a correctly received frame instead of the lost one). The result is a strong blocking pattern, sometimes also causing other perceptual artifacts (parts of the image, typically blocks or horizontal stripes, with a different color or texture than what they should).. • Freezing or continual jerky motion, caused by the unrecoverable loss of video frames.. • Mute or audio glitches, caused by the loss of packets with audio information. • Outage or temporal loss of service due to network problems..

(44) 16. Chapter 2. Understanding Quality of Experience. Finally there is an heterogeneous set of errors that can be caused in the user terminal and display, such as an incorrect aspect ratio display [113] or a malfunction in the terminal itself. Transport errors are normally the most damaging for the perceived QoE. In an study done with a real IPTV deployment [7], it was shown that about 82% of the multimedia quality impairments reported by customers were directly related to them: “Breaking Up Into Blocks” (macroblocking, 29%), “Screen Freezes” (20%), “Choppy Screen Transitions” (or jerky motion, 18%) and “Distorted audio” (mute or glitches, 15%). As the customers were requested to report perceived errors, it is possible that a fraction of them were caused in the encoding process. However, the description of the errors as given by customers suggest that most of them refer to the “stronger” (and more visible) effects of the artifacts, i.e. the ones resulting from transmission errors. The additional 18% of the errors is divided into “Edges Shimmer” (11%), visible artifacts around edges in the image (caused by coding artifacts, as the edges are one of the places where they are more visible), and “Error Stoppage” (7%), or problems with the end terminal (which “has to be reset”).. 2.3. Who is who in the QoE metrics. In contrast with the relatively fast standardization of audio [41] and speech [51, 52] qualities, the efforts to standardize video quality metrics have produced slower results [15]. The Video Quality Experts Group (VQEG) has been the most relevant contributor to this standardization process [111, 112], producing an extensive evaluation of quality metrics which has led to some standardization initiatives [45, 46, 47, 48, 49, 50]. The study of the multimedia quality, and more specifically of the video quality, has been of great interest for the last 15 years, and therefore it is relatively easy to find good surveys, reviews and classifications of the different existing metrics and approaches [15, 33, 78, 92, 121]. This section will present the most used classification of video quality assessment strategies, as well as some example methods which are relevant for our work. More detailed surveys can be found in the given references. The first division in the quality assessment approaches is between subjective and objective methods. Subjective quality assessment implies having a panel of users watching the target content and evaluating its quality by giving a score to each fragment of content under study. The result is normally presented in terms of Mean Opinion Score (MOS), which is the average of the results from the different users, maybe with some statistical processing such as the removal of outliers. Objective quality assessment is done.

(45) Chapter 2. Understanding Quality of Experience. 17. automatically by computing processes which analyze the multimedia stream to produce some quality values. In most cases, the aim of objective metrics is providing MOS values which correlate well with those provided by subjective assessments, which are used as benchmark.. Figure 2.3: Models for objective quality assessment: Full-reference method (top), Reduced-reference method (middle), No-reference method (bottom). Objective quality assessment methods can be classified into three different types, depending on how much information they use from the original signal (see Figure 2.3):. • Full-Reference (FR). The impaired signal is compared with the original one to. obtain a quality value. This is the most appropriate method to use in cases where it is possible to have access to the original and impaired signals simultaneously (for instance, to analyze the compression defects introduced by a video encoder).. • Reduced-Reference (RR). A reduced description of the original and impaired signals are generated, and they are compared to produce a quality value. This model. is useful when the original signal is not available in the measurement point (for instance, when they are at different points in the network), but it is possible to receive ancillary data through a lower bitrate channel. • No-Reference (NR). The quality measure is generated only by analyzing the impaired signal, without having any information about the original. This is the most. generic model, because it can be introduced in a non-intrusive way at any point of the transmission chain..

(46) 18. Chapter 2. Understanding Quality of Experience. A second classification criterium for objective metrics refers to the type of data they use, having: • Picture metrics, which operate in the baseband domain, analyzing the pixel values of the original and/or decoded frames to produce their results.. • Bitstream metrics, which operate in the coded domain, analyzing the video stream. without fully decoding it or, in some cases, analyzing just the quality of service information (losses, delays. . . ). Bitstream metrics are usually No-Reference as well.. 2.3.1. Subjective quality assessment. The aim of the quality assessment is knowing, for a specific set of content assets and impairments, which would be the opinion of an average user. As such, the best way to know it is in fact asking the users. Subjective quality assessment methods provide guidelines about how to ask users about multimedia quality in the most effective way. There are several standards which provide these methods of subjective assessment, mainly the ITU-R BT.500 [42], ITU-T P.910 [53], and ITU-T P.911 [54]. All of them are quite similar in the way they propose to structure, perform and evaluate tests. Most of the subjective assessment tests reported in the literature are based on these standards, being the VQEG validation tests the most relevant example [119]. In test sessions, a number of “subjects” are asked to watch a set of audiovisual clips and rate their quality. The total number of viewers for a test must be between 4 and 40 (they can be effectively distributed in different viewing sessions). In general, at least 15 observers should participate in the experiment. They should not be professionally involved in multimedia quality evaluation, and they should have normal or correctedto-normal visual acuity and color vision. The location and the displays where the tests are conducted must comply with a set of requirements regarding lighting, screen brightness and contrast, distance and angle from viewers to screen. . . Guidelines are provided to work either with professional monitors or with domestic TV sets [42]. Sessions should not last more than half an hour. At the beginning of the session, viewers are presented with a set of example clips where they can see the type of defects that they are supposed to judge. The content samples to be evaluated may be preceded by about five “dummy presentations”, whose results are not taken into account, to stabilize the.

(47) Chapter 2. Understanding Quality of Experience. 19. observers’ opinion. Besides, the video clips under study should be distributed randomly along the session. Table 2.1: ACR and DCR evaluation scales. 5 4 3 2 1. ACR Excellent Good Fair Poor Bad. DCR Imperceptible Perceptible but not annoying Slightly Annoying Annoying Very Annoying. Different evaluation strategies are used. Although there are some variations in the details from one standard to another, they are basically the following [54]: • Absolute Category Rating (ACR), or Single Stimulus method (SS). The test sequences are presented one at a time and are rated independently on a category scale. After each presentation, the subjects are asked to evaluate the quality of the sequence presented using an absolute scale, normally with five levels (see Table 2.1). Nine-level and eleven-level rating scales are also suggested to increase resolution, but they do not seem to produce significantly different results [35]. • Degradation Category Rating (DCR), or Double Stimulus Impairment Scale method (DSIS). In this case, each presentation consists of two different video clips: the reference content (without impairments) and the processed or impaired version of the same content. Both videos are watched consecutively, and the subject is asked to rate the impairment of the second stimulus in relation to the reference. Five-level scales are also used (see Table 2.1). • Pair Comparison method (PC). Test sequences are presented in pairs as in the case. of DCR, but now the sequences are two different processed versions of the same original one (i.e. with two different levels or types of impairments). After each pair is presented, the subject has to select which one is preferred in the context of the test scenario.. • Single Stimulus Continuous Quality Evaluation (SSCQE). This method considers. long-duration sequences (3 to 30 min). While the sequence is being played, subjects are asked to continuously evaluate the quality of the sequence, normally by controlling a slider.. The proposed duration of sequences is about 10 seconds, including another 10-second period (showing a grey screen) to vote each of the sequences. When sequence pairs are.

(48) 20. Chapter 2. Understanding Quality of Experience. used (DCR and PC), both sequences within a pair should be separated by a short (about 2 seconds) grey screen.. 2.3.2. Full-Reference quality metrics. Full Reference metrics compare the original and impaired versions of the sequence, thus having access to more information than RR or NR metrics. For this reason, FR metrics have been the first ones to be developed and they also are the ones which produce more accurate results. Video engineers have used for years simple FR objective metrics such as the Peak Signal to Noise Ratio (PSNR) or the Mean Square Error (MSE) of the impaired video with respect to the reference. They are computed as follows: MSE =. M −1 N −1 1 � � �I(i, j) − K(i, j)�2 MN. (2.1). i=0 j=0. PSNR = 10 log10. �. (max I)2 MSE. �. (2.2). where I(i, j) and K(i, j) are the two compared images, whose size is M × N pixels, and max I is the maximum possible intensity value for any pixel in the image (for instance, 255 for 8-bit pixel values). These metrics compare the pictures on a pixel-by-pixel basis, ignoring the image structure, and their capability to predict the perceived MOS is quite limited. However, they are still used for some applications, and especially as benchmark for other FR quality metrics: the acceptability criterium for any FR quality metric is having a correlation with subjective MOS which is significantly better (statistically speaking) than that obtained by PSNR [111]. The first attempts to improve the performance of PSNR and MSE resulted from the application of psychophysical models of the Human Vision System (HVS) to improve the measurements, in a way that has been known to produce good results in the audio quality estimation (and in the development of audio codecs) [78, 120]. A second family of FR algorithms appeared with a different approach: trying to detect impairments related to the known processing applied to the image, the expected impairments that can appear or, in general, how the image is affected from the image point of view. Some metrics having this “engineering approach” [121] were able to outperform the PSNR in the second round of the VQEG tests for television signals [111]. They are.

(49) Chapter 2. Understanding Quality of Experience. 21. the ones included in the ITU-T Recommendation J.144 [45], the first standard for FR video quality metrics: • BTFR (BT Full Reference). It makes a weighted linear composition of several. individual measures, such as: percent of correctly estimated blocks, PSNR of matching blocks, segmental PSNR (error in the matching vectors), energy of edge differences, texture degradation and pyramidal PSNR.. • EPSNR (Edge PSNR). It measures the PSNR between both images, considering only the regions where there are edges. The result is afterwards scaled non-linearly to generate a MOS value. • CPqD-IES. Image is segmented in three regions: flat, edges and textured. The. Absolute Sobel Difference (ASD) is computed for each region: the result of applying a Sobel filter and finding the MSE of the resulting images. The result is introduced into a trained model to obtain the final MOS value.. • VQM. This metric computes also seven different parameters of the image, which are afterwards added linearly with experimentally obtained weights. Measured. features are: loss of spatial information, loss of horizontal and vertical edges, gain of horizontal and vertical edges, chroma spread, spatial information gain at edges, errors in high-contrast areas end extreme chrominance errors. An implementation of VQM is publicly available on the internet [89]. Subsequent test projects of the VQEG have resulted in additional ITU-T Recommendations for slightly different scenarios. For instance, ITU-T J.341 [49] introduces VQualHD, another FR metric specialized for HDTV contents, which combines picture similarity, spatial degradation, and temporal degradation to obtain a quality metric. ITU-T J.247 [47] proposes metrics for multimedia environments, more focused on “internet” frame resolutions and bit rates (lower than in digital television, as a general rule). ITUT J.147 [46] proposes embedding hidden data in the original signal and measure their degradation in the received one. Additionally to them, it is relevant to mention the Structural Similarity Index (SSIM) [116]. SSIM considers image degradation as perceived change in structural information. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close. The metric is computed over several windows in the image, and its value between two windows x and y (assumed to be in the same position of two different images) is: SSIM(x, y) =. (2µx µy + c1 )(2σxy + c2 ) + µ2y + c1 )(σx2 + σy2 + c2 ). (µ2x. (2.3).

(50) 22. Chapter 2. Understanding Quality of Experience. where µ represent the average, σ 2 the variance and σxy the covariance of the signals, and c1 and c2 are constants used to stabilize the division when the denominator is small. Although the metric has some limitations [13], SSIM has becoming increasingly popular over the recent years, since it seem to offer better results than PSNR while being a simple metric to implement (the source code is available on the internet as well). In any case, most of the FR metrics (and especially the ones included in ITU-T recommendations) have been specifically designed to be able to provide good MOS estimations for relatively subtle impairments, such as the ones generated by video encoders. However, when the errors are generated by packet losses or other network problems, and therefore are more aggressive perceptually, PSNR, SSIM and VQM show reasonably good correlation with MOS [40]. For such cases, it can be more useful to use simpler metrics (such as PSNR or SSIM) rather than the complex schemes proposed by the standards.. 2.3.3. Reduced-Reference quality metrics. The basic strategy used to design Reduced-Reference metrics is extracting a set of statistic parameters that characterize the video and compare them between the original and the impaired sequences (see [15] for a short survey). We can difference between two types of features:. • Features which describe image properties: temporal and spatial information [63, 98, 117], structural similarity [106], image statistics [114]. . .. • Known impairments on the image, normally by applying No-Reference quality. estimators in both pictures (original and impaired) and comparing the results [16].. Simple RR measures can be combined to generate a more complex metric, in a similar way that FR metrics are generated from complex measures. This is the case of the RR metrics selected by the RR-NR project of VQEG [112], which are now included in the ITU-T Recommendations J.249 (for Standard Definition TV)[48] and J.342 (for High Definition)[50]:. • Yonsei University metric. It is a Reduced Reference version of the EPSNR included in ITU-T J.144 [45]. The algorithm selects some pixels in the edge region of the. original image and computes its PSNR with the same pixels in the impaired image..