I. ASPECTOS DE LA INVESTIGACIÓN
2.2.5. PROCESO CONSTRUCTIVO
4:4:4
For every 2x2 Y Pixels
4 Cb & 4 Cr Pixel
(No subsampling)
Y Pixel
Cb and Cr Pixel
4:1:1
For every 4x1 Y Pixels
1 Cb & 1 Cr Pixel
(Subsampling by 4:1
horizontally only)
Figure1.8. BT.601chrominancesubsamplingformats.Notethatthetwoadjacentlines inanyonecomponentbelongtotwodierentelds.
thechrominancecomponentsaresubsampledalong each linebyafactorof 4,i.e.,
there are 1 Cb sample and 1 Cr sample for every 4 Y samples. This sampling
method,however,yieldsveryasymmetricresolutionsinthehorizontalandvertical
directions. Anothersamplingformatisthereforedeveloped, which subsamplesthe
Cb and Cr components by half in boththe horizontal and verticaldirections. In
this format, there are also 1 Cb sample and 1Cr sample for every 4 Y samples.
But to avoidthe confusion with the previouslydened 4:1:1 format, this format
is designated as 4:2:0. For applications requiring veryhigh resolutions, the 4:4:4
formatisdened,whichsamplesthechrominancecomponentsinexactlythesame
resolution asthe luminance components. The relativepositions of the luminance
andchrominancesamplesfordierentformatsareshownin Fig.1.8.
12
InChap. 4, we will discuss solutionsfor converting videos with dierent spa-
tial/temporalresolutions. Theconversionbetweendierentcolorsubsamplingfor-
matsisconsideredinoneoftheexerciseproblems.
TherawdataratesofaBT.601signaldependsonthecolorsub-samplingfactor.
Withthemostcommon4:2:2format,therearetwochrominancesamplespertwoY
samples,eachrepresentedwith8bits. Therefore,theequivalentbitrateforeachY
sampleis N
b
=16 bits,andtherawdata rateisf s
N b
=216Mbps. Therawdata
ratecorrespondingtotheactiveareais f s;t f 0 s;y f 0 s;x N b =166Mbps. Withthe4:2:0
format,therearetwochrominancesamplesperfourYsamples,andtheequivalent
bitrateforeachYsampleisN b
=12bits. Thereforetherawdatarateis162Mbps, with124Mbpsin theactivearea. Forthe4:4:4format,theequivalentbitratefor
eachYsampleisN
b
=24bits,andtherawdatarateis 324Mbps,with249Mbps
in the activearea. Theresolutions and data ratesof dierentBT.601 signalsare
12
summarizedin Table1.3.
TheBT.601formatsareusedinhigh-qualitydigitalvideoapplications,withthe 4:4:4and4:2:2formatstypicallyusedforvideoproductionandediting,whereas4:2:0 forvideodistribution,e.g.,moviesondigitalvideodisks(DVD),video-on-demand
(VOD),etc. TheMPEG2
13
videocompressionstandardwasprimarily developed
forcompressionofBT.6014:2:0signals,althoughitcanalsohandlevideosinlower orhigherresolutions. Atypical4:2:0signalwitharawactivedatarateof124Mbps
can becompresseddowntoabout4-8Mbps. Wewill introducetheMPEG2 video
codingalgorithminSec.13.5.
1.5.3 Other Digital Video Formats and Applications
InadditiontotheBT.601format,severalotherstandarddigitalvideoformatshave
been dened. Table 1.3 summarizes these video formats, along with their main
applications, compression methods, and compressed bit rates. The CIF (Com-
monIntermediateFormat)isspeciedbyInternationalTelecommunicationsUnion-
TelecommunicationsSector(ITU-T),whichhasabouthalftheresolutionofBT.601
4:2:0inbothhorizontaland verticaldimensionsandisdevelopedforvideoconfer- encingapplications,andtheQCIF,whichisaquarterofCIF,usedforvideophone
typeapplications. Botharenon-interlaced. TheITU-TH.261codingstandardwas
developedtocompressvideosineitherformattop64Kbps,withp=1;2;:::;30, fortransportoverISDN(integratedservicedigitalnetwork)lines,whichonlyallow
transmission rates in multiples of 64 Kbps. Typically, a CIF signal with a raw
data rateof 37.3 Mbps canbecompressed down to about128to 384 Kbps, with
reasonablequality, while aQCIFsignal with arawdata rate of 9.3 Mbpscan be
compressed to 64-128 Kbps. A later standard, H.263, can achieve better quality
thanH.261, at thesamebit rate. Forexample,it ispossibleto compressaQCIF
picturetoabout20Kbps,whilemaintainingaqualitysimilarorbetterthanH.261
at64Kbps. Thisenablesvideophoneovera28.8Kbpsmodem line.
InparallelwiththeeortofITU-T,theISO-MPEGalsodenedaseriesofdigital
videoformats. TheSIF(Source IntermediateFormat)is essentiallyaquarter size
oftheactiveareaintheBT.601signal,andisaboutthesameasCIF.Thisformat
is targeted for video applications requiring medium quality, such asvideo games
andCDmovies. AswithBT.601,therearetwoSIFformats: onewithaframerate
of 30 Hzand aline number of 240,and another with aframe rateof 25 and line
numberof288,bothhave352pixels/line. ThereisalsoacorrespondingsetofSIF-I
format,whichis2:1interlaced. TheMPEG-1algorithmcancompressatypicalSIF
videowitharawdatarateof30Mbpstoabout1.1Mbpswithaqualitysimilarto
theresolutionsseenonaVHSVCR,whichislowerthanbroadcasttelevision. The
rateof1.1MbpsenablestheplaybackofdigitalmoviesonCD-ROM's,whichhave
anaccess rateof1.5 Mbps. DistributionofMPEG1moviesonvideoCD's(VCD)
markedtheentranceofdigitalvideointo theconsumermarket intheearly1990's.
13
MPEG2-based DVD's, which started in mid 90's, opened the era of high quality digitalvideoentertainment. MPEG2technologyisalsothecornerstoneofthenext generationTVsystem,whichwillbefullydigital,employingdigitalcompressionand transmissiontechnology. Table1.3 liststhe details ofthe videoformatsdiscussed
above, alongwith theirmain applications, compression methods, and compressed
bitrates. Moreoncompressionstandardswill bepresentedinChap.13.
TheBT.601 format is the standard picture formatfor digital TV (DTV). To
furtherenhancethevideoquality,severalHDTV formatshavealsobeenstandard-
ized bythe Society of Motion Pictureand Television Engineers(SMPTE), which
are alsolisted in Table1.3. A distinctfeature ofHDTV isits wideraspect ratio,
16:9 as opposed to 4:3 in SDTV. The picture resolution is doubled to tripled in
bothhorizontal andverticaldimensions. Furthermore,progressivescanis used to
reduce theinterlacingartifacts. Ahigh prolehasbeendeveloped in theMPEG2
videocompressionstandardforcompressingHDTV video. Typicallyitcanreduce
thedataratetoabout20Mbpswhileretainingtheveryhighqualityrequired. This
videobit rate is chosenso that thecombinedbit stream with audio, when trans-
mitted using digital modulation techniques, can still t into a 6 MHz terrestrial
channel,whichistheassignedchannelbandwidthforHDTVbroadcastin theU.S.
1.5.4 Digital Video Recording
Tostorevideoin digitalformats,various digitalvideotaperecorder(DVTR)for-
matshavebeendeveloped,whichdierinthevideoformathandledandtechnology
forerror-correction-codingand storagedensity. Table1.4 listssomestandardand
proprietary tape formats. The D1-D5 formats store a video in its raw, uncom-
pressedformats,whileotherspre-compressthevideo. Onlyaconservativeamount
ofcompressionisemployedsoasnotto degradethevideoqualitybeyondthat ac-
ceptablefortheintendedapplication.AgoodreviewofdigitalVTRscanbefound
in [11]. Aextensivecoverageontheunderlyingphysicsof magneticrecordingand
operationofDVTRscanbefoundin thebook byWatkinson [12].
Inaddition to magnetic taperecorders, VCD and DVDare twovideostorage
devices using optical disks. By incorporating MPEG1 and MPEG2 compression
technologies, they can store SIF and BT.601 videos, respectively, with suÆcient
quality. At present, VCD and DVD are read-only, sothat they are mainly used
fordistributionofpre-recordedvideo,asopposedtoastoolsforrecordingvideoby consumers.
Exceptvideorecordingsystemsusingmagnetictapes,hard-diskbasedsystems,
such as TiVo and ReplayTV, arealso on thehorizon. These systemsenable con-
sumers torecord upto30 hoursof live TVprogramsontohard-disks in MPEG-2
compressed formats, which can be viewed later with usual VCR features such as
fast forward, slow motion, etc. They also allow instant pause of a live program
Table1.3. DigitalVideoFormatsforDierentApplications
Video Y Color Frame RawData
Format Size Sampling Rate (Mbps)
HDTVoverair,cable,satellite,MPEG2 video,20-45Mbps
SMPTE296M 1280x720 4:2:0 24P/30P/60P 265/332/664
SMPTE295M 1920x1080 4:2:0 24P/30P/60I 597/746/746
VideoProduction,MPEG2,15-50Mbps
BT.601 720x480/576 4:4:4 60I/50I 249
BT.601 720x480/576 4:2:2 60I/50I 166
Highqualityvideodistribution(DVD,SDTV),MPEG2,4-8Mbps
BT.601 720x480/576 4:2:0 60I/50I 124
Intermediatequalityvideodistribution(VCD,WWW),MPEG1,1.5Mbps
SIF 352x240/288 4:2:0 30P/25P 30
VideoconferencingoverISDN/Internet,H.261/H.263,128-384Kbps
CIF 352x288 4:2:0 30P 37
Videotelephonyoverwired/wirelessmodem, H.263,20-64Kbps
QCIF 176x144 4:2:0 30P 9.1
mayeventuallyovertaketape-basedsystems,whichareslowerandhavelessstorage capacity.
1.5.5 Video Quality Measure
To conduct videoprocessing, it is necessary to dene an objective measure that
can measurethe dierencebetweenanoriginal videoand theprocessedone. This
isespeciallyimportant,e.g.,invideocodingapplicationswhere onemustmeasure thedistortioncausedbycompression. Ideallysuchameasureshouldcorrelatewell
withtheperceiveddierencebetweentwovideosequences. Findingsuchameasure
howeverturnsouttobeanextremelydiÆculttask. Althoughvariousqualitymea-
sureshavebeenproposed,thosethatcorrelatewellwithvisualperceptionarequite
complicated to compute. Most videoprocessing systemsof today aredesigned to
minimize the mean square error (MSE) between twovideo sequences
1 and 2 , whichisdenedas MSE= 2 e = 1 N XX m;n ( 1 (m;n;k) 2 (m;n;k)) 2 ; (1.5.5)
Table1.4. DigitalVideoTapeFormats
Tape Video Source Compressed Compression Intended
Format Format Rate Rate Method Application
Uncompressedformats
SMPTED1 BT.6014:2:2 216Mbps N/A N/A Professional
SMPTED2 BT.601composite 114Mbps N/A N/A Professional
SMPTED3 BT.601composite 114Mbps N/A N/A Professional/
Consumer
SMPTED5 BT.6014:2:2 270Mbps N/A N/A Professional
(10bit)
Compressedformats
DigitalBetacam BT.6014:2:2 166Mbps 80Mbps FrameDCT Professional
BetacamSX BT.6014:2:2 166Mbps 18Mbps MPEG2 Consumer
(IandBmodeonly)
DVCPRO50 BT.6014:2:2 166Mbps 50Mbps frame/eldDCT Professional
DVCPRO25(DV) BT.6014:1:1 124Mbps 25Mbps frame/eldDCT Consumer
where N is the total number of pixels in either sequence. For acolor video, the
MSEiscomputedseparatelyforeachcolorcomponent.
Instead of the MSE, the peak signal to noise ratio (PSNR) in decibel (dB) is
moreoftenusedasaqualitymeasure invideocoding. ThePSNRisdened as
PSNR=10log 10 2 max 2 e (1.5.6) where max
isthepeak(maximum)intensityvalueofthevideosignal. Forthemost
common 8 bit/colorvideo,
max
= 255:Note that for a xed peak value, PSNR
is completely determined by the MSE. The PSNR is more commonly used than
theMSE, becausepeople tend to associatethequalityof animage withacertain
range of PSNR.As a rule of thumb, for the luminance component, aPSNR over
40dBtypicallyindicates anexcellentimage(i.e., beingveryclosetotheoriginal), between30to40dBusually meansagoodimage(i.e., thedistortionisvisiblebut
It is worth noting that to compute the PSNR between two sequences, it is
incorrectto calculatethePSNRbetweeneverytwocorrespondingframesandthen
taking the average of the PSNR values obtained overindividual frames. Rather
oneshould computetheMSEbetweencorrespondingframes,averagetheresulting
MSEvaluesoverallframes,andnally converttheMSEvaluetoPSNR.
A measure that is sometimes used in place of the MSE, mainly for reduced
computation,isthemeanabsolutedierence(MAD).Thisisdened as
MAD= 1 N X k X m;n j 1 (m;n;k) 2 (m;n;k)j: (1.5.7)
For example, for motion estimation, the MAD is usually used to nd the best
matchingblockinanotherframeforagivenblockinacurrentframe.
It is well known that MSE orPSNR does not correlate verywell with visual
distortionbetweentwoimagery. Butthese measureshavebeenusedalmostexclu-
sivelyasobjectivedistortionmeasuresinimage/videocoding,motioncompensated
prediction,andimagerestoration,partlybecauseoftheirmathematicaltractability, andpartlybecauseofthelackofbetteralternatives. Designingobjectivedistortion measures thatare easyto computeandyet correlatewell withvisualdistortion is
stillanopenresearchissue. Inthis book,wewillmostlyuseMSEorPSNRasthe
distortionmeasure.
1.6 Summary
ColorGeneration, Perception,andSpecication (Sec.1.1)
Thecolorofalightdependsonitsspectralcontent. Anycolorcanbecreated
bymixingthree primarycolors. Themostcommonprimarysetincludesred,
green,and bluecolors.
Thehumaneyeperceivescolorbyhavingreceptors(cones)intheretinathat
are tuned to red, green, and blue wavelengths. Thecolor sentation can be
describedby three attributes: luminance (i.e., brightness), hue (colortone),
andsaturation(colorpurity). Thehumaneyeismostsensitivetoluminance,
thento hue,andnallytosaturation.
A color can be specied by three numbers: either those corresponding to
the contributions of the three primary colors (i.e., tristimulus values), ora
luminance valueandtwochrominancevalues.
AnalogVideo(Sec. 1.3)
AnalogvideosusedinbroadcastingTV,videocamcorder,etc.,videodisplay,
Interlacedscan is a mechanism to trade o vertical resolution forenhanced temporalresolution. Butitalsoleadstointerlacingartifacts.
AnalogColorTV Systems(Sec. 1.4)
There are threeanalogcolorTV systemsworldwide: NTSC, PAL,and SE-
CAM. Theyall use2:1interlace,but dierin framerate, linenumber,color
coordinate,andluminanceand chrominancemultiplexing.
IncolorTVsystems,theluminanceandtwochrominancecomponentsaswell
as the associated audio signal are multiplexed into a composite signal, us-
ing modulation (frequency shifting) techniques. The multiplexing methods
are designed so that the colorTV systemis downwardcompatible with the
monochromeTV system. Furthermore,themodulationfrequencies forindi-
vidual componentsarechosentominimizetheinterferenceamongthem.
DigitalVideo(Sec. 1.5)
BT.601isadigitalvideoformat,resultingfromsamplingtheanalogcolorTV
signals. Thesampling rate ischosenso that the horizontal samplingrate is
similar to theverticalsampling rate,and that thedata ratesforNTSC and
PAL/SECAMsystemsarethesame.
Thechrominancecomponentscanbesampledatalowerratethanthelumi-
nance component. There are dierent colorsubsampling formatsdened in
BT.601.
Compression is necessaryto reduce the raw data rate of a digital video to
reduce thestorage/transmissioncost. Dierentvideocompression standards
havebeendevelopedforvideos intendedfordierentapplications.
1.7 Problems
1.1 Describethemechanismbywhichthehumanbeingperceivescolor.
1.2 What is the perceived color if you havea light that hasapproximatelythe
same energy at frequencies corresponding to red, green, and blue, and are
zeroatotherfrequencies? Whataboutredandgreenfrequenciesonly?
1.3 What is the perceived color if you mix red, green, and blue dyes in equal
proportion? Whataboutredand greendyesonly?
1.4 For thefollowingcolorsintheRGB coordinate,determinetheirvaluesinthe
1.5 Forthefollowingcolorsin thedigitalRGBcoordinate,determinetheirvalues
intheYCbCrcoordinate.
(a)(255,255,255);(b)(0, 255,0);(c) (255,255,0);(d)(0,255,255).
1.6 InSec.1.5.2,wesaythatthemaximumvalueofC r
correspondstored,whereas
theminimumvalueyieldscyan. Similarly,themaximumandminimumvalues
of C b
correspond to blue and yellow, respectively. Verify these statements
usingtheYCbCrtoRGB coordinatetransformation.
1.7 In Fig. 1.4, we show the spectrum of a typical raster signal. Why is the
spectrumofthevideosignalnearlyperiodic? Whatdoesthewidthofharmonic
lobesdepend on?
1.8 Whataretheprosandconsofprogressivevs. interlacedscans? Forthesame
linenumberperframe, whatis therelationbetweenthemaximumtemporal
frequencythat aprogressiverastercanhaveand that ofaninterlaced raster
which divides each frame into two elds? What about therelation between
themaximumverticalfrequencies?
1.9 In Sec.1.4.3, we estimated the bandwidth of the NTSC signal basedon its
scan parameters. Following the same approach, estimate the bandwidth of
thePALandSECAMsignals.
1.10 Describetheprocessforforming acompositecolorvideosignal. Howshould
youselectthecolorsub-carrierfrequencyandaudiosub-carrierfrequency?
1.11 Whataretheprosandconsofusingcomponentvs. compositeformat?
1.12 Project: Using anoscilloscopeto i) drawthewaveform,and ii)measure the
spectrumofacompositevideosignaloutputfromaTVset oracamcorder.
1.13 Project: DigitizeacompositevideosignalusinganA/Dconverter,andusing
Matlab to determine the spectrum. Also perform ltering to separate the
luminance,chrominanceandaudiosignals.
1.8 Bibliography
[1] K. B. Benson, editor. Television Engineering Handbook. McGraw Hill, 1992.
RevisedbyJ.C.Whitaker.
[2] J.F.Blinn. NTSC:nicetechnology,supercolor.IEEEComputer Graphicsand
Applications Magazine, pages17{23,Mar.1993.
[3] R.M.Boynton.HumanColorVision.Holt,Rinhart,Winston,NewYork,1979.
[5] B. Grob and C. E. Herndon. Basic Television and Video Systems. Glencoe
McGrawHill,6thedition,1999.
[6] Y. Hashimoto,M. Yamamoto, and T. Asaida. Camerasand display systems.
Proc. ofIEEE, pages1032{1043,July1995.
[7] B.G.Haskell,A. Puri,andA.N. Netravali. DigitalVideo: An Introductionto
MPEG-2. Chapman&Hall,NewYork,1997.
[8] ITU-R.BT.601-5: Studioencodingparametersofdigitaltelevisionforstandard
4:3andwide-screen16:9aspectratios,1998. (FormerlyCCIR601).
[9] A.N. NetravaliandB.G.Haskell. DigitalPictures- Representation,Compres-
sionandStandards. PlenumPress,2ndedition,1995.
[10] D. H. Pritchard. US color television fundamentals. IEEE Trans. Consum.
Electron.,CE-23:467{78,1977.
[11] M.Umemoto,Y.Eto,andT.Fukinuki.Digitalvideorecording.Proc.ofIEEE,
pages1044{1054,July1995.
[12] J. Watkinson. The Art of Digital Video. Focal Press, Oxford, 2nd edition,
1994.
[13] G.WyszeckiandW. S.Stiles. ColorScience. JohnWiley,NewYork,1967.
[14] T.Young. Onthetheoryoflightandcolors. Philosophical Transactionsof the
FOURIER ANALYSIS OF
VIDEO SIGNALS AND
PROPERTIES OF THE
HUMAN VISUAL SYSTEM
Fourieranalysisisanimportanttoolforsignalanalysis.Weassumethatthereader isfamiliarwithFouriertransformsforone-andtwo-dimensional(1Dand2D)spaces as well as signal processing tools using such transforms. In this chapter, we rst
extendtheseresultsto K-dimensions(K-D),where K canbeanypositiveinteger.
Wethenfocusontheirapplications forvideosignals,whicharethree-dimensional
(3D). Wewill explore the meaningof spatial and temporal frequencies,and their
inter-relationship. Finally,wediscussvisualsensitivityto dierentfrequencycom- ponents.
2.1 Multi-dimensional Continuous Space Signals and Systems
Mostofthetheoremsandtechniques formulti-dimensionalsignalsandsystemsare
direct extensions of those developed for 1D and 2D signalsand systems. In this
section,weintroducesomeimportantconceptsandtheorems forsignalanalysis in
the K-Dreal space, R
K =f[x 1 ;x 2 ;:::;x K ] T jx k
2 R;k 2 K g; where Ris the set
of real numbers, and K = f1;2;:::;Kg. We start by dening K-D signals, com-
monoperationsbetweenK-Dsignals,andspecialK-Dsignals. Wethendenethe
Fourier transform representation of K-D signals. Finally, we dene K-D systems
andpropertiesofthelinearandshiftinvariantsystems. Thispresentationisinten- tionally kept brief. We also intentionally leaveout discussion of the convergence conditionsofvariousintegralformulations. Foramoresubstantialtreatmentofthe