low cost framework A for real-time marker based 3-D human expressionmodeling Technology Journal of Applied Researchand

(1)

Availableonlineatwww.sciencedirect.com

Journal of Applied Research and Technology

www.jart.ccadet.unam.mx JournalofAppliedResearchandTechnology15(2017)61–77

Original

A low cost framework for real-time marker based 3-D human expression modeling

Alexander Woodward

^a

, Yuk Hin Chan

^b

, Rui Gong

^b

, Minh Nguyen

^b

, Trevor Gee

^b

, Patrice Delmas

^b

, Georgy Gimel’farb

^b

, Jorge Alberto Marquez Flores

^c^,∗

aDepartmentofGeneralSystemsSciences,TheGraduateSchoolofArtsandSciences,TheUniversityofTokyo,Japan

bDepartmentofComputerScience,TheUniversityofAuckland,NewZealand

cCentrodeCienciasAplicadasyDesarrolloTecnológico,UniversidadNacionalAutónomadeMéxico,CircuitoExteriorS/N,CiudadUniversitariaAP70-186, C.P.04510,México,D.F,Mexico

Received7April2016;accepted18November2016 Availableonline22February2017

Abstract

Thisworkpresentsarobust,andlow-costframeworkforreal-timemarkerbased3-Dhumanexpressionmodelingusingoff-the-shelfstereo web-camerasandinexpensiveadhesivemarkersappliedtotheface.Thesystemhaslowcomputationalrequirements,runsonstandardhardware, andisportablewithminimalset-uptimeandnotraining.Itdoesnotrequireacontrolledlabenvironment(lightingorset-up)andisrobustunder varyingconditions,i.e.illumination,facialhair,orskintonevariation.Stereoweb-camerasperform3-Dmarkertrackingtoobtainheadrigidmotion andthenon-rigidmotionofexpressions.Trackedmarkersarethenmappedontoa3-Dfacemodelwithavirtualmuscleanimationsystem.Muscle inversekinematicsupdatemusclecontractionparametersbasedonmarkermotioninordertocreateavirtualcharacter’sexpressionperformance.

Theparametrizationofthemuscle-basedanimationencodesafaceperformancewithlittlebandwidth.Additionally,aradialbasisfunction mappingapproachwasusedtoeasilyremapmotioncapturedatatoanyfacemodel.Inthiswaytheautomatedcreationofapersonalized3-Dface modelandanimationsystemfrom3-Ddataiselucidated.

Theexpressivepowerofthesystemanditsabilitytorecognizenewexpressionswasevaluatedonagroupoftestsubjectswithrespecttothesix universallyrecognizedfacialexpressions.Resultsshowthattheuseofabstractmuscledefinitionreducestheeffectofpotentialnoiseinthemotion capturedataandallowstheseamlessanimationofanyvirtualanthropomorphicfacemodelwithdataacquiredthroughhumanfaceperformance.

Keywords:Facialmotioncapture;Markerbasedmotioncapture;Expressionrecognition;Lowcost;Stereovision

1. Introduction

This work presents a complete system for modeling the appearanceandexpressionsofavirtualcharacterbasedonthe facialmovementofahumansubject.Stereoweb-camerasper- formmarkerbasedmotioncapturetoobtainheadrigidmotion andthenon-rigidmotionofexpressions.Tracked3-Dpointsare thenmappedontoa3-Dfacemodelwithavirtualmuscleani- mationsystem.Muscleinversekinematics(IK)updatesmuscle

∗Correspondingauthor.

E-mailaddress:[email protected](J.A.MarquezFlores).

PeerReviewundertheresponsibilityofUniversidadNacionalAutónomade México.

contraction parametersbased onmarker motion tocreate the character’sexpressionperformance.

Adesigngoalwastoallowany 3-Dhumanfacemodelto be used withany setof facemotion capture data.Thisgives theabilitytoretrofitexistingmodels.Italsoallowsthesystem todriveawidevariety offacemodels thatdo nothavetobe humaninnature. To dothis, mapping of markersinto anew facespacewasperformedwithradialbasisfunctions(RBFs), wheretherequiredcorrespondencesbetweenmarkersandver- ticesofthemeshwerepredefined.Thisproceduretakesonlya fewminutesandneedonlybedoneonceforacertainmarkerset andfacemodel,sincethemarkertemplate,markercorrespon- dences,andRBFmappingcoefficientscanbesavedforfuture use.

http://dx.doi.org/10.1016/j.jart.2017.01.002

1665-6423/©2017UniversidadNacionalAutónomadeMéxico,CentrodeCienciasAplicadasyDesarrolloTecnológico.Thisisanopenaccessarticleunderthe CCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).

(2)

Theadvantagesofthesystemaremultipleandarewithexist- ingsystems,describedinSection2andcomparedinSection8.

Overall,oursystemislow-costasonlystandardhardwareand camerasarerequired,withnoneedforspeciallighting,making thesystemavailabletoawiderangeofusersandsituations.Ithas alowset-uptimewithnoneedforlongandrepetitivetraining whenusersarechanged.The systemisrobustunderdiffering conditions,i.e.illumination,facialhair,skintone.Despiteusing off-the-shelfself-mademarkers,itgenerateslessnoiseintracked pointsandismorereliablethanmanymarkerlessapproaches includingthosetriedinSection8.

Ouranimationmodelusesabstractmuscledefinition,there- foreitprovidesausefulconstraintonpossiblefaceexpressions andthiscanhelpdealwithnoiseinthe motion capturedata.

Also,theabstractmuscledefinitionallowsmappingandanima- tionbetweendifferentfacemodels,offeringastraightforward waytoanimateawidenumberof 3-Dmodels withthe same animationdata-e.g.capturingahumanfaceperformanceand applyingittoavirtualanthropomorphicanimalface.

Wealsocombineareducedcharactercreationtimebyusing virtualmuscles,anovelwayofcombiningmarkertrackingwith afaceanimationsystemusingmuscleinversekinematics(IK) anditslowcosthardwarerequirements,capableofrunningon standardhardware.OurframeworkrequiresonlyastandardPC withweb-camerasandcoloredmarkers(self-adhesivelabels), availablefromstationerysuppliers.Thisfulfilsamajordesign goaltomakethesystemsuitableforend-userenvironments.As anextension, providedminimalchanges,aMicrosoft Kinetic couldbeusedtoextractdepthandmarkerpositionsthatcould bemappedtoourcurrentsetupinsteadofstereocameras.

Marker based motion capture was chosen for its ease of motiontracking,dealingwithnoisyenvironmentsandproviding lowcomputationalcostsandasimplealgorithmicformulation overmarkerlessbasedapproaches.Apairofstereocameraswas chosenoverasinglecameraapproachtoaccuratelyestimate3-D markerpositions.Inversekinematicsiswidelyusedinrobotics as atool for animating articulated figures andto reduce the amountofdataneededtospecifyananimationframe(Welman, 1993).WefocusontheJacobiantransposeapproachtoinverse kinematics,Baxter(2000),foritscomputationaltractabilityand easeofimplementation.Thisresearchaimstocreateacomplete systemfor expressionmodeling,toinvestigate theexpressive powerofaminimalsetofmarkers,andtopursueananimation approachthatismodelbasedinsteadofimagebased.

Itishypothesizedthat givenasuitablycomplex animation systemaminimalnumberofinputmarkerpositionsareneededto createrealisticexpressions.Thisassumesthatnoisehasminimal influenceonaccuracyandcanbesubsumedbytheanimation system.ThisworkparallelsChoe,Lee,andKo(2001),however theirapproachrequiredfaceandmotiondatafromthesametest subject.Inoursystemradialbasisfunctions(RBFs)wereused toalignandpairmarkerdatawithcorrespondingpointsonan arbitrary3-Dhumanfacemodel.Thisdecouplesthesubject’s faceshapefromtheshapeofthe3-Dmodel.Inadditionitwas foundthatthe3D model’sshapedifferednoticeablyfromthe actualsubject’s marker locations. Therefore using RBFswas importantforobtainingvisuallyrealisticresults.

Thisarticleisstructuredas follows:firstly,relatedworkis presented inSection2followedbyadesignoverviewinSec- tion3,listingthecorecomponentsofthesystem.The3Dface animationsystemisdescribedinSection4.Thenadescription ofthemarkertrackingprocessisdescribedinSection5.Marker mapping betweentrackedpointsanda3Dfacemodelisthen describedinSection6.Thisincludesanovelwayofcreatinga personalizedfacemodelfrom3Dreconstructiondata.Anovel methodtodrivefaceanimationthroughageneralmuscleinverse kinematicsframeworkisthengiveninSection7.Aselectionof results andanalysis ofsystemperformanceexplorethe effec- tiveness of the generatedexpressionsin Section9. Then, the systemsabilitytorecognizeexpressionsistestedinSection10.

Finally,theconclusionandfutureworkarepresented.

2. Relatedwork

Thereareanumberoffacialexpressioncapturetechniques thatcanbegroupedintoeithermarkerormarkerlessapproaches.

Marker basedapproacheshavebeen themainstayof industry applications, duetotheir reliabilityintrackingandlowcom- putational cost.On the other hand,marker based approaches are algorithmicinnatureandofferthepossibilityoffreeinga performerfrom requiringalengthysetupprocedure.Because of this, a greatdeal of researchhasbeen placed ondevelop- ingthem.Furthermore,(facial)motioncapture’simportancefor animationhasresultedinitbecominganintegralcomponentof commercial3-DmodelingpackagessuchasMaya(2013)and 3Ds(2013).

Expression retargeting refers to techniques that take the expressive performanceof onesubject andretarget it ontoa 3-Dmodelof differentproportions.Thisisusefulforprojects thatusefantasticalcreaturesasavatarsthathaveanthropomor- phicfacialexpressions.Acommonapproachistouseasetof expressionbaseshapes,oftencalled“blend-shapes”,whichare blendedtogetherandselectedbasedonretargetedmotiondata, e.g. Liu, Ma,Chang, Wang, andDebevec(2011) andWeise, Bouaziz,Li,andPauly(2011a).Asanotherexample,thework of Stoiber,Seguier,andBreton(2010)usedadatabaseof2-D facemotiontocreateasetof“bases”-ourapproachiscompu- tationallysimplerandmoreaccessible,butour3-Dexpression data couldbe used insuch aframeworktogive moredepth information.

Famous examples of marker based motion capture canbe foundinthemoviesPolarExpress(thefacialexpressioncapture of Tom Hanks’character), andinAvatar, whereanumberof painted markerswere trackedby miniaturecameras mounted ontotheactors.Interestingly,thefacialanimationforthepopular character Gollumof the Lordof the Ringstrilogy, wasartist created andonlyinspired by the facial performance of actor AndySerkis(Gollum,2013).

OneexampleofamarkerbasedapproachistheLightStage, used insuch moviesas Spiderman2. The basicapproach,as describedin(Hawkinsetal.,2004),used30strobelights,placed inamoveablesemicirculararc,whichwereusedtocapturethe reflectancepropertiesofanactor’sface.Additionally,3Ddataof theactor’sfaceshapewasrecordedunderdifferentexpressions.

(3)

Markermotioncapturewas thenused todriveaperformance whichblendsandwarpsthedifferentexpressionstogether.

The Playable Universal Capture system, described in Borshukov,Montgomery,andWerner(2006),presentsanother technique for facial animation.A high-resolution scan of an actor’sfacewastaken,whichwasthenmappedandanimated using separatelyrecorded marker based motion capture data.

Facetexturewasacquiredasavideostreamandmappedtothe 3-Dmodel, synchronizedtothemotion captureperformance.

Anumber of facial expression clips can be smoothly cycled between,butoverallthisisnotanapproachwhosemotiondata canbeeasilyretargetedtootherfacemodels.

Afewexamplesofsuccessful markerlessfacemotion cap- tureapproachesareMova(2004),amarkerlessfacialexpression capturesolutionthat requiresthe application ofspecial phos- phorescent make-up. The work of Chai, Xiao, and Hodgins (2003), where a database of motion capture was correlated withtracked facefeature pointstocreate newperformances.

Andlastly,thewellknownActiveAppearanceModel(AAM) (Cootes,Edwards,&Taylor,1998;Xiao,Baker,Matthews,&

Kanade,2004)approach.Here,textureandshapevariationwere modeledthroughpriorlearningonadatabase.Expressiontrack- ingismade possibleifsets of expressionsare alsoannotated inthedatabase.However,thecomputationalcomplexityofthe approach,alongwiththetrainingstage,aremuchhigherthanthe approachpresentedhere.Also,theproportionsofunseenfaces mustsuccessfullylie withinthe expressionspaceconstructed fromthedatabaseforthesystemtoworkproperly-oursystem isnotrestrictedinthissense.

The MirrorMoCap system by Lin and Ouhyoung (Lin &

Ouhyoung,2005)usesmorethan300fluorescentmarkersanda mirrorsystemtoaccuratelytracktheshapeoftheface.Theuse ofthemirrorsystemallowsawiderangleofthefacetobecap- turedwithasinglecamera.Thesystemoperatesatsub-video framerates, and requires specialised illumination andmirror equipmentsetup.

Sifakis,Neverov,andFedkiw(2005)presentedafacialani- mationmethod using inverse kinematics similarto our work here.Howevertheirsystemdoesnotdealwithreal-timeexpres- sioninformation from cameras.Their system also requires a muchhigher numberof datapointsonthe facemodelandis designedforusewithprofessionalmotioncapturedata.

The Faceshift system by Weise, Bouaziz, Li, and Pauly (2011b)usesadepthandtexturesensor(e.g.Kinect)toenable facial expressioncontrol of a digital avatarin real-time.The systemisnon-intrusive,requiringnomarkerstobeplacedon thesubject, butrequirestraining tomodelasubject’s expressions.Thetrainingstepadaptsauser’sexpressionstoageneric blendshapemodelandisperformedoff-linepriortorecording thedesiredperformance.Aftertraining,thesystemiscapable ofreconstructingfinedetailsofanexpression,suchasthelips andeyebrowshapeandmovement.Thesystemisavailableasa commercialproduct¹withasubscriptionfeeof$1500USDper year.

1 www.faceshift.com

Weobtainedatrialversionoftheproducttoevaluateitseffec- tiveness. Training requires theuser toperform 21 predefined facialexpressions,namely,neutral,open mouth,smile, brows down,browsup,sneer,jawleft,jawright,jawfront,mouthleft, mouthright,dimple, chinraised, kiss,lipfunnel,frown,‘M’, teethshown,puff,chew,lipdown.Theuserisrequiredtorotate the headtotheleft andright duringthe expressionrecording toobtaindataforthesideofthefacemodel.Duringourtest- ing,wefoundthatanumberofexpressionssuchas‘lipdown’

weredifficulttoperformastheyrequireunnaturalmusclemove- ments. Howeverthedevelopersmention thesystemwillhave morenaturalexpressionstotraininfutureversions.

Overallthe training tookbetween15and30min. Wealso foundthatitwassometimeshardtoproperlyreplicatesomeof the21predefinedfacialexpressions.Thiscouldleadtobadly trained and represented 3-D face, with far from satisfactory results.Withaproperlytrainedsystem,wereplicatedthesixuni- versalexpressions.WeobservedthattheFaceshiftsystemisin generalfluid,responsiveandcapabletotracksomeexpressions.

Theuser-specific3-Drepresentationdidbearsomeresemblance totheoriginalandthereplicatedexpressionsdidsomehowrelate totheoriginalexpressions.Faceshiftstillhadtroublewithcor- rectlyidentifyingfacialhairandfinemovementofmusclesand requirestrainingforeachnewuserwhichcontradictsthegoal ofoursystemi.e.auniversaluser-independentsystemrequiring notraining.

Choeetal.(2001)modelsthefaceusing13differentfacial musclesalongwithaskindeformationmodelusingafiniteele- ment model (FEM) for the skin surface. Facial performance captureisaccomplished bythreecalibratedandsynchronized camerasobserving24retro-reflectivemarkersgluedtotheface.

The steepestdescentmethodisused tofindmuscle actuation parametersthatbestfitthemarkerpositions.Similartoourwork, expressionretargetingisdoneusingthemuscleparametersfrom observedexpressionstoanimateanotherface.

Acommonfeatureamongstmarkerbasedsystemsisahigh hardwarerequirementandlengthysetupprocedure.Weinstead focusonwhatcanbedonewithstereoweb-camerasandcheap andeasilyappliedadhesivestickers.In addition,oncemotion capturedataisacquireditgenerallyneedstobecleanedupor manipulatedbeforeitisinauseablestate.Welookatremoving thisdifficultybyusinganRBFretargetingapproach,projecting motionvectorsontovirtualmusclevectorstoconstrainmove- mentstovalidones.Byautomatingourapproachwealsowantto removethehighlearningcurveoftheprofessional3-Dmodeling softwareenvironmentsforfacialexpressioncreation.

Regarding the applications of this technology, the investi- gation ofmoodandemotionsdatesback tothe seminalwork of Ekman andFriesen (1971). With the emergence of social interfacesandvirtualworldstherehasbeenarenewedinterest inwaystodirectlyinputormodifycontentbasedonausers’

instantaneousmoodoremotion(Kaliouby&Robinson,2004).

TheexplorationofemotionsandmoodsinHCIhasledtoappli- cationssuchasthepopularMobiMood,whereusersconveytheir moodstatusviashorttypedmessages.Suchapplicationshavea strongappealonmobilephonesandincreasingcomputingpower and3-Ddisplaycapabilities(Tegra,2013)shouldallowamore

(4)

direct andautomatedway toconveymood andemotions.As such,ourpresentedworkalignswellwiththecurrenttrendsand needswithinthemulti-mediadomainforthepursuitofthebet- terunderstanding,generationanddetectionoftheinformation expressedbyaface(Rachurietal.,2010).

This research expands and improves on that presented in Woodward,Delmas,Gimel’farb,andMárquez(2007).

3. Systemdesign

The system comprises three modules: facial animation, markertracking,andmarkertomodelassignmentwhichlinks theothertwomodules.Eachmoduleisdescribedinbriefhere andthenelaborateduponintheirrelevantsections.

1 Facialanimation:Thismodulecomprisesthe3-Dfacemodel withapredefinedvirtualmuscleanimationsystem,described inSection4.Theabstractmuscleapproachmeansthataface model does not need to be designed withknowledge of a certainmarkerconfiguration,makingiteasytochangemarker setupsontheface.Apersonalizedfacemodelwascreatedto matcheach test subject using theRBF mapping approach, describedinSection6.2.

2 MarkerTracking:Themarkertrackingmodulecapturesstereo images usingtwo off-the-shelf Fire-iweb-camerasrunning overthe 1394bus.Itprovides synchronizedframeacquisi- tion at 30 fps and camera calibration routines andmarker templatecreationfortracking.Markerprojectionsareassoci- atedbetweenstereoimageframesandcanbetriangulatedto recover3-Dpositions.Trackingcanthenberecordedtofile, each frame being time-stamped. The stereo system makes pointtrackingmorerobustcomparedtoasinglecamerasys- tem sincewe areabletotriangulatepointsusing thestereo geometry.

3 Marker tomodel assignment: Thismodule creates a map- pingbetween3Dmarkersandthemodelforcalculatingthe animation.Doingsoallowsanymarkerconfigurationandtest subjecttobeusedwithanymeshthatisfittedwiththefaceani- mationmodule.Oncemappingiscompleted,faceanimation canbedrivenbythemarkersthroughinversekinematics.

SoftwarewaswritteninC++andusedMicrosoftMFCforthe GUIandOpenGLfor3-Dvisualization.Thesystemoperateson aWindowsbasedplatformusingamediumgradePC.

AstereoconfigurationoftwoUnibrainFire-iweb-cameras running over a IEEE-1394a (Firewire) interface for synchro- nizedvideostreamswasused.TheFire-icameraswerechosen as theyarecapable offrame ratesup to30fps andallowfor colorbasedmarkertracking.IEEE1394waschosenoverUSB 2.0asacamerainterfaceasithasbettersupportforindustrial camerasandeasyaccessandcontrolofIIDC-1394compliant camerasthrough the CMU1394camera driverandAPI.The Fire-icameraoperateat640×480pixelresolutionandhavea 4.3mmfocallength.

Cameraswereplacedverticallyversussidebysidetoallow for themaximumfieldofviewof markerandtoavoidtrack- ing difficulties for markers placed on the sides of the face.

Tsai’sgeometric cameracalibration(Tsai, 1987)was usedto estimateintrinsicandextrinsiccameraparameters.Oncemark- ers wereidentifiedinthestereoimagepair,knowledgeof the systemgeometrywasnecessaryforrecovering3-Dmarkerloca- tionsthroughtriangulation.Acalibrationboxwith63circular markingswasused.Experimentsinreconstructingcalibration markingswithknowntruelocationshaveshownthattheerrors are stablebetween0.4mand1mfromtheworldorigin,with ameanerrorofapproximately1mmandstandarddeviationof 0.48mm.However,inpracticethedelineationandlocalization ofcoloredmarkersismorepronetonoisethanthecalibrationtar- get–noticeableinthejitterseeninrecoveredmarkerpositions.

Thetestsubjectwasplacedapproximately1mfromthecameras.

4. 3-Dfacemodelandexpressionanimation

Afacialanimationsystemusingavirtualmuscleapproach, first implemented and described in Woodward and Delmas (2004),wasextendedforthisresearch.Twenty-onevirtualmus- cles wereplacedinanatomicallybasedpositionswithina3D humanfacemodelas detailedinTable1 andFig.1.Muscles aredefinedseparatelyfromthefacemodel,providingflexibility in designand givean abstract description of a facialexpres- sion.Movableeyeswerecreatedsothefacecouldalteritsgaze.

In addition,jaw movementwas modeled by rotating vertices aroundajawaxis.Theanimationapproachwasbasedonwork by Parke, Water, Terzopolous andLee;their GeoFace model servedasabasemesh(Lee,Terzopoulos,&Waters,1995;Parke

&Waters,1996;Terzopoulos&Waters,1990;Waters,1987).

The face mesh can be deformed using a computationally efficientgeometricapproachwhereverticesaremovedindepen- dentlybythevirtualmuscles,asshowninFig.2.Muscleswere positioned usingan interactiveapplication developedinC++.

Expressions are represented as a vector of muscle activation parametersandfaceanimationperformancescanbegenerated throughasimplescriptingsystemthatcontrolstheseparameters overtime.

4.1. Musclemotiondeﬁnitions

Alinearmuscleandanellipsoidmusclewereimplemented toapplydirectdeformationofthemesh.Facemodelverticesthat

Table1

Animationsystemabstractmusclelist.

Number Musclename Muscletype Location

1-2 Zygomaticmajor Linear L-R

3-4 Frontalismajor Linear L-R

5-6 Frontalissecondary Linear L-R

7-8 Orbicularisoculi Ellipsoid L-R

9 Orbicularisoris Ellipsoid Mouth

10-11 Frontalisouter Linear L-R

12-13 Frontalisinner Linear L-R

14-15 Lateralcorugator Linear L-R

16-17 Innerlabiinasi Linear L-R

18-19 Labiinasi Linear L-R

20-21 Angulardepressor Linear L-R

(5)

Fig.1.Facialanimationsystemmuscleplacement.Musclenamesaregivenin Table1.

Fig.2.Thefacemeshbefore,left,andafter,right,alinearmusclecontraction.

areinfluencedbyaparticularmusclearedeterminedatsystem startup.

Linear muscle: this has an origin, v2, specifying muscle attachment to the skull, and an anchor point v1, specifying theinsertionintotheskintissue.Consequently,m=v1−v2is known asthe muscle vector. Verticesinside aradialdistance fromv2 andinsideanangularrangetothemuscle vectorare affectedbythemuscle.

Avertexpofthefacemesh,undertheinfluenceofaparticular linearmuscle, movesto anewlocation, p,by the following equation:

p=p+akr q

|q| (1)

where k is the contraction increment, q=p−v2 and a=1−cos(α)/cos(ω)isanangularscalingfactorthat reduces movementwhentheanglebetweenmandq,α,increases.The rangeofangularinfluenceofthemuscleisgivenbyω.Theradial

displacementscalingfactor,r,affects p basedonitsdistance fromthemuscleoriginv2:

r=

⎧⎪

⎪⎪

⎨

⎪⎪

⎪⎩ cos

π 2

D−R_s (Rf−R_s)

if Rs <=D<Rf

cos

π 2(1− D

R_s)

if D<Rs

(2)

whereD=|q|,Rf isthelimitofamuscle’sinfluenceandRs is thedistancefromwhichamuscle’sinfluencestartstodecrease tozeroatthemuscleoriginv2.

Ellipsoidmuscle:drawsverticestowardsthemusclecenter.It hasaradialrangeandnoangularrange.Becauseofitsfunction, this muscle is also known as asphincter muscle and can be defined byan ellipsoid, having a majorandtwo minor axes, theirlengthsbeinglx,ly,lz respectively.Avertexp=(x,y,z), undertheinfluenceofaparticularellipsoidmuscle,movesina similarmannerasifinfluencedbyalinearmuscle,butwithout theangularterm,a:

p=p+kr v2−p

|v2−p| (3)

4.2. Jawmotion

Jaw movementshifts facevertices a greater distance than the abstract muscles. In this situation, it ispossible that ver- ticesinfluencedbyjawmovementcanmoveoutofthezoneof influenceformusclesthatshouldalsoinfluencethesamearea (i.e.theorbicularisoris,angulardepressors,zygomaticmajors).

Consequently,thejawmodelwasimprovedbydeterminingifa vertexhadmovedoutsidetherangeofinfluenceofthesemus- cles,thenthevertexisunaltereduntil itreturns withinrange.

Thischeckneedonlybeperformedonverticesunderthecon- fluenceofthejawandatleastonemuscleanddoesnotweigh onthecomputationtime.

5. Markertracking

Marker tracking begins by localizing candidate markers withineachstereoimage.Forthisexperiment,13coloredself- adhesivebluemarkerswereusedandtheirimagelocationswere determinedwithacolorpredicateoperatingintheYUVcolor space(Barton&Delmas,2002).Thisnumberwaschosenasa tradeoffbetweencomputationalcomplexity,capturingenough non-rigidexpressionmotion,andsetuptime.Futureworkwill lookintoextendingthenumberofmarkersanditsimpactonthe functionalityofthemusclebasedsystem.

Asetofadhesivecolorstickerswereusedaslowcostmark- ers. They were chosen for their easy color detection, speed of applicationtoafacewithoutdamagetotheskin,andtheir low cost, being available from most stationery stores. From ourexperiments,commerciallyavailablereflectivemarkersthat areappliedtothefacewithspiritgumaretrickytoattachand remove, thusaddinganextraelement totestsubject prepara- tion.Theyarealsoexpensivetopurchasebecauseoftheirniche market.

(6)

Fig.3.Thefaceanimationsystemshowing(a)muscleplacements,(b)textureonly,and(c)theunderlyingmesh.

Fig.4.Markerplacementontheface,fortherealandvirtualcases.Whitepoints onthevirtualmodelindicateanchormarkers:theforehead,totheoutersideof theleftandrighteyes,andbelowthenose.

Eachtestsubjectpreparedthemselvesbyapplyingmarkersto theirfacewhilelookinginamirror.Tohelpguidethetestsubject amannequinwaspreparedwithanexamplesetofmarkersplaced onitsface.

Markerswereplacedinfaciallocationsthat bestrepresent facemotion.ThecurrentmarkertestsetupisdisplayedinFig.4.

Algorithm1. Markertrackingalgorithm

1: whileStereoimagepairsfromavideostreamdo

2: forSubregionsofimagesifmarkertemplateinitialised,elsefor entireimagedo

3: Colorsegmentationusingacolorpredicate,medianfilter,and connectedcomponentsanalysis.

4: endfor

5: Mergingnearbycomponentsofsimilarcolor.

6: Computingcentroidsofconnectedcomponentsaspotential markerlocations.

7: ifMarkertemplateinitialisedthen

8: Assignnewmarkerpositions;ifmarkertemporarilylostretain previousposition.

9: Triangulatemarkerpositionsbasedonmarkertemplate correspondences.

10: endif

11: Optionally:

12: ifSuitablecandidatemarkersfoundthen

13: Initialisemarkertemplatebasedontheirlocations.

14: endif

15: endwhile

Amarkertemplatewasconstructedandtrackedthroughthe videostream,theprocedureisgiveninAlgorithm1.Themarker templatedefinesthenumberofmarkerstobetrackedwithinthe

imagealongwiththeirpositionandvelocity.Thetemplatepro- videsrobustnessagainstanyextraerroneouslocalizationsfrom being considered. Afterdefinition,onlythe areasaroundcur- rentmarkerlocationsneedtobeprocessedandsearchedin,thus reducing computation time. Marker correspondence between stereoframescanbefoundwithrespecttothescanorderand epipolargeometryconditionofthestereocameras.Thesecorre- spondencescanthenbetriangulatedtorecoverthe3-Dlocations.

Asubsetofthetemplatemarkersweredesignatedasanchor points (shown as white points in Fig. 4). These anchors are located wherea good estimate of the head rigid motion can betaken.Theyshouldnotbeaffectedbyfacialexpressionsas theyareusedtodefinealocalheadcoordinateframe.Thecon- structionofalocalcoordinateframeisdescribedinSection5.1.

Amarkerisconsideredtohavedisappeared ifanewposi- tionthatisbelowacertaindistancethresholdcannotbefound – inthiscasetheprevious markerposition iscarried through intothecurrentframe.Inpracticethisworksadequatelyforthe temporarylossofmarkers,e.g.inthecaseofanobjectsuchas thehandbrieflyoccludingthefacewhenthefaceisnotmoving substantially.

Thepresenceofnoisewithinthesystemcausesjitterinthe triangulated3Dmarkerpositions.Noisesourcesincludeoptical distortions,imaging noiseandinaccuraciesinthecalibration.

Pastexperimentsshowedthatasmallamountofpositionalaver- aginghalvedpositionalvariance(Woodward&Delmas,2005a).

Therefore,anempiricallydetermined,temporalmeanfiltering overthreeframeswasappliedtostabilize3-Dmarkerpositions.

Thisgaveanoticeableimprovementinmarkerstability,reducing thestandarddeviation inmarker positionestimationfrom 4.5mmto1.6mmat1.2m.

Eachstereoframewastime-stampedandtherelevantdatais savedalongwiththeframe’striangulatedmarkerpointsasthe systemruns inreal-time.Alternatively,the systemallowsfor stereoimagesequencestobesavedandprocessedoff-line.

5.1. Headlocalcoordinateframeandreferencemarker positions

Inordertoestimatethenon-rigidmotionoffacialexpressions rigidheadmotionmustbeaccountedfor.Thiswasachievedby constructingalocalcoordinatesystembasedonthefouranchor markers,showninFig.4.Thecenterpointofthefourmarkers

(7)

actsasthecenterofgravity,ortranslationalcomponent.Thefour markerpointsa1,a2,a3,a4arearrangedsothatthevectorfrom a1toa2isorthogonaltothevectorformedfroma3toa4.These canbeusedtocreatethelocalcoordinateframeR=(r1r2r3)as follows:

r1= (a2−a4)

|a2−a4|, r3=r1×(a3−a1)

|a3−a1|, r2=r1×r3 (4) Thecalculationensuresthatthecoordinateframeisorthogonal.

Rigidmotionwasremovedfromeachframebysubtractingthe centerof gravity from the triangulated marker positions and multiplyingtheresultsbyR^T.

Asmallamountofmovementwasapparentforsomeofthe anchorpoints,particularlytheoneplacedabovethelip.Thiswas unavoidableforthisapproachbutwasdealtwithbyknowingthat aminimumofthreepointscanbeusedtocreateaco-ordinate frame for rigid motion. Also, for each test subject a neutral expressionwas recorded to provide default referencemarker positions(aneutralstate),forallmarkers,withinthelocalhead frame.Thisreferenceframecanbeacquiredduringthefirstfew secondsofmarkermotioncapture.Todosothetestsubjectis askedtokeeptheirfacestillandwithoutfacialmovement(head movementpermissible).Referencemarkerpositionsallowfor thecalculationofdivergencevectorsfromtheneutralstate,and arealsousedintheRBFmappingprocedureinSection6.These vectorsareusedtocreatethevirtualperformance.

6. Markertomodelassignmentandpersonalized3-D facemodelcreation

ARBFmappingwasmadebetween3-Dmarkerpointsand correspondingpointsonafacemeshbasedontheneutralref- erence marker frame andthe neutral face mesh. The marker performancecanthenbetranslatedintothevirtualmeshlocal coordinateframebygivingthe3-DmarkerpositionstotheRBF mapper(asinEquation(6)).Theresultantmappedpositionscan thenbemeasuredasdivergencevectorsofmarkerverticesfrom themeshneutralstate.

6.1. Radialbasisfunctionsformapping

RadialBasisFunction(RBF)interpolantspossessdesirable smooth properties (Farfield Technology Ltd, 2011) and have successfullymappedgenericfacemodelstosubjectspecificface data(Woodward&Delmas,2004,2005b).Theyareparticularly desirableasascattereddatainterpolantandhavebeenapplied inareassuchas approximatingnoisy data,surfacesmoothing andreconstruction(Carretal.,2001;FarfieldTechnologyLtd, 2011),imagemorphing, andevenfor facialanimation (Noh, Fidaleo,&Neumann,2000).

Thepresentedfacemappingapproachtakesauserspecified subsetofthegenericfacemodel’sverticesandassignsvaluesat theselocationsthatrepresentdisplacementsforasinglespatial coordinatebetweenthe genericmodel andtargetfacemodel.

Thesecorrespondencesareassignedbetweenfiducialpointson eachfacesurface.Thus,themappingofallpointsneednotbe

specifiedmanually;thesetwillbemuchlessthanthenumber ofpointscontainedinthesourcedataset.

ThiscanbethoughtofasspecifyingasetofNsamplesfrom a3-Dfunction,f(x),atcoordinates,X=(x1,...,x_N),xi=(xi,yi, zi),x_i ∈ R³,whosevalues,f=(f1,...,f_N),representdisplace- mentsbetweenthetwofacemodelsalongaparticularaxis.A RBF,s(x),isthenfittedtothisdatatoapproximatef(x).Thisis performedthreetimes,onceforeach spatialaxis,tocalculate 3-Ddisplacements.

ARBFhasthegeneralform:

s(x)=p(x)+^N

i=1

a_iφ(|x−xi|) (5)

where p(x)=c1+c2x+c3y+c4z is a linear polynomial term definingaplanein3-space,aiisarealnumberweight,|x−xi| specifiestheEuclideandistancebetweenxandx_iandφisabasis function.Thebiharmonic spline,φ(r)=r,isused asthe basis functionasfunctionsofthreevariablesarebeingfitted,where ristheEuclideandistance.Thebiharmonicsplineisasmooth- inginterpolatorappropriatefor3-Dobjectssuchashumanface models(Carretal.,2001).

Forthemappingofa3-Dpoint,x,fromthegenericmodel toanequivalentpoint,v,thatfitstheproportionsoftherecon- structeddata,Equation(5)becomes:

v=s(x)=p(x)+

N i=1

aiφ(|x−xi|) (6)

Thecoefficientsofthepolynomialvectortermpandthereal valued vectorweights a_i canbedetermined usinglinearleast squares.TheRBFapproachfindsamappingthatwill depend onthequalityofcorrespondencesidentifiedbetweendatasets.

Creationofthepersonalizedfacemodeliscompletedbyenter- ingthepointsofthegenericfacemodelintoEquation(6).The outputpositionsaretheinterpolatedvaluesbestapproximating theoriginalreconstructedfacedataset.

6.2. Creatingapersonalizedmodel

Results from the associated work on 3D face reconstruc- tionbyAn,Woodward,Delmas,andChen(2005),Woodward, Delmas,andGimel’farb(2008),Woodwardetal.(2006)canbe combined withthegeneric facemodel andanimationsystem describedheretocreateapersonalized3-Dmodelabletocre- atenovelexpressions.Radialbasisfunctions(RBF)wereused for anon-rigidmappingbetweenapersonal3-Dfaceandthe genericface.RBFsrequireonlyasmallnumberofcorrespon- dencesbetweendatasetsthatdescribethedifferentproportions betweenthe twofaces. Thevertexdistributionandresolution ofthegenericfacemodelmustbepreserved,sothisinterpola- tionstrategywasappropriate.Thepersonalizedfacesfoundin Section9werecreatedinthismanner.

(8)

7. Muscleinversekinematicstodrivefaceanimation Triangulated3-Dmarkerpositionscannowbemappedand appliedtothefacemodel.Changesintheirpositionsdrivethe facemusclesthroughinversekinematics.Thissectiondescribes theprocessofcreatingacompletevirtualperformancegivena setof3Dmarkers,theirmotionovertime,andafaceanimation system.

Markerdivergencesweremeasuredfromtheneutralframe, mapped into avirtual facemodel’s localspace. Eachmarker wasmatchedwithavertexofthemesh.Uponinitializationthe musclesthatinfluenceeachmarkermustbefound.Thisissimple aseachmusclehasanareaofinfluence.Whenamarkermoves, virtualmusclesmust beupdatedtodeformthe meshtoplace thecorrespondingvertexonthemeshascloseaspossibletothe marker.Inversekinematicswasusedtoachievethis.

Inversekinematicsisthecalculationofparametersforakine- maticchaintomeetadesiredgoalpositiong,whenstartingfrom aninitialpositione.Appliedhere,thedesiredendpositionwould bea3-Dmarkerpositiontrackedbythevisionsystem.Theini- tialor currentposition isconsideredtobethelocation ofthe vertexonthemeshcorrespondingtothemarker.

In the literature the vertex eis known as the endeffector ofakinematicchain.Thekinematicchainconsistsofasetof joints:namelythesubsetofmusclesorjawaffectingacertain marker. Since only the contraction values of each muscle or therotationofthejawaredealtwith,eachjointpossessesone degreeoffreedom (DOF).Adegreeoffreedom (DOF) refers toananimatableparameterforeachvertex-markerpairwithin thevirtualface,wheretheanimatableparametersarethemus- clecontractionvaluesandjawrotation.Eachvertex-markerpair willhaveanassociatedsetofmuscleswhichaffectit,theirnum- beristhevertex’sDOF.Likewise,eachendeffector,ormarker positionhas3-DOF,describingitspositionin3-space.Collec- tivelyalloftheseparametersdescribethesystemstateandtheir change provides thevirtual characterperformance. The loca- tionsofmarkerssufficetoplanfortheanimationand3-Dmotion capturedataprovidesasimplerwaytogenerateaperformance basedwhenopposedtoforwardkinematics,whereeachmuscle contractionvaluemustbedetermined.

7.1. Solutionformulation

Let=(φ1,φ2,...,φN) ∈ R^Nrepresentthevectorofjoint parameters(themuscleDOFs)ande=(e1,e2,e3)representa vectordescribingtheendeffectorDOFs,inthiscaseavertex position.Letf bethe forwardkinematicsmapof akinematic chain.ThisfunctioncomputesendeffectorDOFsfrommuscle DOFs:

e=f() (7)

Consideringasinglefacevertex,e,anditsmarkercorrespon- dence(targetposition)g=(g1,g2,g3),theinverse kinematics problemistocompute thevectorof muscleDOFs, ,sothe

vertexofafacecoincideswiththemarkerposition;e=gandthe desiredpositionalchangeofeisd=g−e:

=f⁻¹(g) (8)

Unlike forward kinematics, solving f⁻¹ is non-trivial as f is non-linearandtheremaybemultipleornopossiblesolutions.

AniterativenumericalsolutioncanbefoundwiththeJacobian matrix J of f, i.e. the function canbe followed to aminima throughtheknowledgecontainedinJ:

J(e,)= de d =

⎡

⎢⎢

⎢⎣

∂e1

∂φ1

··· ∂e1

∂φ_N ... . .. ...

∂e_M

∂φ1 ··· ∂e_M

∂φ_N

⎤

⎥⎥

⎥⎦

(9)

where M=3 tospecify a vertex position inR³ andN is the numberof musclesaffectingavertex(which isautomatically calculatedduringinitialization).Foranincrementalstepinpose, thechangeinendeffectorposition,causedbyavariationinjoint DOFstate,canbeestimatedbyfirstorderapproximation:

d≈ de

d·=J(e,)·=J· (10) Hence,forinversekinematicsanupdatevalueiscalcu- latedbasedon:

≈J⁻¹·d (11)

whereifpossiblee≈d.Ingeneral,Eq.(11)cannotbesolved uniquely and various strategies can be employed for choos- ing .Since f isnon-linear, theJacobian approximation is validonlynearthecurrentconfigurationofe,,therefore,the approachtakenistoiterativelyupdatejointDOFsbyvaluesof

untilEquation(8)issatisfied.

Afastandsimpleiterativegradientdescentsolutionwascho- sen that uses the Jacobian transpose of marker positionwith respect tomusclecontraction(orjaw rotationarounditsaxis) (Baxter, 2000). This method is computationally inexpensive sincethereisnoinversionof theJacobianmatrixtoperform.

Italsolocalizes computationssincethe DOFscanbeupdated beforetheentireJacobianiscomputed.

7.2. TheJacobianTransposemethod

ThetransposeofJisusedinsteadoftheinverse:

=J^Te (12)

wherethestepeisgivenas:

e=β(g−e) (13)

inthe direction of thegoalmarker positiong. The scalefac- torβisusedtolimitthestepsizesincenon-smoothfunctions are involved.The value ofβ could be tunedfor each marker sincesmallerstepsizesmustbetakenwhenthenumberofinflu- encingmusclesincreasesandinturnthecomplexityofmarker

(9)

movement. Theside effectof thisis anincrease inthe num- berofiterationsrequiredforasolution.TheJacobianTranspose methodis justified in terms of the principal ofvirtual work, whichprovidesarelationshipbetweenanendeffector move- mentandaninternaljoint-byanalogythedistancetothegoal g isconsidered tobe aforcepulling e, andjoint torques are equated to changes in joint DOFs, (Baxter, 2000; Buss

&Kim, 2004).Thisanalogy isinexactandnon-linearitypre- vents the end effector reaching the goalin exactly onestep, whichiswhytheprocessmustbeiterated.Ifβissmallenough thenthemethodalwaysmovesclosertothegoalg:bytaking d=g−eandEquation(10),d≈JJ^Te=βJJ^Td,itholds that JJ^Td·d≥0,i.e.themovementofewillalwaysbeinadirection lessthan90degreeswithd(Buss,2004).

TheJacobianmatrixcanbebuiltbyloopingthroughthejoints inthekinematicchainsandevaluatingtheirDOFs.Itrelatesthe changein endeffectorposition withthechange inkinematic musclecontractionvalues.EntriesofJspecifyDOFsanddepend onthetypeofjoint–muscularcontractionisalinearjointand jawrotationisanangularjoint.

Foramuscle DOFavertexeisdrawtowardthe muscle’s origino.ThepossiblevaluesfortheDOFareboundedasmuscles haveaphysicalcontractionrange.AchangeintheDOFφgives amovementalongthisdirection:

∂e

∂φ =(o−e) (14)

A1-DOFrotationaljointmodelsjawmotion,itsentryinthe Jacobian measureshow theend effector,e, changes during a rotationaboutitsaxis:

∂e

∂φ =a×(e−r) (15)

Therateofchangeinthepositionofeisinatangentialdirection totherotationaxisa,scaledbythedistanceoftheaxistoe,r. Equations(14)and(15)formcolumnsofJ.

7.3. Pseudo-codeformuscleinversekinematics

Pseudo-codeisgiveninAlgorithm2;markerCountgivesthe numberofmarkersbeing trackedandaffectedMuscleCount(j) specifiesthenumberofmusclesaffectingaparticularmarkerj.

Algorithm2. Muscleinversekinematics

1: forallisuchthat0≤i≤iterationsdo 2: foralljsuchthat0≤j≤markerCountdo

3: forallksuchthat0≤k≤affectedMuscleCount(j)do 4: EstimateJacobiantransposeentry,J^T_k,from

Equations(14)or(15)dependingonmuscletype.Matrixis implicitasentriescanbeprocessedinorder.

5: Pickapproximatesteptotake-asinEquation(13).

6: Computechangeinmuscle(joint)DOF-asin Equation(12).

7: ApplychangetoDOF-φk=φk+φk. 8: Contractvirtualmusclewiththenewparameter

valueφktoobtainnewpositionforej.

9: endfor

10: endfor

11: endfor

Theresultinganimationcanbesavedtofileorprocessedin real-timedependingonthe processingpowerofthehardware platform.

8. Benchmarkingagainststateofthearttechniques 8.1. Markervsmarkerlessﬁducialpointstracking

Weperformedadditionalexperimentstocomparetheaccu- racyofmarker-basedtrackingwiththestateofthearttechniques in2-Dand3-Dfacefeaturepointsextraction,namely,Active Appearance Models (Edwards,Taylor,& Cootes,1998), 2-D ConstrainedLocalModels(Wang,Lucey,&Cohn,2008)and 3-D CLM (akaCLM-Z) (Baltrusaitis,Robinson,&Morency, 2012).Wefirstcreatedanewdatabaseforthecomparisonexper- iment which includes 7 subjects, each recording two similar synchronizedstereovideo sequences– onewithmarkersand anotherwithout.Eachvideoshowsthesubjectgoingthroughthe sixuniversalexpressionsofhappiness,sadness, surprise,fear, disgustandangerplusaneutralface(seeFig.5).Thelengthof eachsequencerangedfrom336frames(11.2s)to528frames (17.6 s),averaging 442 frames(14.7s)for acombined 3000 images. Thesequenceswere recorded inaquicklysetupand relativelycontrolledenvironmentwithgoodlighting,afeature- lessbackground,agoodquality3-Dcamera(FujiFilm3-DW3) andallthesubjectswereplacedinthesamelocation.Following the frameworkdeveloped inWoodwardet al.(2012)andour latestresultsinstereomatchingpublishedinGimel’farb,Gong, Nicolescu,andDelmas(2012),thestereocamerawascalibrated toremovelensdistortionandrectifytheimages.Stereomatch- ing(here1-DBeliefPropagation)wasthenappliedtoproduce depth maps.The latterswerebothbothused forCLM-Z and toobtain3-Dcoordinatesofanypointonthefaceandthe3-D coordinatesofour13markersmodel.

8.1.1. ActiveAppearanceModel

Foreachtechniqueweobtainedcodeformthewebanddid noretrainonourimages(sinceoursystemiftraining-free).We selected the code offered by T. Cootes http://personalpages.

manchester.ac.uk/staff/timothy.f.cootes/software/am toolsdoc/

index.htmlas theAAMtool-kitcomes withtraining toolthat allows manually drawing of contours on a series of facial images; andthenadaption ofthe contoursetasthe sequence evolves.Thisallowedaneffectivetrainingofthisimplementa- tiononasetof theauthors’faceimages.Theresultingmodel was abletomatchneutral facesquitewell(sinceneutral was thefirstexpressioninthesequence),butfailedeachtimeanew expressionwasmade(e.g.seeFig.6).Thegeneralimpression wasthatalotoftedioustrainingwouldberequiredtogenerate agoodmodeleachtimenewanewfaceistobeprocessed.

8.1.2. 2-DCLM

We evaluated the 2-D CLM version implemented by Yan, Xiaoguang, available at https://sites.google.com/site/

xgyanhome/home/projects/clm-implementation, which followed:Wangetal.(2008)andCristinacceandCootes(2008).

We used the model provided with the source code, thus no

(10)

Fig.5.Imagesextractedfromourcomparisondatabase:testsubject:Trevor.Fromlefttoright,showingthe7universalexpressionsofneutral,happiness,sadness, surprise,fear,disgustandanger.

Fig.6.AAMtracking:testsubject:Trevor.Fromlefttoright,showingAAMtrackingresultsforaneutral,happyandsurprisefacialexpressions.

training was needed.If notinitalizedclose tothe actualface it wasfound that that theinitial template didnot fit wellthe firstimagefacefeatures.Howeverafteracoupleofframes,the template evolved andappearedtofind the correct orientation oftheface.Thisimplementationwasslowtoadapttochanges infacialexpressionsanddidnottrackmouth movementvery well.Whileverylittleadditionsweremadetothesourcecode to run properly, manual hardcoding of the initial template locationroughlyinthevicinityofthefacefeaturesofthefirst image(achievedbychangingconstantshardcodedinthesource code)wasnecessary.Ourinitialtestsfound2-DCLMproneto errorsduetotheexistenceoffacialhair,lightingandskintone variationsas well as an open mouth throughoutimages (e.g.

seeFig.7).

BothAAMand2-DCLMfailedtocorrectlytrackfacefea- turesonalimitedsampleofimagesofourdatabaseandwere excludedfromthecomparisonwithourmarker-basedapproach leavingCLM-Zasthelaststate-of-the-arttechniquetocompare to.

8.1.3. CLM-Z

OurCLM-Zimplementationwaskindlyprovidedby².While thislibrarydidcomewithseveraldifferentimplementations,we hadtowriteourownfront-endprogramtogetthesourcecode torun through allthe images of ourdatabase and renderthe relevantpointstoadatafile.TheCLM-Zimplementationcame withapre-trainedclassifierandthusnotrainingwasrequired.

2 http://www.cl.cam.ac.uk/research/rainbow/projects/clmz/

Upon start up, the algorithm grabs the first image in the sequence toinitialize itstemplate. Thismaytakearelatively longtime,averagingupto2minforamid-rangePC.Onceini- tialized,thesubsequentupdatesandfeatureextractionthrough successive frameswasquitefast, achievingclosetoreal-time performance.

Weusedanimplementationof CLM-Zwithagenericface templateforthemarker-lesssequences.CLM-Ztrackedpoints roughlyintheregionoftheappropriatefacialfeatures.CLM-Z copedwellwithheadtiltingintheneutralexpressionstate.There werenojumpsinfeaturepointtrackingonstaticsequencesas mostpointsremainedintheneutralstate.However,wefound

Fig.7.2-DCLMfacefeaturedetection:testsubject:Trevor.Fromlefttoright, showing2-DCLMtrackingresultsforaneutralandsurprisefacialexpressions.

(11)

Fig.8.ExamplesoftrackingerrorswithCLM-Zacrossfourdifferentsequences.Oftentheeyesaremisaligned,facialhairconfusesthesizeandlocationofthe mouth,andthefineshapeofthemouthisalsolost.

thatCLM-Zfrequentlyfailedtobeinperfectsynchronization withthe images, oftenproducing lagging or misaligned fea- tures (see Fig. 8). On sequences with full facial expressions evolving, the detectedeyeshape seemedto bemuch smaller thanthe actualeye. Aswell, themouth was ofteninaccurate withfacialhair confusedasamouthinsomecases.Tracking stopped followingthemouth forwide smilesor if themouth waswideopen,e.g.surpriseactions(seeFig.9).Acrossthe7 sequences,CLM-Zfailedtoaccuratelyfindeyeshapeandloca- tionin33%oftheimages(from5%to100%errorsacrosseachof the7sequencesformouthshapeextractionandbetween0%and 100%erroracrosseachofthe7sequencesforeyeshapeextrac- tion), 67% for mouth shape and location, respectively. None of thesequencesrecorded perfect extractionof facefeatures, necessitatingtheuseofmanualextractionforsomecases.CLM- Zfailedwhenfacial hairwasencountered; eyedetectionwas weakinmostcasesandmouthcornerdetectionfailedwithwide smilesorunexpectedshapes,afeaturecommontomostcontour basedandappearancebasedtechniqueswhendealingwiththe mouth.

8.1.4. Marker-basedtracking

For our marker-based system, we used our sequences withmarkers. The markers were first segmentedusing color

Fig.9.3D-CLM(CLM-Z)facefeaturedetection:testsubject:Trevor.From lefttoright,showingCLM-Ztrackingresultsforaneutralandsurprisefacial expressions.

segmentation inHSV colorspace,thena basicblobtracking algorithmwasusedtotracktheblobsacrosssubsequentframes.

Onalltestsequences,themarker’sbluecolorwasclearlydis- tinctfromfacialfeaturesallowingittobetrackedacrossframes easilyandwithhighaccuracy.Markerswereonlylostwhenthe subject’sheadwasrotatedsuchthatthemarkerwashiddenfrom thecamera,orwhenthesubject’sheadmovesoutofthefieldof viewofthecamera.Incontrasttothebest-performingCLM-Z, only3.8% ofthe 3000images of ourdatabase failedtohave all 13markers trackedacross bothcameras,with allbut two sequenceshaving 100%perfect extraction.Overall, ofall the markerlessapproachestried,CLM-Zhadaslowstart-uptime butperformed thebest,sinceit waseasytoseethe dominat- ingexpression.AAMperformedtheworstduetopoortraining.

Thepointsextractedinallthreemodelswerefar fromperfect withthe most accuratefiducial pointsextractionachievedby ourmarker-basedapproach.Facialhair,lightingandskintone aswellaspronouncedopenmouthexpressionsheavilyaffected facefeaturetrackingformarkerlesstechniques.Ofnoteisthat our markers were createdfrom scratch using bluepaper and skin-friendlyglueataclosetozerocost,incontrastwithmore professional marker requirements. Most failed images origi- nated from blue colorsaturation around some markersagain consideringafastsetupusingoff-the-shelftripod,lighting,cam- erasettings,backgroundandsubjectspositioning.Followingthe frameworkdevelopedinWoodwardetal.(2012)andourlatest resultsinstereomatchingpublishedinGimel’farbetal.(2012), the stereo cameras were calibrated to remove lens distortion andrectify theimages. Thecameras wereautomatically syn- chronized over the firewire bus. Thisensured stable tracking ofmarkersandnoclock driftoccurred.Stereomatching(1-D BeliefPropagation)was thenappliedtoproducedepthmaps.

Asmentionedearly,thiswasusedforCLM-Z,toobtainthe3-D coordinatesofanypointonthefaceandthe3Dcoordinatesof our13markersmodel.

9. Experimentalresultsandsystemanalysis

AselectionofresultsareshowninFig.10;threetestsubjects performed the six universally recognizable facialexpressions (Ekman&Friesen,1971)infrontofthesystem,apersonalized 3-Dmodelwascreatedfromtheirfacesandtrackingdatadrove thefacialanimationsystem.

Experiments show that the proposed system is capable of reproducing facial expressionsfrom marker motion.From

(12)

Fig.10.Expressioncreationforarangeofsubjectsand.Foreachsubfigure,fromlefttoright,showingtheoriginalleftimagefromthestereopair,thevirtualresult, andtheassociatedmeshandmotionvectorsofthemarkerpointsfromaneutralstate.

running the system it was found that different test subjects articulatedthesameexpressionusingdifferentmuscles.Also, itwasdifficultforatestsubjecttoperformanexpressionwhen noemotional tiewas involved.Thinkingabout whatmuscles tomoveandgeneratinganexpressionsometimesrequiredself experimentationandlookingatone-selfinthemirror.Therefore itiseasytounderstandwhythereisaneedfordirectorsinmovie performancecapturesituations.

Detailsinimportantareasofthefacethatarenotcurrently modeledincludetheeyelids,lips,teethandinnermouth.This

impacts onthe visualaccuracyof thevirtual expressionsand futureworkwouldlooktoimprovethefacemodelintheseareas.

ThislossofdetailcanbeseeninFig.9,Matthew,whereimpor- tant cuesfor the expressionsare conveyed through the eyes.

The surpriseexpressionfortestsubjectMatthew(Fig.10(i)), clearlyshowsthelossofinformationinthemouthregionand howimportantthisregioncanbeforconveyingexpressions.

For Matthew’s disgust expression, inFig. 10 (j), shadow- ing cuesare lostaroundthe flangesof thenose inthevirtual expression,despitethemotionvectorsbeingcorrectlycaptured.

(13)

Duringaperformance,somepeoplehadbettercontrolover certainmusclesof their facethan others.An example of this wasthecontroloftheinnerandouterfrontalismusclesofthe foreheadthatinfluencetheeyebrows,ortheindividualcontrol ofthelabiinasiontheflangesofthenose.FortestsubjectAli therewaslittlemovementinthemotionvectorsforhisexpres- sions, reflecting intheir poorerrepresentations in the system whencomparedtoothertestsubjects.Conversely,testsubject Alex’sexpressionimagesandmotionvectorsshowedlargescale movement,reflectingthevisualcorrectnessinexpressionrepro- ductionshowninFig.10 (c)–(f).

Finetuningof the presetmuscle locations andparameters when mapping a new face model was sometimes needed to improveresultsortocorrectmuscleswithtoogreatanactivity onthemesh.Anexampleofthisistheasymmetryinfacemove- mentfortestsubjectAli’seyebrowswheretherighteyewould oftenappearslightlylargerafteranexpressionwassimulated(as seeninFig.10(a)).Incorrectasymmetryintheeyebrowsandin thesadnessexpressionwasalsonoticeableforAlex(Fig.10(d)).

Thisasymmetrycanbeunderstoodbyconsideringthemapping procedure.Inrealityeachperson’smuscleshaveslightlydiffer- entlocationsandstrengths,thusmoveinslightlydifferentways -asmallamountofpositionalandmorphologicaldifferenceisto beexpectedbetweenthesamemusclesindifferentpeople.One waytoimprovethiswouldbetosetplausiblelimitsofmuscle contractionrangesandletthesystemgothroughatrainingphase whereitwouldrescaleitsactivitybasedonasubjectperforming arangeofexpressions.

Oneofthemainissueswiththeexpressionsystemisthatthere arepotentiallymultiplesolutionsforavertexpositionwhenitis undertheinfluenceofmorethantwomuscles,aneffectthatis amplifiedwithnoisydata.Inthissituation,equalcontributions ofmuscle contraction couldbeassumed, butthismaynotbe the case in reality if some muscles are stronger than others.

Thisambiguitydoesnotexistwhenhandcraftedexpressionsare createdfor animation.Two opportunitiesfor improvingupon thisissuearethecreationofamoreadvancedanimationsystem thatmayincorporatepriorlearning,e.g.theuseofarestricted setofbasisexpressions,andinvestigatingtheeffectofadding moremarkers.

Amarkerbasedsystemcanonlypickupthesparsemovement ofthefacesincetrackedpointsarenotdenseenoughtoaccount

forfinemovement.Thiswasdifficultassomepeoplenaturally hadminimalfacemotionforsomeexpressions.

Cameranoisewasmoreapparentindarkregionsandaround edges.ColorpredicatedetectionusedtheYUVcolorspace,so lowintensity regionsweremoreambiguousandwerethresh- olded out of consideration. Noise in the detection of 2-D marker positions affected the quality of 3-D triangulation, giving fluctuations in the location of marker centroids. This was dealt with through temporal smoothing as described in Section 5. The majority of face motion can be described in 2-D andtheerrorfrom thetwo camerasinfluencesthedeter- mination of marker location, however a 2-D system would find it much harder to account for rigid 3-D head motion andany slight shiftingof perspective in theface as the head moves.

The RBF approach provided an effective means of mapping markersto aface model. There isno restriction on the choice of face model and it does not have to represent the testsubject.RBFisinexpensiveandthemappingcoefficients for aparticular setupneed only be estimated once based on a neutral face expression. However, asymmetries in the 3- D mapping process may affect expressions. This is due to positionaldifferencesintheplacementofmarkersonatestsub- ject’sfacewithrespecttothe selectedvertices ofthe generic model.

10. Expressionestimation

Facial expressionscan be handcrafted using the face ani- mationsystemandsettingmusclecontractionparameters. An experimentwasperformedtotestifthemarkertrackingsystem could be used toestimateagiven expressionfroma prebuilt set of expressions. The setof six universalexpressions were modeledwiththeanimationsystemandsaved.Thetestsubject thenaimed togeneratetheseexpressionsas best as possible.

Fig.11displaysthesixuniversalexpressionsthatwerecreated withtheanimationsystem.

A simple method of expression estimation from marker motioncapturecanbeperformedbytakingtheEuclideandis- tancebetweenthecurrentmusclecontractionvectora,derived fromthecurrentmarkerconfiguration,andacollectionofpre- built expressionsb_i, i∈ [1, 6], described in the same form.

Fig.11.Thesixuniversalexpressionsprebuiltintothesystem:(a)happiness,(b)sadness,(c)surprise,(d)fear,(e)disgust,(f)anger.

(14)

0 0.5 1 1.5 2 2.5

2.66 2.31 1.99 1.66 1.31 0.99 0.66 0.31 0.00

Time (s)

Distance

a. Happiness

0 0.5 1 1.5 2 2.5

2.66 2.33 1.99 1.66 1.33 0.99 0.66 0.33 0.00

Time (s)

Distance

b. Sadness

0 0.5 1 1.5 2 2.5

2.66 2.33 2.00 1.66 1.33 1.00 0.66 0.33 0.00

Time (s)

Distance

c. Surprise

0 0.5 1 1.5 2 2.5

2.66 2.33 2.00 1.66 1.33 1.00 0.66 0.33 0.00

Time (s)

Distance

d. Fear

0 0.5 1 1.5 2 2.5

2.67 2.34 2.00 1.67 1.34 1.00 0.67 0.34 0.00

Time (s)

Distance

Happiness Sadness Surprise Fear Disgust Anger e. Disgust

0 0.5 1 1.5 2 2.5

2.67 2.33 2.00 1.67 1.34 1.00 0.67 0.34 0.00

Time (s)

Distance

f. Anger

Fig.12.Euclideandistancesofmusclecontractionvectorfrompresetexpressions;testsubjectAlex.

It should be noted that because each individual expresses themselvesdifferently,futureworkshouldcreateanextensive databaseofprebuiltexpressionsthatcanbetterrepresentvaria- tionswithineachclass.

Fig.12displayscurvesofeuclideandistanceovertimefrom eachpresetexpressionforasetofcapturedsequencesinwhich

thetestsubject,Alex,performedthehappiness,sadness,surprise, fear,disgust,andangerexpressions.Thelesserthedistancethe closeracapturedmarkersequenceistoapresetexpression.

Thetestsequencesbeganwiththesubjectinaneutralface stateandthenmovedtowardanexpressionatmaximalmagni- tude.Itshouldbenotedthatthetailofthecurvesinthesequences