Availableonlineatwww.sciencedirect.com
Journal of Applied Research and Technology
www.jart.ccadet.unam.mx JournalofAppliedResearchandTechnology15(2017)61–77
Original
A low cost framework for real-time marker based 3-D human expression modeling
Alexander Woodward
a, Yuk Hin Chan
b, Rui Gong
b, Minh Nguyen
b, Trevor Gee
b, Patrice Delmas
b, Georgy Gimel’farb
b, Jorge Alberto Marquez Flores
c,∗aDepartmentofGeneralSystemsSciences,TheGraduateSchoolofArtsandSciences,TheUniversityofTokyo,Japan
bDepartmentofComputerScience,TheUniversityofAuckland,NewZealand
cCentrodeCienciasAplicadasyDesarrolloTecnológico,UniversidadNacionalAutónomadeMéxico,CircuitoExteriorS/N,CiudadUniversitariaAP70-186, C.P.04510,México,D.F,Mexico
Received7April2016;accepted18November2016 Availableonline22February2017
Abstract
Thisworkpresentsarobust,andlow-costframeworkforreal-timemarkerbased3-Dhumanexpressionmodelingusingoff-the-shelfstereo web-camerasandinexpensiveadhesivemarkersappliedtotheface.Thesystemhaslowcomputationalrequirements,runsonstandardhardware, andisportablewithminimalset-uptimeandnotraining.Itdoesnotrequireacontrolledlabenvironment(lightingorset-up)andisrobustunder varyingconditions,i.e.illumination,facialhair,orskintonevariation.Stereoweb-camerasperform3-Dmarkertrackingtoobtainheadrigidmotion andthenon-rigidmotionofexpressions.Trackedmarkersarethenmappedontoa3-Dfacemodelwithavirtualmuscleanimationsystem.Muscle inversekinematicsupdatemusclecontractionparametersbasedonmarkermotioninordertocreateavirtualcharacter’sexpressionperformance.
Theparametrizationofthemuscle-basedanimationencodesafaceperformancewithlittlebandwidth.Additionally,aradialbasisfunction mappingapproachwasusedtoeasilyremapmotioncapturedatatoanyfacemodel.Inthiswaytheautomatedcreationofapersonalized3-Dface modelandanimationsystemfrom3-Ddataiselucidated.
Theexpressivepowerofthesystemanditsabilitytorecognizenewexpressionswasevaluatedonagroupoftestsubjectswithrespecttothesix universallyrecognizedfacialexpressions.Resultsshowthattheuseofabstractmuscledefinitionreducestheeffectofpotentialnoiseinthemotion capturedataandallowstheseamlessanimationofanyvirtualanthropomorphicfacemodelwithdataacquiredthroughhumanfaceperformance.
©2017UniversidadNacionalAutónomadeMéxico,CentrodeCienciasAplicadasyDesarrolloTecnológico.Thisisanopenaccessarticleunder theCCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords:Facialmotioncapture;Markerbasedmotioncapture;Expressionrecognition;Lowcost;Stereovision
1. Introduction
This work presents a complete system for modeling the appearanceandexpressionsofavirtualcharacterbasedonthe facialmovementofahumansubject.Stereoweb-camerasper- formmarkerbasedmotioncapturetoobtainheadrigidmotion andthenon-rigidmotionofexpressions.Tracked3-Dpointsare thenmappedontoa3-Dfacemodelwithavirtualmuscleani- mationsystem.Muscleinversekinematics(IK)updatesmuscle
∗Correspondingauthor.
E-mailaddress:[email protected](J.A.MarquezFlores).
PeerReviewundertheresponsibilityofUniversidadNacionalAutónomade México.
contraction parametersbased onmarker motion tocreate the character’sexpressionperformance.
Adesigngoalwastoallowany 3-Dhumanfacemodelto be used withany setof facemotion capture data.Thisgives theabilitytoretrofitexistingmodels.Italsoallowsthesystem todriveawidevariety offacemodels thatdo nothavetobe humaninnature. To dothis, mapping of markersinto anew facespacewasperformedwithradialbasisfunctions(RBFs), wheretherequiredcorrespondencesbetweenmarkersandver- ticesofthemeshwerepredefined.Thisproceduretakesonlya fewminutesandneedonlybedoneonceforacertainmarkerset andfacemodel,sincethemarkertemplate,markercorrespon- dences,andRBFmappingcoefficientscanbesavedforfuture use.
http://dx.doi.org/10.1016/j.jart.2017.01.002
1665-6423/©2017UniversidadNacionalAutónomadeMéxico,CentrodeCienciasAplicadasyDesarrolloTecnológico.Thisisanopenaccessarticleunderthe CCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Theadvantagesofthesystemaremultipleandarewithexist- ingsystems,describedinSection2andcomparedinSection8.
Overall,oursystemislow-costasonlystandardhardwareand camerasarerequired,withnoneedforspeciallighting,making thesystemavailabletoawiderangeofusersandsituations.Ithas alowset-uptimewithnoneedforlongandrepetitivetraining whenusersarechanged.The systemisrobustunderdiffering conditions,i.e.illumination,facialhair,skintone.Despiteusing off-the-shelfself-mademarkers,itgenerateslessnoiseintracked pointsandismorereliablethanmanymarkerlessapproaches includingthosetriedinSection8.
Ouranimationmodelusesabstractmuscledefinition,there- foreitprovidesausefulconstraintonpossiblefaceexpressions andthiscanhelpdealwithnoiseinthe motion capturedata.
Also,theabstractmuscledefinitionallowsmappingandanima- tionbetweendifferentfacemodels,offeringastraightforward waytoanimateawidenumberof 3-Dmodels withthe same animationdata-e.g.capturingahumanfaceperformanceand applyingittoavirtualanthropomorphicanimalface.
Wealsocombineareducedcharactercreationtimebyusing virtualmuscles,anovelwayofcombiningmarkertrackingwith afaceanimationsystemusingmuscleinversekinematics(IK) anditslowcosthardwarerequirements,capableofrunningon standardhardware.OurframeworkrequiresonlyastandardPC withweb-camerasandcoloredmarkers(self-adhesivelabels), availablefromstationerysuppliers.Thisfulfilsamajordesign goaltomakethesystemsuitableforend-userenvironments.As anextension, providedminimalchanges,aMicrosoft Kinetic couldbeusedtoextractdepthandmarkerpositionsthatcould bemappedtoourcurrentsetupinsteadofstereocameras.
Marker based motion capture was chosen for its ease of motiontracking,dealingwithnoisyenvironmentsandproviding lowcomputationalcostsandasimplealgorithmicformulation overmarkerlessbasedapproaches.Apairofstereocameraswas chosenoverasinglecameraapproachtoaccuratelyestimate3-D markerpositions.Inversekinematicsiswidelyusedinrobotics as atool for animating articulated figures andto reduce the amountofdataneededtospecifyananimationframe(Welman, 1993).WefocusontheJacobiantransposeapproachtoinverse kinematics,Baxter(2000),foritscomputationaltractabilityand easeofimplementation.Thisresearchaimstocreateacomplete systemfor expressionmodeling,toinvestigate theexpressive powerofaminimalsetofmarkers,andtopursueananimation approachthatismodelbasedinsteadofimagebased.
Itishypothesizedthat givenasuitablycomplex animation systemaminimalnumberofinputmarkerpositionsareneededto createrealisticexpressions.Thisassumesthatnoisehasminimal influenceonaccuracyandcanbesubsumedbytheanimation system.ThisworkparallelsChoe,Lee,andKo(2001),however theirapproachrequiredfaceandmotiondatafromthesametest subject.Inoursystemradialbasisfunctions(RBFs)wereused toalignandpairmarkerdatawithcorrespondingpointsonan arbitrary3-Dhumanfacemodel.Thisdecouplesthesubject’s faceshapefromtheshapeofthe3-Dmodel.Inadditionitwas foundthatthe3D model’sshapedifferednoticeablyfromthe actualsubject’s marker locations. Therefore using RBFswas importantforobtainingvisuallyrealisticresults.
Thisarticleisstructuredas follows:firstly,relatedworkis presented inSection2followedbyadesignoverviewinSec- tion3,listingthecorecomponentsofthesystem.The3Dface animationsystemisdescribedinSection4.Thenadescription ofthemarkertrackingprocessisdescribedinSection5.Marker mapping betweentrackedpointsanda3Dfacemodelisthen describedinSection6.Thisincludesanovelwayofcreatinga personalizedfacemodelfrom3Dreconstructiondata.Anovel methodtodrivefaceanimationthroughageneralmuscleinverse kinematicsframeworkisthengiveninSection7.Aselectionof results andanalysis ofsystemperformanceexplorethe effec- tiveness of the generatedexpressionsin Section9. Then, the systemsabilitytorecognizeexpressionsistestedinSection10.
Finally,theconclusionandfutureworkarepresented.
2. Relatedwork
Thereareanumberoffacialexpressioncapturetechniques thatcanbegroupedintoeithermarkerormarkerlessapproaches.
Marker basedapproacheshavebeen themainstayof industry applications, duetotheir reliabilityintrackingandlowcom- putational cost.On the other hand,marker based approaches are algorithmicinnatureandofferthepossibilityoffreeinga performerfrom requiringalengthysetupprocedure.Because of this, a greatdeal of researchhasbeen placed ondevelop- ingthem.Furthermore,(facial)motioncapture’simportancefor animationhasresultedinitbecominganintegralcomponentof commercial3-DmodelingpackagessuchasMaya(2013)and 3Ds(2013).
Expression retargeting refers to techniques that take the expressive performanceof onesubject andretarget it ontoa 3-Dmodelof differentproportions.Thisisusefulforprojects thatusefantasticalcreaturesasavatarsthathaveanthropomor- phicfacialexpressions.Acommonapproachistouseasetof expressionbaseshapes,oftencalled“blend-shapes”,whichare blendedtogetherandselectedbasedonretargetedmotiondata, e.g. Liu, Ma,Chang, Wang, andDebevec(2011) andWeise, Bouaziz,Li,andPauly(2011a).Asanotherexample,thework of Stoiber,Seguier,andBreton(2010)usedadatabaseof2-D facemotiontocreateasetof“bases”-ourapproachiscompu- tationallysimplerandmoreaccessible,butour3-Dexpression data couldbe used insuch aframeworktogive moredepth information.
Famous examples of marker based motion capture canbe foundinthemoviesPolarExpress(thefacialexpressioncapture of Tom Hanks’character), andinAvatar, whereanumberof painted markerswere trackedby miniaturecameras mounted ontotheactors.Interestingly,thefacialanimationforthepopular character Gollumof the Lordof the Ringstrilogy, wasartist created andonlyinspired by the facial performance of actor AndySerkis(Gollum,2013).
OneexampleofamarkerbasedapproachistheLightStage, used insuch moviesas Spiderman2. The basicapproach,as describedin(Hawkinsetal.,2004),used30strobelights,placed inamoveablesemicirculararc,whichwereusedtocapturethe reflectancepropertiesofanactor’sface.Additionally,3Ddataof theactor’sfaceshapewasrecordedunderdifferentexpressions.
Markermotioncapturewas thenused todriveaperformance whichblendsandwarpsthedifferentexpressionstogether.
The Playable Universal Capture system, described in Borshukov,Montgomery,andWerner(2006),presentsanother technique for facial animation.A high-resolution scan of an actor’sfacewastaken,whichwasthenmappedandanimated using separatelyrecorded marker based motion capture data.
Facetexturewasacquiredasavideostreamandmappedtothe 3-Dmodel, synchronizedtothemotion captureperformance.
Anumber of facial expression clips can be smoothly cycled between,butoverallthisisnotanapproachwhosemotiondata canbeeasilyretargetedtootherfacemodels.
Afewexamplesofsuccessful markerlessfacemotion cap- tureapproachesareMova(2004),amarkerlessfacialexpression capturesolutionthat requiresthe application ofspecial phos- phorescent make-up. The work of Chai, Xiao, and Hodgins (2003), where a database of motion capture was correlated withtracked facefeature pointstocreate newperformances.
Andlastly,thewellknownActiveAppearanceModel(AAM) (Cootes,Edwards,&Taylor,1998;Xiao,Baker,Matthews,&
Kanade,2004)approach.Here,textureandshapevariationwere modeledthroughpriorlearningonadatabase.Expressiontrack- ingismade possibleifsets of expressionsare alsoannotated inthedatabase.However,thecomputationalcomplexityofthe approach,alongwiththetrainingstage,aremuchhigherthanthe approachpresentedhere.Also,theproportionsofunseenfaces mustsuccessfullylie withinthe expressionspaceconstructed fromthedatabaseforthesystemtoworkproperly-oursystem isnotrestrictedinthissense.
The MirrorMoCap system by Lin and Ouhyoung (Lin &
Ouhyoung,2005)usesmorethan300fluorescentmarkersanda mirrorsystemtoaccuratelytracktheshapeoftheface.Theuse ofthemirrorsystemallowsawiderangleofthefacetobecap- turedwithasinglecamera.Thesystemoperatesatsub-video framerates, and requires specialised illumination andmirror equipmentsetup.
Sifakis,Neverov,andFedkiw(2005)presentedafacialani- mationmethod using inverse kinematics similarto our work here.Howevertheirsystemdoesnotdealwithreal-timeexpres- sioninformation from cameras.Their system also requires a muchhigher numberof datapointsonthe facemodelandis designedforusewithprofessionalmotioncapturedata.
The Faceshift system by Weise, Bouaziz, Li, and Pauly (2011b)usesadepthandtexturesensor(e.g.Kinect)toenable facial expressioncontrol of a digital avatarin real-time.The systemisnon-intrusive,requiringnomarkerstobeplacedon thesubject, butrequirestraining tomodelasubject’s expres- sions.Thetrainingstepadaptsauser’sexpressionstoageneric blendshapemodelandisperformedoff-linepriortorecording thedesiredperformance.Aftertraining,thesystemiscapable ofreconstructingfinedetailsofanexpression,suchasthelips andeyebrowshapeandmovement.Thesystemisavailableasa commercialproduct1withasubscriptionfeeof$1500USDper year.
1 www.faceshift.com
Weobtainedatrialversionoftheproducttoevaluateitseffec- tiveness. Training requires theuser toperform 21 predefined facialexpressions,namely,neutral,open mouth,smile, brows down,browsup,sneer,jawleft,jawright,jawfront,mouthleft, mouthright,dimple, chinraised, kiss,lipfunnel,frown,‘M’, teethshown,puff,chew,lipdown.Theuserisrequiredtorotate the headtotheleft andright duringthe expressionrecording toobtaindataforthesideofthefacemodel.Duringourtest- ing,wefoundthatanumberofexpressionssuchas‘lipdown’
weredifficulttoperformastheyrequireunnaturalmusclemove- ments. Howeverthedevelopersmention thesystemwillhave morenaturalexpressionstotraininfutureversions.
Overallthe training tookbetween15and30min. Wealso foundthatitwassometimeshardtoproperlyreplicatesomeof the21predefinedfacialexpressions.Thiscouldleadtobadly trained and represented 3-D face, with far from satisfactory results.Withaproperlytrainedsystem,wereplicatedthesixuni- versalexpressions.WeobservedthattheFaceshiftsystemisin generalfluid,responsiveandcapabletotracksomeexpressions.
Theuser-specific3-Drepresentationdidbearsomeresemblance totheoriginalandthereplicatedexpressionsdidsomehowrelate totheoriginalexpressions.Faceshiftstillhadtroublewithcor- rectlyidentifyingfacialhairandfinemovementofmusclesand requirestrainingforeachnewuserwhichcontradictsthegoal ofoursystemi.e.auniversaluser-independentsystemrequiring notraining.
Choeetal.(2001)modelsthefaceusing13differentfacial musclesalongwithaskindeformationmodelusingafiniteele- ment model (FEM) for the skin surface. Facial performance captureisaccomplished bythreecalibratedandsynchronized camerasobserving24retro-reflectivemarkersgluedtotheface.
The steepestdescentmethodisused tofindmuscle actuation parametersthatbestfitthemarkerpositions.Similartoourwork, expressionretargetingisdoneusingthemuscleparametersfrom observedexpressionstoanimateanotherface.
Acommonfeatureamongstmarkerbasedsystemsisahigh hardwarerequirementandlengthysetupprocedure.Weinstead focusonwhatcanbedonewithstereoweb-camerasandcheap andeasilyappliedadhesivestickers.In addition,oncemotion capturedataisacquireditgenerallyneedstobecleanedupor manipulatedbeforeitisinauseablestate.Welookatremoving thisdifficultybyusinganRBFretargetingapproach,projecting motionvectorsontovirtualmusclevectorstoconstrainmove- mentstovalidones.Byautomatingourapproachwealsowantto removethehighlearningcurveoftheprofessional3-Dmodeling softwareenvironmentsforfacialexpressioncreation.
Regarding the applications of this technology, the investi- gation ofmoodandemotionsdatesback tothe seminalwork of Ekman andFriesen (1971). With the emergence of social interfacesandvirtualworldstherehasbeenarenewedinterest inwaystodirectlyinputormodifycontentbasedonausers’
instantaneousmoodoremotion(Kaliouby&Robinson,2004).
TheexplorationofemotionsandmoodsinHCIhasledtoappli- cationssuchasthepopularMobiMood,whereusersconveytheir moodstatusviashorttypedmessages.Suchapplicationshavea strongappealonmobilephonesandincreasingcomputingpower and3-Ddisplaycapabilities(Tegra,2013)shouldallowamore
direct andautomatedway toconveymood andemotions.As such,ourpresentedworkalignswellwiththecurrenttrendsand needswithinthemulti-mediadomainforthepursuitofthebet- terunderstanding,generationanddetectionoftheinformation expressedbyaface(Rachurietal.,2010).
This research expands and improves on that presented in Woodward,Delmas,Gimel’farb,andMárquez(2007).
3. Systemdesign
The system comprises three modules: facial animation, markertracking,andmarkertomodelassignmentwhichlinks theothertwomodules.Eachmoduleisdescribedinbriefhere andthenelaborateduponintheirrelevantsections.
1 Facialanimation:Thismodulecomprisesthe3-Dfacemodel withapredefinedvirtualmuscleanimationsystem,described inSection4.Theabstractmuscleapproachmeansthataface model does not need to be designed withknowledge of a certainmarkerconfiguration,makingiteasytochangemarker setupsontheface.Apersonalizedfacemodelwascreatedto matcheach test subject using theRBF mapping approach, describedinSection6.2.
2 MarkerTracking:Themarkertrackingmodulecapturesstereo images usingtwo off-the-shelf Fire-iweb-camerasrunning overthe 1394bus.Itprovides synchronizedframeacquisi- tion at 30 fps and camera calibration routines andmarker templatecreationfortracking.Markerprojectionsareassoci- atedbetweenstereoimageframesandcanbetriangulatedto recover3-Dpositions.Trackingcanthenberecordedtofile, each frame being time-stamped. The stereo system makes pointtrackingmorerobustcomparedtoasinglecamerasys- tem sincewe areabletotriangulatepointsusing thestereo geometry.
3 Marker tomodel assignment: Thismodule creates a map- pingbetween3Dmarkersandthemodelforcalculatingthe animation.Doingsoallowsanymarkerconfigurationandtest subjecttobeusedwithanymeshthatisfittedwiththefaceani- mationmodule.Oncemappingiscompleted,faceanimation canbedrivenbythemarkersthroughinversekinematics.
SoftwarewaswritteninC++andusedMicrosoftMFCforthe GUIandOpenGLfor3-Dvisualization.Thesystemoperateson aWindowsbasedplatformusingamediumgradePC.
AstereoconfigurationoftwoUnibrainFire-iweb-cameras running over a IEEE-1394a (Firewire) interface for synchro- nizedvideostreamswasused.TheFire-icameraswerechosen as theyarecapable offrame ratesup to30fps andallowfor colorbasedmarkertracking.IEEE1394waschosenoverUSB 2.0asacamerainterfaceasithasbettersupportforindustrial camerasandeasyaccessandcontrolofIIDC-1394compliant camerasthrough the CMU1394camera driverandAPI.The Fire-icameraoperateat640×480pixelresolutionandhavea 4.3mmfocallength.
Cameraswereplacedverticallyversussidebysidetoallow for themaximumfieldofviewof markerandtoavoidtrack- ing difficulties for markers placed on the sides of the face.
Tsai’sgeometric cameracalibration(Tsai, 1987)was usedto estimateintrinsicandextrinsiccameraparameters.Oncemark- ers wereidentifiedinthestereoimagepair,knowledgeof the systemgeometrywasnecessaryforrecovering3-Dmarkerloca- tionsthroughtriangulation.Acalibrationboxwith63circular markingswasused.Experimentsinreconstructingcalibration markingswithknowntruelocationshaveshownthattheerrors are stablebetween0.4mand1mfromtheworldorigin,with ameanerrorofapproximately1mmandstandarddeviationof 0.48mm.However,inpracticethedelineationandlocalization ofcoloredmarkersismorepronetonoisethanthecalibrationtar- get–noticeableinthejitterseeninrecoveredmarkerpositions.
Thetestsubjectwasplacedapproximately1mfromthecameras.
4. 3-Dfacemodelandexpressionanimation
Afacialanimationsystemusingavirtualmuscleapproach, first implemented and described in Woodward and Delmas (2004),wasextendedforthisresearch.Twenty-onevirtualmus- cles wereplacedinanatomicallybasedpositionswithina3D humanfacemodelas detailedinTable1 andFig.1.Muscles aredefinedseparatelyfromthefacemodel,providingflexibility in designand givean abstract description of a facialexpres- sion.Movableeyeswerecreatedsothefacecouldalteritsgaze.
In addition,jaw movementwas modeled by rotating vertices aroundajawaxis.Theanimationapproachwasbasedonwork by Parke, Water, Terzopolous andLee;their GeoFace model servedasabasemesh(Lee,Terzopoulos,&Waters,1995;Parke
&Waters,1996;Terzopoulos&Waters,1990;Waters,1987).
The face mesh can be deformed using a computationally efficientgeometricapproachwhereverticesaremovedindepen- dentlybythevirtualmuscles,asshowninFig.2.Muscleswere positioned usingan interactiveapplication developedinC++.
Expressions are represented as a vector of muscle activation parametersandfaceanimationperformancescanbegenerated throughasimplescriptingsystemthatcontrolstheseparameters overtime.
4.1. Musclemotiondefinitions
Alinearmuscleandanellipsoidmusclewereimplemented toapplydirectdeformationofthemesh.Facemodelverticesthat
Table1
Animationsystemabstractmusclelist.
Number Musclename Muscletype Location
1-2 Zygomaticmajor Linear L-R
3-4 Frontalismajor Linear L-R
5-6 Frontalissecondary Linear L-R
7-8 Orbicularisoculi Ellipsoid L-R
9 Orbicularisoris Ellipsoid Mouth
10-11 Frontalisouter Linear L-R
12-13 Frontalisinner Linear L-R
14-15 Lateralcorugator Linear L-R
16-17 Innerlabiinasi Linear L-R
18-19 Labiinasi Linear L-R
20-21 Angulardepressor Linear L-R
Fig.1.Facialanimationsystemmuscleplacement.Musclenamesaregivenin Table1.
Fig.2.Thefacemeshbefore,left,andafter,right,alinearmusclecontraction.
areinfluencedbyaparticularmusclearedeterminedatsystem startup.
Linear muscle: this has an origin, v2, specifying muscle attachment to the skull, and an anchor point v1, specifying theinsertionintotheskintissue.Consequently,m=v1−v2is known asthe muscle vector. Verticesinside aradialdistance fromv2 andinsideanangularrangetothemuscle vectorare affectedbythemuscle.
Avertexpofthefacemesh,undertheinfluenceofaparticular linearmuscle, movesto anewlocation, p,by the following equation:
p=p+akr q
|q| (1)
where k is the contraction increment, q=p−v2 and a=1−cos(α)/cos(ω)isanangularscalingfactorthat reduces movementwhentheanglebetweenmandq,α,increases.The rangeofangularinfluenceofthemuscleisgivenbyω.Theradial
displacementscalingfactor,r,affects p basedonitsdistance fromthemuscleoriginv2:
r=
⎧⎪
⎪⎪
⎨
⎪⎪
⎪⎩ cos
π 2
D−Rs (Rf−Rs)
if Rs <=D<Rf
cos
π 2(1− D
Rs)
if D<Rs
(2)
whereD=|q|,Rf isthelimitofamuscle’sinfluenceandRs is thedistancefromwhichamuscle’sinfluencestartstodecrease tozeroatthemuscleoriginv2.
Ellipsoidmuscle:drawsverticestowardsthemusclecenter.It hasaradialrangeandnoangularrange.Becauseofitsfunction, this muscle is also known as asphincter muscle and can be defined byan ellipsoid, having a majorandtwo minor axes, theirlengthsbeinglx,ly,lz respectively.Avertexp=(x,y,z), undertheinfluenceofaparticularellipsoidmuscle,movesina similarmannerasifinfluencedbyalinearmuscle,butwithout theangularterm,a:
p=p+kr v2−p
|v2−p| (3)
4.2. Jawmotion
Jaw movementshifts facevertices a greater distance than the abstract muscles. In this situation, it ispossible that ver- ticesinfluencedbyjawmovementcanmoveoutofthezoneof influenceformusclesthatshouldalsoinfluencethesamearea (i.e.theorbicularisoris,angulardepressors,zygomaticmajors).
Consequently,thejawmodelwasimprovedbydeterminingifa vertexhadmovedoutsidetherangeofinfluenceofthesemus- cles,thenthevertexisunaltereduntil itreturns withinrange.
Thischeckneedonlybeperformedonverticesunderthecon- fluenceofthejawandatleastonemuscleanddoesnotweigh onthecomputationtime.
5. Markertracking
Marker tracking begins by localizing candidate markers withineachstereoimage.Forthisexperiment,13coloredself- adhesivebluemarkerswereusedandtheirimagelocationswere determinedwithacolorpredicateoperatingintheYUVcolor space(Barton&Delmas,2002).Thisnumberwaschosenasa tradeoffbetweencomputationalcomplexity,capturingenough non-rigidexpressionmotion,andsetuptime.Futureworkwill lookintoextendingthenumberofmarkersanditsimpactonthe functionalityofthemusclebasedsystem.
Asetofadhesivecolorstickerswereusedaslowcostmark- ers. They were chosen for their easy color detection, speed of applicationtoafacewithoutdamagetotheskin,andtheir low cost, being available from most stationery stores. From ourexperiments,commerciallyavailablereflectivemarkersthat areappliedtothefacewithspiritgumaretrickytoattachand remove, thusaddinganextraelement totestsubject prepara- tion.Theyarealsoexpensivetopurchasebecauseoftheirniche market.
Fig.3.Thefaceanimationsystemshowing(a)muscleplacements,(b)textureonly,and(c)theunderlyingmesh.
Fig.4.Markerplacementontheface,fortherealandvirtualcases.Whitepoints onthevirtualmodelindicateanchormarkers:theforehead,totheoutersideof theleftandrighteyes,andbelowthenose.
Eachtestsubjectpreparedthemselvesbyapplyingmarkersto theirfacewhilelookinginamirror.Tohelpguidethetestsubject amannequinwaspreparedwithanexamplesetofmarkersplaced onitsface.
Markerswereplacedinfaciallocationsthat bestrepresent facemotion.ThecurrentmarkertestsetupisdisplayedinFig.4.
Algorithm1. Markertrackingalgorithm
1: whileStereoimagepairsfromavideostreamdo
2: forSubregionsofimagesifmarkertemplateinitialised,elsefor entireimagedo
3: Colorsegmentationusingacolorpredicate,medianfilter,and connectedcomponentsanalysis.
4: endfor
5: Mergingnearbycomponentsofsimilarcolor.
6: Computingcentroidsofconnectedcomponentsaspotential markerlocations.
7: ifMarkertemplateinitialisedthen
8: Assignnewmarkerpositions;ifmarkertemporarilylostretain previousposition.
9: Triangulatemarkerpositionsbasedonmarkertemplate correspondences.
10: endif
11: Optionally:
12: ifSuitablecandidatemarkersfoundthen
13: Initialisemarkertemplatebasedontheirlocations.
14: endif
15: endwhile
Amarkertemplatewasconstructedandtrackedthroughthe videostream,theprocedureisgiveninAlgorithm1.Themarker templatedefinesthenumberofmarkerstobetrackedwithinthe
imagealongwiththeirpositionandvelocity.Thetemplatepro- videsrobustnessagainstanyextraerroneouslocalizationsfrom being considered. Afterdefinition,onlythe areasaroundcur- rentmarkerlocationsneedtobeprocessedandsearchedin,thus reducing computation time. Marker correspondence between stereoframescanbefoundwithrespecttothescanorderand epipolargeometryconditionofthestereocameras.Thesecorre- spondencescanthenbetriangulatedtorecoverthe3-Dlocations.
Asubsetofthetemplatemarkersweredesignatedasanchor points (shown as white points in Fig. 4). These anchors are located wherea good estimate of the head rigid motion can betaken.Theyshouldnotbeaffectedbyfacialexpressionsas theyareusedtodefinealocalheadcoordinateframe.Thecon- structionofalocalcoordinateframeisdescribedinSection5.1.
Amarkerisconsideredtohavedisappeared ifanewposi- tionthatisbelowacertaindistancethresholdcannotbefound – inthiscasetheprevious markerposition iscarried through intothecurrentframe.Inpracticethisworksadequatelyforthe temporarylossofmarkers,e.g.inthecaseofanobjectsuchas thehandbrieflyoccludingthefacewhenthefaceisnotmoving substantially.
Thepresenceofnoisewithinthesystemcausesjitterinthe triangulated3Dmarkerpositions.Noisesourcesincludeoptical distortions,imaging noiseandinaccuraciesinthecalibration.
Pastexperimentsshowedthatasmallamountofpositionalaver- aginghalvedpositionalvariance(Woodward&Delmas,2005a).
Therefore,anempiricallydetermined,temporalmeanfiltering overthreeframeswasappliedtostabilize3-Dmarkerpositions.
Thisgaveanoticeableimprovementinmarkerstability,reduc- ing thestandarddeviation inmarker positionestimationfrom 4.5mmto1.6mmat1.2m.
Eachstereoframewastime-stampedandtherelevantdatais savedalongwiththeframe’striangulatedmarkerpointsasthe systemruns inreal-time.Alternatively,the systemallowsfor stereoimagesequencestobesavedandprocessedoff-line.
5.1. Headlocalcoordinateframeandreferencemarker positions
Inordertoestimatethenon-rigidmotionoffacialexpressions rigidheadmotionmustbeaccountedfor.Thiswasachievedby constructingalocalcoordinatesystembasedonthefouranchor markers,showninFig.4.Thecenterpointofthefourmarkers
actsasthecenterofgravity,ortranslationalcomponent.Thefour markerpointsa1,a2,a3,a4arearrangedsothatthevectorfrom a1toa2isorthogonaltothevectorformedfroma3toa4.These canbeusedtocreatethelocalcoordinateframeR=(r1r2r3)as follows:
r1= (a2−a4)
|a2−a4|, r3=r1×(a3−a1)
|a3−a1|, r2=r1×r3 (4) Thecalculationensuresthatthecoordinateframeisorthogonal.
Rigidmotionwasremovedfromeachframebysubtractingthe centerof gravity from the triangulated marker positions and multiplyingtheresultsbyRT.
Asmallamountofmovementwasapparentforsomeofthe anchorpoints,particularlytheoneplacedabovethelip.Thiswas unavoidableforthisapproachbutwasdealtwithbyknowingthat aminimumofthreepointscanbeusedtocreateaco-ordinate frame for rigid motion. Also, for each test subject a neutral expressionwas recorded to provide default referencemarker positions(aneutralstate),forallmarkers,withinthelocalhead frame.Thisreferenceframecanbeacquiredduringthefirstfew secondsofmarkermotioncapture.Todosothetestsubjectis askedtokeeptheirfacestillandwithoutfacialmovement(head movementpermissible).Referencemarkerpositionsallowfor thecalculationofdivergencevectorsfromtheneutralstate,and arealsousedintheRBFmappingprocedureinSection6.These vectorsareusedtocreatethevirtualperformance.
6. Markertomodelassignmentandpersonalized3-D facemodelcreation
ARBFmappingwasmadebetween3-Dmarkerpointsand correspondingpointsonafacemeshbasedontheneutralref- erence marker frame andthe neutral face mesh. The marker performancecanthenbetranslatedintothevirtualmeshlocal coordinateframebygivingthe3-DmarkerpositionstotheRBF mapper(asinEquation(6)).Theresultantmappedpositionscan thenbemeasuredasdivergencevectorsofmarkerverticesfrom themeshneutralstate.
6.1. Radialbasisfunctionsformapping
RadialBasisFunction(RBF)interpolantspossessdesirable smooth properties (Farfield Technology Ltd, 2011) and have successfullymappedgenericfacemodelstosubjectspecificface data(Woodward&Delmas,2004,2005b).Theyareparticularly desirableasascattereddatainterpolantandhavebeenapplied inareassuchas approximatingnoisy data,surfacesmoothing andreconstruction(Carretal.,2001;FarfieldTechnologyLtd, 2011),imagemorphing, andevenfor facialanimation (Noh, Fidaleo,&Neumann,2000).
Thepresentedfacemappingapproachtakesauserspecified subsetofthegenericfacemodel’sverticesandassignsvaluesat theselocationsthatrepresentdisplacementsforasinglespatial coordinatebetweenthe genericmodel andtargetfacemodel.
Thesecorrespondencesareassignedbetweenfiducialpointson eachfacesurface.Thus,themappingofallpointsneednotbe
specifiedmanually;thesetwillbemuchlessthanthenumber ofpointscontainedinthesourcedataset.
ThiscanbethoughtofasspecifyingasetofNsamplesfrom a3-Dfunction,f(x),atcoordinates,X=(x1,...,xN),xi=(xi,yi, zi),xi ∈ R3,whosevalues,f=(f1,...,fN),representdisplace- mentsbetweenthetwofacemodelsalongaparticularaxis.A RBF,s(x),isthenfittedtothisdatatoapproximatef(x).Thisis performedthreetimes,onceforeach spatialaxis,tocalculate 3-Ddisplacements.
ARBFhasthegeneralform:
s(x)=p(x)+N
i=1
aiφ(|x−xi|) (5)
where p(x)=c1+c2x+c3y+c4z is a linear polynomial term definingaplanein3-space,aiisarealnumberweight,|x−xi| specifiestheEuclideandistancebetweenxandxiandφisabasis function.Thebiharmonic spline,φ(r)=r,isused asthe basis functionasfunctionsofthreevariablesarebeingfitted,where ristheEuclideandistance.Thebiharmonicsplineisasmooth- inginterpolatorappropriatefor3-Dobjectssuchashumanface models(Carretal.,2001).
Forthemappingofa3-Dpoint,x,fromthegenericmodel toanequivalentpoint,v,thatfitstheproportionsoftherecon- structeddata,Equation(5)becomes:
v=s(x)=p(x)+
N i=1
aiφ(|x−xi|) (6)
Thecoefficientsofthepolynomialvectortermpandthereal valued vectorweights ai canbedetermined usinglinearleast squares.TheRBFapproachfindsamappingthatwill depend onthequalityofcorrespondencesidentifiedbetweendatasets.
Creationofthepersonalizedfacemodeliscompletedbyenter- ingthepointsofthegenericfacemodelintoEquation(6).The outputpositionsaretheinterpolatedvaluesbestapproximating theoriginalreconstructedfacedataset.
6.2. Creatingapersonalizedmodel
Results from the associated work on 3D face reconstruc- tionbyAn,Woodward,Delmas,andChen(2005),Woodward, Delmas,andGimel’farb(2008),Woodwardetal.(2006)canbe combined withthegeneric facemodel andanimationsystem describedheretocreateapersonalized3-Dmodelabletocre- atenovelexpressions.Radialbasisfunctions(RBF)wereused for anon-rigidmappingbetweenapersonal3-Dfaceandthe genericface.RBFsrequireonlyasmallnumberofcorrespon- dencesbetweendatasetsthatdescribethedifferentproportions betweenthe twofaces. Thevertexdistributionandresolution ofthegenericfacemodelmustbepreserved,sothisinterpola- tionstrategywasappropriate.Thepersonalizedfacesfoundin Section9werecreatedinthismanner.
7. Muscleinversekinematicstodrivefaceanimation Triangulated3-Dmarkerpositionscannowbemappedand appliedtothefacemodel.Changesintheirpositionsdrivethe facemusclesthroughinversekinematics.Thissectiondescribes theprocessofcreatingacompletevirtualperformancegivena setof3Dmarkers,theirmotionovertime,andafaceanimation system.
Markerdivergencesweremeasuredfromtheneutralframe, mapped into avirtual facemodel’s localspace. Eachmarker wasmatchedwithavertexofthemesh.Uponinitializationthe musclesthatinfluenceeachmarkermustbefound.Thisissimple aseachmusclehasanareaofinfluence.Whenamarkermoves, virtualmusclesmust beupdatedtodeformthe meshtoplace thecorrespondingvertexonthemeshascloseaspossibletothe marker.Inversekinematicswasusedtoachievethis.
Inversekinematicsisthecalculationofparametersforakine- maticchaintomeetadesiredgoalpositiong,whenstartingfrom aninitialpositione.Appliedhere,thedesiredendpositionwould bea3-Dmarkerpositiontrackedbythevisionsystem.Theini- tialor currentposition isconsideredtobethelocation ofthe vertexonthemeshcorrespondingtothemarker.
In the literature the vertex eis known as the endeffector ofakinematicchain.Thekinematicchainconsistsofasetof joints:namelythesubsetofmusclesorjawaffectingacertain marker. Since only the contraction values of each muscle or therotationofthejawaredealtwith,eachjointpossessesone degreeoffreedom (DOF).Adegreeoffreedom (DOF) refers toananimatableparameterforeachvertex-markerpairwithin thevirtualface,wheretheanimatableparametersarethemus- clecontractionvaluesandjawrotation.Eachvertex-markerpair willhaveanassociatedsetofmuscleswhichaffectit,theirnum- beristhevertex’sDOF.Likewise,eachendeffector,ormarker positionhas3-DOF,describingitspositionin3-space.Collec- tivelyalloftheseparametersdescribethesystemstateandtheir change provides thevirtual characterperformance. The loca- tionsofmarkerssufficetoplanfortheanimationand3-Dmotion capturedataprovidesasimplerwaytogenerateaperformance basedwhenopposedtoforwardkinematics,whereeachmuscle contractionvaluemustbedetermined.
7.1. Solutionformulation
Let=(φ1,φ2,...,φN) ∈ RNrepresentthevectorofjoint parameters(themuscleDOFs)ande=(e1,e2,e3)representa vectordescribingtheendeffectorDOFs,inthiscaseavertex position.Letf bethe forwardkinematicsmapof akinematic chain.ThisfunctioncomputesendeffectorDOFsfrommuscle DOFs:
e=f() (7)
Consideringasinglefacevertex,e,anditsmarkercorrespon- dence(targetposition)g=(g1,g2,g3),theinverse kinematics problemistocompute thevectorof muscleDOFs, ,sothe
vertexofafacecoincideswiththemarkerposition;e=gandthe desiredpositionalchangeofeisd=g−e:
=f−1(g) (8)
Unlike forward kinematics, solving f−1 is non-trivial as f is non-linearandtheremaybemultipleornopossiblesolutions.
AniterativenumericalsolutioncanbefoundwiththeJacobian matrix J of f, i.e. the function canbe followed to aminima throughtheknowledgecontainedinJ:
J(e,)= de d =
⎡
⎢⎢
⎢⎢
⎢⎣
∂e1
∂φ1
··· ∂e1
∂φN ... . .. ...
∂eM
∂φ1 ··· ∂eM
∂φN
⎤
⎥⎥
⎥⎥
⎥⎦
(9)
where M=3 tospecify a vertex position inR3 andN is the numberof musclesaffectingavertex(which isautomatically calculatedduringinitialization).Foranincrementalstepinpose, thechangeinendeffectorposition,causedbyavariationinjoint DOFstate,canbeestimatedbyfirstorderapproximation:
d≈ de
d·=J(e,)·=J· (10) Hence,forinversekinematicsanupdatevalueiscalcu- latedbasedon:
≈J−1·d (11)
whereifpossiblee≈d.Ingeneral,Eq.(11)cannotbesolved uniquely and various strategies can be employed for choos- ing .Since f isnon-linear, theJacobian approximation is validonlynearthecurrentconfigurationofe,,therefore,the approachtakenistoiterativelyupdatejointDOFsbyvaluesof
untilEquation(8)issatisfied.
Afastandsimpleiterativegradientdescentsolutionwascho- sen that uses the Jacobian transpose of marker positionwith respect tomusclecontraction(orjaw rotationarounditsaxis) (Baxter, 2000). This method is computationally inexpensive sincethereisnoinversionof theJacobianmatrixtoperform.
Italsolocalizes computationssincethe DOFscanbeupdated beforetheentireJacobianiscomputed.
7.2. TheJacobianTransposemethod
ThetransposeofJisusedinsteadoftheinverse:
=JTe (12)
wherethestepeisgivenas:
e=β(g−e) (13)
inthe direction of thegoalmarker positiong. The scalefac- torβisusedtolimitthestepsizesincenon-smoothfunctions are involved.The value ofβ could be tunedfor each marker sincesmallerstepsizesmustbetakenwhenthenumberofinflu- encingmusclesincreasesandinturnthecomplexityofmarker
movement. Theside effectof thisis anincrease inthe num- berofiterationsrequiredforasolution.TheJacobianTranspose methodis justified in terms of the principal ofvirtual work, whichprovidesarelationshipbetweenanendeffector move- mentandaninternaljoint-byanalogythedistancetothegoal g isconsidered tobe aforcepulling e, andjoint torques are equated to changes in joint DOFs, (Baxter, 2000; Buss
&Kim, 2004).Thisanalogy isinexactandnon-linearitypre- vents the end effector reaching the goalin exactly onestep, whichiswhytheprocessmustbeiterated.Ifβissmallenough thenthemethodalwaysmovesclosertothegoalg:bytaking d=g−eandEquation(10),d≈JJTe=βJJTd,itholds that JJTd·d≥0,i.e.themovementofewillalwaysbeinadirection lessthan90degreeswithd(Buss,2004).
TheJacobianmatrixcanbebuiltbyloopingthroughthejoints inthekinematicchainsandevaluatingtheirDOFs.Itrelatesthe changein endeffectorposition withthechange inkinematic musclecontractionvalues.EntriesofJspecifyDOFsanddepend onthetypeofjoint–muscularcontractionisalinearjointand jawrotationisanangularjoint.
Foramuscle DOFavertexeisdrawtowardthe muscle’s origino.ThepossiblevaluesfortheDOFareboundedasmuscles haveaphysicalcontractionrange.AchangeintheDOFφgives amovementalongthisdirection:
∂e
∂φ =(o−e) (14)
A1-DOFrotationaljointmodelsjawmotion,itsentryinthe Jacobian measureshow theend effector,e, changes during a rotationaboutitsaxis:
∂e
∂φ =a×(e−r) (15)
Therateofchangeinthepositionofeisinatangentialdirection totherotationaxisa,scaledbythedistanceoftheaxistoe,r. Equations(14)and(15)formcolumnsofJ.
7.3. Pseudo-codeformuscleinversekinematics
Pseudo-codeisgiveninAlgorithm2;markerCountgivesthe numberofmarkersbeing trackedandaffectedMuscleCount(j) specifiesthenumberofmusclesaffectingaparticularmarkerj.
Algorithm2. Muscleinversekinematics
1: forallisuchthat0≤i≤iterationsdo 2: foralljsuchthat0≤j≤markerCountdo
3: forallksuchthat0≤k≤affectedMuscleCount(j)do 4: EstimateJacobiantransposeentry,JTk,from
Equations(14)or(15)dependingonmuscletype.Matrixis implicitasentriescanbeprocessedinorder.
5: Pickapproximatesteptotake-asinEquation(13).
6: Computechangeinmuscle(joint)DOF-asin Equation(12).
7: ApplychangetoDOF-φk=φk+φk. 8: Contractvirtualmusclewiththenewparameter
valueφktoobtainnewpositionforej.
9: endfor
10: endfor
11: endfor
Theresultinganimationcanbesavedtofileorprocessedin real-timedependingonthe processingpowerofthehardware platform.
8. Benchmarkingagainststateofthearttechniques 8.1. Markervsmarkerlessfiducialpointstracking
Weperformedadditionalexperimentstocomparetheaccu- racyofmarker-basedtrackingwiththestateofthearttechniques in2-Dand3-Dfacefeaturepointsextraction,namely,Active Appearance Models (Edwards,Taylor,& Cootes,1998), 2-D ConstrainedLocalModels(Wang,Lucey,&Cohn,2008)and 3-D CLM (akaCLM-Z) (Baltrusaitis,Robinson,&Morency, 2012).Wefirstcreatedanewdatabaseforthecomparisonexper- iment which includes 7 subjects, each recording two similar synchronizedstereovideo sequences– onewithmarkersand anotherwithout.Eachvideoshowsthesubjectgoingthroughthe sixuniversalexpressionsofhappiness,sadness, surprise,fear, disgustandangerplusaneutralface(seeFig.5).Thelengthof eachsequencerangedfrom336frames(11.2s)to528frames (17.6 s),averaging 442 frames(14.7s)for acombined 3000 images. Thesequenceswere recorded inaquicklysetupand relativelycontrolledenvironmentwithgoodlighting,afeature- lessbackground,agoodquality3-Dcamera(FujiFilm3-DW3) andallthesubjectswereplacedinthesamelocation.Following the frameworkdeveloped inWoodwardet al.(2012)andour latestresultsinstereomatchingpublishedinGimel’farb,Gong, Nicolescu,andDelmas(2012),thestereocamerawascalibrated toremovelensdistortionandrectifytheimages.Stereomatch- ing(here1-DBeliefPropagation)wasthenappliedtoproduce depth maps.The latterswerebothbothused forCLM-Z and toobtain3-Dcoordinatesofanypointonthefaceandthe3-D coordinatesofour13markersmodel.
8.1.1. ActiveAppearanceModel
Foreachtechniqueweobtainedcodeformthewebanddid noretrainonourimages(sinceoursystemiftraining-free).We selected the code offered by T. Cootes http://personalpages.
manchester.ac.uk/staff/timothy.f.cootes/software/am toolsdoc/
index.htmlas theAAMtool-kitcomes withtraining toolthat allows manually drawing of contours on a series of facial images; andthenadaption ofthe contoursetasthe sequence evolves.Thisallowedaneffectivetrainingofthisimplementa- tiononasetof theauthors’faceimages.Theresultingmodel was abletomatchneutral facesquitewell(sinceneutral was thefirstexpressioninthesequence),butfailedeachtimeanew expressionwasmade(e.g.seeFig.6).Thegeneralimpression wasthatalotoftedioustrainingwouldberequiredtogenerate agoodmodeleachtimenewanewfaceistobeprocessed.
8.1.2. 2-DCLM
We evaluated the 2-D CLM version implemented by Yan, Xiaoguang, available at https://sites.google.com/site/
xgyanhome/home/projects/clm-implementation, which fol- lowed:Wangetal.(2008)andCristinacceandCootes(2008).
We used the model provided with the source code, thus no
Fig.5.Imagesextractedfromourcomparisondatabase:testsubject:Trevor.Fromlefttoright,showingthe7universalexpressionsofneutral,happiness,sadness, surprise,fear,disgustandanger.
Fig.6.AAMtracking:testsubject:Trevor.Fromlefttoright,showingAAMtrackingresultsforaneutral,happyandsurprisefacialexpressions.
training was needed.If notinitalizedclose tothe actualface it wasfound that that theinitial template didnot fit wellthe firstimagefacefeatures.Howeverafteracoupleofframes,the template evolved andappearedtofind the correct orientation oftheface.Thisimplementationwasslowtoadapttochanges infacialexpressionsanddidnottrackmouth movementvery well.Whileverylittleadditionsweremadetothesourcecode to run properly, manual hardcoding of the initial template locationroughlyinthevicinityofthefacefeaturesofthefirst image(achievedbychangingconstantshardcodedinthesource code)wasnecessary.Ourinitialtestsfound2-DCLMproneto errorsduetotheexistenceoffacialhair,lightingandskintone variationsas well as an open mouth throughoutimages (e.g.
seeFig.7).
BothAAMand2-DCLMfailedtocorrectlytrackfacefea- turesonalimitedsampleofimagesofourdatabaseandwere excludedfromthecomparisonwithourmarker-basedapproach leavingCLM-Zasthelaststate-of-the-arttechniquetocompare to.
8.1.3. CLM-Z
OurCLM-Zimplementationwaskindlyprovidedby2.While thislibrarydidcomewithseveraldifferentimplementations,we hadtowriteourownfront-endprogramtogetthesourcecode torun through allthe images of ourdatabase and renderthe relevantpointstoadatafile.TheCLM-Zimplementationcame withapre-trainedclassifierandthusnotrainingwasrequired.
2 http://www.cl.cam.ac.uk/research/rainbow/projects/clmz/
Upon start up, the algorithm grabs the first image in the sequence toinitialize itstemplate. Thismaytakearelatively longtime,averagingupto2minforamid-rangePC.Onceini- tialized,thesubsequentupdatesandfeatureextractionthrough successive frameswasquitefast, achievingclosetoreal-time performance.
Weusedanimplementationof CLM-Zwithagenericface templateforthemarker-lesssequences.CLM-Ztrackedpoints roughlyintheregionoftheappropriatefacialfeatures.CLM-Z copedwellwithheadtiltingintheneutralexpressionstate.There werenojumpsinfeaturepointtrackingonstaticsequencesas mostpointsremainedintheneutralstate.However,wefound
Fig.7.2-DCLMfacefeaturedetection:testsubject:Trevor.Fromlefttoright, showing2-DCLMtrackingresultsforaneutralandsurprisefacialexpressions.
Fig.8.ExamplesoftrackingerrorswithCLM-Zacrossfourdifferentsequences.Oftentheeyesaremisaligned,facialhairconfusesthesizeandlocationofthe mouth,andthefineshapeofthemouthisalsolost.
thatCLM-Zfrequentlyfailedtobeinperfectsynchronization withthe images, oftenproducing lagging or misaligned fea- tures (see Fig. 8). On sequences with full facial expressions evolving, the detectedeyeshape seemedto bemuch smaller thanthe actualeye. Aswell, themouth was ofteninaccurate withfacialhair confusedasamouthinsomecases.Tracking stopped followingthemouth forwide smilesor if themouth waswideopen,e.g.surpriseactions(seeFig.9).Acrossthe7 sequences,CLM-Zfailedtoaccuratelyfindeyeshapeandloca- tionin33%oftheimages(from5%to100%errorsacrosseachof the7sequencesformouthshapeextractionandbetween0%and 100%erroracrosseachofthe7sequencesforeyeshapeextrac- tion), 67% for mouth shape and location, respectively. None of thesequencesrecorded perfect extractionof facefeatures, necessitatingtheuseofmanualextractionforsomecases.CLM- Zfailedwhenfacial hairwasencountered; eyedetectionwas weakinmostcasesandmouthcornerdetectionfailedwithwide smilesorunexpectedshapes,afeaturecommontomostcontour basedandappearancebasedtechniqueswhendealingwiththe mouth.
8.1.4. Marker-basedtracking
For our marker-based system, we used our sequences withmarkers. The markers were first segmentedusing color
Fig.9.3D-CLM(CLM-Z)facefeaturedetection:testsubject:Trevor.From lefttoright,showingCLM-Ztrackingresultsforaneutralandsurprisefacial expressions.
segmentation inHSV colorspace,thena basicblobtracking algorithmwasusedtotracktheblobsacrosssubsequentframes.
Onalltestsequences,themarker’sbluecolorwasclearlydis- tinctfromfacialfeaturesallowingittobetrackedacrossframes easilyandwithhighaccuracy.Markerswereonlylostwhenthe subject’sheadwasrotatedsuchthatthemarkerwashiddenfrom thecamera,orwhenthesubject’sheadmovesoutofthefieldof viewofthecamera.Incontrasttothebest-performingCLM-Z, only3.8% ofthe 3000images of ourdatabase failedtohave all 13markers trackedacross bothcameras,with allbut two sequenceshaving 100%perfect extraction.Overall, ofall the markerlessapproachestried,CLM-Zhadaslowstart-uptime butperformed thebest,sinceit waseasytoseethe dominat- ingexpression.AAMperformedtheworstduetopoortraining.
Thepointsextractedinallthreemodelswerefar fromperfect withthe most accuratefiducial pointsextractionachievedby ourmarker-basedapproach.Facialhair,lightingandskintone aswellaspronouncedopenmouthexpressionsheavilyaffected facefeaturetrackingformarkerlesstechniques.Ofnoteisthat our markers were createdfrom scratch using bluepaper and skin-friendlyglueataclosetozerocost,incontrastwithmore professional marker requirements. Most failed images origi- nated from blue colorsaturation around some markersagain consideringafastsetupusingoff-the-shelftripod,lighting,cam- erasettings,backgroundandsubjectspositioning.Followingthe frameworkdevelopedinWoodwardetal.(2012)andourlatest resultsinstereomatchingpublishedinGimel’farbetal.(2012), the stereo cameras were calibrated to remove lens distortion andrectify theimages. Thecameras wereautomatically syn- chronized over the firewire bus. Thisensured stable tracking ofmarkersandnoclock driftoccurred.Stereomatching(1-D BeliefPropagation)was thenappliedtoproducedepthmaps.
Asmentionedearly,thiswasusedforCLM-Z,toobtainthe3-D coordinatesofanypointonthefaceandthe3Dcoordinatesof our13markersmodel.
9. Experimentalresultsandsystemanalysis
AselectionofresultsareshowninFig.10;threetestsubjects performed the six universally recognizable facialexpressions (Ekman&Friesen,1971)infrontofthesystem,apersonalized 3-Dmodelwascreatedfromtheirfacesandtrackingdatadrove thefacialanimationsystem.
Experiments show that the proposed system is capable of reproducing facial expressionsfrom marker motion.From
Fig.10.Expressioncreationforarangeofsubjectsand.Foreachsubfigure,fromlefttoright,showingtheoriginalleftimagefromthestereopair,thevirtualresult, andtheassociatedmeshandmotionvectorsofthemarkerpointsfromaneutralstate.
running the system it was found that different test subjects articulatedthesameexpressionusingdifferentmuscles.Also, itwasdifficultforatestsubjecttoperformanexpressionwhen noemotional tiewas involved.Thinkingabout whatmuscles tomoveandgeneratinganexpressionsometimesrequiredself experimentationandlookingatone-selfinthemirror.Therefore itiseasytounderstandwhythereisaneedfordirectorsinmovie performancecapturesituations.
Detailsinimportantareasofthefacethatarenotcurrently modeledincludetheeyelids,lips,teethandinnermouth.This
impacts onthe visualaccuracyof thevirtual expressionsand futureworkwouldlooktoimprovethefacemodelintheseareas.
ThislossofdetailcanbeseeninFig.9,Matthew,whereimpor- tant cuesfor the expressionsare conveyed through the eyes.
The surpriseexpressionfortestsubjectMatthew(Fig.10(i)), clearlyshowsthelossofinformationinthemouthregionand howimportantthisregioncanbeforconveyingexpressions.
For Matthew’s disgust expression, inFig. 10 (j), shadow- ing cuesare lostaroundthe flangesof thenose inthevirtual expression,despitethemotionvectorsbeingcorrectlycaptured.
Duringaperformance,somepeoplehadbettercontrolover certainmusclesof their facethan others.An example of this wasthecontroloftheinnerandouterfrontalismusclesofthe foreheadthatinfluencetheeyebrows,ortheindividualcontrol ofthelabiinasiontheflangesofthenose.FortestsubjectAli therewaslittlemovementinthemotionvectorsforhisexpres- sions, reflecting intheir poorerrepresentations in the system whencomparedtoothertestsubjects.Conversely,testsubject Alex’sexpressionimagesandmotionvectorsshowedlargescale movement,reflectingthevisualcorrectnessinexpressionrepro- ductionshowninFig.10 (c)–(f).
Finetuningof the presetmuscle locations andparameters when mapping a new face model was sometimes needed to improveresultsortocorrectmuscleswithtoogreatanactivity onthemesh.Anexampleofthisistheasymmetryinfacemove- mentfortestsubjectAli’seyebrowswheretherighteyewould oftenappearslightlylargerafteranexpressionwassimulated(as seeninFig.10(a)).Incorrectasymmetryintheeyebrowsandin thesadnessexpressionwasalsonoticeableforAlex(Fig.10(d)).
Thisasymmetrycanbeunderstoodbyconsideringthemapping procedure.Inrealityeachperson’smuscleshaveslightlydiffer- entlocationsandstrengths,thusmoveinslightlydifferentways -asmallamountofpositionalandmorphologicaldifferenceisto beexpectedbetweenthesamemusclesindifferentpeople.One waytoimprovethiswouldbetosetplausiblelimitsofmuscle contractionrangesandletthesystemgothroughatrainingphase whereitwouldrescaleitsactivitybasedonasubjectperforming arangeofexpressions.
Oneofthemainissueswiththeexpressionsystemisthatthere arepotentiallymultiplesolutionsforavertexpositionwhenitis undertheinfluenceofmorethantwomuscles,aneffectthatis amplifiedwithnoisydata.Inthissituation,equalcontributions ofmuscle contraction couldbeassumed, butthismaynotbe the case in reality if some muscles are stronger than others.
Thisambiguitydoesnotexistwhenhandcraftedexpressionsare createdfor animation.Two opportunitiesfor improvingupon thisissuearethecreationofamoreadvancedanimationsystem thatmayincorporatepriorlearning,e.g.theuseofarestricted setofbasisexpressions,andinvestigatingtheeffectofadding moremarkers.
Amarkerbasedsystemcanonlypickupthesparsemovement ofthefacesincetrackedpointsarenotdenseenoughtoaccount
forfinemovement.Thiswasdifficultassomepeoplenaturally hadminimalfacemotionforsomeexpressions.
Cameranoisewasmoreapparentindarkregionsandaround edges.ColorpredicatedetectionusedtheYUVcolorspace,so lowintensity regionsweremoreambiguousandwerethresh- olded out of consideration. Noise in the detection of 2-D marker positions affected the quality of 3-D triangulation, giving fluctuations in the location of marker centroids. This was dealt with through temporal smoothing as described in Section 5. The majority of face motion can be described in 2-D andtheerrorfrom thetwo camerasinfluencesthedeter- mination of marker location, however a 2-D system would find it much harder to account for rigid 3-D head motion andany slight shiftingof perspective in theface as the head moves.
The RBF approach provided an effective means of map- ping markersto aface model. There isno restriction on the choice of face model and it does not have to represent the testsubject.RBFisinexpensiveandthemappingcoefficients for aparticular setupneed only be estimated once based on a neutral face expression. However, asymmetries in the 3- D mapping process may affect expressions. This is due to positionaldifferencesintheplacementofmarkersonatestsub- ject’sfacewithrespecttothe selectedvertices ofthe generic model.
10. Expressionestimation
Facial expressionscan be handcrafted using the face ani- mationsystemandsettingmusclecontractionparameters. An experimentwasperformedtotestifthemarkertrackingsystem could be used toestimateagiven expressionfroma prebuilt set of expressions. The setof six universalexpressions were modeledwiththeanimationsystemandsaved.Thetestsubject thenaimed togeneratetheseexpressionsas best as possible.
Fig.11displaysthesixuniversalexpressionsthatwerecreated withtheanimationsystem.
A simple method of expression estimation from marker motioncapturecanbeperformedbytakingtheEuclideandis- tancebetweenthecurrentmusclecontractionvectora,derived fromthecurrentmarkerconfiguration,andacollectionofpre- built expressionsbi, i∈ [1, 6], described in the same form.
Fig.11.Thesixuniversalexpressionsprebuiltintothesystem:(a)happiness,(b)sadness,(c)surprise,(d)fear,(e)disgust,(f)anger.
0 0.5 1 1.5 2 2.5
2.66 2.31 1.99 1.66 1.31 0.99 0.66 0.31 0.00
Time (s)
Distance
a. Happiness
0 0.5 1 1.5 2 2.5
2.66 2.33 1.99 1.66 1.33 0.99 0.66 0.33 0.00
Time (s)
Distance
b. Sadness
0 0.5 1 1.5 2 2.5
2.66 2.33 2.00 1.66 1.33 1.00 0.66 0.33 0.00
Time (s)
Distance
c. Surprise
0 0.5 1 1.5 2 2.5
2.66 2.33 2.00 1.66 1.33 1.00 0.66 0.33 0.00
Time (s)
Distance
d. Fear
0 0.5 1 1.5 2 2.5
2.67 2.34 2.00 1.67 1.34 1.00 0.67 0.34 0.00
Time (s)
Distance
Happiness Sadness Surprise Fear Disgust Anger e. Disgust
0 0.5 1 1.5 2 2.5
2.67 2.33 2.00 1.67 1.34 1.00 0.67 0.34 0.00
Time (s)
Distance
f. Anger
Fig.12.Euclideandistancesofmusclecontractionvectorfrompresetexpressions;testsubjectAlex.
It should be noted that because each individual expresses themselvesdifferently,futureworkshouldcreateanextensive databaseofprebuiltexpressionsthatcanbetterrepresentvaria- tionswithineachclass.
Fig.12displayscurvesofeuclideandistanceovertimefrom eachpresetexpressionforasetofcapturedsequencesinwhich
thetestsubject,Alex,performedthehappiness,sadness,surprise, fear,disgust,andangerexpressions.Thelesserthedistancethe closeracapturedmarkersequenceistoapresetexpression.
Thetestsequencesbeganwiththesubjectinaneutralface stateandthenmovedtowardanexpressionatmaximalmagni- tude.Itshouldbenotedthatthetailofthecurvesinthesequences