Availableonlineatwww.sciencedirect.com
Journal of Applied Research and Technology
www.jart.ccadet.unam.mx JournalofAppliedResearchandTechnology14(2016)300–310
Original
Humanoid robot path planning with fuzzy Markov decision processes
Mahdi Fakoor
∗, Amirreza Kosari, Mohsen Jafarzadeh
FacultyofNewSciences&Technologies,UniversityofTehran,Tehran,Iran Received31October2015;accepted4June2016
Availableonline21August2016
Abstract
In contrasttothe caseofknownenvironments,pathplanninginunknownenvironments,mostlyforhumanoidrobots,isyetto beopened forfurtherdevelopment.Thisismainlyattributedtothefactthatobtainingthoroughsensoryinformationaboutanunknownenvironmentisnot functionallyoreconomicallyapplicable.Thisstudyalleviatesthelatterproblembyresortingtoanovelapproachthroughwhichthedecisionis madeaccordingtofuzzyMarkovdecisionprocesses(FMDP),withregardtothepace.Theexperimentalresultsshowtheefficiencyoftheproposed method.
©2016UniversidadNacionalAutónomadeMéxico,CentrodeCienciasAplicadasyDesarrolloTecnológico.Thisisanopenaccessarticleunder theCCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords: FuzzyMarkovdecisionprocesses;Visionbasedpathplanning;Pathplanninginunknownenvironments;Humanoidrobots
1. Introduction
Theword“robot”originatesfromrobotainCzechlanguage, robotameans work.So,weexpectrobots toworklikeexpert labors.Intheotherwords,wewantrobotstoworkinindustry insteadofhumanlaborforces.Sincehumanshaveappropriately builttheirenvironmentswiththeirergonomic,robotsthathave samephysicalbodyashumanbodiesaremoreusefulthanother types.
In ordertodesignhumanoidrobots whicharesuitable for working efficiently in industrial applications, several control andnavigationproblemssuchasstability,mapping,interacting, computing,grasping and pathplanning must be investigated, mostespeciallypathplanningwhichisanopenprobleminthis researchfield.
Thereisneed forusersandindustries tohavearobotthat canworkintherealworld.Sincetherealworldisdynamic;it isimpossibletosaveallenvironmentsinrobotmemory.This means that we must prepare the robots towork inunknown environment.
∗Correspondingauthorat:FacultyofNewSciences&Technologies,Univer- sityofTehran,NorthKargarStreet,1439955941Tehran,Iran.
E-mailaddress:[email protected](M.Fakoor).
PeerReviewundertheresponsibilityofUniversidadNacionalAutónomade México.
Path planning has been studied in several research fields.
Althoughprogrammingofamobilerobottomovefromanini- tial positiontoatarget positioninaknown environmentisa well-knownprobleminrobotics,buttherearefewmethodsin pathplanninginanunknownenvironment.
Medina-Santiago, Camas-Anzueto, Vazquez-Feijoo, Hernández-de León, and Mota-Grajales (2014) implemented neural controlsystemsinmobilerobotsinobstacleavoidance usingultrasonicsensorswithcomplexstrategies.
Fuzzylogicisoneofthesemethodsthathavebeenusedin somerelatedprojects.Afuzzy-basednavigator hasbeenpro- posed by Zavlangas, Tzafestas, andAlthoefer (2000)for the obstacleavoidanceandnavigationprobleminomni-directional mobile robots. Their proposed navigator consideredonly the nearestobstacleinordertodecideupontherobot’snextmoving step.Thismethodutilizedthreeparametersforpathplanningas follows:
• thedistancebetweentherobotandthenearestobstacle,
• theanglebetweentherobotandthenearestobstacle,and
• the angle betweenthe robotdirection andthe straight line connectingthecurrentpositiontothetargetpoint.
Althoughthismethodwasinrealtimeandtrulyworks,these threeparametersarenotneededforacamerahumanoidrobot.
http://dx.doi.org/10.1016/j.jart.2016.06.006
1665-6423/©2016UniversidadNacionalAutónomadeMéxico,CentrodeCienciasAplicadasyDesarrolloTecnológico.Thisisanopenaccessarticleunderthe CCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Inotherwords,thismethodcanbeutilizedforanyrobotthat hasomni-directionalrangesensors.
Fatmi,Yahmadi, Khriji,andMasmoudi(2006) proposeda usefulwayofimplementingthenavigationtaskinordertodeal withtheproblemofwheeledmobilerobotnavigation.Intheir work,issuesofindividualbehaviordesignandactioncoordina- tionofthebehaviorswereaddressedutilizingfuzzylogic.The coordinationtechniqueusedinthisworkcomprisestwolayers, i.e.,layerofprimitivebasicbehaviorsandthesupervisionlayer.
Theyused14rangesensorstoachievethepositionofanyobsta- clesurroundingthemobilerobot.Thus,thismethodcannotbe employedwithmosthumanoidrobots.
Medina-Santiago, Anzueto, Pérez-Patricio, and Valdez- Alemán (2013) presented a real-time programming for a prototyperobottocontrol itsmovementfromonemomentto anotheronewithoutshowingresponsedelays.
Iancu,Colhon,andDupac(2010)presentedafuzzyreasoning methodofaTakagi–Sugenotypecontrollerwhichwasapplied intwowheelsautonomousrobotnavigation.Thismobilerobot isequipped with asensorial system. The robot’ssensor area isdividedintosevenradialsectorslabeled:largeleft,medium leftandsmall leftfor the leftareas, EZ for thestraight area, andlargeright,mediumrightandsmallrightfortherightarea, respectively.Eachradialsectorwasfurtherdividedinto other threeregions:small,mediumandlarge.Thesensor’srangecould recognizeupto30m,andtherobotcouldidentifyanobstacle anywhereinsidethe interval[−90◦, 90◦].Undoubtedly,most humanoidrobots are notequipped withthis largeamount of sensors,hence,applyingthismethodonhumanoidrobotsseems tobeimpossible.
Thesimplestpathplanningalgorithmsforanunknownenvi- ronment are called Bug algorithms (Lumelsky & Stepanov, 1984).Bugalgorithms solve the navigationproblem bystor- ingonlyaminimalnumberofwaypoints,withoutgenerating afullmapoftheenvironment.Traditionalbugalgorithmswork withonlytactilesensors.Newbugalgorithms,suchasDistbug (Kamon&Rivlin,1997),Visbug(Lumelsky&Skewis,1990), Tangentbug(Kamon, Rimon, andRivlin, 1998) andSensbug (Kim,Russell,andKoo,2003),workwithonlyrangesensors.
Bugalgorithmsneedtocontinuouslyupdatetheirpositiondata, whileasweknow,itisimpossibletoachieveacontinuousupdate inpractice.Moreover,thebugmodelmakessomesimplifying assumptionsabouttherobot,i.e.,therobotisapointobject,ithas perfectlocalizationabilityandithasperfectsensors.Thesethree assumptionsareunrealisticforrobots,andsobugalgorithmsare notusuallyappliedforpracticalnavigationtasksdirectly.
Michel,Chestnutt,Kuffner,andKanade(2005)proposeda path planning algorithm for humanoid robots. They used an externalcamerathatshowedthetopviewoftherobotworking region,hence,theycouldextractinformationabouttheposition oftherobotsandtheobstacles.Theirmethodisnotapplicable formostsituationsbecauseitisimpossibletouseacamerawith aglobalviewoftherobotworksites.
Furthermore,NakhaeiandLamiraux(2008)utilizedacom- binationofonline3Dmappingandpathplanning.Theyutilized a3D occupancygrid that is updated incrementally by stereo visionfor constructingthemodel of theenvironment.Aroad
map-basedmethodwasemployedformotionplanningbecause thedimensionoftheconfigurationspacewashighforhumanoid robots. Indeed,it wasnecessarytoupdatethe roadmapafter receivingnewvisualinformationbecausetheenvironmentwas dynamic. Thisalgorithm was tested on HRP2. As a conclu- sion,theirmethodwasnotefficientbecauseitneedsexactstereo visionandalotoftimetofindapathineachstep.
Inaddition,Sabeetal.(2004)presentedamethodforpath planning and obstacle avoidance for QRIO humanoid robot, allowing it to walk autonomously around the home environ- ment.They utilizedanA* algorithmintheirmethod; thus,it needsalotoftimetoprocess.Furthermore,theyutilizedonline mappingandstereovision. Theirmethodseemseffective,but itneedsstereovisionandhighcomputationalprocesses.Asa result,itcannotbeappliedinmostconditions.
OtherpathplanningprojectonHRP-2humanoidrobothas been carried out by Michel et al. (2006). This method uses severalcamerasintherobotenvironmentinordertoproduce asuitablemap. Asmentionedbefore,we cannotutilize many cameraswhereverwewanttousehumanoidrobotsandsothis methodcannotbeappliedeither.
Moreover,inanotherproject,Chestnutt,Kuffner,Nishiwaki, andKagami(2003)usedtheBest-firstsearchandA*algorithms forfootsteppathplanningonaH7humanoidrobot.theydemon- stratedthatA*ismoreeffectivethanBest-firstsearch.Butboth ofthemneedstereovisionandhighcomputationalprocesses.
Inaddition,Okada,Inaba,andInoue(2003)followedadiffer- entrouteforhumanoidrobotpathplanning:robotandobstacle wereconsideredcylindricalshapes,thefloorwasextractedbased onvisionandthentherobotmadeadecision.Thismethodmay encounteraconflictingproblemwhentherobotconfrontsabig obstacleatitsstartpoint.Inthissituation,therobotcouldnotbe abletodetectthefloorwhichinturnleadstomissingthepath.
In addition, Gay, Dégallier, Pattacini, Ijspeert, andVictor (2010)usedartificialpotentialfieldalgorithmsinarecentpath planningprojectoniCub(ahumanoidrobot).Atfirst,intheir proposedalgorithm,iCubcalculated3Dpositionofeachobsta- cleandtransformeditin2D,andthentheartificialpotentialfield wascalculated. Theirmethodneedsperfect knowledgeof the extractedimagestofindthepositionoftheobstacles;therefore, itmaynotbeutilizedinsomehumanoidrobotsapplications.
Asseenfromtheabovestudy,anefficientandsuitablemethod for pathplanning for humanoid robots in dynamic unknown environmentshasnotyetbeenproposed.Consideringtheiden- tifiedresearchgapinthispaper, anewprocedure, combining theeffectsoffuzzyinferencesystemandMarkovdecisionpro- cesses,isproposed.
TheapplicationofMarkovdecisionprocessesresultsinfaster executionoftheprocedurewhencomparedtotheonesproposed instudiessuchasthosebySabeetal.(2004).Moreover,inthe proposed method, using a fuzzy inference system leadsto a smootheroptimalpaththantheotherpreviouspastmethods.
Moreover, resorting to fuzzy Markov decision processes (FMDP)obviatesthenecessityofhavingknowledgeofthepre- ciseshape,positionandorientationofthesurroundingobstacles, as wellas theneedforrelativelyenormousvolumes ofmem- oryforstockinginformationgatheredin2Dand3Dmaps.The
Fig.1.Aldebaranhumanoidrobotics–NAOH25V4.
extendedexperimentalresultsdemonstratetheefficiencyofthe proposedmethod.
2. Functionalblockdiagram
AnAldebaranhumanoidrobot–NAOH25V4(Fig.1)–was selectedfordescriptionandverificationofourpresentedmethod.
Thishumanoidrobothasacamera,whichistheonlysourceof environmentalsensoryinformationusedfortheanalysisinthis approach.Aftertaking animage,itpassesthroughalow-pass filterinordertoobviatetheeffectof theconcomitantnoises.
Thereafter,theimageissegmentedandsubsequently,inorderto clearuptheeffectofthenoisecausedbysegmentation,theimage passesthroughamodefilter.Inaddition,thedilationprocessis appliedsothatthefinalimagecanbeproduced.
Moreover,aftercomputingtherewardsassociatedwitheach partof the image,the Markovdecisionprocessesserve asan inputfor the fuzzy inferencesystem, so as to feedthe robot withtheinformationneededfordecidingonthedirectionandits movement.ThefunctionalblockdiagramofFMDPisillustrated inFig.2.
The relatedflowchart of the visionsystemis presented in Fig.3.
Camera took an image
Low-pass filter
Segmentation
Median filter
Computed probability of presence of obstacle, free space
and target in each square Mesh Dilation
Fig.3.Flowchartofvisionsystem.
3. Theoreticalbackground 3.1. Markovdecisionprocesses
Markov decision processes (MDP) can be considered as anextensionof Markovchainswithsomedifferencessuch as allowingchoiceandgivingmotivation.MDP, isamathemati- caldecisionmakingtoolwhichmaybeapplicableinsituations whereseriesarepartlyrandomandpartlyunderthecontrolofa decisionmaker.
Specifically, aMarkovdecisionprocessmaybe definedas a discrete timestochasticcontrol process. Ateach timestep, the process isinstate s,andthe decision makermay choose anyadmissibleactionawhichisachievableinthestates.The process proceedsatthe nexttimestepbyrandomly selecting andmovingintoanewstates,andgivingthedecisionmakera correspondingrewardR(s,a,s).Theprobabilitythattheprocess departs towarditsnewstates maybeaffectedbythe chosen action. Specifically,itisgivenbythestate transitionfunction P(s,a,s).Thus,thenextstatesmaydependonthecurrentstate sandthedecisionmaker’sactiona.Butgivensanda,itiscon- ditionallyindependentofallpreviouschosenstatesandactions;
The approximate coordinates of the target
Vision system
Localization system
Calculator
Markov Fuzzy
inference system
Motion Systems Optimal
policy extractor
VF∗ a∗
Decision Processes
Value function calculator
Environment Reward
R
Fuzzy markov decision processes
Fig.2.FunctionalblockdiagramofFMDP.
inotherwords,thestatetransitions ofanMDPcouldpossess theMarkovproperty(Puterman,2014).
3.1.1. Policy
Findinga“policy”forthedecisionmakeristhemainprob- leminMDPs.“Policy”canbeinterpretedasafunctionπthat specifiestheactionπ(s)whichthedecisionmakerwillchoose whenisinstates.
The goal is to select a policy π which could maximize somecumulativefunctionoftherandomrewards,typicallythe expecteddiscountedsumoverapotentiallyinfinitehorizon.
3.1.2. Valuefunction
ValuefunctionVπ(s)istheexpectedvalueoftotalrewardif thesystemstartsfromstate sandacts accordingtopolicyπ.
Soeach policyhasitsvaluefunction.Itcanbeformulatedas follows:
Vπ(s)=E
R(s0)+γR(s1)+γ2R(s2)+··· |π
(1)
Vπ(s)=E
N
t=0
γtR(st)|π
(2)
Eq.(1)couldberewrittenasfollows:
Vπ(s)=E
R(s0)+γ(R(s1)+γR(s2)+···) |π
(3) Vπ(s)=E
R(s0)+γV(s1)|π
(4) TheBellmanequationscanbegainedbythesimplification ofEq.(4)asfollows:
Vπ(s)=R(s0)+γ
s
P(s,a,s)Vπ(s) (5)
3.1.3. Optimalpolicyandoptimalvaluefunction Intheoptimalcase,wewillhave:
V∗(s)=R(s)+max
a γ
s
P(s,a,s)V∗(s) (6)
π∗(s)=argmax
a
γ
s
P(s,a,s)V∗(s) (7)
Iftherearenstates,thentherearenBellmanequations,one foreachstate.Inaddition,therearenunknownvalues.There- fore, by simultaneously solving these equations, the optimal policyandoptimalvaluefunctioncanbeachieved.
3.1.4. Valueiterationalgorithm
As we can see, the Bellman equations (6) are nonlinear, hence they are difficult to solve. In this case, we utilize an algorithmtoextractthe optimalpolicyoroptimalvaluefunc- tionwithoutdirectlysolvingtheBellmanequationsasshownin Algorithm1.
3.2. Fuzzylogic
Zadeh(1965)explainedfuzzysettheoryandfuzzylogic.In contrasttotraditionallogicthatemploysfixedandexactvalues, fuzzylogicusesapproximatevalues.Inotherwords,traditional logichastwo-valuedlogic,falseortrue(0or1)butfuzzylogic hasinfinite-valued,intervalbetweencompletelyfalseandcom- pletelytrue(interval[0,1]).Itissimilartolinguisticvariablesthat usemulti-valued.Therefore,wecanutilizelinguisticinference in fuzzylogic. Fuzzyrules usefuzzy sets and everylinguis- tictermhasamembershipfunctionthatdefines thedegreeof membershipof aspecificvariableforthe fuzzyset.Member- shipfunctionsareusuallyshownwithμ(x).Afuzzyinference systemisasystemthathasafuzzyinferenceunitasillustrated inFig.4.
In thereal world, aspecifiedsystemgains itsneeded data viaitssensors.Asweknow,thegathereddatathrowsensorsare crispandthereforeforfuzzyinferencesystems,afuzzifierunit (preprocessingunit)shouldbeimprovisedtochangeacquired dataintofuzzydata.Also,actuatorsneedcrispdata;becausethe inferenceunitgivesfuzzy data,fuzzyinferencesystemshave adefuzzifier unit (postprocessing unit).There are fewtypes of fuzzyinferencesystems,MamdaniandTakagi–Sugenoare morewidelyknown thanothertypes(Kaur andKaur,2012).
Mamdanirulesareinterpretedasfollows:
Ri: If x1=Ai1 and xm=Aim Theny=Bi
AndforTakagi–Sugenoruleswehave:
Ri: If x1=Ai1 and ... and xm=Aim
Then y=fi(x1,x2,...,xm)
These rules could be defined by experts. But there is no standard method for explaining database in fuzzy Inference systems. Also, there is no standard method for defining the membershipfunctiontominimizeerrorormaximizetheoutput accuracy.
Output Defuzzifier
Inference Fuzzifier
Rule
Input
Fig.4.Fuzzyinferencesystem(Stieler,Yan,Lohr,Wenz,&Yin,2009).
4. Visionsystem
4.1. Preprocessing(low-passfilter)
Mosthumanoidrobotshaveatleastonecameratoseetheir environment.Robotsneedsomeprocesstopercepttheirenvi- ronmentthroughrawdatagainedbytheirsensors.Camerasgive a3Dmatrixandeacharrayofthecamerahasavaluebetween0 and255andasweknow,mostofthesedataarenotnecessary.
Also,thesedataaremixedwithnoise.Imagenoiseisarandom variationof brightness or colorinformationinimages and is usuallyanaspectofelectronicnoise.Therefore,thesedatamust firstpassthroughalow-passfilter(s).
Therearemanylow-passfiltersfor imagenoisecanceling.
MostroboticsprojectsemploytheGaussianfilterthatproduces asmoothblurimage.Themedianfilterisanotherfilterwidely usedfor imagenoisecanceling; however,allsmoothing tech- niquesareeffectiveatremovingnoise,buttheyadverselyaffect edges.In otherwords, atthe sametime, as noiseis reduced, smoothedgeswillbecreated.Forsmalltomoderatelevelsof Gaussiannoise,themedianfilterissignificantlyabetterchoice thanGaussianbluratremovingnoisewhilstretainingtheedges foragivenfixedwindowsize(Arias-Castro&Donoho,2009).
Therefore,inthismethod,therobotutilizedamedianfilterfor noisecancelingasdescribedinAlgorithm2.
4.2. Imagesegmentation
Inthepathplanningprocess,robotsneedtounderstandthe locationof obstaclesandfreespaces,buttheydonotneedto understandthecolorofeachpixel.Inthisway,therobotmust segmentimages.Themainpurposeofthesegmentationprocess maybetosimplifyandchangetherepresentationofanimage into something that is moresignificant andeasier toanalyze (Shapiro&Stockman,2001,Chap.12).Inthiscase,obstacles wererevealedbyredpixels,freespacewasrevealedbygreen pixels,andundefinedpixelsrevealothercolorsthatarecloser toit.
There are various image segmentation algorithms such as thresholding, clustering, histogram-based, edge detection, region-growing andgraph partitioningmethod. Theselection of an image segmentation algorithm is related to where the robotworks.Forexample,inourcasestudy,weexaminedthe
proposed methodologyinour Robotic Lab,thusthe simplest imagesegmentationalgorithmworksverywell.
4.3. Improvementsegmentation(modefilter)
Segmentationmayproducesomenoises;therefore,afterseg- mentation,the producedimages must passthrough the mode filterwiththeAlgorithm3defined.
Theseprocessesmadeasimpledataofanyimage(Fig.5).
4.4. Imagedilations
Dilationmeansexpandingobstaclestoobtainconfiguration space(Chosetetal.,2005).Inotherwords,dilationincreasesthe effectsofanobstaclebydevelopingthevolumeoftheobstacle.
Fig.5.Originalimagesincomparisonwiththesegmentedandfilteredimage andthefinalones.
Fig.6.Exampleofeffectofcameraangle;topandperspectiveview.
4.5. Mesh
Inthesituationsinwhichtheangleoftherobotheadiscon- stantandthegroundisflat,eachpixelshowsthesamedistance.
Weused thispropertytoapproximatethe distancesfromele- mentsofobstacles.Sincetherobothasaspecificphysicalsize andpredefinedintervalfootstep,theexactpositionofeachpixel isnotthat important.Therefore,we shouldmeshtheimages.
Noweverysquareoftheimageshowsinformationonthepres- enceoftheobstacleinrelativeposition.
Becausethecameraislocatedintheheadofthehumanoid robotandittakesaspecifiedanglewiththeground,theimage picturebytherobotisaperspectiveview.Fig.6illustratesthe effectoftheperspectiveviewonacheckerboard.Itrevealsthat themeshesarenotsquare.Whileitisexpectedthattheresultof therobotvisionmustbeatsomedistancefromtheobstacles,but forcomputeddistancefromtherobots’simage,themeshmustbe non-homogeneous.Inotherwords,thenumberofpixelsineach boxatthebottomoftheimagemustbegreaterthanthenumber ofpixels ineach boxatthetopof theimage;inaddition,the numberofpixelsineachboxatthemiddleoftheimagemust begreaterthanthenumberofpixelsineachboxonthesidesof theimage.
4.6. Probabilitycalculation
TheMarkovdecisionprocessworksondiscretestates.There- fore,Markovdecisionprocessesneedtounderstandthepresence oftheobstacleineachstate.Thebestwaytoshowtheexistence ofobstaclesinastateisbydeterminingtheprobabilityofthe existence.Inthisway,therobotdividesthenumberofobstacle pixelsineachsquarebythenumberofallpixelsinthesquare.
Furthermore,Markovdecisionprocessesneedtounderstandthe presenceof the obstacleineach state, whichcouldbe calcu- latedinasimilarway.The followingrelations fully interpret thisconcept:
Pobstacle(i,j)= numberofobstaclepixelsinsquare(i,j) numberofallpixelsinsquare(i,j) (8)
Ptarget(i,j)= numberoftargetpixelsinsquare(i,j)
numberofallpixelsinsquare(i,j) (9)
Pfreespace(i,j)=numberoftargetpixelsinsquare(i,j)
numberofallpixelsinsquare(i,j) (10)
5. Rewardcalculation
Reward calculationis related toobservation of the target.
Iftherobotcanseethetarget, itcanuseonlytheinformation extractedfromtheimage;however,iftherobotcannotseethe target(becauseoflongdistance),inadditiontotheseinforma- tion,itneedsextrainformationonthecoordinationofthetarget anditselftocreateasub-goal.
5.1. Sub-goal
Asub-goalisdefinedasavirtualgoalinvisionspacesuch thatachievingitcouldguidetherobottowardtheoriginaltarget.
Fig.7illustrateshowthesub-goalstatecouldbecalculated.As wecansee,ifthedarkgreencircleisconsideredtobethetarget, thenthestateinlightgreenisdefinedasthesub-goal;inasimilar way,ifthebluecircleisconsideredtobethetarget,thenthestate incyanisknownasthesub-goalstate.
5.2. Rewardwhentherobotcannotseetheobstacle
Inthiscondition,thefreespacehas−w1point,theobstacle has −w2 point, andthe sub-goal has +1point;therefore, the rewardfunctioncouldbecalculatedasfollows:
R(i,j)=Psub-goal(i,j)+w1Pfreespace(i,j)
+w2Pobstacale(i,j) (11)
5.3. Rewardwhentherobotcanseetheobstacle
Inthiscondition,thetargethas+1,thefreespacehas−w1
point,andtheobstaclehas−w2point;thus,therewardfunction couldbecalculatedasfollows:
R(i,j)=Ptarget(i,j)+w1Pfreespace(i,j)+w2Pobstacale(i,j) (12)
6. DesignedFuzzyMarkovDecisionProcess
AFuzzyMarkovDecisionProcess(FMDP)representsthe uncertainknowledgeabouttheenvironmentintheformofoffer- responsepatternssuchastriangularandGaussianmembership functions.
IntheFuzzyMarkovDecisionProcess(FMDP)designed,the robotbeginsfromitsstartstate,itmustchooseasuitableaction ateachtimestep.Inthisproblem,itissupposedthattheactions whichleadtherobottogooutofitsstatedonotwork,inother words,wecouldsaythattheadjacenthasasimilarreward.The solutionthatiscalledpolicy(π(s))isinfrequencefromthevalue function.ThedesignedpolicyisillustratedinFig.8,whichisa fuzzymappingfromstatetoaction.Indeed,anoptimalpolicy isapolicythatmaximizestheexpectedvalueofaccumulated rewardsthroughouttime.
Fig.7.Exampleofdeterminingsub-goalstate.Darkgreen:target1;blue:target 2;red:target3;lightgreen:sub-goal1;cyan:sub-goal2;pink:sub-goal3.(For interpretationofreferencetocolorinthisfigurelegend,thereaderisreferredto thewebversionofthisarticle.)
6.1. Valuefunction
Unfortunately,inthe realworld, humanoidrobot’sactions are unreliable for some recognizedreasons such as noise of inputdata,ineffectivenesscontrolsystems,un-modeledsystem dynamicandenvironmentchangesthroughouttime;therefore, theprobabilityofgoingfromstatestostatesbychoosingaction awithP(s,a,s)willbe:
s
P(s,a,s)=1, 0≤P(s,a,s)≤1 (13)
Asaresult,therobotmustdecidetoact forincreasingthe probabilityofsuccessthroughouttime.Anexampleof proba- bilityintherobotactionwhentherobotwantstogotothenext statedecidedisillustratedinFig.9.
0.2 0.4 0.2
0.1 0.1
Fig.9.Exampleofprobabilityinactionwhenrobotwantstogotonextstate decide.
At first, it is supposed that the optimalpolicy is equal to thetraditionaloptimalpolicy;hence,thevaluefunctionwillbe achievedbysolvingEqs.(6)and(7).Asmentionedbefore,this nonlinearequationcanbesolvedwithavalueiteration(Section 3.1.4).
Inthenextstep,forgettingthegainedoptimalpolicy,thevalue functionwillbedeterminedbasedondecisionmaking.Inother words, weutilizethe valuefunctionasaninputfor thefuzzy inferencesystem.
6.2. Fuzzification
Asweknow,theextremumofthevaluefunctionisrelated totherewardfunction,whiletherewardfunctionisrelatedto theenvironmentanduserdefinition,thereforethevaluefunction cantakeanyinterval.Butasweknow,thefuzzyinputmustbea vectorwherethearrayisarealnumberbetween0and1.Thus, thefuzzificationisimportantforcontinuation.
Image
processing Image
understamding
Taken image State
Environment
Action
Fuzzy inference Fuzzy
action Value
iteration Fuzzified
value function
Fuzzification Value function Rewards Processed
image Vision system
Fuzzy markov decision processes Fig.8.Dataflow.
r≤√2
r≤ √5 r≤ √10
r≤2
Fig.10.TraditionalMarkovdecision.
Thefirstroutetofuzzificationislineartransformation.This wayisnotlogicalbecausethedifferencebetweenthevalueon theleftsideandrightsideofrobotisnotsignificanttoguide the robotinthe true path. In otherwords, linear transformer fuzzificationforthisapproachisoverfuzzified.
Althoughotherprevalentfuzzificationsuchastriangularand Gaussianmethodscouldresult inasignificantincreaseinthe numberof rules applied inthe fuzzy inferencepart, it isnot advisabletousethem.Thefuzzificationproposedinthisstudy isadrastictransformation.Ifthemaximumandminimumofthe valuefunctionisshownbyaandb,respectively,theproposed fuzzificationcanbewrittenas:
μ(Ai)=
xi−b a−b
n
(14)
wherea and b are the maximum andminimum of the value functionamonginputs,respectively,andxiisthevaluefunction oftheithinput.
Table1 Squaredistance.
Table2 Fuzzyinput.
6.3. Fuzzyinference
AsillustratedinFig.2,thetraditionalMarkovdecisionpro- cess considers onlythe adjacentstates tochoose the optimal policy. Accordingto the traditional approach,the robotmust chooseoneofthecandidatestatesandcontinuethere.Although thispolicyisknownasoptimalpolicy,butitistrueonlywhen discrete states are considered. Becausein traditional Markov decisionprocesstherobotcanchooseactionfromfiniteexist- enceactions,itseemslogicaltoconsideronlyneighborstates.
Nevertheless,intheFMDP,therobotisabletochooseinfinite actionsbecauseitisassumedtobecontinuous.Nevertheless,the
Shoulder offsety
Neck offsetz
Shoulder offsetz
Upper arm length
Lower arm length
Hip offsety Hip offsetz
Thigh length
Tibia length
Foot height
Fig.11.DetailedkinematicsofNAO.Wristjointnotrepresented(Gouaillier etal.,2009).
traditionalprocessisyetapplicablebyconsideringonlyadjacent states;inthiscase,itisnotlogicaltoonlylookatnextstates;in otherwords,itiswisefortherobottoconsidernearerstates,as illustratedinFig.10.Thesquaredistanceofeachstatefromthe robot’sstateispresentedinTable1.
Asaresult,therearemanychoicesfortheselectionofthe numberofinputs.Theexplanationforthisproblemwillcontinue withr=√
10thatresultsin13inputs(Table2).
Iftherobotdirectionisconsideredbyangleϕwhichisdefined astheclockwiseanglefromthefrontsideoftherobot,thefuzzy ruleforFig.10(r≤√
10)canbeeasilywrittenas:
IfA1,then ˆϕ isaverysmallpositive.
IfA2,then ˆϕiszero.
IfA3,then ˆϕ isaverysmallnegative.
IfA4,then ˆϕisamediumpositive IfA5,then ˆϕisasmallpositive IfA6,then ˆϕiszero
G
B
R Fig.12.RGBcolorspace.
Fig.13.NAOrobotpathplanningscenario.
IfA7,then ˆϕisasmallnegative IfA8,then ˆϕisamediumnegative IfA9,then ˆϕisabigpositive IfA10,then ˆϕ isamediumpositive IfA11,then ˆϕiszero
IfA12,then ˆϕ isamediumnegative IfA13,then ˆϕ isabignegative
Therearesomewaystodefuzzifythe ˆϕinordertogainϕ.
Amongtheseways,weightedaverageislogical.
7. Experiments
WeappliedourpresentedmethodonaNAOH25V4thatis producedbytheFrenchcompanyAldebaranRobotics(Fig.1).
The NAO has 25degree of freedom. There are five DOF in eachleg;twointhe ankle,two inthehipandoneatitsknee.
Anadditionaldegreeof freedomexistsatthepelvisforyaw;
nevertheless,itissharedbetweenbothlegs,thatistosay,both legsare rotated outwardor inward, together,using thisjoint.
Moreover,NAOhas6DOFsineachhandand2DOFsonitshead (Fig.11).TheNAOhastwocameras,aninertialmeasuringunit, sonarsensorsinitschest,andforce-sensitiveresistorsunderits feet.NAOwasdesignedtoperformsmoothwalkinggaits,even whenchangingspeedanddirection.Thewalkingspeedmustbe similartothewalkingspeedofa2-year-oldchildofthesame size,whichisabout0.6km/h(Gouaillieretal.,2009).
ThesoftwarearchitecturewasdevelopedusingAldebaran’s NaoQias a frameworkandan extendedcode in C++.NaoQi givesaccesstoallthefeaturesoftherobot,likesendingcom- mandstothe actuators,retrievinginformationfromthe robot memory,andmanagingWi-Ficonnections.Inthisway,weuse Kubuntu12.0.4andOpenCV2.3.1writingprograminC++in Qtcreators.Inthestudyexperiment,therobottooka160×120 pixelimage.Resultsrevealedthattherobotwasabletoachieve itsgoalwithoutcollidingwithanyobstacle.Fig.12illustrates howtherobottrulyworks.Thisfigureillustratesthefactthatthe robotcanworkinnoisyenvironments(Fig.5).
7.1. Euclideandistance-basedimagesegmentation
Pathplanningwastestedinlaboratoryconditions;therefore, a simple image segmentation algorithm could work without any problem. The simplest image segmentation algorithm is Euclideandistance.AssumingthatallpixelsareinRGBcolor space(Fig.12),theneachpixelwillhavethreeCartesiancoor- dinates.DistancebetweeneachcolorcanbecalculatedbyEq.
(15).
D(P,T)=
(Tred−Pred)2+(Tgreen−Pgreen)2+(Tblue−Pblue)2 (15)
In thisalgorithm, the distancesbetweenthe color of each pixelandthecolorofthetargethavebeencalculated.Thenthe colorofthe targetpertainingtothe shorterdistance couldbe treatedasthenewcolorofthepixels.
Manyexperimentshavebeencarriedoutinordertoprovethe effectivenessandcorrectnessofourpresentedalgorithm.Fig.13 illustratesapathplanningscenarioasanexample.
8. Conclusions
In this study, a new successful and effective algorithm hasbeen proposedfor thereal-time optimalpathplanningof autonomous humanoid robots in unknown complex environ- ments. Thismethodusesonlyvisiondatatoobtain necessary knowledge on its surrounding. It was developed by mixing Markovdecisionprocessesandfuzzyinferencesystems.This methodimprovesthetraditionalMarkovdecisionprocesses.The rewardfunctionhasbeencalculatedwithoutexactestimationof thedistanceandshapeoftheobstacles.Wealsousevalueiter- ationtosolvetheBellmanequationinreal time.Unlikeother exitingalgorithms,ourmethodcanworkwithnoisydata.The wholelocomotion,vision,pathplanningandmotionplanningis thusfullyautonomous.Thesefeaturesdemonstratethattherobot canworkinarealsituation.Moreover,thismethodrequiresonly onecameraanddoes notneedrange computing.Themethod discussed ensurescollision avoidanceandconvergence tothe optimalgoal.Thismethodhasbeendevelopedandsuccessfully testedonanexperimentalhumanoidsrobot(NAOH25V4).
Conflictofinterest
Theauthorshavenoconflictsofinteresttodeclare.
References
Arias-Castro,E.,&Donoho,D.L.(2009).Doesmedianfilteringtrulypreserve edgesbetterthanlinearfiltering?TheAnnalsofStatistics,1172–1206.
Chestnutt,J.,Kuffner,J.,Nishiwaki,K.,&Kagami,S.(2003).Planningbiped navigationstrategiesincomplexenvironments.InIEEEint.conf.hum.rob.
Choset,H.,Lynch,K.M.,Hutchinson,S.,Kantor,G.,Burgard,W.,Kavraki, L.E.,etal.(2005).Principlesofrobotmotion:Theory,algorithms,and implementations.
Fatmi,A.,Yahmadi,A.A.,Khriji,L.,&Masmoudi,N.(2006).Afuzzylogic basednavigationofamobilerobot.WorldAcademyofScience.Engineering andTechnology,22,169–174.
Gay,S.,Dégallier,S.,Pattacini,U.,Ijspeert,A.,&Victor,J.S.(2010).Integration ofvisionandcentralpatterngeneratorbasedlocomotionforpathplanningof anon-holonomiccrawlinghumanoidrobot.In2010IEEE/RSJinternational conferenceonintelligentrobotsandsystems(IROS)(pp.183–189).IEEE.
Gouaillier,D.,Hugel,V.,Blazevic,P.,Kilner,C.,Monceaux,J.O.,Lafourcade, P.,...,&Maisonnier,B.(2009).MechatronicdesignofNAOhumanoid.In ICRA’09.IEEEinternationalconferenceonroboticsandautomation(pp.
769–774).IEEE.
Iancu,I.,Colhon,M.,&Dupac,M.(2010).ATakagi–Sugenotypecontroller formobilerobotnavigation.InProc.ofWSEASint.conf.oncomputational intelligence(pp.29–34).Bucharest,Romania:WSEASPress.
Kamon,I.,&Rivlin,E.(1997).Sensory-basedmotionplanningwithglobal proofs.IEEETransactionsonRoboticsandAutomation,13(6),814–822.
Kamon,I.,Rimon,E.,&Rivlin,E.(1998).Tangentbug:Arange-sensor-based navigationalgorithm.TheInternationalJournalofRoboticsResearch,17(9), 934–953.
Kaur,A.,&Kaur,A.(2012).Comparisonofmamdani-typeandsugeno-type fuzzyinferencesystemsforairconditioningsystem.InternationalJournal ofSoftComputingandEngineering,2(2),2231323–3252307.
Kim,S.K.,Russell,J.S.,&Koo,K.J.(2003).Constructionrobotpath-planning forearthworkoperations.JournalofComputinginCivilEngineering,17(2), 97–104.
Lumelsky, V. J., & Stepanov, A. A.(1984). Navigation strategies for an autonomous vehicle with incomplete information on the environment.
General Electric Company, Corporate Research Center, Tech. Report 84CRD070.
Lumelsky,V.,&Skewis,T.(1990).Incorporatingrangesensingintherobot navigationfunction.IEEETransactionsonSystems,ManandCybernetics, 20(5),1058–1069.
Medina-Santiago,A.,Anzueto,J.C.,Pérez-Patricio,M.,&Valdez-Alemán,E.
(2013).Programmingreal-timemotioncontrolrobotprototype.Journalof AppliedResearchandTechnology,11(6),927–931.
Medina-Santiago,A.,Camas-Anzueto,J.L.,Vazquez-Feijoo,J.A.,Hernández- deLeón,H.R.,& Mota-Grajales,R. (2014).Neuralcontrolsystemin obstacleavoidanceinmobilerobotsusingultrasonicsensors.Journalof AppliedResearchandTechnology,12(1),104–110.
Michel,P.,Chestnutt,J.,Kagami,S.,Nishiwaki,K.,Kuffner,J.,&Kanade, T. (2006). Onlineenvironment reconstruction for biped navigation. In Proceedings2006IEEEinternationalconferenceonroboticsandautoma- tion,2006.ICRA2006.pp.3089–3094.IEEE.
Michel, P., Chestnutt,J., Kuffner,J., & Kanade, T. (2005). Vision-guided humanoidfootstepplanningfordynamicenvironments.In20055thIEEE- RASinternationalconferenceonhumanoidrobots(pp.13–18).IEEE.
Nakhaei,A.,&Lamiraux,F.(2008).Motionplanningforhumanoidrobots inenvironmentsmodeledbyvision.InHumanoids2008.8thIEEE-RAS internationalconferenceonhumanoidrobots,2008(pp.197–204).IEEE.
Okada, K.,Inaba,M., & Inoue, H. (2003).Walking navigation system of humanoid robot using stereo vision based floor recognition and path planningwithmulti-layeredbodyimage.pp.2155–2160.2003IEEE/RSJ international conference onintelligent robotsandsystems,2003 (IROS 2003).Proceedings(Vol.3)IEEE.
Puterman, M. L. (2014). Markov decision processes: Discrete stochastic dynamicprogramming.JohnWiley&Sons.
Sabe,K.,Fukuchi,M.,Gutmann,J.S.,Ohashi,T.,Kawamoto,K.,&Yoshi- gahara,T.(2004).Obstacleavoidanceandpathplanningforhumanoid robotsusingstereovision.pp.592–597.ICRA’04.2004IEEEinternational conferenceonroboticsandautomation,2004.Proceedings(Vol.1)IEEE.
Shapiro,L.G.,&Stockman,G.C.(2001).Computervision.NewJersey:Prentice Hall.
Stieler,F.,Yan,H.,Lohr,F.,Wenz,F.,&Yin,F.F.(2009).Developmentof aneuro-fuzzytechniqueforautomatedparameteroptimizationofinverse treatmentplanning.RadiationOncology,4(1),39.
Zadeh,L.A.(1965).Fuzzysets.InformationandControl,8(3),338–353.
Zavlangas, P.G.,Tzafestas,S.G.,& Althoefer,K.(2000).Fuzzyobstacle avoidanceandnavigationforomnidirectionalmobilerobots.pp.375–382.
Aachen,Germany:EuropeanSymposiumonIntelligentTechniques.