robot Humanoid path planning with fuzzy Markov decision processes Technology Journal of Applied Researchand

(1)

Availableonlineatwww.sciencedirect.com

Journal of Applied Research and Technology

www.jart.ccadet.unam.mx JournalofAppliedResearchandTechnology14(2016)300–310

Original

Humanoid robot path planning with fuzzy Markov decision processes

Mahdi Fakoor

^∗

, Amirreza Kosari, Mohsen Jafarzadeh

FacultyofNewSciences&Technologies,UniversityofTehran,Tehran,Iran Received31October2015;accepted4June2016

Availableonline21August2016

Abstract

In contrasttothe caseofknownenvironments,pathplanninginunknownenvironments,mostlyforhumanoidrobots,isyetto beopened forfurtherdevelopment.Thisismainlyattributedtothefactthatobtainingthoroughsensoryinformationaboutanunknownenvironmentisnot functionallyoreconomicallyapplicable.Thisstudyalleviatesthelatterproblembyresortingtoanovelapproachthroughwhichthedecisionis madeaccordingtofuzzyMarkovdecisionprocesses(FMDP),withregardtothepace.Theexperimentalresultsshowtheefficiencyoftheproposed method.

Keywords: FuzzyMarkovdecisionprocesses;Visionbasedpathplanning;Pathplanninginunknownenvironments;Humanoidrobots

1. Introduction

Theword“robot”originatesfromrobotainCzechlanguage, robotameans work.So,weexpectrobots toworklikeexpert labors.Intheotherwords,wewantrobotstoworkinindustry insteadofhumanlaborforces.Sincehumanshaveappropriately builttheirenvironmentswiththeirergonomic,robotsthathave samephysicalbodyashumanbodiesaremoreusefulthanother types.

In ordertodesignhumanoidrobots whicharesuitable for working efficiently in industrial applications, several control andnavigationproblemssuchasstability,mapping,interacting, computing,grasping and pathplanning must be investigated, mostespeciallypathplanningwhichisanopenprobleminthis researchfield.

Thereisneed forusersandindustries tohavearobotthat canworkintherealworld.Sincetherealworldisdynamic;it isimpossibletosaveallenvironmentsinrobotmemory.This means that we must prepare the robots towork inunknown environment.

∗Correspondingauthorat:FacultyofNewSciences&Technologies,Univer- sityofTehran,NorthKargarStreet,1439955941Tehran,Iran.

E-mailaddress:[email protected](M.Fakoor).

PeerReviewundertheresponsibilityofUniversidadNacionalAutónomade México.

Path planning has been studied in several research fields.

Althoughprogrammingofamobilerobottomovefromanini- tial positiontoatarget positioninaknown environmentisa well-knownprobleminrobotics,buttherearefewmethodsin pathplanninginanunknownenvironment.

Medina-Santiago, Camas-Anzueto, Vazquez-Feijoo, Hernández-de León, and Mota-Grajales (2014) implemented neural controlsystemsinmobilerobotsinobstacleavoidance usingultrasonicsensorswithcomplexstrategies.

Fuzzylogicisoneofthesemethodsthathavebeenusedin somerelatedprojects.Afuzzy-basednavigator hasbeenpro- posed by Zavlangas, Tzafestas, andAlthoefer (2000)for the obstacleavoidanceandnavigationprobleminomni-directional mobile robots. Their proposed navigator consideredonly the nearestobstacleinordertodecideupontherobot’snextmoving step.Thismethodutilizedthreeparametersforpathplanningas follows:

• thedistancebetweentherobotandthenearestobstacle,

• theanglebetweentherobotandthenearestobstacle,and

• the angle betweenthe robotdirection andthe straight line connectingthecurrentpositiontothetargetpoint.

Althoughthismethodwasinrealtimeandtrulyworks,these threeparametersarenotneededforacamerahumanoidrobot.

http://dx.doi.org/10.1016/j.jart.2016.06.006

1665-6423/©2016UniversidadNacionalAutónomadeMéxico,CentrodeCienciasAplicadasyDesarrolloTecnológico.Thisisanopenaccessarticleunderthe CCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).

(2)

Inotherwords,thismethodcanbeutilizedforanyrobotthat hasomni-directionalrangesensors.

Fatmi,Yahmadi, Khriji,andMasmoudi(2006) proposeda usefulwayofimplementingthenavigationtaskinordertodeal withtheproblemofwheeledmobilerobotnavigation.Intheir work,issuesofindividualbehaviordesignandactioncoordina- tionofthebehaviorswereaddressedutilizingfuzzylogic.The coordinationtechniqueusedinthisworkcomprisestwolayers, i.e.,layerofprimitivebasicbehaviorsandthesupervisionlayer.

Theyused14rangesensorstoachievethepositionofanyobsta- clesurroundingthemobilerobot.Thus,thismethodcannotbe employedwithmosthumanoidrobots.

Medina-Santiago, Anzueto, Pérez-Patricio, and Valdez- Alemán (2013) presented a real-time programming for a prototyperobottocontrol itsmovementfromonemomentto anotheronewithoutshowingresponsedelays.

Iancu,Colhon,andDupac(2010)presentedafuzzyreasoning methodofaTakagi–Sugenotypecontrollerwhichwasapplied intwowheelsautonomousrobotnavigation.Thismobilerobot isequipped with asensorial system. The robot’ssensor area isdividedintosevenradialsectorslabeled:largeleft,medium leftandsmall leftfor the leftareas, EZ for thestraight area, andlargeright,mediumrightandsmallrightfortherightarea, respectively.Eachradialsectorwasfurtherdividedinto other threeregions:small,mediumandlarge.Thesensor’srangecould recognizeupto30m,andtherobotcouldidentifyanobstacle anywhereinsidethe interval[−90^◦, 90^◦].Undoubtedly,most humanoidrobots are notequipped withthis largeamount of sensors,hence,applyingthismethodonhumanoidrobotsseems tobeimpossible.

Thesimplestpathplanningalgorithmsforanunknownenvi- ronment are called Bug algorithms (Lumelsky & Stepanov, 1984).Bugalgorithms solve the navigationproblem bystor- ingonlyaminimalnumberofwaypoints,withoutgenerating afullmapoftheenvironment.Traditionalbugalgorithmswork withonlytactilesensors.Newbugalgorithms,suchasDistbug (Kamon&Rivlin,1997),Visbug(Lumelsky&Skewis,1990), Tangentbug(Kamon, Rimon, andRivlin, 1998) andSensbug (Kim,Russell,andKoo,2003),workwithonlyrangesensors.

Bugalgorithmsneedtocontinuouslyupdatetheirpositiondata, whileasweknow,itisimpossibletoachieveacontinuousupdate inpractice.Moreover,thebugmodelmakessomesimplifying assumptionsabouttherobot,i.e.,therobotisapointobject,ithas perfectlocalizationabilityandithasperfectsensors.Thesethree assumptionsareunrealisticforrobots,andsobugalgorithmsare notusuallyappliedforpracticalnavigationtasksdirectly.

Michel,Chestnutt,Kuffner,andKanade(2005)proposeda path planning algorithm for humanoid robots. They used an externalcamerathatshowedthetopviewoftherobotworking region,hence,theycouldextractinformationabouttheposition oftherobotsandtheobstacles.Theirmethodisnotapplicable formostsituationsbecauseitisimpossibletouseacamerawith aglobalviewoftherobotworksites.

Furthermore,NakhaeiandLamiraux(2008)utilizedacom- binationofonline3Dmappingandpathplanning.Theyutilized a3D occupancygrid that is updated incrementally by stereo visionfor constructingthemodel of theenvironment.Aroad

map-basedmethodwasemployedformotionplanningbecause thedimensionoftheconfigurationspacewashighforhumanoid robots. Indeed,it wasnecessarytoupdatethe roadmapafter receivingnewvisualinformationbecausetheenvironmentwas dynamic. Thisalgorithm was tested on HRP2. As a conclu- sion,theirmethodwasnotefficientbecauseitneedsexactstereo visionandalotoftimetofindapathineachstep.

Inaddition,Sabeetal.(2004)presentedamethodforpath planning and obstacle avoidance for QRIO humanoid robot, allowing it to walk autonomously around the home environment.They utilizedanA* algorithmintheirmethod; thus,it needsalotoftimetoprocess.Furthermore,theyutilizedonline mappingandstereovision. Theirmethodseemseffective,but itneedsstereovisionandhighcomputationalprocesses.Asa result,itcannotbeappliedinmostconditions.

OtherpathplanningprojectonHRP-2humanoidrobothas been carried out by Michel et al. (2006). This method uses severalcamerasintherobotenvironmentinordertoproduce asuitablemap. Asmentionedbefore,we cannotutilize many cameraswhereverwewanttousehumanoidrobotsandsothis methodcannotbeappliedeither.

Moreover,inanotherproject,Chestnutt,Kuffner,Nishiwaki, andKagami(2003)usedtheBest-firstsearchandA*algorithms forfootsteppathplanningonaH7humanoidrobot.theydemon- stratedthatA*ismoreeffectivethanBest-firstsearch.Butboth ofthemneedstereovisionandhighcomputationalprocesses.

Inaddition,Okada,Inaba,andInoue(2003)followedadiffer- entrouteforhumanoidrobotpathplanning:robotandobstacle wereconsideredcylindricalshapes,thefloorwasextractedbased onvisionandthentherobotmadeadecision.Thismethodmay encounteraconflictingproblemwhentherobotconfrontsabig obstacleatitsstartpoint.Inthissituation,therobotcouldnotbe abletodetectthefloorwhichinturnleadstomissingthepath.

In addition, Gay, Dégallier, Pattacini, Ijspeert, andVictor (2010)usedartificialpotentialfieldalgorithmsinarecentpath planningprojectoniCub(ahumanoidrobot).Atfirst,intheir proposedalgorithm,iCubcalculated3Dpositionofeachobsta- cleandtransformeditin2D,andthentheartificialpotentialfield wascalculated. Theirmethodneedsperfect knowledgeof the extractedimagestofindthepositionoftheobstacles;therefore, itmaynotbeutilizedinsomehumanoidrobotsapplications.

Asseenfromtheabovestudy,anefficientandsuitablemethod for pathplanning for humanoid robots in dynamic unknown environmentshasnotyetbeenproposed.Consideringtheiden- tifiedresearchgapinthispaper, anewprocedure, combining theeffectsoffuzzyinferencesystemandMarkovdecisionpro- cesses,isproposed.

TheapplicationofMarkovdecisionprocessesresultsinfaster executionoftheprocedurewhencomparedtotheonesproposed instudiessuchasthosebySabeetal.(2004).Moreover,inthe proposed method, using a fuzzy inference system leadsto a smootheroptimalpaththantheotherpreviouspastmethods.

Moreover, resorting to fuzzy Markov decision processes (FMDP)obviatesthenecessityofhavingknowledgeofthepre- ciseshape,positionandorientationofthesurroundingobstacles, as wellas theneedforrelativelyenormousvolumes ofmem- oryforstockinginformationgatheredin2Dand3Dmaps.The

(3)

Fig.1.Aldebaranhumanoidrobotics–NAOH25V4.

extendedexperimentalresultsdemonstratetheefficiencyofthe proposedmethod.

2. Functionalblockdiagram

AnAldebaranhumanoidrobot–NAOH25V4(Fig.1)–was selectedfordescriptionandverificationofourpresentedmethod.

Thishumanoidrobothasacamera,whichistheonlysourceof environmentalsensoryinformationusedfortheanalysisinthis approach.Aftertaking animage,itpassesthroughalow-pass filterinordertoobviatetheeffectof theconcomitantnoises.

Thereafter,theimageissegmentedandsubsequently,inorderto clearuptheeffectofthenoisecausedbysegmentation,theimage passesthroughamodefilter.Inaddition,thedilationprocessis appliedsothatthefinalimagecanbeproduced.

Moreover,aftercomputingtherewardsassociatedwitheach partof the image,the Markovdecisionprocessesserve asan inputfor the fuzzy inferencesystem, so as to feedthe robot withtheinformationneededfordecidingonthedirectionandits movement.ThefunctionalblockdiagramofFMDPisillustrated inFig.2.

The relatedflowchart of the visionsystemis presented in Fig.3.

Camera took an image

Low-pass filter

Segmentation

Median filter

Computed probability of presence of obstacle, free space

and target in each square Mesh Dilation

Fig.3.Flowchartofvisionsystem.

3. Theoreticalbackground 3.1. Markovdecisionprocesses

Markov decision processes (MDP) can be considered as anextensionof Markovchainswithsomedifferencessuch as allowingchoiceandgivingmotivation.MDP, isamathemati- caldecisionmakingtoolwhichmaybeapplicableinsituations whereseriesarepartlyrandomandpartlyunderthecontrolofa decisionmaker.

Specifically, aMarkovdecisionprocessmaybe definedas a discrete timestochasticcontrol process. Ateach timestep, the process isinstate s,andthe decision makermay choose anyadmissibleactionawhichisachievableinthestates.The process proceedsatthe nexttimestepbyrandomly selecting andmovingintoanewstates,andgivingthedecisionmakera correspondingrewardR(s,a,s).Theprobabilitythattheprocess departs towarditsnewstates maybeaffectedbythe chosen action. Specifically,itisgivenbythestate transitionfunction P(s,a,s).Thus,thenextstatesmaydependonthecurrentstate sandthedecisionmaker’sactiona.Butgivensanda,itiscon- ditionallyindependentofallpreviouschosenstatesandactions;

The approximate coordinates of the target

Vision system

Localization system

Calculator

Markov Fuzzy

inference system

Motion Systems Optimal

policy extractor

VF^∗ a^∗

Decision Processes

Value function calculator

Environment Reward

R

Fuzzy markov decision processes

Fig.2.FunctionalblockdiagramofFMDP.

(4)

inotherwords,thestatetransitions ofanMDPcouldpossess theMarkovproperty(Puterman,2014).

3.1.1. Policy

Findinga“policy”forthedecisionmakeristhemainprob- leminMDPs.“Policy”canbeinterpretedasafunctionπthat specifiestheactionπ(s)whichthedecisionmakerwillchoose whenisinstates.

The goal is to select a policy π which could maximize somecumulativefunctionoftherandomrewards,typicallythe expecteddiscountedsumoverapotentiallyinfinitehorizon.

3.1.2. Valuefunction

ValuefunctionV^π(s)istheexpectedvalueoftotalrewardif thesystemstartsfromstate sandacts accordingtopolicyπ.

Soeach policyhasitsvaluefunction.Itcanbeformulatedas follows:

V^π(s)=E

R(s0)+γR(s1)+γ²R(s2)+··· |π

(1)

V^π(s)=E

_N

t=0

γ^tR(st)|π

(2)

Eq.(1)couldberewrittenasfollows:

V^π(s)=E

R(s0)+γ(R(s1)+γR(s2)+···) |π

(3) V^π(s)=E

R(s0)+γV(s1)|π

(4) TheBellmanequationscanbegainedbythesimplification ofEq.(4)asfollows:

V^π(s)=R(s0)+γ

s

P(s,a,s)V^π(s) (5)

3.1.3. Optimalpolicyandoptimalvaluefunction Intheoptimalcase,wewillhave:

V^∗(s)=R(s)+max

a γ

s

P(s,a,s)V^∗(s) (6)

π^∗(s)=argmax

a

γ

s

P(s,a,s)V^∗(s) (7)

Iftherearenstates,thentherearenBellmanequations,one foreachstate.Inaddition,therearenunknownvalues.There- fore, by simultaneously solving these equations, the optimal policyandoptimalvaluefunctioncanbeachieved.

3.1.4. Valueiterationalgorithm

As we can see, the Bellman equations (6) are nonlinear, hence they are difficult to solve. In this case, we utilize an algorithmtoextractthe optimalpolicyoroptimalvaluefunc- tionwithoutdirectlysolvingtheBellmanequationsasshownin Algorithm1.

3.2. Fuzzylogic

Zadeh(1965)explainedfuzzysettheoryandfuzzylogic.In contrasttotraditionallogicthatemploysfixedandexactvalues, fuzzylogicusesapproximatevalues.Inotherwords,traditional logichastwo-valuedlogic,falseortrue(0or1)butfuzzylogic hasinfinite-valued,intervalbetweencompletelyfalseandcom- pletelytrue(interval[0,1]).Itissimilartolinguisticvariablesthat usemulti-valued.Therefore,wecanutilizelinguisticinference in fuzzylogic. Fuzzyrules usefuzzy sets and everylinguis- tictermhasamembershipfunctionthatdefines thedegreeof membershipof aspecificvariableforthe fuzzyset.Member- shipfunctionsareusuallyshownwithμ(x).Afuzzyinference systemisasystemthathasafuzzyinferenceunitasillustrated inFig.4.

In thereal world, aspecifiedsystemgains itsneeded data viaitssensors.Asweknow,thegathereddatathrowsensorsare crispandthereforeforfuzzyinferencesystems,afuzzifierunit (preprocessingunit)shouldbeimprovisedtochangeacquired dataintofuzzydata.Also,actuatorsneedcrispdata;becausethe inferenceunitgivesfuzzy data,fuzzyinferencesystemshave adefuzzifier unit (postprocessing unit).There are fewtypes of fuzzyinferencesystems,MamdaniandTakagi–Sugenoare morewidelyknown thanothertypes(Kaur andKaur,2012).

Mamdanirulesareinterpretedasfollows:

Ri: If x1=Ai1 and xm=Aim Theny=Bi

AndforTakagi–Sugenoruleswehave:

Ri: If x1=Ai1 and ... and xm=Aim

Then y=f_i(x1,x₂,...,x_m)

These rules could be defined by experts. But there is no standard method for explaining database in fuzzy Inference systems. Also, there is no standard method for defining the membershipfunctiontominimizeerrorormaximizetheoutput accuracy.

Output Defuzzifier

Inference Fuzzifier

Rule

Input

Fig.4.Fuzzyinferencesystem(Stieler,Yan,Lohr,Wenz,&Yin,2009).

(5)

4. Visionsystem

4.1. Preprocessing(low-passﬁlter)

Mosthumanoidrobotshaveatleastonecameratoseetheir environment.Robotsneedsomeprocesstopercepttheirenvi- ronmentthroughrawdatagainedbytheirsensors.Camerasgive a3Dmatrixandeacharrayofthecamerahasavaluebetween0 and255andasweknow,mostofthesedataarenotnecessary.

Also,thesedataaremixedwithnoise.Imagenoiseisarandom variationof brightness or colorinformationinimages and is usuallyanaspectofelectronicnoise.Therefore,thesedatamust firstpassthroughalow-passfilter(s).

Therearemanylow-passfiltersfor imagenoisecanceling.

MostroboticsprojectsemploytheGaussianfilterthatproduces asmoothblurimage.Themedianfilterisanotherfilterwidely usedfor imagenoisecanceling; however,allsmoothing tech- niquesareeffectiveatremovingnoise,buttheyadverselyaffect edges.In otherwords, atthe sametime, as noiseis reduced, smoothedgeswillbecreated.Forsmalltomoderatelevelsof Gaussiannoise,themedianfilterissignificantlyabetterchoice thanGaussianbluratremovingnoisewhilstretainingtheedges foragivenfixedwindowsize(Arias-Castro&Donoho,2009).

Therefore,inthismethod,therobotutilizedamedianfilterfor noisecancelingasdescribedinAlgorithm2.

4.2. Imagesegmentation

Inthepathplanningprocess,robotsneedtounderstandthe locationof obstaclesandfreespaces,buttheydonotneedto understandthecolorofeachpixel.Inthisway,therobotmust segmentimages.Themainpurposeofthesegmentationprocess maybetosimplifyandchangetherepresentationofanimage into something that is moresignificant andeasier toanalyze (Shapiro&Stockman,2001,Chap.12).Inthiscase,obstacles wererevealedbyredpixels,freespacewasrevealedbygreen pixels,andundefinedpixelsrevealothercolorsthatarecloser toit.

There are various image segmentation algorithms such as thresholding, clustering, histogram-based, edge detection, region-growing andgraph partitioningmethod. Theselection of an image segmentation algorithm is related to where the robotworks.Forexample,inourcasestudy,weexaminedthe

proposed methodologyinour Robotic Lab,thusthe simplest imagesegmentationalgorithmworksverywell.

4.3. Improvementsegmentation(modeﬁlter)

Segmentationmayproducesomenoises;therefore,afterseg- mentation,the producedimages must passthrough the mode filterwiththeAlgorithm3defined.

Theseprocessesmadeasimpledataofanyimage(Fig.5).

4.4. Imagedilations

Dilationmeansexpandingobstaclestoobtainconfiguration space(Chosetetal.,2005).Inotherwords,dilationincreasesthe effectsofanobstaclebydevelopingthevolumeoftheobstacle.

Fig.5.Originalimagesincomparisonwiththesegmentedandfilteredimage andthefinalones.

(6)

Fig.6.Exampleofeffectofcameraangle;topandperspectiveview.

4.5. Mesh

Inthesituationsinwhichtheangleoftherobotheadiscon- stantandthegroundisflat,eachpixelshowsthesamedistance.

Weused thispropertytoapproximatethe distancesfromele- mentsofobstacles.Sincetherobothasaspecificphysicalsize andpredefinedintervalfootstep,theexactpositionofeachpixel isnotthat important.Therefore,we shouldmeshtheimages.

Noweverysquareoftheimageshowsinformationonthepres- enceoftheobstacleinrelativeposition.

Becausethecameraislocatedintheheadofthehumanoid robotandittakesaspecifiedanglewiththeground,theimage picturebytherobotisaperspectiveview.Fig.6illustratesthe effectoftheperspectiveviewonacheckerboard.Itrevealsthat themeshesarenotsquare.Whileitisexpectedthattheresultof therobotvisionmustbeatsomedistancefromtheobstacles,but forcomputeddistancefromtherobots’simage,themeshmustbe non-homogeneous.Inotherwords,thenumberofpixelsineach boxatthebottomoftheimagemustbegreaterthanthenumber ofpixels ineach boxatthetopof theimage;inaddition,the numberofpixelsineachboxatthemiddleoftheimagemust begreaterthanthenumberofpixelsineachboxonthesidesof theimage.

4.6. Probabilitycalculation

TheMarkovdecisionprocessworksondiscretestates.There- fore,Markovdecisionprocessesneedtounderstandthepresence oftheobstacleineachstate.Thebestwaytoshowtheexistence ofobstaclesinastateisbydeterminingtheprobabilityofthe existence.Inthisway,therobotdividesthenumberofobstacle pixelsineachsquarebythenumberofallpixelsinthesquare.

Furthermore,Markovdecisionprocessesneedtounderstandthe presenceof the obstacleineach state, whichcouldbe calcu- latedinasimilarway.The followingrelations fully interpret thisconcept:

Pobstacle(i,j)= numberofobstaclepixelsinsquare(i,j) numberofallpixelsinsquare(i,j) (8)

Ptarget(i,j)= numberoftargetpixelsinsquare(i,j)

numberofallpixelsinsquare(i,j) (9)

Pfreespace(i,j)=numberoftargetpixelsinsquare(i,j)

numberofallpixelsinsquare(i,j) (10)

5. Rewardcalculation

Reward calculationis related toobservation of the target.

Iftherobotcanseethetarget, itcanuseonlytheinformation extractedfromtheimage;however,iftherobotcannotseethe target(becauseoflongdistance),inadditiontotheseinforma- tion,itneedsextrainformationonthecoordinationofthetarget anditselftocreateasub-goal.

5.1. Sub-goal

Asub-goalisdefinedasavirtualgoalinvisionspacesuch thatachievingitcouldguidetherobottowardtheoriginaltarget.

Fig.7illustrateshowthesub-goalstatecouldbecalculated.As wecansee,ifthedarkgreencircleisconsideredtobethetarget, thenthestateinlightgreenisdefinedasthesub-goal;inasimilar way,ifthebluecircleisconsideredtobethetarget,thenthestate incyanisknownasthesub-goalstate.

5.2. Rewardwhentherobotcannotseetheobstacle

Inthiscondition,thefreespacehas−w1point,theobstacle has −w2 point, andthe sub-goal has +1point;therefore, the rewardfunctioncouldbecalculatedasfollows:

R(i,j)=P_sub-goal(i,j)+w₁P_free_space(i,j)

+w2P_obstacale(i,j) (11)

5.3. Rewardwhentherobotcanseetheobstacle

Inthiscondition,thetargethas+1,thefreespacehas−w1

point,andtheobstaclehas−w2point;thus,therewardfunction couldbecalculatedasfollows:

R(i,j)=P_target(i,j)+w₁P_free_space(i,j)+w₂P_obstacale(i,j) (12)

6. DesignedFuzzyMarkovDecisionProcess

AFuzzyMarkovDecisionProcess(FMDP)representsthe uncertainknowledgeabouttheenvironmentintheformofoffer- responsepatternssuchastriangularandGaussianmembership functions.

IntheFuzzyMarkovDecisionProcess(FMDP)designed,the robotbeginsfromitsstartstate,itmustchooseasuitableaction ateachtimestep.Inthisproblem,itissupposedthattheactions whichleadtherobottogooutofitsstatedonotwork,inother words,wecouldsaythattheadjacenthasasimilarreward.The solutionthatiscalledpolicy(π(s))isinfrequencefromthevalue function.ThedesignedpolicyisillustratedinFig.8,whichisa fuzzymappingfromstatetoaction.Indeed,anoptimalpolicy isapolicythatmaximizestheexpectedvalueofaccumulated rewardsthroughouttime.

(7)

Fig.7.Exampleofdeterminingsub-goalstate.Darkgreen:target1;blue:target 2;red:target3;lightgreen:sub-goal1;cyan:sub-goal2;pink:sub-goal3.(For interpretationofreferencetocolorinthisfigurelegend,thereaderisreferredto thewebversionofthisarticle.)

6.1. Valuefunction

Unfortunately,inthe realworld, humanoidrobot’sactions are unreliable for some recognizedreasons such as noise of inputdata,ineffectivenesscontrolsystems,un-modeledsystem dynamicandenvironmentchangesthroughouttime;therefore, theprobabilityofgoingfromstatestostatesbychoosingaction awithP(s,a,s)willbe:

s

P(s,a,s)=1, 0≤P(s,a,s)≤1 (13)

Asaresult,therobotmustdecidetoact forincreasingthe probabilityofsuccessthroughouttime.Anexampleof proba- bilityintherobotactionwhentherobotwantstogotothenext statedecidedisillustratedinFig.9.

0.2 0.4 0.2

0.1 0.1

Fig.9.Exampleofprobabilityinactionwhenrobotwantstogotonextstate decide.

At first, it is supposed that the optimalpolicy is equal to thetraditionaloptimalpolicy;hence,thevaluefunctionwillbe achievedbysolvingEqs.(6)and(7).Asmentionedbefore,this nonlinearequationcanbesolvedwithavalueiteration(Section 3.1.4).

Inthenextstep,forgettingthegainedoptimalpolicy,thevalue functionwillbedeterminedbasedondecisionmaking.Inother words, weutilizethe valuefunctionasaninputfor thefuzzy inferencesystem.

6.2. Fuzziﬁcation

Asweknow,theextremumofthevaluefunctionisrelated totherewardfunction,whiletherewardfunctionisrelatedto theenvironmentanduserdefinition,thereforethevaluefunction cantakeanyinterval.Butasweknow,thefuzzyinputmustbea vectorwherethearrayisarealnumberbetween0and1.Thus, thefuzzificationisimportantforcontinuation.

Image

processing Image

understamding

Taken image State

Environment

Action

Fuzzy inference Fuzzy

action Value

iteration Fuzzified

value function

Fuzzification Value function Rewards Processed

image Vision system

Fuzzy markov decision processes Fig.8.Dataflow.

(8)

r≤√²

r≤ √⁵ ^r^≤√¹⁰

r≤2

Fig.10.TraditionalMarkovdecision.

Thefirstroutetofuzzificationislineartransformation.This wayisnotlogicalbecausethedifferencebetweenthevalueon theleftsideandrightsideofrobotisnotsignificanttoguide the robotinthe true path. In otherwords, linear transformer fuzzificationforthisapproachisoverfuzzified.

Althoughotherprevalentfuzzificationsuchastriangularand Gaussianmethodscouldresult inasignificantincreaseinthe numberof rules applied inthe fuzzy inferencepart, it isnot advisabletousethem.Thefuzzificationproposedinthisstudy isadrastictransformation.Ifthemaximumandminimumofthe valuefunctionisshownbyaandb,respectively,theproposed fuzzificationcanbewrittenas:

μ(Ai)=

x_i−b a−b

n

(14)

wherea and b are the maximum andminimum of the value functionamonginputs,respectively,andxiisthevaluefunction oftheithinput.

Table1 Squaredistance.

Table2 Fuzzyinput.

6.3. Fuzzyinference

AsillustratedinFig.2,thetraditionalMarkovdecisionpro- cess considers onlythe adjacentstates tochoose the optimal policy. Accordingto the traditional approach,the robotmust chooseoneofthecandidatestatesandcontinuethere.Although thispolicyisknownasoptimalpolicy,butitistrueonlywhen discrete states are considered. Becausein traditional Markov decisionprocesstherobotcanchooseactionfromfiniteexist- enceactions,itseemslogicaltoconsideronlyneighborstates.

Nevertheless,intheFMDP,therobotisabletochooseinfinite actionsbecauseitisassumedtobecontinuous.Nevertheless,the

Shoulder offsety

Neck offsetz

Shoulder offsetz

Upper arm length

Lower arm length

Hip offsety Hip offsetz

Thigh length

Tibia length

Foot height

Fig.11.DetailedkinematicsofNAO.Wristjointnotrepresented(Gouaillier etal.,2009).

(9)

traditionalprocessisyetapplicablebyconsideringonlyadjacent states;inthiscase,itisnotlogicaltoonlylookatnextstates;in otherwords,itiswisefortherobottoconsidernearerstates,as illustratedinFig.10.Thesquaredistanceofeachstatefromthe robot’sstateispresentedinTable1.

Asaresult,therearemanychoicesfortheselectionofthe numberofinputs.Theexplanationforthisproblemwillcontinue withr=√

10thatresultsin13inputs(Table2).

Iftherobotdirectionisconsideredbyangleϕwhichisdefined astheclockwiseanglefromthefrontsideoftherobot,thefuzzy ruleforFig.10(r≤√

10)canbeeasilywrittenas:

IfA1,then ˆϕ isaverysmallpositive.

IfA2,then ˆϕiszero.

IfA3,then ˆϕ isaverysmallnegative.

IfA4,then ˆϕisamediumpositive IfA5,then ˆϕisasmallpositive IfA6,then ˆϕiszero

G

B

R Fig.12.RGBcolorspace.

Fig.13.NAOrobotpathplanningscenario.

(10)

IfA7,then ˆϕisasmallnegative IfA8,then ˆϕisamediumnegative IfA9,then ˆϕisabigpositive IfA10,then ˆϕ isamediumpositive IfA11,then ˆϕiszero

IfA12,then ˆϕ isamediumnegative IfA13,then ˆϕ isabignegative

Therearesomewaystodefuzzifythe ˆϕinordertogainϕ.

Amongtheseways,weightedaverageislogical.

7. Experiments

WeappliedourpresentedmethodonaNAOH25V4thatis producedbytheFrenchcompanyAldebaranRobotics(Fig.1).

The NAO has 25degree of freedom. There are five DOF in eachleg;twointhe ankle,two inthehipandoneatitsknee.

Anadditionaldegreeof freedomexistsatthepelvisforyaw;

nevertheless,itissharedbetweenbothlegs,thatistosay,both legsare rotated outwardor inward, together,using thisjoint.

Moreover,NAOhas6DOFsineachhandand2DOFsonitshead (Fig.11).TheNAOhastwocameras,aninertialmeasuringunit, sonarsensorsinitschest,andforce-sensitiveresistorsunderits feet.NAOwasdesignedtoperformsmoothwalkinggaits,even whenchangingspeedanddirection.Thewalkingspeedmustbe similartothewalkingspeedofa2-year-oldchildofthesame size,whichisabout0.6km/h(Gouaillieretal.,2009).

ThesoftwarearchitecturewasdevelopedusingAldebaran’s NaoQias a frameworkandan extendedcode in C⁺⁺.NaoQi givesaccesstoallthefeaturesoftherobot,likesendingcom- mandstothe actuators,retrievinginformationfromthe robot memory,andmanagingWi-Ficonnections.Inthisway,weuse Kubuntu12.0.4andOpenCV2.3.1writingprograminC⁺⁺in Qtcreators.Inthestudyexperiment,therobottooka160×120 pixelimage.Resultsrevealedthattherobotwasabletoachieve itsgoalwithoutcollidingwithanyobstacle.Fig.12illustrates howtherobottrulyworks.Thisfigureillustratesthefactthatthe robotcanworkinnoisyenvironments(Fig.5).

7.1. Euclideandistance-basedimagesegmentation

Pathplanningwastestedinlaboratoryconditions;therefore, a simple image segmentation algorithm could work without any problem. The simplest image segmentation algorithm is Euclideandistance.AssumingthatallpixelsareinRGBcolor space(Fig.12),theneachpixelwillhavethreeCartesiancoor- dinates.DistancebetweeneachcolorcanbecalculatedbyEq.

(15).

D(P,T)=

(Tred−Pred)²+(Tgreen−Pgreen)²+(Tblue−Pblue)² (15)

In thisalgorithm, the distancesbetweenthe color of each pixelandthecolorofthetargethavebeencalculated.Thenthe colorofthe targetpertainingtothe shorterdistance couldbe treatedasthenewcolorofthepixels.

Manyexperimentshavebeencarriedoutinordertoprovethe effectivenessandcorrectnessofourpresentedalgorithm.Fig.13 illustratesapathplanningscenarioasanexample.

8. Conclusions

In this study, a new successful and effective algorithm hasbeen proposedfor thereal-time optimalpathplanningof autonomous humanoid robots in unknown complex environ- ments. Thismethodusesonlyvisiondatatoobtain necessary knowledge on its surrounding. It was developed by mixing Markovdecisionprocessesandfuzzyinferencesystems.This methodimprovesthetraditionalMarkovdecisionprocesses.The rewardfunctionhasbeencalculatedwithoutexactestimationof thedistanceandshapeoftheobstacles.Wealsousevalueiter- ationtosolvetheBellmanequationinreal time.Unlikeother exitingalgorithms,ourmethodcanworkwithnoisydata.The wholelocomotion,vision,pathplanningandmotionplanningis thusfullyautonomous.Thesefeaturesdemonstratethattherobot canworkinarealsituation.Moreover,thismethodrequiresonly onecameraanddoes notneedrange computing.Themethod discussed ensurescollision avoidanceandconvergence tothe optimalgoal.Thismethodhasbeendevelopedandsuccessfully testedonanexperimentalhumanoidsrobot(NAOH25V4).

Conﬂictofinterest

Theauthorshavenoconflictsofinteresttodeclare.

References

Arias-Castro,E.,&Donoho,D.L.(2009).Doesmedianfilteringtrulypreserve edgesbetterthanlinearfiltering?TheAnnalsofStatistics,1172–1206.

Chestnutt,J.,Kuffner,J.,Nishiwaki,K.,&Kagami,S.(2003).Planningbiped navigationstrategiesincomplexenvironments.InIEEEint.conf.hum.rob.

Choset,H.,Lynch,K.M.,Hutchinson,S.,Kantor,G.,Burgard,W.,Kavraki, L.E.,etal.(2005).Principlesofrobotmotion:Theory,algorithms,and implementations.

Fatmi,A.,Yahmadi,A.A.,Khriji,L.,&Masmoudi,N.(2006).Afuzzylogic basednavigationofamobilerobot.WorldAcademyofScience.Engineering andTechnology,22,169–174.

Gay,S.,Dégallier,S.,Pattacini,U.,Ijspeert,A.,&Victor,J.S.(2010).Integration ofvisionandcentralpatterngeneratorbasedlocomotionforpathplanningof anon-holonomiccrawlinghumanoidrobot.In2010IEEE/RSJinternational conferenceonintelligentrobotsandsystems(IROS)(pp.183–189).IEEE.

Gouaillier,D.,Hugel,V.,Blazevic,P.,Kilner,C.,Monceaux,J.O.,Lafourcade, P.,...,&Maisonnier,B.(2009).MechatronicdesignofNAOhumanoid.In ICRA’09.IEEEinternationalconferenceonroboticsandautomation(pp.

769–774).IEEE.

Iancu,I.,Colhon,M.,&Dupac,M.(2010).ATakagi–Sugenotypecontroller formobilerobotnavigation.InProc.ofWSEASint.conf.oncomputational intelligence(pp.29–34).Bucharest,Romania:WSEASPress.

Kamon,I.,&Rivlin,E.(1997).Sensory-basedmotionplanningwithglobal proofs.IEEETransactionsonRoboticsandAutomation,13(6),814–822.

Kamon,I.,Rimon,E.,&Rivlin,E.(1998).Tangentbug:Arange-sensor-based navigationalgorithm.TheInternationalJournalofRoboticsResearch,17(9), 934–953.

Kaur,A.,&Kaur,A.(2012).Comparisonofmamdani-typeandsugeno-type fuzzyinferencesystemsforairconditioningsystem.InternationalJournal ofSoftComputingandEngineering,2(2),2231323–3252307.

(11)

Kim,S.K.,Russell,J.S.,&Koo,K.J.(2003).Constructionrobotpath-planning forearthworkoperations.JournalofComputinginCivilEngineering,17(2), 97–104.

Lumelsky, V. J., & Stepanov, A. A.(1984). Navigation strategies for an autonomous vehicle with incomplete information on the environment.

General Electric Company, Corporate Research Center, Tech. Report 84CRD070.

Lumelsky,V.,&Skewis,T.(1990).Incorporatingrangesensingintherobot navigationfunction.IEEETransactionsonSystems,ManandCybernetics, 20(5),1058–1069.

Medina-Santiago,A.,Anzueto,J.C.,Pérez-Patricio,M.,&Valdez-Alemán,E.

(2013).Programmingreal-timemotioncontrolrobotprototype.Journalof AppliedResearchandTechnology,11(6),927–931.

Medina-Santiago,A.,Camas-Anzueto,J.L.,Vazquez-Feijoo,J.A.,Hernández- deLeón,H.R.,& Mota-Grajales,R. (2014).Neuralcontrolsystemin obstacleavoidanceinmobilerobotsusingultrasonicsensors.Journalof AppliedResearchandTechnology,12(1),104–110.

Michel,P.,Chestnutt,J.,Kagami,S.,Nishiwaki,K.,Kuffner,J.,&Kanade, T. (2006). Onlineenvironment reconstruction for biped navigation. In Proceedings2006IEEEinternationalconferenceonroboticsandautoma- tion,2006.ICRA2006.pp.3089–3094.IEEE.

Michel, P., Chestnutt,J., Kuffner,J., & Kanade, T. (2005). Vision-guided humanoidfootstepplanningfordynamicenvironments.In20055thIEEE- RASinternationalconferenceonhumanoidrobots(pp.13–18).IEEE.

Nakhaei,A.,&Lamiraux,F.(2008).Motionplanningforhumanoidrobots inenvironmentsmodeledbyvision.InHumanoids2008.8thIEEE-RAS internationalconferenceonhumanoidrobots,2008(pp.197–204).IEEE.

Okada, K.,Inaba,M., & Inoue, H. (2003).Walking navigation system of humanoid robot using stereo vision based ﬂoor recognition and path planningwithmulti-layeredbodyimage.pp.2155–2160.2003IEEE/RSJ international conference onintelligent robotsandsystems,2003 (IROS 2003).Proceedings(Vol.3)IEEE.

Puterman, M. L. (2014). Markov decision processes: Discrete stochastic dynamicprogramming.JohnWiley&Sons.

Sabe,K.,Fukuchi,M.,Gutmann,J.S.,Ohashi,T.,Kawamoto,K.,&Yoshi- gahara,T.(2004).Obstacleavoidanceandpathplanningforhumanoid robotsusingstereovision.pp.592–597.ICRA’04.2004IEEEinternational conferenceonroboticsandautomation,2004.Proceedings(Vol.1)IEEE.

Shapiro,L.G.,&Stockman,G.C.(2001).Computervision.NewJersey:Prentice Hall.

Stieler,F.,Yan,H.,Lohr,F.,Wenz,F.,&Yin,F.F.(2009).Developmentof aneuro-fuzzytechniqueforautomatedparameteroptimizationofinverse treatmentplanning.RadiationOncology,4(1),39.

Zadeh,L.A.(1965).Fuzzysets.InformationandControl,8(3),338–353.

Zavlangas, P.G.,Tzafestas,S.G.,& Althoefer,K.(2000).Fuzzyobstacle avoidanceandnavigationforomnidirectionalmobilerobots.pp.375–382.

Aachen,Germany:EuropeanSymposiumonIntelligentTechniques.