Contents lists available at ScienceDirect
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
Data-driven
prognostics
using
a
combination
of
constrained
K-means
clustering,
fuzzy
modeling
and
LOF-based
score
Alberto Diez-Olivan
a,∗, Jose A. Pagan
b, Ricardo Sanz
c, Basilio Sierra
daTecnaliaResearch&Innovation,IndustrialSystemsUnit,Donostia-SanSebastián,Spain bDieselEngineFactory,DiagnoseEngineeringandProductDevelopment,Cartagena,Spain cAutonomousSystemsLaboratory,UniversidadPolitécnicadeMadrid,Spain
dDepartmentofComputerSciencesandArtificialIntelligence,UPV/EHU,Spain
a
r
t
i
c
l
e
i
n
f
o
Articlehistory: Received26July2016 Revised10December2016 Accepted4February2017 Availableonline22February2017
CommunicatedbyShenYin
Keywords:
Behaviorcharacterization Conditionmonitoring Constrainedk-meansclustering Fuzzymodeling
Localoutlierfactor Machinelearning
a
b
s
t
r
a
c
t
Today,failuremodescharacterizationandearlydetectionisakeyissueincomplexassets.Thisisdueto thenegativeimpactofcorrectiveoperationsandtheconservativestrategiesusuallyputinpractice, fo-cusedonpreventivemaintenance.Inthispaperanomalydetectionissueisaddressedinnewmonitoring sensordatabycharacterizingandmodelingoperationalbehaviors.Thelearningframeworkisperformed onthe basisofamachinelearningapproachthatcombinesconstrainedK-means clusteringforoutlier detectionandfuzzymodelingofdistancestonormality.Afinalscoreisalsocalculatedovertime, con-sideringthemembershipdegree toresultingfuzzysetsand alocal outlierfactor.Proposed solutionis deployedinaCBM+platformforonlinemonitoringoftheassets.Inordertoshowthevalidityofthe ap-proach,experimentshavebeenconductedonrealoperationalfaultsinanauxiliarymarinedieselengine. Experimentalresultsshowafullycomprehensiveyetaccurateprognosticsapproach,improvingdetection capabilitiesandknowledge management.Theperformanceachievedisquitehigh(precision,sensitivity andspecificityabove93%andκ=0.93),evenmoresogiventhataverysmallpercentageofrealfaults arepresentindata.
© 2017 Elsevier B.V. All rights reserved.
1. Introduction
Unexpected incidental failures imply an important impact in terms of risks, costs, resources and service loss that should be minimized [1]. The growing complexity of industrial equipment, systems andinstallations results inan ever-increasing amountof healthmonitoringinformation,whicheventuallyexceedthe capac-ityofmostfault detectionsystemsandmakes thedesign of suc-cessfulmaintenancemethodologiesmorechallenging.Moreover,it is importanttoprovide abetter understanding ofmonitored sys-tems andto efficiently characterize the normalbehaviors froma huge amountof historicaldata. Thelack ofknowledge aboutthe behavior of complex assets makes the problem of maintenance verydifficult.
Naval sector, for instance, is traditionally focused on preven-tive strategies,usually dividedby3–4maintenance difficultylevel groups: vessel crew, vessel base, shipyard, and manufacturer [2].
∗ Correspondingauthor.
E-mailaddresses:[email protected],[email protected]
(A.Diez-Olivan),[email protected](J.A. Pagan),[email protected](R. Sanz), [email protected](B. Sierra).
Crew and base level maintenance tasks are planned and carried out during vessel operating time; however, shipyard and manu-facturer tasks are done in programmeddock periods. The whole vessellifecycleisdividedbylongoperatingperiods separatedby shortmaintenance periods,someofthemdry-dock.When an im-portant unexpected breakdown occurs during operation (at both shipyardormanufacturerlevel), vesselactivitystopsandplanned missionshavetobecancelled.Additionally,importantrepaircosts must be envisaged. In such cases it is often necessary to open dismantling routes aiming to remove defective parts. Sometimes a cesareanin thevessel hullor evena dry-dock have tobe per-formed to extract the involved equipment. Therefore, total costs may include: defective parts, reparation manpower, dismantling route procedures and a cesarean or a dry-dock. Indirect tasks are usually more expensive than normal component reparation processes. Moreover, when a failure arises while a mission is in process,objectives could notbe fulfilledorthe missionmightbe cancelledandvessel is returned tobase. Butthe worst-case sce-narioisthatinwhichvesselorcrewsafetyarethreaten.
This work is an effort to implement a novel machine learn-ingbased approachthat aims tominimisethe negativeeffects of unexpected breakdowns, providing a reliable fault detection and
predictionstrategy. Thisapproach hasbeenapplied overreal op-erationaldataacquiredfromanauxiliarydieselengineduringreal vessel operation, since it is one of the most critical vessel com-ponents:it supplies propulsion andenergy to the vessel andits behavioriscomplex,asitisareciprocatingenginematchedwitha turbocharger.Astudyin-depthwascarriedout intothepossibility ofimprovementthroughthe useofdata-drivenmachine learning techniquesto statisticallymodel the normal behavior of the en-gine,ina fullyautomated unsupervisedfashion.Todo so, behav-iorcharacterizationandfuzzymodelingareapplied tomonitoring sensordata. Moreover,knowledge models generated are compre-hensive,yetaccuratemethodstoanticipatepotentialcriticalfaults. Resultingmodelsandall availableinformationareintegratedina specific CBM+ (Condition Based Monitoring) system, which com-binesCBM,RCM(ReliabilityCenteredMaintenance)andAI (Artifi-cialIntelligence)capabilities.Althoughthestudyisfocusedonthe exploitationofoperationalparameters,itmustbementionedthat theproposedapproachcanbealsoappliedtoothertypesof oper-ationalparameters that canbe usedasfailure indicators,such as vibrations, fluids analysis, thermography information, in-cylinder pressureorultrasonicinformation.
Therestofthearticleisorganizedasfollows.Section2presents a review of related andprevious works underthe use of condi-tionbasedmaintenancemethods andstrategiesforimproving as-setsreliability,andthepositionofthepresentworkinthecontext ofprevious ones. Sections3 and 4 explain the machine learning basedapproaches used to characterize asset behavior anddetect anomalies,basedonconstrainedclusteringandfuzzymodeling, re-spectively.InSection 5thetestscenario ispresentedandthe ex-perimentalresultsobtainedare discussed.Finally,theconclusions achievedinthisstudyaregiveninthelastsection.
2. Relatedwork
Conditionbasedmaintenanceaimstoanticipateamaintenance operationbasedon evidencesofdegradation anddeviationsfrom normalassetbehavior. When observingtheconditionofa partic-ularsystemasetofmonitoringdevicesandsensorsmustbe con-sidered.Intelligentmonitoringof equipmentby meansofsensors isessentialinordertoacquire relevantdata,containingthe char-acterizationof operationalfaults in physicalsignals:acoustic and ultrasonicsensors, accelerometers,currentmeasurementsor ther-mocouplesareusuallyemployed[3,4].Inadditiontothisdata, en-vironmentalconditions andcontextual information,such as tem-perature,pressureorhumidityalsoprovideveryusefuladditional informationto enrichthe modeling process [5]. From such infor-mation, specific Key Performance Indicators (KPIs) are calculated andanalysedtodiscovertrendsandknowledgeofinterestthatcan leadtoapotentialcriticalfault.
Atraditionalpreventivestrategymayobtainhighreliability lev-elsifitiswell designed[6].However, itsometimes implies over-maintainingtheassets.Thisisduetothefactthatequipment man-ufacturersarealwaysconservativeintheirmaintenancepoliciesso thatreliabilityisachieved,butassuminghighmaintenancecosts.It iswellknownthatfailureprobabilityofmanycomponentsishigh atthebeginningandendofitsoperationallife,followingthe bath-tubfailure pattern [7]. Therefore, unnecessary maintenance tasks increase failure ratewhen a defectiveitem is installed ordueto ahumanmistake.Moreover,preventivestrategiesdonottake into accountoperationalcontextsuchasloadprofiles,numberofstarts or environmental parameters, which strongly affect components lifetime. Finally, preventive maintenance is erroneouslybased on the idea that the probability of occurrence of operational faults increasesexponentially ata certain time. Inpreventive strategies componentsare replacedorrepaired before that momentoccurs. Thisassumptionisnottrueinmanycases,sincethereareseveral
failure patterns in which failure probability does never increase [8].In such cases failure probability is constant in time. Thus, a componentcouldfail atanytime.Especiallyrelevantexamplesof this phenomenon are failure patterns of electrical andelectronic components,inwhichreparationandsubstitutiontasksatplanned periods oftime doesnot implyan improvementinterms of reli-ability.Forall thesereasonsthere stillexists importantreliability increaseandcostreductionmargins.
Inorder to improvereliabilityand toreduce costs an optimal maintenance strategy should provide a set of predictive, preven-tive andcorrectiveprocedures asaresultofa technicaland eco-nomicanalysisofeveryfailuremode,takingintoconsiderationthe relatedconsequences [9]. Reliability CenteredMaintenance(RCM) strategies include a Failure Mode, Effects and CriticalityAnalysis (FMECA). Once failure modes are identified and criticality classi-fied,maintenancetasksto beundertakenare establishedtoavoid faults consequences[10].When predictive maintenanceis techni-callypossibleandiseconomicallyworthit,comparedtopreventive andcorrectiveones, itisapplied.Maximumreliabilityisobtained whenarobustandtrustworthyfailureindicatorparameteris mon-itored.When it is not technicallypossible orit isnot affordable, preventivemaintenance strategiesare adopted.Corrective mainte-nance is only envisaged in casepredictive andpreventive main-tenance strategies are not feasible.If that isthe case, the conse-quencesof afault are criticalandonly correctivemaintenance is feasible.Inthosesituations,theuseofsafetydevicestoapply ap-propriate troubleshootingtasks orredesigningaffected asset(e.g. installingastandbycomponent)isrequired.
Whentheaimistoachievemaximumreliabilityanappropriate CBM+systemwithmonitoringcapabilitiesmustbeadopted, gath-eringandcombiningall kindof usefulsources ofinformation si-multaneouslyandprovidingthe prognosticsneededto assurethe correct operation of the assets(e.g. the critical components of a paediatricemergencydepartment)[11].Thus,resultingCBM+ sys-temmustincludedataacquisitionandprocessing,diagnosticsand prognostics and decision making functionalities [12]. Generated data-driven models for diagnostics and prognostics must be de-ployedin amonitoringplatform withonlinedata acquisitionand inspectioncapabilities[13].SeveralcommercialCBM+systemsare alreadyavailable, mostof themusing a widevariety of potential failureindicators,separately[14,15].
Data-drivenprognosticsmodelsarethecoreofthewhole pro-cess since they apply the behavioral and statistical methods for fault prediction and classification [16]. Artificial neural networks [17]andsupportvectormachines[18,19]areusuallyappliedto an-alyze dataandinfer such models.The use ofprojection methods (e.g. linear, nonlinear and orthogonal projections to latent struc-tures,kernelmethodsorPCA)fordimensionalityreductionand re-gressioncanhighlysupportthepredictionprocess[20–22]. Never-theless,dependingontheapplicationandwheneveritispossible, itmaybe beneficialtoinstead incorporatespecific knowledge di-rectlyintowhicheveralgorithmisapplied.
3. Behaviorcharacterization
Data-drivenbehaviorcharacterizationconsistsongrouping sim-ilardataintodatasets,whichphysicallyrepresentthesame opera-tionalcondition.Withinformedgroups,thereexistsdatapointsfar fromthepattern,whichcorrespondstothemeanpoint.Such pat-terncouldbeverysignificantinordertoclassifyortoidentify be-haviorslinkedtothedata,orinordertodetectortoinferpossible faults oranomalous operational conditions.Biggroups, orgroups that are closetogether, usually implynormal behaviors. Whereas smallgroupsoreventsthat arefarfromthepattern(ofthesame group orregardingabig group),implyanomalies oroutliers(e.g. noiseandtransientdata).Itmustbenotedthatthelearnerisonly providedwithunlabelledexamples.Whenlabelsareavailable, su-pervised learning and reinforcement learning can be applied to trainaclassifier.However,itisoftenverydifficulttoobtainalarge datasetcontainingrealfaults.
3.1. ConstrainedK-meansclustering
IntheproposedconstrainedK-meansapproach,kvalueis auto-matically provided based on cluster distribution andcluster data variance as established by He et al. and Wu et al. [25,26]. It is computedasapreviousstepoftheclusteringprocess.Variance ex-plained ofresulting classification model,orclusterscompactness, Cmp, is calculated until Cmpk−Cmpk−1≤0.5. As Euclidean dis-tanceisapplied[27],itbecomescoherenttoaveragecluster scat-teringindex[28].Thememberofeach cluster shouldbe asclose toeachotheraspossible.Theclusterscompactnessiscomputedas itcanbeseeninEq.(1).
Cmp= 1
k k
i=1
||
σ
(
Ci)
||
||
σ
(
X)
||
σ
(
Ci)
=⎡
⎢
⎣
σ
1 C1 . . .σ
m Ck⎤
⎥
⎦
σ
(
X)
=⎡
⎣
σ
1 X . . .σ
m X⎤
⎦
(1)wherekisthenumberofclusters;
σ
(Ci) isthevarianceofclusterCiwith
σ
Cpi=1 |Ci|
n j=1
(
xp j−C
p
i
)
2andσ
(X)isthedatavariancewithσ
pX = 1n n
i,j=1
(
xip−x p j)
2,i= j.
Whenknowledgeregardingthesystembehaviortobemodeled isavailableinadditiontothedatainstancesthemselves,the algo-rithmcanbemodifiedtomakeuseofthisknowledge[29].A con-strainedthatconsistsonspecifyinganassetmaininputfeature,Xf,
isthusconsideredsothatinstancesaregroupedonthebasisofXf
non-parametricdistribution.Giventhenumberofclusters,k,their width interms ofXf valuesare estimated asw=
max(Xf)−min(Xf)
k .
Then, for each cluster Ci, i=
(
1,...,k)
, uk=min(
Xf)
+(
i+1)
∗wandlk=uk−1,withu0=min
(
Xf)
,aresetasupperandlowerlimitsonXfvalues,respectively.Then, alocaldistance-basedoutlier
de-tection can be accurately performed considering theasset status, determinedbyitsmaininputfeaturevalues.
Givenasetofmfeatures,X=
{
X1,...,Xm}
whereeachfeatureXicantakeavaluefromitsownsetofpossiblevalues
χ
i,andnfea-ture vectorsor instances, xi=
(
x1,...,xm)
∈χ
=(
χ
1,...,χ
m)
, withi=1,...,n, L2 normalization is calculated for each instance xi in
the data set inorder to minimisethe impact of having different rangeofvaluesofrawdataintheresultingclassificationmodel.It iscomputedastherootofthesumofitssquaredelements, asit canbeseeninEq.(2).
L2
(
xi)
=m
l=1
|
x2il
|
(2)OncethenormalizationofdatasamplesisperformedusingL2, andinordertocalculatethedistancebetweeninstances,xiandxj,
Euclideanmetriciscomputed(seeEq.3).
D
(
xi,xj)
=||
xi−xj||
=m
l=1
xil−xjl
2 (3)wherexi= xi
L2(xi) andxj= xj
L2(xj) are the L2-normalizedinstances
xiandxj,respectively.
Theconvergencecriteriaisestablishedasamaximumnumber of iterationsand a stability threshold, which checks the clusters compactnessvariationofresultingclassificationmodelfromith it-erationtoiterationi+1(seeEq.1).
An iterative outlier detectionloop is then performed, so that fromgroups ofinstances formed during theconstrained-learning stage, outliers are detected and isolated and behaviors of inter-estgivena certain assetmain input featurevalues are character-ized. Theyare represented by instances that belong to a certain group butdiffers notoriouslyfromthepattern. For eachiteration intheoutlierdetectionprocessandforeachcluster,Ci,ananomaly
thresholdiscalculatedbasedonthedistancesofinstancesgrouped inCitothecentroid,asitcanbeseeninEq.(4).
Th
(
Ci)
=1
|
Ci|
xj∈Ci
D
(
xj,ci)
+3
xj∈Ci
D
(
xj,ci)
2|
Ci|
−1(4)
where ci=|C1i| xj∈Cixj is the centroid of cluster Ci and xj, j=
{
1,..,|
Ci|}
,isthejthinstancegroupedinclusterCi.Events whose distance to the centroid are over the anomaly thresholdaresetasoutliers.Consequently,foreachiteration dur-ing the outlier detection process and for every cluster, centroids are recalculated filtering out detected outliers. The process stops when no more outliers are detected. Outliers detected within a cluster and small clusters could imply abnormal asset behaviors andoperationalfaults.
TheoverallalgorithmcanbeseeninAlgorithm1.
Algorithm1 ConstrainedK-meansclusteringforoutlierdetection. Input:asetofmfeatures,X=
{
X1,...,Xm}
1: ComputekusingEquation1
2: SelectanassetmaininputfeatureXf∈X=
{
X1,...,Xm}
3: Computew= max(Xf)−min(Xf) k
4: forallCi,i=
(
1,...,k)
do5: Find uk=min
(
Xf)
+(
i+1)
∗w and lk=uk−1, with u0= min(
Xf)
6: forallxj⊂
{
lk,uk}
do 7: xj→{
Ci}
8: endfor 9: endfor 10: outliers=
{}
11: whileouliersfounddo 12: forallCi,i=
(
1,...,k)
do13: ComputeTh
(
Ci)
usingEquation414: ifD
(
xj,ci)
>Th(
Ci)
then15: xj→outliers
16: RemovexjfromCi 17: endif
18: endfor 19: endwhile
4. Anomalydetection
Inthissectiontheanomalydetectionprocessispresented.Itis performedonthebasisofbehaviorscharacterizedfromdataby ap-plyingconstrainedK-means clusteringandonoutliers found.The mostimportantbehaviortobeconsideredisnormality.
4.1.Fuzzypartition
Fuzzyrules generationprocessandinference enginearebased on work proposed by Cingolani et al. [32]. Fuzzy controllers are currentlyconsideredtobeoneofthemostimportantapplications of the fuzzy set theory proposed by Zadeh [33]. This theory is basedonthenotionofthefuzzysetasageneralizationofthe or-dinaryset characterized by a membership function
μ
that takes valuesfromtheinterval[0,1]representingdegreesofmembership intheset.Fuzzycontrollerstypicallydefine anon-linearmapping fromthesystem’sstatespacetothecontrolspace.Thus,itis pos-sibletoconsider the output ofa fuzzycontroller asa non-linear controlsurfacereflectingtheprocessoftheoperator’sprior knowl-edge.Afuzzycontroller isakindoffuzzyrule-basedsystemthat iscomposedbythefollowingparts:• a knowledgebasethat comprisestheinformation usedbythe expertoperatorintheformoflinguisticcontrolrules,
• a fuzzification interface, which transforms the crisp values of theinputvariablesintofuzzysetsthatwillbeusedinthefuzzy inferenceprocess,
• an inference system, which uses the fuzzy values from the fuzzificationinterfaceandtheinformationfromtheknowledge basetoperformthereasoningprocess,and
• thedefuzzificationinterface,whichtakesthefuzzyactionfrom theinferenceprocessandtranslatesitintocrispvaluesforthe controlvariables.
The knowledge base encodesthe expertknowledge by means ofasetoffuzzycontrolrules.
In orderto considereach group ofsimilar events,normaland outliers,in a relevantway, a fuzzypartition intwo fuzzy sets is definedovertheuniverseUioftheEuclideandistancestocentroid inclusterCi.LetdibetheEuclideandistanceofeventxitocentroid
ofcluster Ci,computedasitcan beseeninEq.(3).The
member-shipfunction ofthesefuzzysets, respectivelydenoted as
μ
n andμ
oaredefinedinEq.(5)andEq.(6).μ
n(
di)
= Th(Ci)−di
Th(Ci)−minn ifdi∈[minn,maxn]
0 otherwise (5)
μ
o(
di)
= di−Th(Ci)
maxo−Th(Ci) ifdi∈[mino,maxo]
0 otherwise (6)
whereminn andmaxn andmino andmaxoare the minimumand
maximum values in fuzzy sets normal and outlier, respectively,
μ
n(di):di →[0, 1] quantifiesthe degreeof membership ofdi tonormaland
μ
o(di):di→ [0, 1]quantifiesthe degreeofmember-shipofditooutlier.ObtainedfuzzypartitionisdescribedinFig.1. Notethat consideringamembershipdegreeallows toprovide ex-pertswithmoreinterpretableinformationabouttherealstatusof theasset.
4.2.Eventscore
In order to distinguish betweenoutliers and real faults, a lo-caloutlierfactoriscomputedforeacheventdistance,asitcanbe seeninEq.(7).ItisbasedontheworkproposedbyBreunigetal. [34] and it measures the degree of isolation of a point with re-spect to its neighbors. Thus, the local density is also considered
Fig.1. Proposedfuzzypartition.
whendeterminingifanoutlierisanactualanomaly.
LOF
(
di)
=
dj∈N(di) LRD
(
dj)
LRD(
di)
|
N(
di)
|
(7)
where LRD(di) is the local reachability distances of di,computed
fortheclosest subsetofdistances N(di)ofsize max
|Ci|
10,1
,asit isshowninEq.(8).LRD(dj)iscalculatedlikewise.
LRD
(
di)
=⎛
⎜
⎜
⎝
dj∈N(di)
reachDist
(
di,dj)
|
N(
di)
|
⎞
⎟
⎟
⎠
−1 (8)where reachDist
(
di,dj)
=max{
K−dist(
di)
,dj}
and K−dist(
di)
is the K−distanceneighborhood of di,withK=|{
outliers}|
foreachcluster, being K=max
|Ci|100,1
in the case that no outliers are
foundinclusterCi.
The score ofeventxi is then definedasa combination ofthe
membershipfunctiontofuzzysetsnormalandoutlierandthelocal densityofdistancestothenormalbehavior.
score
(
xi)
=⎧
⎪
⎨
⎪
⎩
μ
n(
di)
∗LOF(
di)
max
{
LOF}
ifdi∈[minn,maxn]−
μ
o(
di)
∗LOF(
di)
max
{
LOF}
ifdi∈[mino,maxo](9)
where LOF=
(
LOF1,...,LOFn)
, calculated for the whole set ofdis-tancesd=
(
d1,...,dn)
.Aneventwillbeconsidered asanomaly ifits scorefallsbelow −0.5. Fig. 2 shows an example of the evolution of the resulting eventscore calculatedover time. Theproposed event scoreis an optimalmethodforreducing false positivesinanomaly detection process.Italsoallowsuserstoaccessresultsquicklyandefficiently.
5. Testscenario
Themain goalofthepresentworkis toimproveanomaly de-tectionandpredictioncapabilitiesofexistingconditionmonitoring strategies.Inthissense,proposed approachshouldbeableto im-prove currentmathematicalmodels andpreventive strategiesput inpractice.Oncetheknowledgemodelsarelearnedandvalidated, theyaredeployedintheCBM+systemtoautomaticallyanticipate real-timefaultsinanonlinefashion.
5.1. Experimentalsetup
Fig.2. Eventscoreexample.
Fig.3. Barchartofresultingclustersdistribution.
A set of operational features were monitored, collected and analysed. From these parameters the engine behavior and its healthstatuscanbeestablished.TheyareshowninTable2.Then, the data processing techniques proposed are employed to learn statistical models fromhistorical data, which will allow identify-ing andmodelingnormalbehaviors, isolatingoutliersand detect-ingoperationalfaults.
Aneventiscollectedeveryminuteanditiscomposed by val-ues of all monitored parameters at a specific time instant. The
Table1
Auxiliarymarinedieselenginemainspecifications.
Parameter Value
Numberofcylinders 12V Ratedmaximumpower 1200kW
Ratedoperatingspeed 1800rpm(constant)
Bore 165mm
Stroke 185mm
Compressionratio 15.5
CBM+ system automatically transfers asset sensors’ readings in buffermodetoan on-boarddatabase.Thesedataaredailysentin streamingmodetoagroundcontroldatabaseforfurtheranalysis. Prognosticsmodelslearntfromhistoricaldataarethen appliedin real-timefortheintelligentmonitoringoftheasset,checkingnew eventson-board.
Table2
Monitoredauxiliarydieselengineparameters.
Typeofvariable Parameters
Contextualvariables Environmentalpressureandtemperature Relativehumidity
Circulationseawatertemperatureandpressure Engineinputvariables Inletandreturnfuelflow
Fuelpressure Enginespeed
Coolingwaterpressureandtemperature Oilpressureandtemperature Coolingwatertemperature
IntakemanifoldchargeairpressureandtemperatureA Startingairpressure
Engineoutputvariables Exhausttemperatureofcylinders1Ato6Aand1Bto6B InletexhausttemperatureofturbochargersAandB OutletexhausttemperatureofturbochargersAandB AandBturbotemperaturedrop
Generatorinputvariables Alternatorcoolingairtemperature Generatoroutputvariables Alternatorfrequency
Alternatorreactiveandactivepower Alternatorvoltageandintensity
AlternatorwindingtemperatureinphasesR,SandT Alternatorbearingtemperatures
Table3
Percentageofvarianceexplainedforeachnumber ofclusters.
Numberofclusters %ofexplainedvariance
2 76.03
3 88.23
4 93.46
5 95.32
6 95.91
7 96.52
8 96.96
sothattimeregionswheretheengineisnotworkingarenot con-sidered,sincetheyhavenosignificanceregardingoperational fail-uremodes.
Domain experts’contributiontakesplacewhenidentifyingthe asset input feature in the constrained K-means clustering step. Then,theproposedapproachisautomaticallyappliedand anoma-liesfound attheendoftheprocessarecheckedandvalidatedby thesameexperts.Todoso,meetingsandinterviewswithexperts mustbeheld.
The software platform used is Java Platform Enterprise Edi-tion fromOracle. It is based on the Java programminglanguage. Experimentswere conducted inthe Java-based Eclipse RCP (Rich Client Platform, Kepler version) environment on Ubuntu-Linux 14.04 (64bits), on a CORE i5 desktop PC with 4 GB of RAM memory.
Inthefollowingsubsectiontheanalysisprocessfollowedbythe proposed methodologyis shown overa real dataset.Moreover, a cylindersleaking problemcharacterised by low exhaustgas tem-peratureincylinders, andan alternatorproblemcharacterisedby highintensityandreactivepoweraredetectedanddiscussed.
5.2.Experimentalresults
The proposed methodology has been tested on two months’ timeoperationaldata,fromJanuarytoFebruary2015,ofan auxil-iarydieselengine.Atotalof17,377eventsareanalyzed.The num-ber of clusters, k, is set to 8 given the percentage of explained variance,asitcanbeseeninTable3.Althoughother complemen-tarytests were performedwithdifferentk values (e.g.k=6 and
k=10), resultsobtainedwere lessaccurate intermsofthe num-beroffalsenegativesandfalsepositives.
As a result of the constrained K-means clustering-based out-lier detectionprocess, a totalof78 eventsare isolated from nor-mal patterns. The resulting cluster distribution can be seen in Table4andinFig.3.
More indetail,events groupedin each cluster canbe seen in Fig.4.The dashedlinerepresentsthecluster centroidand opera-tionalparametersvaluesarerepresentedbydots,beingeachevent a set of dots (one value per operational parameter) at a specific time instant. Operational parameters values are normalized be-tween0and1.ItcanbeseenasasimplifiedKohonenmapor net-workwithasmallnumberofnodes,fromlowertohigherengine load,andnoneighborhoodfunctionwhenupdatingtheBMU(best matchingunit)ateachiteration[35].
As it can be expected, clusters containing stable engine load conditionsgroupedthemajorityofevents.Thatisthecase,for in-stance,ofCluster3,4and5.Outliersdetectedinsuchclustersare more likelyto imply realsystem faults. However, anddepending onnatureandtypeofsystemfault,aproblemcanalsooccurwhen theengineisnotinstableoperatingconditions.Thatbehaviorcan be observed inrelation to Cluster0, whenthe engine isstarting up. InFig.5 atypical normalengine behaviorunder stable oper-ation is shown. It corresponds to normal eventsgrouped within Cluster3.
Among outliers found, some events correspond to abnormally lowexhausttemperaturesincylinders,probablyduetoascavenge fire and/or a defective fuel valve, both ofwhich are caused by a fuelsystemfault.Thisfaultwaspresentinatotalof13events dis-tributedindifferentclusters.InFig.6anexampleofsuchbehavior inthreeevents ofoneofthe clustersformedcan beappreciated. An alternator system symptom was also detected in 2 events in two differentclusters. Itischaracterized by extremely high alter-nator intensityandreactive powerata normalengine load, asit can be seen in Fig.7. It isusually producedduring docking ma-noeuvresandcouldleadtocriticalsystemfailureorcompromised security.
Table4
Clustersdistribution.
Cluster 0 1 2 3 4 5 6 7
Totalnumberofevents 330 13 751 2098 10,550 3265 286 6
OutliersFound 9 0 13 7 33 15 1 0
Fig.5. Normalenginebehavior.
Fig.7. AlternatorSystemfaultdetectedatanormalengineload.
Table5
Resultsobtainedpercluster.
Cluster 0 1 2 3 4 5 6 7
Realnormal 326 13 750 2094 10,547 3263 285 6 Predictednormal 328 13 750 2094 10,547 3263 285 6 Realfault
Fuelsystem 4 0 1 3 3 2 0 0
Alternatorsystem 0 0 0 1 0 0 1 0
Predictedfault 2 0 1 4 3 2 1 0
Table6
Globalconfusionmatrix.
78outliersfound Predictednormal Predictedfault
RealNormal TN=17,362 FP=0 RealFault FN=2 TP=13
To quantify the number of anomalous events in each cluster, an eventisconsideredasarealfaultifitsscoreisbelow−0.5.Given that thetest casescenariosshowingthe fuelsystemand alterna-tor faults were deliberately chosen to be difficult to detect, it is stillencouragingthattheclassificationoffaultyeventsrisesabove the falsepositive rate,accurately distinguishing realfaultsamong outliersfound.
Then theapproach foranomaly detectionwastestedon a 10-foldcross-validationbasisforeachcluster,bysegmentingthetotal setofclustereventsinto10equalparts.Thus,theconfusionmatrix presentedinTable6isobtained, containingtheaverageresultsof the10folds.Notethatthe constrainedK-means clusteringstepis performedon thewholedata setonlyonce, inordertoestablish the differentengine load operational rangesthat will be used to isolatetheoutliersandpredicttherealfaults.
In order to evaluate the results,the precision, sensitivity and specificity ofthe detectionprocess are calculated.Theyare three widely used quality measures in this kind of processes. As it is
Table7
Globalprecision,sensitivity,specificityandκcoefficient.
Realnormal Realfault Globalresults
Precision 99.99% 100% 99.98% Sensitivity 100% 86.67% 93.34%
Specificity 100%
κ 0.93
shown in Table 7, precision, sensitivity and specificity are glob-ally above 93%, so the approach accurately limits false anoma-liesand undetectedfaults. The inter-rater agreement statistic co-efficient (Cohen’s kappa,
κ
) is also computedaiming to evaluate theagreement betweennormalandfault events[36]. The result-ingκ
coefficientis0.93,thereforeahighstrengthofagreementis achieved.Bytakingintoaccountestimationsmadebythisapproachand nextassetmaintenance periods,usageplanning,spareparts avail-ability and manpower resources, optimal maintenance strategies canbesuggested.
6. Conclusions
resultsshowthepotentialoftheproposedapproachtosuccessfully address fault modes characterization problem, providing a finer andeasiertointerprettechniqueforexpertstoanticipatepotential failuresonthebasis ofthehealth statusoftheassetfrom moni-toringsensordata.The complexityofsuchassets(e.g.marine en-ginesand industrial machinery) makes their behavior difficultto understandandinterpret.Undersuch circumstances itis particu-larlychallengingtoanticipateabnormalbehaviorsthatoften fore-seesignificantandcostlyfaults.
Itwasfoundthatmachinelearningmethodscanhighlysupport thetraditionaldiagnosticsandprognosticscondition-based main-tenancestrategies.The combinationofconstrainedK-means clus-teringforoutlierdetectionandbehaviorcharacterization,thefuzzy modeling ofdistances to patterns found and thefinal LOF-based eventscoreprovideafullycomprehensiveyetaccurateprognostics approach.Discussedtestscenario shownan accurate detectionof criticalfaultsinanauxiliarydieselengine,characterizedby abnor-malexhaust temperaturesin cylindersregarding fuel systemand by extremely high alternatorintensity and reactive power in al-ternator system. The performance achieved is quite high (global precision,sensitivityandspecificityabove93%and
κ
=0.93),even more so given that a very small percentage of real faults are presentindata.The deployment of discussed data-driven models in a CBM+ system results in important benefits in terms of fault detection andprediction. The monitoring andanalysis of additional opera-tional parameters (e.g. in-cylinder pressure, torque, turbocharger speed,cylinder heads vibration, main bearings vibrationand flu-idsanalysis),andenvironmentalconditions(e.g.weather variabil-ityfromArcticSeato RedSeawaters), willallow increasing pre-dictionaccuracyandsystemcapabilitiesasmorefailuremodesare considered.Thiswill allowmaximizing reliability andminimising risksandunnecessarycostsderivedfromunexpectedbreakdowns. Moreover,predictionsanddiagnosisobtainedbyadoptingthis ap-proachcanserveasinput foranoptimaldecisionmakingsupport andplanningofmaintenanceoperations,whichwillleadto impor-tantbenefitsinthereliability,safetyandperformanceoftheassets.
References
[1] D.E.Barber,Shipboardconditionbasedmaintenanceandintegratedpower sys-teminitiativesPh.D.thesis,MassachusettsInstituteofTechnology,2011. [2] W.C.Greene,Evaluationofnon-intrusivemonitoringforconditionbased
main-tenanceapplicationsonUSNavypropulsionplantsPh.D.thesis,2005. [3] C.JamesLi,S.Li,Acousticemissionanalysisforbearingconditionmonitoring,
Wear185(1)(1995)67–74,doi:10.1016/0043-1648(95)06591-1.
[4] W.T.Peter,Y.Peng,R.Yam,Waveletanalysisandenvelopedetectionforrolling element bearing fault diagnosis their effectivenessand flexibilities, J. Vibr. Acoust.123(3)(2001)303–310,doi:10.1115/1.1379745.
[5] M.Braglia,G. Carmignani,M. Frosolini,F.Zammori,Data classificationand mtbfpredictionwithamultivariateanalysisapproach,Reliab.Eng.Syst.Saf. 97(1)(2012)27–35,doi:10.1016/j.ress.2011.09.010.
[6] M. Rausand, J. Vatn, Reliability centredmaintenance, in: Complex System MaintenanceHandbook,Springer,Berlin,Heidelberg,2008,pp.79–108,doi:10. 1007/978-1-84800-011-7_4.
[7] C.Milkie,A.N. Perakis,Statisticalmethodsfor planningdieselengine over-haulsintheuscoastguard,NavalEng.J.116(2)(2004)31–42,doi:10.1111/ j.1559-3584.2004.tb00266.x.
[8] M.John,Reliabilitycenteredmaintenance,1997,
[9] G.W.Vogl,B.A. Weiss,M. Helu,A review ofdiagnosticand prognostic ca-pabilitiesandbestpracticesfor manufacturing,J.Intell.Manuf.(2016)1–17, doi:10.1007/s10845-016-1228-8.
[10] J.M.Brown,J.A.CoffeyIII,D.Harvey,J.M.Thayer,Characterizationand prog-nosisofmultirotorfailures,in:StructuralHealthMonitoringandDamage De-tection,Volume7,Springer,Berlin,Heidelberg,2015,pp.157–173,doi:10.1007/ 978-3-319-15230-1_15.
[11]F.Kadri,F.Harrou,S. Chaabane,Y.Sun,C.Tahon,Seasonal arma-basedspc chartsforanomalydetection:applicationtoemergencydepartmentsystems, Neurocomputing173(2016)2102–2114,doi:10.1016/j.neucom.2015.10.009. [12] A.Coraddu,L.Oneto,A.Ghio,S.Savio,M.Figari,D.Anguita,Machine
learn-ingforwearforecastingofnavalassetsforcondition-basedmaintenance ap-plications, in:2015 International Conference on ElectricalSystems for Air-craft,Railway,ShipPropulsionandRoadVehicles(ESARS),IEEE,2015,pp.1–5, doi:10.1109/ESARS.2015.7101499.
[13]B.S.J.Costa,P.P.Angelov,L.A.Guedes,Fullyunsupervised faultdetection and identificationbasedonrecursivedensityestimationand self-evolving cloud-basedclassifier, Neurocomputing150(2015) 289–303,doi:10.1016/j.neucom. 2014.05.086.
[14]M.Baqqar,MachinePerformanceandConditionMonitoringUsingMotor Op-eratingParametersThroughArtificialIntelligenceTechniquesPh.D.thesis, Uni-versityofHuddersfield,2015.
[15]L.Sebastiani,A.Pescetto,L.Ambrosio,etal.,Theconditionmonitoringsystem foroptimalmaintenance-possibleapplicationonoffshorevessels,in:Offshore MediterraneanConferenceandExhibition,OffshoreMediterraneanConference, 2013.
[16]H.Yang,J.Mathew,L.Ma,Intelligentdiagnosisofrotatingmachineryfaults-a review,in:3rdAsia-PacificConferenceonSystemsIntegrityandMaintenance, ACSIM2002,Cairns,Australia,2002,pp.385–392.
[17]W.Li,Z.Zhu,F.Jiang,G.Zhou,G.Chen,Faultdiagnosisofrotatingmachinery withanovelstatisticalfeatureextractionandevaluationmethod,Mech.Syst. SignalProcess.50(2015)414–426,doi:10.1016/j.ymssp.2014.05.034. [18]Z.Zhou,J.Zhao,F.Cao,Anovelapproachforfaultdiagnosisofinductionmotor
withinvariantcharactervectors,Inf.Sci.281(2014)496–506,doi:10.1016/j.ins. 2014.05.046.
[19]R. Jegadeeshwaran, V. Sugumaran, Fault diagnosis of automobile hydraulic brakesystemusingstatistical featuresandsupport vector machines,Mech. Syst.SignalProcess.52(2015)436–446,doi:10.1016/j.ymssp.2014.08.007. [20]S.Yin,G.Wang,H.Gao, Data-drivenprocessmonitoringbasedonmodified
orthogonalprojectionstolatentstructures,IEEETrans.ControlSyst.Technol. 24(4)(2015)1480–1487,doi:10.1109/TCST.2015.2481318.
[21]E.Kokiopoulou,J.Chen,Y.Saad,Traceoptimizationandeigenproblemsin di-mensionreductionmethods,Numer.LinearAlgebraAppl.18(3)(2011)565– 602,doi:10.1002/nla.743.
[22]F.Zhou,J.H.Park,Y.Liu,Differentialfeaturebasedhierarchicalpcafault de-tectionmethodfordynamicfault,Neurocomputing202(2016)27–35,doi:10. 1016/j.neucom.2016.03.007.
[23]R. Akerkar, P.Sajja, Knowledge-Based Systems, Jones&Bartlett Publishers, 2010.
[24]W.J. Verhagen, R.Curran, Knowledge-based engineeringreview: conceptual foundations and researchissues, in: NewWorld Situation: New Directions in Concurrent Engineering,Springer, Berlin, Heidelberg,2010, pp. 267–276, doi:10.1007/978-0-85729-024-3_26.
[25]J.He,A.-H.Tan,C.-L.Tan,S.-Y.Sung,Onquantitativeevaluationofclustering systems,in:ClusteringandInformationRetrieval,Springer,Berlin,Heidelberg, 2004,pp.105–133,doi:10.1007/978-1-4613-0227-8_4.
[26]W.Wu,H.Xiong,S.Shekhar,ClusteringandInformationRetrieval,11,Springer, 2004,doi:10.1007/978-1-4613-0227-8.
[27]C.C.Aggarwal,Proximity-basedoutlierdetection,in:OutlierAnalysis,Springer, Berlin,Heidelberg,2013,pp.101–133,doi:10.1007/978-1-4614-6396-2_4. [28]C.Legány,S. Juhász, A.Babos, Clustervaliditymeasurementtechniques,in:
Proceedingsofthe 5thWSEASInternational ConferenceonArtificial Intelli-gence,KnowledgeEngineeringandDataBases,in:AIKED’06,WorldScientific andEngineeringAcademyandSociety(WSEAS),StevensPoint,Wisconsin,USA, 2006,pp.388–393.
[29]S. Basu, I.Davidson,K. Wagstaff,ConstrainedClustering:Advances in Algo-rithms,Theory,andApplications,CRCPress,2008.
[30]K.Wagstaff,C.Cardie,S.Rogers,S.Schrödl,etal.,Constrainedk-means clus-teringwithbackgroundknowledge,in:ICML,1,2001,pp.577–584.
[31] P.Bradley,K.Bennett,A.Demiriz,Constrainedk-meansclustering,Redmond, 2000,pp.1–8.
[32]P.Cingolani,J.Alcalá-Fdez,jfuzzylogic:ajavalibrarytodesignfuzzylogic con-trollersaccordingtothestandardforfuzzycontrolprogramming,Int.J. Com-put.Intell.Syst.6(sup1)(2013)61–75,doi:10.1080/18756891.2013.818190. [33]L.A. Zadeh, Fuzzy sets, Inf. Control 8 (3) (1965) 338–353, doi:10.1016/
S0019-9958(65)90241-X.
[34]M.M.Breunig,H.-P.Kriegel,R.T.Ng,J.Sander,Lof:identifyingdensity-based localoutliers,in:ACMSigmodRecord,29,ACM,2000,pp.93–104,doi:10.1145/ 335191.335388.
[35]A.Carrascal, A.Díez, A. Azpeitia,Unsupervised methodsfor anomalies de-tectionthrough intelligentmonitoringsystems, in:InternationalConference on HybridArtificialIntelligence Systems,Springer, Berlin,Heidelberg,2009, pp.137–144,doi:10.1007/978-3-642-02319-4_17.
[36]J.L.Fleiss, B.Levin,M.C.Paik,StatisticalMethodsforRatesandProportions, JohnWiley&Sons,2013,doi:10.1002/0471445428.
targetingimportantsectorssuchasmaritime,renewableenergy,railway,agro-food, civilstructuresandmachine-tool.
JoseAntonioPagánRubioobtainedanIndustrial Engi-neeringdegreeatUniversidad PolitécnicadeCartagena (ETSII-UPCT)in 2001.Thatyear heentered NAVANTIA, wherehehasworkedasreliabilityengineerinIntegrated Logistic Support(ILS) for defense programsuntil2007. HeisaReliability-CenterMaintenance(RCM) facilitator. Afterthat, hehasbeen working asanexperienced di-agnosisengineerinNAVANTIAEngineFactoryand heis alsoresponsibleforNAVANTIASmartMaintenance R&D project,startedin2012.Heisformedinpredictive main-tenancetechniquesasvibration,thermographyor ultra-sounds. Nowadays heis a PhD studentat Universidad PolitécnicadeCartagena(ETSII-UPCT)intheDepartment ofFluidsMechanicsandCombustionEngines,beinghisthesisrelatedtodiesel en-ginefailurediagnosisandprediction.
Ricardo Sanz is a Full Professorin Automatic Control and SystemsEngineeringattheUniversidadPolitecnica de Madrid. He got a degree in electrical engineering (1987)and aPh.D.inroboticsandartificialintelligence (1990).Since 1991,hehas been amember ofthe De-partmentofAutomaticControlattheUniversidad Politec-nica deMadrid, Spain. His main research interests fo-cusaround controlarchitectures forintelligent systems, being involved in research lines on autonomous con-trol,softwaretechnologiesfor complex,distributed con-trollers,real-timeartificialintelligence,cognitivesystems, bio-inspiredcontrollersandphilosophical aspectsof in-telligence.HehasbeenassociatededitoroftheIEEE Con-trolSystemsMagazineandchairmanoftheOMGControlSystemsWorkingGroup andtheIFACTechnicalCommitteeonComputersandControl.Heisalsoassociated editoroftheInternationalJournalofMachineConsciousnessandtheBiologically InspiredCognitiveSystemsJournal.Since2004heisthecoordinatoroftheUPM AutonomousSystemsLaboratory.