您的当前位置:首页正文

Learning Large Scale Common Sense Models of Everyday Life

来源:华佗健康网
LearningLargeScaleCommonSenseModelsofEverydayLife

WilliamPentney

DepartmentofComputerScience&Engineering

UniversityofWashingtonbill@cs.washington.edu

DepartmentJeffofElectricalBilmes

Engineering

UniversityofWashington

bilmes@ee.washington.edu

Abstract

Recentworkhasshownpromiseinusinglarge,publiclyavail-able,hand-contributedcommonsensedatabasesasjointmod-elsthatcanbeusedtoinferhumanstatefromday-to-daysen-sordata.Theparametersofthesemodelsareminedfromtheweb.Weshowinthispaperthatlearningtheseparametersusingsensordata(withtheminedparametersaspriors)canimproveperformanceofthemodelssignificantly.Thepri-marychallengeinlearningisscale.Sincethemodelcom-prisesroughly50,000irregularlyconnectednodesineachtimeslice,itisintractableeithertocompletelylabelobserveddatamanuallyortocomputetheexpectedlikelihoodofevenasingletimeslice.Weshowhowtosolvetheresultingsemi-supervisedlearningproblembycombiningavarietyofcon-ventionalapproximationtechniquesandanoveltechniqueforsimplifyingthemodelcalledcontext-basedpruning.Weshowempiricallythatthelearnedmodelissubstantiallybet-teratinterpretingsensordataandandetailedanalysisofhowvarioustechniquescontributetotheperformance.

Introduction

Sensor-basedmethodsforinferringthestateofpeopleandtheirenvironmenthaveavarietyofapplicationssuchasel-dercaremanagement(Wilsonetal.2005)andinstitutionalworkflowmanagement.Asystemthatcouldtellwhetheranelderlypersonlivingalonehasacold,hastakenmedicationorisdepressed,forinstance,couldsubstantiallyreducethefinancialburdenofcare.Akeychallengeinbuildingsuchsystemsistheneedformodelsthatrelatelow-levelsensorsignals(e.g.,vision)tohigh-levelconcepts(e.g.,depres-sion).Theconventionalapproachtoacquiringsuchmod-elsistoapplymachinelearningtechniquestolabeledsensordata,givena“structure”prioronthedependenciesbetweenvariables.Thestructureitselfisoftenprovidedbyapplica-tiondevelopersinamanualfashion.However,ifmanyofthetensofthousandsofaspectsofdailylifearetobetracked,thevariablesandtheirrelationshipsamounttoa“common-sense”encodingofdailylife,andmanualspecificationbe-comesimpracticalforasingleindividual.

MatthaiPhilipose

IntelResearchSeattle

matthai.philipose@intel.com

DepartmentHenryofComputerKautz

Science

UniversityofRochesterkautz@cs.rochester.edu

graphrepresentsawidevarietyofrandomvariablesaboutthestateoftheworldateachstep(frompeoples’emotionstothestateoftheirroof)itisrealistictoexpectmanualla-belingofnomorethanaverysmallfractionofthenodesateachtimestep.Theendresultisaverylargesemi-supervisedlearningproblem(Zhu2005)overamixeddi-rected/undirectedprobabilisticgraphicalmodel.

Althoughgeneralapproachestosemi-supervisedlearn-ingofgraphicalmodelsdoexist,solvinglargeproblemsrequiresconsiderablesophisticationinexploitingproblemstructureinpractice.Inthislight,thispapermakesthreemaincontributions.First,itformulatesthecommonsensemodellearningproblemasmaximumlikelihoodestimationonagraphicalmodelwithminedpriors,andappliesavarietyofexistingtechniquestomakelearningtractable.Second,itpresentsasimplenoveltechniquecalledcontext-basedpruningtospeedupinferencesubstantially.Context-basedpruningcapturestheintuitionthatwemaybeabletoasso-ciatewithsetsofobservationsasubsetofvariables(calledacontext)thatarelikelytobeaffectedbytheseobserva-tions.Byidentifying,andavoidingreasoningabout,irrel-evantcontexts,itmaybepossibletosubstantiallyreducethenumberofvariablesconsideredduringinference.Third,itprovidesadetailedevaluationthatshowsthatlearningisbothreasonablyfastandimprovestheSRCSmodelsignif-icantly,andthatmanyofourdesignchoicescontributesig-nificantlytothisresult.Toourknowledge,thisworkisthefirsttoshowhowtoimprovealargecommonsensedatabasebyobservingsensordataofhumanactivity.

Preliminaries

Themodelonwhichweperformlearningisaslightvari-ationoftheSRCSmodel(Pentneyetal.2006).Wearegivenasetofobservationso=o1,o2,...or;inourcontext,theseobservationswillbeBooleanrandomvariables.Wewishtoinferthestateofasetofqueriess=srandomvariables,foragivenslice1,s2,...softimen−r,alsoBooleant.Collectively,wewillspeakofthestateoftheobservationsandqueriesasthesetofvariablesx=x=s1,x2,...xxon,where1...xr=1,...orandxr+1...xn1,...sn−r,andthesetoftheobservationsandqueriesatsomespecificpointintimetasthestatevectorxWeassumethatwet=xhave1t,xa2t,xset3oft...xweightednt.

proposi-tionalHornclauses(c1,q1),(c2,qm),...(cm,qisarelationoverBooleanvariablesm),whereeachclausectheiaboutthestateofenvironment,weightedbyqualityscoreqiin[0,1].Forinstance,theclauseused(bowl)→action(makecereal)mayhaveweight0.9,andrep-resenttheproposition“ifthebowlisbeingused,theuserismakingcereal.”

GivenasetofHornclauses,weproduceaprobabilisticgraphicalmodeldesignedtomodelthestateoftheenviron-mentovertime.AsinSRCS,timeslicetmaybemodeledwithaMarkovrandomfield,inwhichxoftheworldatt.Someofthevariablestwillrepresentthestatexgroundedinactualobservations;ifwecanobserveitcanbetheuseofabowlattimet,forinstance,thevariablerepresentingused(bowl)maybeprovidedasevidence.InaslightdeviationforSRCS(whichproducesonefeatureperHorn

clause),werepresenteachHornclausecfunctionsφibytwobinary-valuedfeaturei(xci)andφm+i(xallvariablesxci),wherexisthesetofcijwhichappearinci.Thefunc-tionφi(xci)isapositivefeature,andsetto1ifallvariablesinxciaretrue(i.e.boththeleftandrightsidesoftheHornclausearetrue,andthuscfunctionφiissatisfied),and0otherwise.Theifthemassignment+i(xci),conversely,isanegativefeatureandsetto1toxWeassignweightciviolatestheHornclauseand0otherwise.λ(xi=qitopositivefeatureφici),andλm+i=−qitonegativefeatureφwherem+i(xxci).Forsimplicity,wemaysometimesrefertoφi(x),isanentirestatevector;thisshouldbereadasφxˆi(xˆci),where

ciistheassignmentimposedonxcibyˆx

.Thefullsetofallsuchfeaturefunctionsφ=φbetweendifferent1,φvariables2,...φ2mwillrepresenttherelationshipsattimet.

WefollowSRCSinmodelinginhowvariablesxsimplicity,wewillassumethat,giventchangeovertime.ForthatxThisit−1=p,xisrepresentedit=pwithsomefixedprobabilityσwithanothersetoffeaturefunctionsp.ψ={ψ1,ψ2,...ψn},whereψiisafunctionof(xψp,q)=1−σit−1,xit),i(p,p)=σp,andψi(pwherep∈{0,1},p=q.Intuitively,thissimplifyingmodelmeansthat,givennootherevidence,variableswillremaininthestatetheyare,butwithdecliningprobabilityovertime.

Theprobabilityofthestatextgiventhestatexmodelast−1asevi-dencemaybecalculatedinthisP(xt|xt−1)=

1

󰀁d

property,sologP(xˆ|λ,µ)=ˆt|xˆt−1,λ,µ).t=1logP(xThegradientofLLwithrespecttoeachλjis

∂LL

=

󰀃

ψj(xˆt,xˆt−1)+

󰀃

P(xit,xit−1)ψj(xit,xit−1)

logZxˆt−1canbeprohibitivelyexpen-ˆt−1,xˆt\\ytandlogZx

sivetocalculate,evenwithmanystandardapproximationtechniquessuchasMCMC.Toefficientlycalculateanap-proximationtobothlogZxˆt−1,weusetheˆt−1,xˆt\\ytandZx

Bethefreeenergyapproximationtechnique,describedin(Yedidia,Freeman,&Weiss2004).Thistechniqueapproxi-matesthevalueoflogZxˆt−1usingmarginalprobabilitiesofthevariablesxitandfeaturefunctionsφj,ψj,andthisap-proximationtakestheform

∂µj

t

xit,xit−1

Thesecondtermsoftheseequationsareequalto

E(φj(xˆ))andE(ψj(xˆ)),respectively,andmaybecalcu-latedefficientlybycomputingthemarginalprobabilitiesofeachxˆciusingthewell-knownbeliefpropagation(BP)algo-rithm(Pearl1988).WemaythenoptimizeLLbyfindingthemodelparameters(λ,µ)forwhichthegradientiszerousingstochasticgradientdescent(SGD)methods.SGDoptimiza-tionhasbeenshowntobeeffectiveforoptimizationofcon-ditionalrandomfieldsinothercontexts(Vishwanathanetal.2006).ItshouldbenotedthatwhilesomeotherresearchershavefoundtheL-BFGSalgorithmtobeeffectiveforopti-mization(Sutton,Rohanimanesh,&McCallum2004),wefoundSGDtoprovidemuchfasterconvergenceinourex-periments.

Let∇θk(x)bethegradientofthelikelihoodfunctionontraininginstancexundermodelθk.ThestandardSGDal-gorithmbeginswithaninitialmodelθ0=(λ,µ)anditera-tivelyupdatesthemodelbyselectingasingletrainingexam-plex,calculatingthegradient∇θmodelwiththeequation

k−1(x),andupdatingtheθk=θk−1−ηk∇θk−1(x)

whereηkisthegainfactorforiterationk.Weselecttrain-inginstancesforeachiterationbyselectingarandompermu-tationofanentiresetoftracesandproceedingthroughthispermutation,oneiterationpertrace.ThegainfactorcontrolstherateofconvergenceofSGD,andmuchresearchhasbeenconcernedwitheffectiveselectionanditerativeupdatingofηk.Webeginwithηthatof(Y.Lecun1=.1andfollowatechniquesimi-lartoetal.1998)forgainselection.Ouralgorithmtypicallyranfor500-550iterations.

Inpractice,wewillprobablynothaveaccesstoafullyla-beleddatatracexˆ;providingcorrectlabelsforeveryqueryaboutthestateoftheworldateverypointinaperiodoftimecanbequiteexpensive,ifnotimpossible.Wewillthus

assumethateachsliceˆx

imaybeonlypartiallylabeled(per-hapsverysparselyso),withsomesetofvariablesybeledateacht.ThecomputationofeachterminLLtunla-isthen

logP(xˆt|xˆt−1)=log[󰀃

exp((󰀃λiφi(xˆt|yˆt))+

󰀃

yˆt

i

µiψi(xˆt|yˆt,ˆxt−1|yˆt−1))]−logZˆxt−1i

=logZxˆt−1,xˆt\\yt−logZˆx

t−1wherexˆt|ˆvariablesyˆtisthepartialassignmentx

unlabeledsettoyˆtwiththet.Unfortunately,both

logZxˆt−1=󰀃

−[󰀃

P(φj)logλjφj(xˆt)+

xˆt

φj

󰀃P(ψj)logµjψj(xˆt,xˆt−1)−

󰀃

P(φj)logP(φj)+

󰀃

ψj

φj

P(ψj)logP(ψj)]+

󰀃

(dj−1)󰀃P(ˆxkt)logP(ˆ

xkt)ψjj

k

wheredkisthedegreeofvariablexk,orthenumberof

featuresφjandψjdependentuponxk.SincethemarginalscangenerallybecalculatedcorrectlyandefficientlyusingBPontheentiremodelwithnovariablesfixed,thiscanbecomputedefficientlyforlearning,andwehavegenerallyfoundittobeagoodapproximationinpractice.Similarly,theBetheapproximationoflogZthevariablesxˆt−1,xˆt\\ytmaybecalcu-latedbyperformingBPwithfixedbyxxappropriatevalues.

tandt−To1clampedtotheimproveouroptimizationprocedure,wemaketwomorechanges.First,itshouldbenotedthatouroptimizationproceduremayresultinoverfitting.Tolimitthis,weactuallyoptimizethefunctionRLL(λ,µ)=LL(λ,µ)−N(λ,µ),whereN(λ,µ)isaGaussianfunctionofthevector(λ,µ)withmeanzeroandvarianceη(setto5inpractice).Thistermpenalizesexcessivelylargeweightsµ.Wethenhave∂RLLλinthevectorsλand

jandmayoptimizeasdescribed.

∂λ∂LL

j−∂µj=η,Second,inpractice,becausetrueinstancesofvariablesarerelativelyrare,theoptimizationproceduremayfavoramodelwhichisdisproportionatelylikelytolabelvariablesasfalse.Wethusgivemoreweighttocertainslicescontain-ingtrueinstancesofqueriesthatarelessfrequentlytrueinthetrainingdata.Forthelabeledqueryvariablesq1...ql,lettqibethenumberoftrueappearancesofqiinatimeslicein

thetrainingdataxˆ=xˆ1...xˆd,andα(qi)=

d−tqi

100110011001...10011001t=1t=2t=3t=4Figure1:Exampleofourdynamicgraphicalmodelwith

useofcontextforinference.Shadednodesrepresentobser-vations,andaretruewhenblack;thedashedregionsaretheobservations’contexts.Whenobservationsaretrue,featuresinthoseobservations’contextsareappliedtothattimeslice.Thedirectedarrowsrepresentthetemporalfeaturesbetweenslices.

no. of slices 1000se 100cs fo rebmuN 10 1 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018Pct. of total features in feature setFigure2:Distributionofsizesoffeaturesetsovertimesliceswithcontext-basedpruning.

breadth-firsttraversalstoadepthdoneachfactpper,d=2);thiseliminatesalargenumberofthei(inthepa-nodesandfeaturesinthegraph.Whilethisisaneffectivetechnique,itrequiresforeknowledgeofwhatpredicatesonewishestoqueryoverbeforeconstructingthegraph.

Toimproveefficiencyinanothermanner,wetakeadvan-tageofintuitionaboutthestructureofourgraph.Ourcol-lectionofeverydaycommonsensepredicatescontainsmanypredicateswhoserespectivestatesarelikelytobemostlyun-related.Forexample,ifweobservethatakitchenimplementhasbeenused,thisisunlikelytoaffectthestateofthebath-roomsink,whoseintuitive“context”isdifferent.Thus,ifobservationsrelatingtothebathroomsinkhavenotchangedatthistimeaswell,itisunlikelythatthebathroomsinkwillbeaffectedbythenewobservationsthistimeslice.Giventhisintuition,weseektopartitionthetimeslicegraphintosubgraphsbasedonthefeaturesdefineduponthegraph.Werefertothisascontext-basedpruning.

Mostvariables,atanygivenpointintime,arefalse.GivenanobservationorepresentedbyrandomvariablexappropriatesetofnodesCi,itisourgoaltofindanaresultofobeingowhicharemostlikelytobecometrueastrue.Tofindthesenodes,wecalculatethepoint-wisemutualinformationbetweeneachxsetx

.Thepoint-wiseiandofortruevaluesofxmutualinformationioinatraining

ˆbetweenxoforvaluesxˆiandoˆinˆxiscalculatedasPMI(ˆxiandi,oˆ)=

log

P(xi=ˆxi,o=ˆo)d1ˆx

i,oˆ,where1ˆand0otherwise.However,xˆxi,oˆ=1ifxˆi=

xlabelediando=ointhetrainingdata.Inthiscase,weimightnotbemayestimatePMI(ˆxi,oˆ)byestimatingP(x).Weestimatethesei=xˆvaluesi,o=oˆ),P(xbyperformingi=xˆP(o=oˆBPi)andonourexistingmodelwithalllabeledxfindthecontextofanobservationifixedappropriately.Too,wethenselectathresholdǫoandletthecontextofobeC(xInourexperimentalenvironment,o={xi:PMIwhichcontainedi,o)>ǫ24o}.distinctobservations,thesizeofthere-sultingcontextsrangesfromacoupleofnodestoseveralthousand.Thistechniquepermitsthepruningofmanyfea-turesfromthegraphateachtimeslicethatareunlikelytocarrymeaningfulinformationaboutqueriesbasedonobser-vations,andmanynodeswhicharenotsufficientlyrelevanttoanypossibleobservationmaybeprunedfromthegraph.WeproduceasetofcontextsC=(Cobservationsbycalculatingoeach1,CoC2,...Cor)forthefullsetofdataasdescribed.Letζ(C)={φoibasedonthetrainingi:xci⊂C},Oxtbethesetofallobservationsoisett,andζxt=ζ(probability

󰀂

totrueinassignmentoi∈OtCoi).Toperforminference,wethencomputetheP(xt|xt−1)=

1

No.1

d=2

3

d=2

5

Context

LearnNo

24.85%

FL/SGD

69.31%

No

36.66%

Recall30.12%

81.81%

45.87%

94.87%

72.18%

90.76%

F-measure15.94

-26.99

54:52

26.76

43:13

Inf.time8:132:411:12

NolearningAcc96.12%

location(shower)

87.61%

action(addmilkto)

72.26%

stateof(window,dirty)

61.65%86.20%

0.000.000.00

LearningAcc99.47%

47.48

97.12%

0.00

95.97%

5.55

94.87%96.13%

Acc

(d=2)-basedcontextPMI-basedcontext

22.2622.98

Time/slice

Avg.time/slice(s)

15.576

d=1subgraph

5.12

d=3subgraph

1.08

498254370

因篇幅问题不能全部显示,请点此查看更多更全内容