您的当前位置：首页正文

Interleaving Learning, Problem Solving, and Execution in the Icarus Architecture

来源：华佗健康网

InterleavingLearning,ProblemSolving,and

ExecutionintheIcarusArchitecture

PatLangley(langley@csli.stanford.edu)DongkyuChoi(dongkyuc@stanford.edu)SethRogers(srogers@csli.stanford.edu)ComputationalLearningLaboratory

CenterfortheStudyofLanguageandInformationStanfordUniversity,Stanford,CA94305USA

Abstract

Inthispaper,wereviewIcarus,acognitivearchitecturethatutilizeshierarchicalskillsandcon-ceptsforreactiveexecutioninphysicalenvironments.Inaddition,wepresenttwoextensionstotheframework.Theﬁrstinvolvestheincorporationofmeans-endsanalysis,whichletsthesystemcomposeknownskillstosolvenovelproblems.Thesecondinvolvesthestorageofnewskillsthatarebasedonsuccessfulmeans-endstraces.Wereportexperimentalstudiesofthesemechanismsonthreedistinctdomains.Ourresultssuggestthatthetwomethodsinteracttoacquireusefulskillhierarchiesthatgeneralizewellandthatreducetheeﬀortrequiredtohandlenewtasks.Weconcludewithadiscussionofrelatedworkonlearningandprospectsforadditionalresearch.

Keywords:incrementallearning,cognitivearchitecture,reactivecontrol,problemsolving,

hierarchicalskills

1.IntroductionandMotivation

Researchoncognitivearchitectures(Newell,1990)attemptstounderstandthecomputationalin-frastructuresthatsupportintelligentbehavior.Aspeciﬁcarchitecturecharacterizestheaspectsofacognitiveagentthatremainthesameacrosstimeandoverdiﬀerentdomains,andtypicallymakesstrongcommitmentsabouttherepresentationofknowledgestructuresandtheprocessesthatop-erateonthem.Learninghasbeenacentralconcerninmostarchitecturalresearch,withavarietyofmechanismshavingbeenproposedtomodeltheacquisitionofknowledgefromexperience.Thelearningmethodsembeddedinmostcognitivearchitecturesareincremental,reﬂectingevidencethathumansacquireknowledgeinthismanner,buttherehavebeenfewaccountsoftheoriginofhierarchicalstructuresthatappearcrucialtocomplexcognition.

InthispaperwereviewIcarus,acandidatearchitecturethatdivergesfromitspredecessorsonanumberofdimensions.Oneimportantdiﬀerenceisthattypicalarchitectureshandleconcep-tualknowledgeinaproceduralmanner,typicallyusingproductionrules,whereasourframeworkcontainsseparatememoriesforconceptsandskills.Anotherdistinctivefeatureisthatmostar-chitecturesarebasedonproductionsystems,whichencodeknowledgeasa‘ﬂat’setofcondition-actionrules,whereasIcarusmakesanarchitecturalcommitmenttothehierarchicalorganizationofknowledge.OnecancertainlyencodehierarchicalstructuresinframeworkslikeACT-R(Anderson,1993)andSoar(Laird,Rosenbloom,&Newell,1986),butthisremainsthemodeler’schoiceratherthanastrongtheoreticalclaim.Inaddition,mostcognitivearchitecturesevolvedfromtheoriesofhumanproblemsolving,whichhasledtosubordinaterolesforperceptionandactioneveninthoseframeworksthatsupportthem.1Incontrast,Icarusisprimarilyanexecutionarchitecturethatperceivesandreactstoexternalenvironments,whichweviewasmorebasicthanproblemsolving.However,Icarus’relianceonhierarchicalstructuresraiseskeyquestionsabouttheirorigin.Moreover,thearchitecture’semphasisonexecutiondoesnotmeanthatmentalactivitieslikeprob-lemsolvingareunimportant,sincetheycanletanagenthandlenoveltasksforwhichstoredknowledgeisunavailable.Thecentralhypothesisofthispaperisthathierarchicalskillsarise,atleastinmanycases,fromproblem-solvingbehavior,andthat,oncelearned,theagentcanusethesestructurestosupportreactiveexecutionintheenvironment.Moreover,thisacquisitionoccursinanincrementalmanner,withnewskillsbeinglearnedgraduallyastheagentencountersnewproblemsitcannothandlewithoutresortingtoproblemsolving.

WerefertoIcarusasa‘cognitivearchitecture’inthesamesensethattheSoarcommunityusesthatexpression.Bothframeworksaimforconsistencywithgeneralknowledgeabouthumancognitionandhopetosupportthesamebroadrangeofabilitiesthatpeopledemonstrate.However,ourcurrentresearchdoesnotattempttomatchataﬁne-grainedleveltheresultsofpsychologicalexperiments,asdonewitharchitectureslikeACT-R.Wemayaddresssuchissuesinfutureresearch,butfornowweareconcernedwithcoarseregularitiesthatdemandexplanation,suchastheapparenthierarchicalnatureofhumanskillsandtheirincrementalacquisitionfromexperience.

Inthesectionsthatfollow,wereviewIcarus’representationandorganizationofconceptsandskills,alongwiththeinferenceandexecutionprocessesthatutilizethem.Afterthis,wepresent

Page2LearningHierarchicalSkills

anewmodulethatinterleavesmeans-endsproblemsolvingwithexecutionwhenknownskillsareinsuﬃcienttosolveatask.Nextwedescribeamechanismforcreatinggeneralizedskillsfromtracesofsuccessfulproblemsolvingthatsupportsincremental,hierarchicallearning.Wereportexperi-mentswiththislearningmechanismthatdemonstrateitsabilitytogeneralizetonovelsituationsandreduceeﬀortonnewproblems.Inclosing,wediscussearlierresearchonlearningforproblemsolvingandexecution,alongwithsomedirectionsforfuturework.

2.RepresentationandOrganization

Likeothercognitivearchitectures,Icarusmakescommitmentstoitsrepresentationofknowledge,themannerinwhichthatknowledgeisorganized,andthememoriesinwhichitresides.Followingmosttheoriesofhumancognition,theframeworkdistinguishesbetweenlong-termmemories,whichchangeonlygraduallyduetolearning,andshort-termmemories,whichchangerapidlyastheagentrevisesitsbeliefsandgoals.Inthissection,wediscussIcarus’memoriesandtheformalismsusedtoencodetheircontents.2WewilltakeourexamplesfromtheBlocksWorld,sincemanyreadersshouldﬁndthisdomainfamiliar.Wehavedescribedtheseaspectsoftheframeworkinmoredetailelsewhere,includingtheiruseinotherdomainslikein-citydriving(Choietal.,2004)andmulti-columnsubtraction(Langley,Cummings,&Shapiro,2004).2.1Long-TermConceptualMemory

OneofIcarus’long-termmemoriesstoresconceptsthatdescribegeneralizedsituationsintheenvironment.Thesemayinvolveisolatedobjects,suchasindividualblocks,buttheycanalsocharacterizephysicalrelationsamongobjects,suchastherelativepositionsofblocks.Long-termconceptualmemorycontainsthedeﬁnitionsoftheselogicalcategories.Eachelementspeciﬁestheconcept’snameandarguments,alongwithﬁeldswhichdescribeperceptualentitiesthatmustbepresent,lower-levelconceptsthatmustmatch,lower-levelconceptsthatmustnotmatch,andnumericrelationsthatmustbesatisﬁed.Table1presentssomeconceptsfromtheBlocksWorld.Forexample,therelationondescribesaperceivedsituationinwhichtwoblockshavethesamexpositionandthebottomofonehasthesameypositionasthetopoftheother.Theconceptclearinsteadreferstoasingleblock,butonethatcannotholdtherelationontoanyother.

DeﬁnitionsofthissortorganizeIcaruscategoriesintoaconceptualhierarchy.Primitivecon-ceptsaredeﬁnedentirelyintermsofperceptualconditionsandnumerictests,butmanyincorporateotherconceptsintheirdeﬁnitions.Thisimposesalatticestructureonthememory,withmorebasicconceptsatthebottomandmorecomplexconceptsathigherlevels.Theresultinghierarchyissim-ilarinspirittodiscriminationnetworkmodelsofhumanmemorylikeEpam(Richman,Staszewski,&Simon,1995),aswellastoframeworkslikedescriptionlogics(Nardi&Brachman,2002).Struc-turally,thislatticebearsacloseresemblancetotheRetenetworks(Forgy,1982)usedformatchinginproduction-systemarchitectures.

LearningHierarchicalSkillsPage3

Table1.SomeIcarusconceptsfortheBlocksWorld,withvariablesindicatedbyquestionmarks.Perceptsrefer

onlytoattributevaluesusedelsewhereintheconceptdeﬁnition.

2.2Long-TermSkillMemory

Icarusalsoincorporatesasecondlong-termmemorythatstoresknowledgeaboutskillsitcanexecuteintheenvironment,includingtheirconditionsforapplicationandtheirexpectedeﬀects.Eachskillclauseincludesahead(anameandzeroormorearguments)andabodythatspeciﬁestheconceptsthatmustholdtoinitiatetheskillandoneormorecomponents.Aprimitiveskillclauseindicatesoneormoreordered,executableactions,alongwiththoseconceptsthat,takentogether,describethesituationtheskillproduceswhendone.Aprimitiveskillmayalsostateconditionsthatmustholdthroughoutitsexecution,whichmayrequiremultiplecyclestocomplete.Forexample,Table2showstheskillpickup,whichmustsatisfythestartcondition,(pickupable?block?from),deﬁnedinTable1,andinvokes*grasp,whichgraspsablock,and*vertical-move,whichmovesthehandintheverticaldirection.Theskill’sonlystatedeﬀectistomake(holding?block)true.Incontrast,anonprimitiveskillclausespeciﬁeshowtodecomposethatactivityfurther.Forinstance,Table3includestwoclausesforthenonprimitiveskillclear.Eachindicatesthatexecut-ingtheclausewillachievethatgoal,buttheydiﬀerintheirstartconditionsandintheirsubskills.Nonprimitiveskillclausesdonotspecifyeitherrequiredconditionsoreﬀects,buttheirheadsalwayscorrespondstoaconceptthattheskillwillachieveuponsuccessfulcompletion.Thisrepresenta-tionalassumptionﬁgurescentrallyinthelearningmechanismwedescribelater.BecauseIcarusconceptsandskillsutilizeasyntaxsimilartothatfoundintheprogramminglanguageProlog,wehavereferredelsewheretosetsoftheselong-termmemorystructuresasteleoreactivelogicprograms(Choi&Langley,2005).Thisphraseconveysboththeirstructuralsimilaritytotraditionallogicprogramsandtheirabilitytobehavereactivelyinagoal-drivenmanner,followingNilsson’s(1994)notionofateleoreactivesystem.

Page4LearningHierarchicalSkills

Table2.PrimitiveskillsfortheBlocksWorld.Eachclausehasaheadthatspeciﬁestheskill’snameandarguments,

asetoftypedpercepts,asinglestartcondition,asetofeﬀects,andasetofexecutableactions(markedbyasterisks).

2.3Short-TermMemories

Inadditiontolong-termmemories,whichencoderelativelystableknowledgeaboutadomain,Icarusfollowsstandardpsychologicaltheorybyincorporatingshort-termstoresthatchangemorerapidly.Thesecontaintheagent’stemporaryperceptionsandbeliefsabouttheenvironment,aswellasitsgoalsandintendedactivities.Theyinclude:

•aperceptualbuﬀerthatholdsdescriptionsofphysicalentitieswhichcorrespondtotheoutputofsensors;fortheblocksworld,thisincludesliteralslike(blockBxpos10ypos2width2height2),whichspecifythepositionandsizeofindividualblocks.•ashort-termconceptualmemorythatcontainsbeliefsabouttheenvironmentwhichtheagentinfersfromitemspresentinitsperceptualbuﬀerandlong-termconceptmemory;forinstance,thismightcontaintheinstance(onBC),whichisaninstanceoftheonconceptinTable1.•ashort-termskillmemorythatcontainstheagent’sgoalsandassociatedskillinstancesitintendstoexecute;eachgoalliteralspeciﬁesaconcept’snameandargumets,asin(clearA),whereaseachassociatedintentiongivesaskill’snameanditsarguments,asin(stackBC),whichisaninstanceoftheskillstackinTable2.

LearningHierarchicalSkillsPage5

Table3.SomenonprimitiveskillsfortheBlocksWorldthatinvolverecursion.Eachskillclausehasaheadthat

speciﬁesthegoalitachieves,asetoftypedpercepts,oneormorestartconditions,andasetoforderedsubskills.Numbersaftertheheaddistinguishdiﬀerentclausesthatachievethesamegoal.

Unlikemostcognitivearchitectures,everyelementintheshort-termconceptualandskillmemoriesmustbeaninstanceofsomegeneralizedstructureinthelong-termconceptualandskillmem-ory,respectively;theycannotbearbitrarysymbolicstructures.Wehavediscussedthisstrongcorrespondenceassumptionatmorelengthelsewhere(Langley&Rogers,2005).

3.ConceptualInferenceandSkillExecution

Likemostcognitivearchitectures,Icarusoperatesindistinctcycles.Oneachsuchiteration,thesystemupdatesitsperceptualbuﬀerbysensingobjectsinitsﬁeldofview,withthespeciﬁcsensorsdependingontheparticularenvironmentinwhichtheagentisoperating.Thisprocessproducesperceptualelements,whicharearedepositedintheperceptualbuﬀerandwhichinitiatematchingagainstlong-termconcepts.Thematchercheckstoseewhichprimitiveconcepts(i.e.,thosedeﬁnedentirelyintermsofpercepts)aresatisﬁed,addseachmatchedinstancetoconceptualshort-termmemory,andrepeatstheprocessonnonprimitiveconceptstoinferhigher-levelbeliefs.

Inthisway,Icarusinfersallinstancesofconceptsthatareimpliedbyitsconceptualdeﬁnitionsandthecontentsoftheperceptualbuﬀer.Forexample,aBlocksWorldagentwouldﬁrstupdateitsdescriptionsoftheblocksandthetable,theninferprimitiveconceptslikeon,andﬁnallyinfercomplexconceptslikeunstackable.Thisbottom-upprocedureoperatesinmuchthesamewayastheRetenetworks(Forgy,1982)usedinmanyproduction-systemarchitecturesandthelogicalinferencemethodsusedinmanytruth-maintenancesystems(e.g.,Doyle,1979).Thedefaultprocessisexhaustive,butelsewherewehavereportedanalternativemechanismthatmakesinferencesmoreselectively(Asgharbeygietal.,2005).

Oneachcycle,thearchitecturealsoexaminestheagent’sgoalsandtheirassociatedintentionsinshort-termskillmemorytodeterminewhich,ifany,applytothecurrentsituation.3Foreachintendedskillinstance,Icarusaccessesallclausesofthegeneralskilltoseeiftheyareapplicable.

Page6LearningHierarchicalSkills

Sincevariablescanbeboundwithinaskill’sbody,thissetmayincludemultiplevariantsofeachskillclausestoredinlong-termmemory.Aprimitiveskillclauseisapplicableif,foritscurrentvariablebindings,itseﬀectsdonotyethold,itsrequirementsaresatisﬁed,and,ifthesystemhasnotyetstartedexecutingit,thestartconditionsmatchthecurrentsituation.Ahigher-levelskillclauseisapplicableifitsheadisnotsatisﬁed,thestartconditionsaresatisﬁedifithasnotbeeninitiated,andatleastonesubskillisapplicable.Becausethislattertestisrecursive,askillisapplicableonlywhenIcaruscanﬁndatleastoneacceptablepathdownwardtoexecutableactions,whichthearchitecturereturnsforinvocation.

Forexample,supposeanIcarusagenthasthegoal(clearA)inasituationwhereblockAisonthetable,blockBisonA,blockCisonB,andthehandisempty.SupposefurtherthattheagenthasaccesstotheprimitiveskillsinTable2andthenonprimitiveonesinTable3.Inthiscase,thesystemwouldﬁndanapplicablepaththroughtheskillhierarchythatisrelevanttoitsgoal:[(clearA),(unstackableBA),(clearB),(unstackableCB),(clearC),(unstackCB)].Thisholdsbecausetheinstantiatedstartconditionsofeachskillalongthepath(e.g.,(onBA)and(hand-empty)forthetopmostskill)arepresentinconceptualshort-termmemory.Ifselected,(unstackCB)wouldaltertheenvironment,makingthepath[(clearA),(unstackableBA),(clearB),(unstackableCB),(hand-empty),(putdownCT)]acceptableonthenextcycle.Thiswouldproduceabeliefstatethatenablesthenextstepintheprocedure,whichwouldcontinueuntiltheagenthadsatisﬁeditstop-levelgoal,(clearA).

Duringskillselection,Icarusincorporatestwopreferencesthatprovideabalancebetweenreac-tivityandpersistence.Whenconfrontedwithachoicebetweentwoormoresubskills,itselectstheﬁrstalternativeforwhichtheheadisnotsatisﬁed.Thissupportsreactivecontrol,sincethesystemreconsiderspreviouslycompletedsubskillsand,iftheireﬀectsnolongerholdforsomereason,reex-ecutesthemtoremedytheproblem.Ontheotherhand,whenencounteringtwoormoreapplicableskillpaths,Icarusselectstheonethatsharesthemostelementsfromthestartofthepathexecutedonthepreviouscycle.Thisencouragesthesystemtocontinuingexecutingahigh-levelskillithasalreadystarteduntilthatskillachievesitsassociatedgoaloruntilitbecomesinapplicable.

4.Means-EndsProblemSolving

Asjustexplained,Icaruscanexecutecomplexhierarchicalskillsinareactivemanner,butourinitialstudies(e.g.,Choietal.,2004;Langleyetal.,2004)assumedthattheseskillsarealreadypresentinlong-termmemory.Althoughmuchhumanbehaviorappearstoinvolvetheapplicationofsuchroutineskills,peoplecanalsosolvenoveltasksthatrequirethedynamiccombinationofexistingknowledgeelementsthroughsomeformofheuristicproblemsolving.

TomodelthiscapabilityinIcarus,wehaveintroducedavariantofmeans-endsanalysis(Newell,Shaw,&Simon,1960)thatoperatesoverthearchitecture’sknowledgestructures,includingbothlong-termconceptsandskillsprovidedbytheprogrammerandshort-termbeliefsandgoalspro-ducedbythearchitecture.Traditionalmeans-endsproblemsolvingselectssomeunsatisﬁedaspectofthegoalstatetoachieve,thenselectsanoperatorthatwouldachieveit.Ifthatoperator’spreconditionsmatchthecurrentstate,itisapplied;otherwise,themethodselectsanunsatisﬁed

LearningHierarchicalSkillsPage7

preconditiontoachieve,selectsanoperatorthatwouldachieveit,andsoon.Onceaconditionismet,theprocessisrepeateduntiltheoriginalgoaldescriptionissatisﬁed.Thismayrequiresearch,whichisoftenpursuedinadepth-ﬁrstmanner.Means-endsanalysishasbeenimplicatedrepeatedlyinhumanproblemsolvingonnoveltasks.

Tosupportthismechanism,ourextendedversionofIcarusaugmentstheshort-termskillmem-orywithagoalstack.Eachelementinthisstackspeciﬁesagoal(adesiredconceptinstance),whethertheagentintendstoachieveitbybackwardchainingoﬀaconceptdeﬁnitionoraskillclause,and,inthelattercase,theskillinstancethat,ifexecuted,shouldachieveit.Eachgoalelementalsospeciﬁessubgoalsthathavealreadybeenachieved,alongwithskilland/orconceptinstancesthatithastriedinreachingthisgoalbutthathavefailed.Theﬁrstareneededtokeepthesystemfromconsideringskillsthatwouldundoitspreviousaccomplishments,whereasthesec-ondensuresitdoesnotrepeatearliermistakes.Wealsoassumethatboththestartconditionsofprimitiveskillsandtop-levelgoalsmustbecastassinglerelationalliterals,whichcausesnolossingenerality,sinceeithermaybedeﬁnedconcepts.

WehavealsoextendedtheIcarusinterpretertotakeadvantageofthesenewmemorystructures.Oneachcycle,thesystemtakesoneofﬁvedistinct,orderedsteps:

1.IfthecurrentgoalGofthegoalstackGSissatisﬁed,thenpopGfromGSandstoreinformation

aboutthesuccesswithG’sparent.2.IfthegoalstackGSdoesnotexceedthedepthlimitandthereareapplicableskillpathsthat

startfromaskillinstancewiththecurrentgoalGasitshead,thenselectonesuchpathandexecuteit.3.IfthereisanonemptysetofprimitiveskillinstancesinwhichthecurrentgoalGisaneﬀectthat

havenotalreadyfailed,thenselectaskillinstancefromthissetandpushitsstartcondition(whichweassumesubsumesanyrequiredconditions)ontothegoalstackGS.4.IfthecurrentgoalGisaninstanceofacomplexconceptwithunsatisﬁedsubconceptsHand

withsatisﬁedsubconceptsF,thenifthereisasubconceptIinHthathasnotyetfailed,pushIontothegoalstackGS.5.OtherwisepopthecurrentgoalGfromthegoalstackGSandstoreinformationaboutthe

failurewithG’sparent.Weassumethateachoftheseactivitiestakesasinglecycleofthearchitecture,withtheinitialsituationbeingaspecialcaseofthethirditemthattriggerstheprocess.Becausereasoningabouthowtoachieveanobjectivecanrequiremanymanipulationsofthegoalstack,ittakesmorecyclesthanexecutingastoredhierarchicalskillforthatgoal,evenwhentheagentﬁndsasolutiononitsﬁrstattemptanddoesnothavetobacktrack.

Figure1showsasuccessfultraceoftheproblemsolver’sbehavioronasimpleBlocksWorldtaskinwhichwhenthegoalis(clearA)andwhenblockAisonthetable,blockBisonA,blockCisonB,andthehandisempty.Inthissituation,thesystemlooksforexecutableskillswiththisgoalasitsheadbut,whenthisfails,itconsidersskillsthathavethegoalasoneofitseﬀects.Inthiscase,invokingtheprimitiveskillinstance(unstackBA)wouldproducetheintendedresult,butit

Page8LearningHierarchicalSkills

Figure1.AtraceofsuccessfulproblemsolvingintheBlocksWorld,whichellipsesindicatinggoalsandrectangles

denotingprimitiveskills.

cannotbeappliedbecauseitsinstantiatedstartcondition,(unstackableBA),doesnothold.Inresponse,theproblemsolverstorestheskillinstancewiththeinitialgoalandpushesthesubgoalontothegoalstack.

Next,theproblemsolverattemptstoretrieveskillsthatwouldachieve(unstackableBA).How-ever,becauseithasnosuchskillsinmemory,itresortstochainingoﬀthedeﬁnitionofunstackable.Thisinvolvesthreeinstantiatedsubconcepts–(clear),(onBA),and(hand-empty)–butonlytheﬁrstoftheseisunsatisﬁed,sothesystempushesitontothegoalstack.Thisinturnleadsittoconsiderskillsthatwouldproducethisliteralasaneﬀectandretrievestheskillinstance(unstackCB),whichitstoreswiththecurrentgoal.Inthiscase,thestartconditionoftheselectedskill,(unstackableCB)alreadyholds,sothesystemexecutes(unstackCB).Theassociatedactionsaltertheenvironmentandcausetheagenttoinfer(clearB)fromitspercepts.Inresponse,itpopsthisgoalfromthestackandreconsidersitsparent,(unstackableBA).However,thishasnotyetbeenachievedbecauseexecutingtheskillhascausedthethirdofitscomponentconceptinstances,(hand-empty),tobecomefalse.Thus,thesystempushesthisontothestackand,uponinspectingmemory,retrievestheskillinstance(putdownCT),whichitcananddoesexecute.

Thissecondstepachievesthesubgoal(hand-empty),whichinturnletstheagentinfer(unstack-ableBA).Thus,theproblemsolverpopsthiselementfromthegoalstackandexecutestheskillinstanceithadoriginallyselected,(unstackBA),inthenewsituation.Uponcompletion,thesystemperceivesthatthealteredenvironmentsatisﬁesthetop-levelgoal,(clearA),whichleadsittohalt,sinceithassolvedtheproblem.

LearningHierarchicalSkillsPage9

Forthesakeofclarity,bothourdescriptionandFigure1presentthetraceofsuccessfulproblemsolving,butﬁndingsuchasolutionmayinvolvesearch.Whenbackwardchainingoﬀskillsthatwouldachievetheobjectiveofthecurrentstackentry,Icarusconsidersonlyskillinstancesthathavenotyetfailed.Thesystemalsopreferscandidatesthathavethefewestexpandedstartcondi-tionsthatareunmetbythecurrentenvironmentalstate,withfullymatchedconditionsbeingmostdesirable.Ifcandidatestieonthiscriterion,itselectsanalternativeatrandom.Whenbackwardchainingoﬀtheunmatchedelementsofaconceptdeﬁnition,thesystemselectssubgoalsatrandomaftereliminatingthosewhichhavefailedinthepast.

Takentogether,thesebiasesproduceaheuristicversionofmeans-endsanalysis.However,thisproblem-solvingmethodistightlyintegratedwiththeexecutionprocess.Icarusbackwardchainsoﬀconceptorskilldeﬁnitionswhennecessary,butitexecutestheskillassociatedwiththetopstackentryassoonasitbecomesapplicable.Moreover,becausethearchitecturecanchainoverhierarchicalreactiveskills,theirexecutionmaycontinueformanycyclesbeforeproblemsolvingisresumed.Incontrast,mostmodelsofhumanproblemsolvingandmostAIplanningsystemsfocusonthegenerationortheexecutionofplans,ratherthaninterleavingthetwoprocesses.

Ofcourse,executingacomponentskillbeforeconstructingacompleteplancanleadanagentintodiﬃculties,sinceonecannotalwaysbacktrackinthephysicalworld.Thisstrategymaywellleadtosuboptimalbehaviors,buthumanintelligenceismoreaboutsatisﬁcingthanoptimizing,andinterleavingproblemsolvingwithexecutionrequiresfarlessmemorythanconstructingafullplanbeforeexecutingit.However,itcanproducesituationsfromwhichtheagentcannotrecoverwithoutstartingtheproblemover.

Insuchcases,Icarusstoresthegoalelementforwhichitsexecutedskillcausedaproblem,alongwitheverythingbelowitinthestack.Thesystembeginstheproblemagain,thistimeavoidingtheskillandselectinganotheroption.Ifitmakesadiﬀerentexecutionerrorthistime,itagainstorestheproblematicskillanditscontext,thenstartsoveroncemore.Icarusalsostartsoverifithasnotachievedthetop-levelobjectivewithinaspeciﬁednumbercycles.Suchrepeatedattemptsatsolvingatask,withselectedmemoryaboutpreviouspasses,seemsabettermodelofhumanproblemsolvingthansystemsthatconstructacompleteplanbeforeexecution.JonesandLangley’s(inpress)modelofmeans-endsproblemsolving,Eureka,usedasimilarrestartstrategy,butitkeptnoexplicitrecordofpreviousfailedpaths.

5.LearningHierarchicalSkillsfromProblemSolving

Inthepreviouspages,wedescribedtwofacetsofIcarus:itsexecutionofhierarchicalskillsonfamiliartasksanditsuseofproblemsolvingtohandlenovelones.Theﬁrstletsthesystemoperateeﬃciently,butskillsaretedioustoconstructmanually,whereasthesecondgivesthesystemﬂexibilitybutrequiresreasoningandmeans-endssearch.Webelievethathumansalsohavebothcapabilities,butthattheyuselearningtotransformtheresultsofsuccessfulproblemsolvingintohierarchicalskills.WewouldliketoincorporateasimilarcapabilityintoIcarus.

However,wewantourlearningmechanismstoreﬂectcertainpropertiesthatappeartoholdforhumanskillacquisition.Oneisthatlearningshouldtakeadvantageofexistingknowledge,suchas

Page10LearningHierarchicalSkills

thedeﬁnitionsofcurrentskillsandconcepts.Inaddition,acquisitionshouldbeincremental,inthatitlearnsfromeachnewexperience,andinterleavedwiththeproblem-solvingprocess.Therecentliteratureoncomputationallearningcontainsfewcasesofsuchknowledgeacquisition,althoughinSection7wediscusssomeolderworkthathasthischaracter.

OurextensionofIcarusachievesthiseﬀectthroughaformofimpasse-drivenlearningthatistiedcloselytoitsproblem-solvingandexecutionprocesses.Forthisreason,thelearningmechanismsrequirenoadditionalinputsbeyondthoserequiredforthesebasicperformanceprocesses.AsinSoar(Lairdetal.,1986),thepurposeofskilllearningistoavoidsuchimpassesinthefuture.Thus,wheneverthearchitectureachievesanobjectivethatisassociatedwithanentryinthegoalstack,thissuccessprovidesanopportunityforlearning.Thesystemacquirestwodistinctformsofskillthataretiedtodiﬀerentaspectsofproblemsolving.

TheﬁrstclassofskillsresultfromsituationsinwhichtheproblemsolvercannotﬁndaskilltoachieveagoalG,andthuspursuessubgoalsbasedontheunsatisﬁedconditionsofG’sconceptualdeﬁnition.Iftheagentachievesthesesubgoalsintheorder{G1,G2,...,Gn},thussatisfyingtheparentgoalG,IcarusconstructsanewskillclausethathasGasitsheadandthathas{G1,G2,...,Gn}asitsorderedsubskills.4ThestartconditionsofthenewclausearesimplythosesubconceptsofGthatweresatisﬁedwhenitwaspushedontothegoalstack.Thehead,conditions,andsubskillshavetheirargumentsreplacedbyvariablesinaconsistentmanner,ensuringapplicabilitytoanalogoussituationsthatinvolvediﬀerentobjects.

Forexample,uponachievingthesubgoal(unstackableBA)inFigure1,thesystemconstructstheunstackableskillclauselabeled3inTable3.Thehead(unstackable?B?A)isageneralizedversionofthegoal(unstackableBA),whereastheorderedsubskills(clear?B)and(hand-empty)aregeneralizedversionsofitstwosubgoals(clearB)and(hand-empty).Thestartconditionsare(on?B?A)and(hand-empty),whicharegeneralizedversionsofthesubconceptsthatheldwhenthegoalwasestablished.Finally,the:perceptsﬁeldspeciﬁesthetypesforobjectsthatserveasthehead’sarguments.Thismechanismconstructsdiﬀerentvariantsofaskill,withseparatestartconditionsanddistinctsubskills,fromsubproblemsthatinvolvediﬀerentinitialconditions.ThesecondcategoryresultsfromsituationsinwhichIcarushasselectedaprimitiveskillinstanceS2inordertoachieveagoalG,butfounditssinglestartconditionG2unsatisﬁedandselectedanotherskillinstance,S1,toachieveit.Oncetheagenthasexecutedbothskillssuccessfullyandithasreachedthegoal,thesystemconstructsanewskillclausethathasGasitsheadandthathasG2(ratherthanthespeciﬁcclauseS1)andS2asorderedsubskills.ThestartconditionsaresimplythestartconditionsoftheS1clauseusedinthesubproblemsolution,whicharesuﬃcientbecausetheproblemsolverS1selectedittoachievesthestartconditionofS2,whichinturnachievesthegoalG.Again,speciﬁcargumentsarereplacedconsistentlybyvariables.

Forinstance,uponachievingthetop-levelgoal(clearA)inFigure1,Icaruscreatestheclearskillclauselabeled4inTable3.Thisincorporatesageneralizedversionof(clearA)asitshead,alongwithvariableizedversionsof(unstackableBA)and(unstackBA)asitstwoorderedsubskills.Thestartconditions,(on?B?A)and(hand-empty),arethesameasthoseforunstackableclause

LearningHierarchicalSkillsPage11

3justdiscussed,sincethelatterwascreatedtoachievethestartconditionofunstackunderthosesameconditions,whichinturnsatisﬁesthegoalclear.

Bothlearningmechanismsarefullyincremental,inthateachlearningeventdrawsonasingleproblem-solvingexperienceandthusrequiresnomemoryofpreviousones.Theysupportwithin-triallearning,sinceskillsacquiredononesubproblemmaybeusedtohandlelatersubproblems.Theprocessesalsobuildonexistingknowledge,sincetheconstructionofnewskillclausesinvolvesthecompositionofthoseusedinatrainingproblem’ssolution.Takentogether,thesesupportaformofcumulativelearning,inwhichIcaruslearnsskillsononeproblem,usesthemtosolvealaterproblem,andincorporatesthemintostillhigher-levelstructures.

Assuggestedbyourexamples,theselearningmethodscanacquirebothdisjunctiveandrecursiveskills.Thekeytothisabilityliesintheassumptionthatacquiredskillclauseswhichachievethesamegoalshouldbegiventhesamehead.Byindexingskillsinthismanner,Icarusknowswhentwoormoreclausesshouldbestoredtogether,whichleadsinturntothecreationofskillsthatcallonthemselves,eitherdirectlyorthroughintermediateskills.Thismakesthearchitecture’slearnedskillsconsiderablymoreﬂexibleandgeneralthantraditional‘macro-operators’(e.g.,Iba,1988)orcomposedproductionrules(e.g.,Neves&Anderson,1981).

Ofcourse,thecreationofdisjunctiveandrecursivestructureshaspotentialforovergeneralization,asdemonstratedbyresearchontheinductionofcontext-freegrammars(e.g.,Langley&Stromsten,2000).Ourtechniquefordeterminingthestartconditionsonnewskillclausesismuchsimplerthanstandardtechniquesforanalyticallearningorruleinduction.Infact,atﬁrstglance,thelearnedclausesinTable3appearhighlyovergeneral,butthisignoresthefactthatIcarusdoesnotinterpretskillsinisolation.Recallthatthearchitecturemustﬁndanentirepaththroughtheskillhierarchybeforeitcanexecutetheprimitiveskillatitsterminus.Thismeansthesystemcollectsconditionsdynamically,asitdescendsthehierarchy,guardingagainstovergeneralizationbycarryingoutlimitedanalysisatperformancetimeratherthandoingitallatlearningtime.Unlikesomeapproachestoincrementallearning,Icarus’methodsrequirenoadditionalmecha-nismsforskillreﬁnement.Eachskillclauseisgeneralizedwhenthearchitectureconstructsit,anditsstartconditionsareassumedtobeaccurate.Theknowledgeitacquiresfromsolvingagivenproblemmaywellbeincomplete,butthiswillsimplyleadtofurtherimpassesthatproduceaddi-tionallearning.Skillclausesacquiredlatercomplement,butdonotcompetewith,thoselearnedearlierbecausetheycoverdiﬀerentsituationsortheolderclauseswouldhaveavoidedtheimpasse.Thus,learningispurelymonotonic,asinframeworkslikeSoar.

Weshouldnotethatourcurrentimplementationrestrictstheuseoflearnedskillsinfutureproblemsolving.Inparticular,wehaveadoptedMooney’s(1989)ideathatoneshouldnotchainoﬀthepreconditionsoflearnedskills.Thisdoesnotrestricttheirusebytheexecutionmodule,butitdoesmeanthattheproblemsolverconsidersalearnedskillonlywhenitsstartconditionsarealreadysatisﬁed.Asaresult,clausesacquiredfromchainingoﬀskillsalwayshavealeft-branchingstructureinwhichthesecondsubskillisprimitive.Thisassumptionmayseemrestrictive,but,likeMooney,webelieveitprovidesaneﬀectiveguardagainsttheutilityproblem(Minton,1990),inwhichthecreationanduseofcomplexstructuresreducessearchbutactuallyslowsperformance.

Page12LearningHierarchicalSkills

6.ExperimentalStudiesofSkillLearning

Althoughthenewmethodsforlearninghierarchicalskillsseemplausible,whethertheyimproveanIcarusagent’sperformanceisanempiricalquestion.Inthissection,wereporttheresultsofbasictestsofthesemechanismsonthreedistinctdomains:in-citydriving,theBlocksWorld,andFreeCellsolitaire.Afterthis,wereportmoresystematicexperimentswiththedomainsthatexaminetheeﬀectsoflearninginmoredetail.Asonemeasureofperformance,weusedthenumberofrecognize-actcyclesrequiredtosolvetheprobleminthesimulatedenvironment,includingbothproblemsolvingandexecutionsteps.However,wealsomeasuredtheCPUtimerequiredtosolveeachproblem,todeterminewhetherIcarussuﬀersfromtheutilityproblem.6.1DomainsandBasicDemonstrations

Toensurethatourapproachtolearninghierarchicalskillsoperatedasintended,wedevelopedIcarusprogramsforthethreedomains.Ineachcase,weprovidedasetofprimitiveskillssuﬃcientforsolvingproblemswithmeans-endsanalysisandasetofhierarchicalconceptssuﬃcientforrecognizingsituationsthatwererelevanttoexecutingthoseskills.Forexample,wedevisedsome41conceptsand19skillsforthein-citydrivingdomain,11conceptsandfourskillsfortheBlocksWorld,and24conceptsand12skillsforFreeCellsolitaire.TheAppendixgivesthenamesoftheprimitiveconcepts,nonprimitiveconcepts,andprimitiveskillsprovidedforeachdomain,whichshouldalsosuggesttheirfunction.Inaddition,wealsoprovidedthearchitecturewithasetofsensorsandeﬀectorsforeachsimulatedenvironment.

WehavealreadydiscussedtheBlocksWorld,butbothin-citydrivingandFreeCellmeritsomeexplanation.Theﬁrstdomaininvolvesadynamicsimulationofadowntowndrivingenvironment.Thecitycontainsobjectsrepresentedasrectanglesofdiﬀerentsizes,includingbuildingsandside-walksorganizedintosquareblocksthataredividedbystreetsegmentsandintersections.Eachsegmentincludesayellowcenterlineandwhitedottedlanelines,andithasamarkedstreetnameandspeedlimit.Eachbuildingshasauniquestreetaddresstohelptheagentnavigatethroughthecityandtosupporttaskslikepackagedelivery.Thecityconﬁgurationusedinourexperimentshasnineblockswithfourverticalstreetsandfourhorizontalstreets.TheIcarusagentmustoperateunderphysicallawsandfollowtherulesofdriving,suchasstayingontherightsideofthestreetandturningfromtheproperlane.Weprovidetheagentspeciﬁcwithgoalstoachieve,suchasgettingontoanotherstreetsegmentordeliveringapackagetoacertainaddress.

FreeCellsolitaireinvolveseightstacksofcards,theﬁrstfourofwhichcontainsevencardsandthelastfourcontainsixcards.All52cardsaredealtfaceup,makingthemvisibletotheplayer.Inaddition,therearefourfreecells,whichcanserveastemporaryholdingspotsforonecardeachduringthegame,andonefoundationcellforeachsuit.ThegoalinFreeCellistogetallcardsonthefoundationcellsinascendingorder(wheretheaceisoneandthekingisthirteen)groupedbysuit.Onceonitsfoundationcell,acardcannotberemoved.Onlyfully-exposedcardsatthetopofeachstackandcardsthatinthefreecellsareinplay.Theagentcanmoveonecardatatimetoanavailablefreecell,totheappropriatefoundationcell,toanemptystack,ortoastackinwhichthetopcardhasadiﬀerentcolorandvalueonehigherthanthemovedcard.

LearningHierarchicalSkillsPage13

Samplerunswiththein-citydrivingdomain,theBlocksWorld,andareducedversionofFreeCellindicatedthattheextendedversionofIcaruswasabletosolveproblemsintheirrespectivedomainswithsomesearchand,fromtheirrespectivetraces,learnhierarchicalskillsinthemannerdescribedearlier.Wefoundthat,whengiventhesametasktosolveasecondtime,thesystemutilizedthisknowledgetohandleitwithoutproblemsolving.Moreover,becausethesystemgeneralizesitslearnedstructuresbeyondthespeciﬁcinstancesonwhichtheyarebased,theytransferfullytoanytasksthatareisomorphictothoseithasalreadysolved.Theonlyconstraintisthatthisisomorphismmustinvolvethesamegoalandhavethesameconceptssatisﬁedorunsatisﬁedintheinitialenvironment.

However,weshouldnotethisabilitydoesnotmeanthatthesystemcancompleteafamiliarprobleminasinglecycle.Recallthat,traditionalworkoncognitivearchitectures,Icarusresortstoproblemsolvingonlytoenableaction,anditmuststillexecuteitsacquiredskillstoachieveagoal.Thus,foraproblemthatrequiresfourprimitivesteps,thesystemtakessixcyclesonthesecondencounter,withonetoretrievethehierarchicalskillandonetorealizeithasﬁnished.However,theagentrequiresneithersearchorbackwardchainingoverskillsorconceptstocompleteanyproblemithassolvedpreviously.6.2ExperimentwithIn-CityDriving

Althoughtheseinitialrunswereencouraging,wedesiredmorethananecdotaldemonstrationsthatthenewmechanismssupportedincrementallearningofhierarchicalskills.Wealsowantedevidencefromsystematicexperimentsthatthislearnedknowledgeproducesmoreeﬀectivebehavior.Ourﬁrststudyalongtheselinesfocusedonin-citydriving,whichisthemostdynamicofthethreesettingsandthustheonemostappropriateforevaluatingourmethodsforlearningskillsthatsupportreactiveexecution.

Asnotedabove,weprovidedIcaruswith41conceptsand19primitiveskillsrelevanttothisenvironment.Withthebasicknowledge,theagentcancharacterizeitssituationatmultiplelevelsofabstractionandperformactionsforaccelerating,decelerating,andsteeringleftorrightatrealisticangles.Thus,itcanoperateavehicle,butthisisnotsuﬃcienttodrivesafelyinacityenvironment.Theagentmuststilllearnskillsforstayingalignedandcenteredwithinlanelines,changelanes,increaseordecreasespeedforturns,andstopforparking.

Toencouragesuchlearning,wepresentedtheagentwiththegoalofdrivingonadiﬀerentstreetsegmentthanitscurrentone.Toachievethisobjective,itresortedtoproblemsolving,whichfoundasolutionpaththatinvolvedchangingtotherightmostlane,stayingalignedandcentereduntiltheintersection,steeringrightintothetargetsegment,turningthecorner,andﬁnallyaligningandcenteringinthenewlane.WelettheIcarusagentpracticethistaskforﬁvetrialstoexamineitsimprovementwithexperience.Werepeatedthisproceduretendiﬀerenttimeswithslightlydiﬀerentstartingpositions,collectedperformancemeasuresforeachrun,andaveragedtheresults.Figure2showsthetotalnumberofcyclesasafunctionofthenumberoftrials,alongwiththenumberofplanningandexecutioncyclesrequiredtoachievethegoal.Astheagentaccumulatesknowledgeaboutthistask,problemsolvingdisappearsalmostentirely,whichcausesthereduced

Page14LearningHierarchicalSkills

Number of cycles required200250300150totalplanningexecution

5010012345

Number of trials

Figure2.Thetotalnumberofcyclesrequiredtosolveaparticularright-turntaskalongwiththeplanningand

executiontimes,asafunctionofthenumberoftrials.Eachlearningcurveshowsthemeanovertensetsoftrialsand95percentconﬁdenceintervals.

numberoftotalcycles.However,thisproblemisdominatedbyexecutiontime,sincetheagentmustactuallydrivethevehicletoitsdestination.Executioncyclesappeartoincrease,whichoccursnotbecausethelearnedskillsareineﬃcientbutratherbecausetimeprogressesevenduringproblemsolving.Thus,thevehiclemovesintherightdirectionduringthisperiodintheearlytrials,reducingthedistancethatremainstotravel.Asproblemsolvingbecomesunnecessary,theagentdrivesthisextradistanceunderconsciouscontrolratherthanaccidentally.CPUtimeremainedapproximatelythesamewithincreasedexperience,presumablyforthesamereasons.

Table4showstheﬁveskillclausesacquiredduringoneoftheseruns.Thetwoclausesfordriving-in-segmentspecifydiﬀerentdecompositionsforachievingthistop-levelgoalunderalternativestartconditions.Thesecondofthesereferstotheclauseforin-segment,whichreferstothelearnedsub-skillforin-intersection-for-right-turnandtheprimitiveskillsteer-for-right-turn.Theformerreferstoin-rightmost-lane,whichinvokestheprimitiveskillclausedriving-in-segment,butitalsocallsonitselfrecursivelywithdistinctarguments.Forclariﬁcation,thetablealsopresentstheprimitiveclauseforin-intersection-for-right-turn,whichthesystemwasgivenasbackgroundknowledge.Figure3showsatraceoftheagent’sbehavioronthetaskduringlearning,inasituationthatinvolvesastreetwithtwolanes,andafterwards,inasettingthatinsteadinvolvesthreelanes.Thetraceofthevehicle’smovementdemonstratesthatthelearnedskillsgeneralizetocasesthatinvolvemorelanesthanwerepresentduringtraining.Thisabilityfollowsdirectlyfromtherecursivestruc-tureofthelearnedin-intersection-for-right-turnclause.Behaviorafterlearningisalsosmoother,presumablybecausetheagentneednotengageinproblemsolvingwhenitovershootsslightlyaftergettingintothetargetlaneinpreparationfortherightturn.6.3ExperimentwiththeBlocksWorld

AlthoughtheBlocksWorldisfarlessdynamicthanin-citydriving,itlendsitselftoscalingstudiesthatinvolvegeneralizationtotaskswithvaryingnumbersofobjects.Forthisdomain,weprovidedIcaruswiththefourprimitiveskillsinTable2and11conceptsthatweresuﬃcient,inprinciple,

LearningHierarchicalSkillsPage15

Table4.Fiveskillclauseslearnedforin-citydriving,alongwithaprimitiveskillforthesamedomain.

Page16LearningHierarchicalSkills

Figure3.AtraceoftheIcarusdrivingagent’sbehavior,duringandafterlearning,onataskthatrequiredchanging

totherightmostlaneandturningattheintersection.Thetracedemonstratesgeneralizationtoanewsettingwithadiﬀerentnumberoflanes.

tosolveanyproblem.Wethenpresentedtheagentwiththeproblemsinsequence,usingeachtaskasatrainingproblembutalsorecordingthenumberofcyclesandCPUtimerequiredtocompleteit.Becausemisguidedsearchcombinedwithexecutioncanleadtheproblemsolverintoundesirablephysicalstates,wetoldittohaltifithadnotﬁnishedarunwithin100cyclesandtostartoverfromtheinitialstate.However,theagentcouldattemptagivenproblemonlyﬁvetimes,andthusspendatmost500cyclesbeforegivingupentirely.Wealsolimitedthestackdepthtotengoalelements.Weenforcedtheseconstraintsforreasonsofpracticalityandbecausewethinktheyreﬂectthemannerinwhichhumanstacklenovelproblems.

WegeneratedrandomlyasetofrandomBlocksWorldtasksthatinvolvedsettingswith5,10,15,20,25,and30blocks.Eachcomplexityclasshad67to69distinctproblems,whichweorderedbydiﬃcultyclass(ﬁve-blocktasksﬁrstand30-blocktaskslast).Theintuitionwasthatthesystemwouldlearnmoreeﬀectivelyifwepresenteditﬁrstwithsimplerproblems,whichitcouldthenuseinsolvingmorediﬃcultones.Tothisend,Icarusretainedskillsacquiredonsuccessfulrunsforuseinlatertasks.Weprovidedthesystemsome400randomlygeneratedproblemordersandrecordedthenumberofcyclesandCPUtimesneededforeachtask.Asacontrol,wealsoranthesystemwithitslearningmechanismsoﬀforanother400problemsetsthatwereorderedrandomlywithindiﬃcultyclasses.Becausetheproblemsrequirediﬀerentamountsofeﬀort,traditionallearningcurvesarenotveryinformative.Instead,followingMinton(1990),wereportcumulativecyclesandCPUtimesasafunctionofthenumberoftrainingproblems.

Figure4showstheresultingcurves,including95percentconﬁdenceintervalsaroundeachmean.Asexpected,thecurvesmainlytaketheformofsuperlinearfunctionswhoseslopesincreasewithproblemdiﬃculty.Althoughthelargescaleofplotsmadethelearningandnon-learningcurveslook

LearningHierarchicalSkillsPage17

26300Number of cycles requiredlearning onlearning off

CPU time required41000learning onlearning off

21040157801052052600067134201

268335402

Number of problems encountered

820016400246003280067134201

268335402

Number of problems encountered

Figure4.CumulativenumberofcyclesandCPUtimesrequiredbyIcarustosolveaBlocksWorldtaskasafunction

ofthenumberofproblemsencountered,averagedover400runsandwithproblemsorderedbydiﬃculty,withthegoalstackofsizeten.Tickmarksonthehorizontalaxisindicateshiftsinproblemcomplexity.

similarforearlypartsofthecurves,therewassomebeneﬁtforlearningevenfromthebeginning,butthediﬀerencegrowssubstantiallyasthesystemsencounterharderproblems.Clearly,priorexperiencereducessearchsubstantiallywhenitreachesproblemswithmanyblocks,andthereisnoevidencethatlearningproducesautilityproblem.Rememberthatwehavemadethetransferoflearnedknowledgechallenginginthatnoneoftheproblemsareisomorphic,althoughtheymayinvolveisomorphicsubtasks.TheresultsindicatethatIcaruscantakeadvantageofthissimilarsubstructuretoreduceitseﬀortonlaterproblems.6.4ExperimentwithFreeCellSolitaire

ToensurethatourconclusionsheldformorethantheBlocksWorld,wecarriedoutasimilarexperimentwithFreeCellsolitaire,whichwedescribedearlierinthissection.WegavetheIcarusagentonlythe12basicskillsneededtomovecardsandthetop-levelgoalofgettingallcardsintofoundationcells,alongwith24conceptsfordescribingsituations.UnliketheBlocksWorld,thisdomainhasonlyonegoalcondition,butitstillhasmanypossiblestartingstates.

Forthisstudy,werandomlygenerated20problemseachthatinvolved8,12,16,20,and24cards.5Weranthesystemon300diﬀerentsequencesoftasks,withsimplerproblemsbeingpresentedearlierbutorderedrandomlywithineachoftheﬁvediﬃcultyclasses.Asbefore,weexpectedthattheagentwouldlearnskillsfromtheeasierproblemsthatwouldassistontheharderones,thusreducingproblem-solvingeﬀort.Forcomparison,wepresentedanother300randomsequencestoanon-learningsystemwiththesameinitialskillsandconcepts.

Figure5presentsthecumulativeresultsforthisexperiment,witherrorbarsthatindicatethe95%conﬁdenceintervals.AsintheBlocksWorld,thediﬀerencebetweenthelearningandnon-learningconditionsissubstantial.However,problemswith20cardsormorerequireadiﬀerent

Page18LearningHierarchicalSkills

23000Number of cycles requiredCPU time required96000learning onlearning off

learning on

18400learning off

13800002040

6080100Number of problems encountered

19200460038400920057600768002040

6080100Number of problems encountered

Figure5.CumulativenumberofcyclesandCPUtimesrequiredbyIcarustosolveaFreeCelltaskasafunctionof

thenumberofproblemsencountered,averagedover300runsandwithproblemsorderedbydiﬃculty.

classofskillsthatinvolvecolumn-to-columnmoves,whichcausedthelessenedgapbetweenthetwoconditionsaroundthe80thproblem.However,oncetheyhavebeenacquired,theseskillsprovidesomeadvantage,asevidencedbythedownturninthecurveforthelearningsystemonthefarrightofthegraphs.Again,wedetectednosignofautilityproblemastheagentaccumulatesknowledgeinthisdomain.

7.RelatedResearch

ResearchonlearningcognitiveskillsfromproblemsolvinghasalonghistorywithinbothAIandcognitivescience.Forexample,workonexplanation-basedlearningoftenaimedtoimproveef-ﬁciencyonproblem-solvingtasksandcombinedexperiencewithadomaintheorytocreatenewcognitivestructures.Sometechniquesfocusedontheacquisitionofsearch-controlrulestoguideproblemsolving,butothereﬀortsdealtinsteadwiththeconstructionofmacro-operatorsfromprimitiveoperators(e.g.,Iba,1988;Mooney,1989;Shavlik,1989).Ourapproachtoskilllearn-ingcomesclosertothesecondparadigm,sincebothinvolvecomposingknowledgeelementsintolargerstructures.However,Icarusadaptsthisideatothecreationofdisjunctiveandevenre-cursiveskillhierarchies,whereastraditionalmethodsemphasizedthecreationof‘ﬁxed-sequence’macro-operatorsthatwerefarlessﬂexible.

Icarusalsobearssomesimilaritytoothercognitivearchitecturesthatincorporatevarietiesofanalyticalorexplanation-basedlearning.Forexample,Laird,Rosenbloom,andNewell’s(1986)Soarrevolvesaroundaproblemsolverthatproceedsuntilthesystemencountersanimpasse,inwhichcaseitcreatesasubgoaltoresolveit.Thisresolutionmayrequiresearchandtakesometimetoproducetheinformationnecessary.Oncetheimpassehasbeenhandled,Soarcreatesachunkthatencodesageneralizedexplanationoftheresultintermsoftheoriginalgoalcontext.Intermediatestepsfromthesolutionarelost,buttheacquiredchunkletsthesystemsidestepsimilarimpassesinthefuture.

LearningHierarchicalSkillsPage19

Anderson’s(1993)ACT-Remploysarelatedmechanism,calledcompilation,whichcreatesnewproductionrulesfromonesthatareinvolvedinthesamereasoningchain.Thisschemeproducesveryspeciﬁcrulesthatreplacevariableswiththedeclarativeelementsagainstwhichtheymatched,ratherthanforminggeneralizedstructures,asdoIcarusandmostothersystemsthatlearnmacro-operatorsorsearch-controlrules.Infact,ourapproachismoreakintothecompositionprocessthatplayedaroleinearlierversionsofACT(Neves&Anderson,1981),thoughthismechanismproducedﬁxedbehavioralsequencesratherthanﬂexibleskillhierarchies.

Icarus’closestarchitecturalrelativeisProdigy(Minton,1990),whichinvokesmeans-endsanalysistosolveproblemsandusesananalyticalmethodtolearneithersearch-controlrolesormacro-operatorsfromproblem-solvingtraces.VelosoandCarbonell(1993)alsodescribeanexten-sionthatrecordsthesetracesinmemoryandsolvesnewproblemsbyderivationalanalogywithearlierones.Noneofthesemechanismsgeneratesexplicithierarchicalstructures,butVelosoandCarbonell’sapproachprovidesﬂexibilitysimilartothatfoundinIcarus,andthetwosystemsrecordandutilizeverysimilarinformationintheirgoalstacks.

Someothersystemssupportlearninginproblem-solvingdomainswithoutmakingstrongar-chitecturalcommitments.RubyandKibler’s(1991)SteppingStonelearnsgeneralizedrulesfordecomposingcomplexproblemsintosimplerones,whichitobtainsthroughmixeduseofexist-ingproblem-reductionrulesandforward-chainingexhaustivesearchwhenitreachesanimpasse.MarsellaandSchmidt’s(1993)systemalsoacquirestask-decompositionrulesthatincorporatepar-tialorderingsamongcomponents.Theirsystemcombinesforwardandbackwardsearchtoidentifycandidatestatepairs,whichinturnproducehypothesizedproblem-reductionrulesthatarerevisedbasedonfurtherexperience.6

PerhapstheclosestrelativetoourapproachisReddyandTadepalli’s(1997)X-Learn,whichacquiresgoal-decompositionrulesfromasequenceoftrainingproblems.Theirsystemdoesnotincludeanexecutionengine,butitgeneratesrecursivehierarchicalplansinamannerthatalsoidentiﬁesdeclarativegoalswiththeheadsoflearnedclauses.However,becauseitinvokesforward-chainingratherthanbackward-chainingsearchtosolvenewproblems,itreliesonthetrainertodeterminehierarchicalstructure.X-Learnalsousesaquitesophisticatedmixtureofanalyticalandinductivetechniquestodetermineconditionsonskills,ratherthanthemuchsimplermethodthatIcarusincorporates.

AnotherkeydiﬀerencefromX-Learn,PRL,andSteppingstoneisthatIcaruslearnsskillsforuseinreactiveexecutionratherthanforuseinplanning.Therehasbeenotherworkonthistopic,butithasemphasizedtheacquisitionofﬂatcontrollersratherthanhierarchicalstructures.Forinstance,Benson’s(1995)TRAILlearnsteleoreactivecontrollersforphysicalagents,butitinvokesinductivelogicprogrammingtodeterminerulesforindividualactions.Fernetal.(2004)reportanapproachtolearningreactivecontrollersthattrainsitselfonincreasinglycomplexproblems,butthatalsoacquiresdecisionlistsforactionselection.Khardon(1999)considerstherelatedtaskoflearninghierarchicalcontrollers,buthisformalanalysisassumestheagentisprovidedwithannotatedsamplesolutionsratherthanbeinggeneratedthroughproblemsolving.

Page20LearningHierarchicalSkills

Otherresearchershavebuiltsystemsthatsupportcumulativelearningoutsidethecontextofproblem-solvingtasks.OneearlyexamplewasSammutandBanerji’s(1986)Marvin,whichlearnsincreasinglycomplexlogicalconceptsthatarecomposedofonesithasmasteredpreviously.StoneandVeloso(2000)takeasimilarapproachtolearningconceptsandcontrollersforplayingroboticsoccer,althoughtheirsystemacquiresquitediﬀerenttypesofstructureateachlevelofdescription.StracuzziandUtgoﬀ’s(2002)STLalgorithmreceivestrainingcasesaboutmanyconceptsinparallel,butitlearnscomplexonesonlywhenithasacquiredsimplerstructuresthatletitmasterthemwithlittleeﬀort.Pﬂeger(2004)describesanothersystemthatacquireshierarchicalpatternsinanon-linesetting,inthiscasefromunsuperviseddata.LikeMarvinandSTL,itlearnsconceptualstructuresfromthebottomup,sothatmorecomplexpatternsareapparentaftersimpleroneshavebeenacquired.

8.ConcludingRemarks

Intheprecedingpages,wepresentedIcarus,acognitivearchitectureforphysicalagentsthatusesstoredconceptsandskills,bothorganizedinhierarchies,torecognizefamiliarsituationsandcontrolbehavior.Wedescribedanewmodulethatsupportsmeans-endsproblemsolvingonnoveltasks,alongwithalearningmechanismthatproducesnewskillsfromtracesofproblemsolutions.Thismethodoperatesinanincrementalmanner,creatinghierarchicalstructuresthatrefertootherslearnedearlier.Inaddition,wereportedexperimentswithin-citydriving,theBlocksWorld,andFreeCellthatshowedsuchlearningenablesmoreeﬀectivebehavioronunfamiliarproblemsthansolvingthemwithonlybasicknowledgeaboutthedomain.

Despitetheseadvances,ourworkonskilllearninginIcarusisstillinitsearlystages.Forinstance,weshoulddemonstrateitsabilitytolearnhierarchicalstructuresbothontraditionalcognitivetaskslikemulti-columnsubtractionandonotherdynamicdomainsthat,likein-citydriving,requiretheintegrationofproblemsolvingwithreactivecontrol.Futureworkondrivingshouldshowthatourmethodsaresuﬃcienttoacquiremorecomplicatedskillsthatinvolveextendedtaskslikepackagedeliveryandcomplexsettingsthatincludeothervehicles.Anotherpromisingclassofdomainsforstudyingskilllearninginvolvestwo-persongameslikechess,whichseemcertaintointroducenewchallengesbecauseoftheirextendedduration.

Inaddition,Icarus’methodsforproblemsolvingandhierarchicallearningwouldbeneﬁtfromnewcapabilities.Wenotedearlierthatthecurrentsystemdoesnotchainbackwardfromthestartconditionsoflearnedskillclauses.Extendingtheproblemsolvertosupportthisabilitywouldmeandeﬁningnewconceptsthatcharacterizethesituationsinwhichlearnedskillsareapplicable.Thisadditionwouldalsoremedyanotherlimitationofthecurrentsystem,namelyitsinabilitytoaccountfortheoriginofconcepthierarchies,whichitassumesaregiven.Suchanextensionwouldbestraightforwardforsometasks,butotherswillrequiretheabilitytoacquirerecursiveconcepts.Augmentingthesysteminthismannermayalsoleadtoautilityproblem,notduringexecutionoflearnedskillsbutduringtheproblemsolvingusedfortheiracquisition,whichwewouldthenneedtoovercome.

LearningHierarchicalSkillsPage21

Anotherdrawbackisthearchitecture’srelianceonpurelydeductiveinference,whichdiﬀersmarkedlyfromtheprobabilisticapproachtakenbyitsearliestancestor(Langleyetal.,1991).Futureversionsoftheframeworkshouldextendtherepresentationofconceptsandskillstoin-corporateprobabilities,replacedeductiveprocesseswithabductivemethodsthatmakeplausibledefaultinferences,andaugmentproblemsolvingtooperateoverskillswithuncertainoutcomes.Wehypothesizethatthecurrentmechanismsforlearningthestructureofskillscanbeadaptedeasilytothissetting,butweshouldalsointroducemethodsforestimatingtheprobabilitiesthatannotatethesymbolicstructures.

Weshouldalsonotethat,althoughourapproachlearnsskillsthatgeneralizetosituationswithdiﬀerentnumbersofobjects,itstreatmentofgoalsislessﬂexible.Forexample,Icaruscanacquireageneralprocedureforclearingablockthatdoesnotdependonthenumberofblocksaboveit,butitcannotlearnaprocedureforconstructingatowerwitharbitrarilyspeciﬁedcomponents.Extendingthemethod’sabilitytolearnaboutsuchrecursivegoalstructuresisanotherimportantdirectionforfutureresearchthatwillbringthearchitectureintocloseralignmentwiththeabilitiesobservedincomplexhumanlearning.

Acknowledgements

ThisresearchwasfundedinpartbyGrantHR0011-04-1-0008fromDARPAIPTOandbyGrantIIS-0335353fromtheNationalScienceFoundation.DiscussionswithGlennIba,DavidNicholas,StephanieSage,DanShapiro,andJudeShavlikcontributedtomanyideaspresentedhere.

References

Anderson,J.R.(1993).Rulesofthemind.Hillsdale,NJ:LawrenceErlbaum.

Asgharbeygi,N.,Nejati,N.,Langley,P.,&Arai,S.(2005).Guidinginferencethroughrelationalreinforcementlearning.ProceedingsoftheFifteenthInternationalConferenceonInductiveLogicProgramming.Bonn,Germany:Springer.

Benson,S.(1995).Inductionlearningofreactiveactionmodels.ProceedingsoftheTwelfthInter-nationalConferenceonMachineLearning(pp.47–54).SanFrancisco:MorganKaufmann.

Choi,D.,Kaufman,M.,Langley,P.,Nejati,N.,&Shapiro,D.(2004).Anarchitectureforpersis-tentreactivebehavior.ProceedingsoftheThirdInternationalJointConferenceonAutonomousAgentsandMultiAgentSystems(pp.988–995).NewYork:ACMPress.

Choi,D.,&Langley,P.(2005).Learningteleoreactivelogicprogramsfromproblemsolving.Pro-ceedingsoftheFifteenthInternationalConferenceonInductiveLogicProgramming.Bonn,Ger-many:Springer.

Doyle,J.(1979).Atruthmaintenancesystem.ArtiﬁcialIntelligence,12,231–272.

Fern,A.,Yoon,S.W.,&Givan,R.(2004).Learningdomain-speciﬁccontrolknowledgefromrandomwalks.ProceedingsoftheFourteenthInternationalConferenceonAutomatedPlanningandScheduling(pp.191–199).Whistler,BC:AAAIPress.

Page22LearningHierarchicalSkills

Forgy,C.L.(1982).Rete:Afastalgorithmforthemanypattern/manyobjectpatternmatchproblem.ArtiﬁcialIntelligence,19,17–37.

Jones,R.M.,&Langley,P.(inpress).Aconstrainedarchitectureforlearningandproblemsolving.ComputationalIntelligence.

Iba,G.A.(1989).Aheuristicapproachtothediscoveryofmacro-operators.MachineLearning,3,285–317.

Ilghami,O.,Nau,D.S.,Mu˜noz-Avila,H.,&Aha,D.W.(2002).CaMeL:Learningmethodpre-conditionsforHTNplanning.ProceedingsoftheSixthInternationalConferenceonAIPlanningandScheduling(pp.131–14).Toulouse,France.

Khardon,R.(1999).Learningtotakeactions.MachineLearning,35,57–90.

Laird,J.E.,Rosenbloom,P.S.,&Newell,A.(1986).ChunkinginSoar:Theanatomyofagenerallearningmechanism.MachineLearning,1,11–46.

Langley,P.,Cummings,K.,&Shapiro,D.(2004).Hierarchicalskillsandcognitivearchitectures.ProceedingsoftheTwenty-SixthAnnualConferenceoftheCognitiveScienceSociety(pp.779–784).Chicago,IL.

Langley,P.,McKusick,K.B.,Allen,J.A.,Iba,W.F.,&Thompson,K.(1991).AdesignfortheIcarusarchitecture.SIGARTBulletin,2,104–109.

Langley,P.,&Stromsten,S.(2000).Learningcontext-freegrammarswithasimplicitybias.Pro-ceedingsoftheEleventhEuropeanConferenceonMachineLearning(pp.220–228).Barcelona:Springer-Verlag.

Marsella,S.,&Schmidt,C.F.(1993).Amethodforbiasingthelearningofnonterminalreductionrules.InS.Minton(Ed.),Machinelearningmethodsforplanning.SanMateo,CA:MorganKaufmann.

Minton,S.N.(1990).Quantitativeresultsconcerningtheutilityofexplanation-basedlearning.ArtiﬁcialIntelligence,42,363–391.

Mooney,R.J.(1989).Theeﬀectofruleuseontheutilityofexplanation-basedlearning.ProceedingsoftheEleventhInternationalJointConferenceonArtiﬁcialIntelligence(pp.725–730).Detroit:MorganKaufmann.

Nardi,D.,&Brachman,R.J.(2002).Anintroductiontodescriptionlogics.InF.Baaderetal.(Eds.),Descriptionlogichandbook.Cambridge:CambridgeUniversityPress.

Newell,A.(1990).Uniﬁedtheoriesofcognition.Cambridge,MA:HarvardUniversityPress.

Newell,A.,Shaw,J.C.,&Simon,H.A.(1960).Reportonageneralproblem-solvingprogramforacomputer.InformationProcessing:ProceedingsoftheInternationalConferenceonInformationProcessing(pp.256–264).UNESCOHouse,Paris.

Nilsson,N.(1994).Teleoreactiveprogramsforagentcontrol.JournalofArtiﬁcialIntelligenceResearch,1,139–158.

Pﬂeger,K.(2004).On-linecumulativelearningofhierarchicalsparsen-grams.ProceedingsoftheThirdInternationalConferenceonDevelopmentandLearning.SanDiego,CA:IEEEPress.Reddy,C.,&Tadepalli,P.(1997).Learninggoal-decompositionrulesusingexercises.ProceedingsoftheFourteenthInternationalConferenceonMachineLearning(pp.278–286).SanFrancisco:MorganKaufmann.

LearningHierarchicalSkillsPage23

Richman,H.B.,Staszewski,J.J.,&Simon,H.A.(1995).SimulationofexpertmemoryusingEPAMIV.PsychologicalReview,102,305–330.

Ruby,D.,&Kibler,D.(1991).SteppingStone:Anempiricalandanalyticalevaluation.ProceedingsoftheTenthNationalConferenceonArtiﬁcialIntelligence(pp.527–532).MenloPark,CA:AAAIPress.

Sammut,C.,&Banerji,R.B.(1986).Learningconceptsbyaskingquestions.InR.S.Michalski,J.G.Carbonell,&T.M.Mitchell(Eds.),Machinelearning:Anartiﬁcialintelligenceapproach(Vol.2).LosAltos,CA:MorganKaufmann.

Shapiro,D.,Langley,P.,&Shachter,R.(2001).Usingbackgroundknowledgetospeedrein-forcementlearninginphysicalagents.ProceedingsoftheFifthInternationalConferenceonAu-tonomousAgents(pp.254–261).Montreal:ACMPress.

Shavlik,J.W.(1989).Acquiringrecursiveconceptswithexplanation-basedlearning.ProceedingsoftheEleventhInternationalJointConferenceonArtiﬁcialIntelligence(pp.688–693).Detroit,MI:MorganKaufmann.

Stone,P.,&Veloso,M.M.(2000).Layeredlearning.ProceedingsoftheEleventhEuropeanCon-ferenceonMachineLearning(pp.369–381).Barcelona:Springer-Verlag.

Utgoﬀ,P.,&Stracuzzi,D.(2002).Many-layeredlearning.ProceedingsoftheSecondInternationalConferenceonDevelopmentandLearning(pp.141–146).

Veloso,M.M.,&Carbonell,J.G.(1993).DerivationalanalogyinProdigy:Automatingcaseacquisition,storage,andutilization.MachineLearning,10,249–278.

Page24LearningHierarchicalSkills

Appendix:ConceptsandSkillsProvidedinExperiments

Table5.ConceptsandskillsprovidedtoIcarusforthein-citydrivingdomain,withitalicsdenotingthegoalconcept

andparenthesesindicatingthenumberofclausesfordisjunctiveskills.

primitiveconcepts(15)

parked

aligned-with-lane-in-segmentcentered-in-lane

steering-wheel-not-straightdriving-in-segmentat-speed-for-right-turnready-for-right-turnin-leftmost-lanelane-to-rightlane-to-left

in-rightmost-lanein-right-turn-lane

oﬀ-centered-to-right-in-segmentoﬀ-centered-to-left-in-segmentbuilding-on-rightbuilding-on-leftcurrent-building

start-aligned-with-lane-in-segmentstart-centered-in-lane-1start-centered-in-lane-2

start-adjust-speed-for-cruisestart-cruise-within-segmentstart-change-lane-to-rightstart-change-lane-to-leftstart-in-lane-1start-in-lane-2

primitiveskills(19)

LearningHierarchicalSkillsPage25

Table6.ConceptsandskillsprovidedtoIcarusfortheBlocksWorld,withgoalconceptsinitalics.

primitiveconcepts(4)

clear

three-tower

two-tower-one-on-tableunstackablepickupablestackableputdownable

primitiveskills(4)

nonprimitiveconcepts(14)

starthomesuccessorcolcolpairavailable-cellavailable-columnclearon

bottomincellhome

column-to-homecolumn-to-newhomecolumn-to-freecelllastcolumn-to-homelastcolumn-to-newhomelastcolumn-to-freecellfreecell-to-homefreecell-to-newhomefreecell-to-columncolumn-to-columnfreecell-to-new-columncolumn-to-new-column

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

首页

养生问答

疾病百科

养生资讯

女性养生

男性养生

Interleaving Learning, Problem Solving, and Execution in the Icarus Architecture