Interleaving Learning, Problem Solving, and Execution in the Icarus Architecture
ExecutionintheIcarusArchitecture
PatLangley(langley@csli.stanford.edu)DongkyuChoi(dongkyuc@stanford.edu)SethRogers(srogers@csli.stanford.edu)ComputationalLearningLaboratory
CenterfortheStudyofLanguageandInformationStanfordUniversity,Stanford,CA94305USA
Abstract
Inthispaper,wereviewIcarus,acognitivearchitecturethatutilizeshierarchicalskillsandcon-ceptsforreactiveexecutioninphysicalenvironments.Inaddition,wepresenttwoextensionstotheframework.Thefirstinvolvestheincorporationofmeans-endsanalysis,whichletsthesystemcomposeknownskillstosolvenovelproblems.Thesecondinvolvesthestorageofnewskillsthatarebasedonsuccessfulmeans-endstraces.Wereportexperimentalstudiesofthesemechanismsonthreedistinctdomains.Ourresultssuggestthatthetwomethodsinteracttoacquireusefulskillhierarchiesthatgeneralizewellandthatreducetheeffortrequiredtohandlenewtasks.Weconcludewithadiscussionofrelatedworkonlearningandprospectsforadditionalresearch.
Keywords:incrementallearning,cognitivearchitecture,reactivecontrol,problemsolving,
hierarchicalskills
1.IntroductionandMotivation
Researchoncognitivearchitectures(Newell,1990)attemptstounderstandthecomputationalin-frastructuresthatsupportintelligentbehavior.Aspecificarchitecturecharacterizestheaspectsofacognitiveagentthatremainthesameacrosstimeandoverdifferentdomains,andtypicallymakesstrongcommitmentsabouttherepresentationofknowledgestructuresandtheprocessesthatop-erateonthem.Learninghasbeenacentralconcerninmostarchitecturalresearch,withavarietyofmechanismshavingbeenproposedtomodeltheacquisitionofknowledgefromexperience.Thelearningmethodsembeddedinmostcognitivearchitecturesareincremental,reflectingevidencethathumansacquireknowledgeinthismanner,buttherehavebeenfewaccountsoftheoriginofhierarchicalstructuresthatappearcrucialtocomplexcognition.
InthispaperwereviewIcarus,acandidatearchitecturethatdivergesfromitspredecessorsonanumberofdimensions.Oneimportantdifferenceisthattypicalarchitectureshandleconcep-tualknowledgeinaproceduralmanner,typicallyusingproductionrules,whereasourframeworkcontainsseparatememoriesforconceptsandskills.Anotherdistinctivefeatureisthatmostar-chitecturesarebasedonproductionsystems,whichencodeknowledgeasa‘flat’setofcondition-actionrules,whereasIcarusmakesanarchitecturalcommitmenttothehierarchicalorganizationofknowledge.OnecancertainlyencodehierarchicalstructuresinframeworkslikeACT-R(Anderson,1993)andSoar(Laird,Rosenbloom,&Newell,1986),butthisremainsthemodeler’schoiceratherthanastrongtheoreticalclaim.Inaddition,mostcognitivearchitecturesevolvedfromtheoriesofhumanproblemsolving,whichhasledtosubordinaterolesforperceptionandactioneveninthoseframeworksthatsupportthem.1Incontrast,Icarusisprimarilyanexecutionarchitecturethatperceivesandreactstoexternalenvironments,whichweviewasmorebasicthanproblemsolving.However,Icarus’relianceonhierarchicalstructuresraiseskeyquestionsabouttheirorigin.Moreover,thearchitecture’semphasisonexecutiondoesnotmeanthatmentalactivitieslikeprob-lemsolvingareunimportant,sincetheycanletanagenthandlenoveltasksforwhichstoredknowledgeisunavailable.Thecentralhypothesisofthispaperisthathierarchicalskillsarise,atleastinmanycases,fromproblem-solvingbehavior,andthat,oncelearned,theagentcanusethesestructurestosupportreactiveexecutionintheenvironment.Moreover,thisacquisitionoccursinanincrementalmanner,withnewskillsbeinglearnedgraduallyastheagentencountersnewproblemsitcannothandlewithoutresortingtoproblemsolving.
WerefertoIcarusasa‘cognitivearchitecture’inthesamesensethattheSoarcommunityusesthatexpression.Bothframeworksaimforconsistencywithgeneralknowledgeabouthumancognitionandhopetosupportthesamebroadrangeofabilitiesthatpeopledemonstrate.However,ourcurrentresearchdoesnotattempttomatchatafine-grainedleveltheresultsofpsychologicalexperiments,asdonewitharchitectureslikeACT-R.Wemayaddresssuchissuesinfutureresearch,butfornowweareconcernedwithcoarseregularitiesthatdemandexplanation,suchastheapparenthierarchicalnatureofhumanskillsandtheirincrementalacquisitionfromexperience.
Inthesectionsthatfollow,wereviewIcarus’representationandorganizationofconceptsandskills,alongwiththeinferenceandexecutionprocessesthatutilizethem.Afterthis,wepresent
Page2LearningHierarchicalSkills
anewmodulethatinterleavesmeans-endsproblemsolvingwithexecutionwhenknownskillsareinsufficienttosolveatask.Nextwedescribeamechanismforcreatinggeneralizedskillsfromtracesofsuccessfulproblemsolvingthatsupportsincremental,hierarchicallearning.Wereportexperi-mentswiththislearningmechanismthatdemonstrateitsabilitytogeneralizetonovelsituationsandreduceeffortonnewproblems.Inclosing,wediscussearlierresearchonlearningforproblemsolvingandexecution,alongwithsomedirectionsforfuturework.
2.RepresentationandOrganization
Likeothercognitivearchitectures,Icarusmakescommitmentstoitsrepresentationofknowledge,themannerinwhichthatknowledgeisorganized,andthememoriesinwhichitresides.Followingmosttheoriesofhumancognition,theframeworkdistinguishesbetweenlong-termmemories,whichchangeonlygraduallyduetolearning,andshort-termmemories,whichchangerapidlyastheagentrevisesitsbeliefsandgoals.Inthissection,wediscussIcarus’memoriesandtheformalismsusedtoencodetheircontents.2WewilltakeourexamplesfromtheBlocksWorld,sincemanyreadersshouldfindthisdomainfamiliar.Wehavedescribedtheseaspectsoftheframeworkinmoredetailelsewhere,includingtheiruseinotherdomainslikein-citydriving(Choietal.,2004)andmulti-columnsubtraction(Langley,Cummings,&Shapiro,2004).2.1Long-TermConceptualMemory
OneofIcarus’long-termmemoriesstoresconceptsthatdescribegeneralizedsituationsintheenvironment.Thesemayinvolveisolatedobjects,suchasindividualblocks,buttheycanalsocharacterizephysicalrelationsamongobjects,suchastherelativepositionsofblocks.Long-termconceptualmemorycontainsthedefinitionsoftheselogicalcategories.Eachelementspecifiestheconcept’snameandarguments,alongwithfieldswhichdescribeperceptualentitiesthatmustbepresent,lower-levelconceptsthatmustmatch,lower-levelconceptsthatmustnotmatch,andnumericrelationsthatmustbesatisfied.Table1presentssomeconceptsfromtheBlocksWorld.Forexample,therelationondescribesaperceivedsituationinwhichtwoblockshavethesamexpositionandthebottomofonehasthesameypositionasthetopoftheother.Theconceptclearinsteadreferstoasingleblock,butonethatcannotholdtherelationontoanyother.
DefinitionsofthissortorganizeIcaruscategoriesintoaconceptualhierarchy.Primitivecon-ceptsaredefinedentirelyintermsofperceptualconditionsandnumerictests,butmanyincorporateotherconceptsintheirdefinitions.Thisimposesalatticestructureonthememory,withmorebasicconceptsatthebottomandmorecomplexconceptsathigherlevels.Theresultinghierarchyissim-ilarinspirittodiscriminationnetworkmodelsofhumanmemorylikeEpam(Richman,Staszewski,&Simon,1995),aswellastoframeworkslikedescriptionlogics(Nardi&Brachman,2002).Struc-turally,thislatticebearsacloseresemblancetotheRetenetworks(Forgy,1982)usedformatchinginproduction-systemarchitectures.
LearningHierarchicalSkillsPage3
Table1.SomeIcarusconceptsfortheBlocksWorld,withvariablesindicatedbyquestionmarks.Perceptsrefer
onlytoattributevaluesusedelsewhereintheconceptdefinition.
2.2Long-TermSkillMemory
Icarusalsoincorporatesasecondlong-termmemorythatstoresknowledgeaboutskillsitcanexecuteintheenvironment,includingtheirconditionsforapplicationandtheirexpectedeffects.Eachskillclauseincludesahead(anameandzeroormorearguments)andabodythatspecifiestheconceptsthatmustholdtoinitiatetheskillandoneormorecomponents.Aprimitiveskillclauseindicatesoneormoreordered,executableactions,alongwiththoseconceptsthat,takentogether,describethesituationtheskillproduceswhendone.Aprimitiveskillmayalsostateconditionsthatmustholdthroughoutitsexecution,whichmayrequiremultiplecyclestocomplete.Forexample,Table2showstheskillpickup,whichmustsatisfythestartcondition,(pickupable?block?from),definedinTable1,andinvokes*grasp,whichgraspsablock,and*vertical-move,whichmovesthehandintheverticaldirection.Theskill’sonlystatedeffectistomake(holding?block)true.Incontrast,anonprimitiveskillclausespecifieshowtodecomposethatactivityfurther.Forinstance,Table3includestwoclausesforthenonprimitiveskillclear.Eachindicatesthatexecut-ingtheclausewillachievethatgoal,buttheydifferintheirstartconditionsandintheirsubskills.Nonprimitiveskillclausesdonotspecifyeitherrequiredconditionsoreffects,buttheirheadsalwayscorrespondstoaconceptthattheskillwillachieveuponsuccessfulcompletion.Thisrepresenta-tionalassumptionfigurescentrallyinthelearningmechanismwedescribelater.BecauseIcarusconceptsandskillsutilizeasyntaxsimilartothatfoundintheprogramminglanguageProlog,wehavereferredelsewheretosetsoftheselong-termmemorystructuresasteleoreactivelogicprograms(Choi&Langley,2005).Thisphraseconveysboththeirstructuralsimilaritytotraditionallogicprogramsandtheirabilitytobehavereactivelyinagoal-drivenmanner,followingNilsson’s(1994)notionofateleoreactivesystem.
Page4LearningHierarchicalSkills
Table2.PrimitiveskillsfortheBlocksWorld.Eachclausehasaheadthatspecifiestheskill’snameandarguments,
asetoftypedpercepts,asinglestartcondition,asetofeffects,andasetofexecutableactions(markedbyasterisks).
2.3Short-TermMemories
Inadditiontolong-termmemories,whichencoderelativelystableknowledgeaboutadomain,Icarusfollowsstandardpsychologicaltheorybyincorporatingshort-termstoresthatchangemorerapidly.Thesecontaintheagent’stemporaryperceptionsandbeliefsabouttheenvironment,aswellasitsgoalsandintendedactivities.Theyinclude:
•aperceptualbufferthatholdsdescriptionsofphysicalentitieswhichcorrespondtotheoutputofsensors;fortheblocksworld,thisincludesliteralslike(blockBxpos10ypos2width2height2),whichspecifythepositionandsizeofindividualblocks.•ashort-termconceptualmemorythatcontainsbeliefsabouttheenvironmentwhichtheagentinfersfromitemspresentinitsperceptualbufferandlong-termconceptmemory;forinstance,thismightcontaintheinstance(onBC),whichisaninstanceoftheonconceptinTable1.•ashort-termskillmemorythatcontainstheagent’sgoalsandassociatedskillinstancesitintendstoexecute;eachgoalliteralspecifiesaconcept’snameandargumets,asin(clearA),whereaseachassociatedintentiongivesaskill’snameanditsarguments,asin(stackBC),whichisaninstanceoftheskillstackinTable2.
LearningHierarchicalSkillsPage5
Table3.SomenonprimitiveskillsfortheBlocksWorldthatinvolverecursion.Eachskillclausehasaheadthat
specifiesthegoalitachieves,asetoftypedpercepts,oneormorestartconditions,andasetoforderedsubskills.Numbersaftertheheaddistinguishdifferentclausesthatachievethesamegoal.
Unlikemostcognitivearchitectures,everyelementintheshort-termconceptualandskillmemoriesmustbeaninstanceofsomegeneralizedstructureinthelong-termconceptualandskillmem-ory,respectively;theycannotbearbitrarysymbolicstructures.Wehavediscussedthisstrongcorrespondenceassumptionatmorelengthelsewhere(Langley&Rogers,2005).
3.ConceptualInferenceandSkillExecution
Likemostcognitivearchitectures,Icarusoperatesindistinctcycles.Oneachsuchiteration,thesystemupdatesitsperceptualbufferbysensingobjectsinitsfieldofview,withthespecificsensorsdependingontheparticularenvironmentinwhichtheagentisoperating.Thisprocessproducesperceptualelements,whicharearedepositedintheperceptualbufferandwhichinitiatematchingagainstlong-termconcepts.Thematchercheckstoseewhichprimitiveconcepts(i.e.,thosedefinedentirelyintermsofpercepts)aresatisfied,addseachmatchedinstancetoconceptualshort-termmemory,andrepeatstheprocessonnonprimitiveconceptstoinferhigher-levelbeliefs.
Inthisway,Icarusinfersallinstancesofconceptsthatareimpliedbyitsconceptualdefinitionsandthecontentsoftheperceptualbuffer.Forexample,aBlocksWorldagentwouldfirstupdateitsdescriptionsoftheblocksandthetable,theninferprimitiveconceptslikeon,andfinallyinfercomplexconceptslikeunstackable.Thisbottom-upprocedureoperatesinmuchthesamewayastheRetenetworks(Forgy,1982)usedinmanyproduction-systemarchitecturesandthelogicalinferencemethodsusedinmanytruth-maintenancesystems(e.g.,Doyle,1979).Thedefaultprocessisexhaustive,butelsewherewehavereportedanalternativemechanismthatmakesinferencesmoreselectively(Asgharbeygietal.,2005).
Oneachcycle,thearchitecturealsoexaminestheagent’sgoalsandtheirassociatedintentionsinshort-termskillmemorytodeterminewhich,ifany,applytothecurrentsituation.3Foreachintendedskillinstance,Icarusaccessesallclausesofthegeneralskilltoseeiftheyareapplicable.
Page6LearningHierarchicalSkills
Sincevariablescanbeboundwithinaskill’sbody,thissetmayincludemultiplevariantsofeachskillclausestoredinlong-termmemory.Aprimitiveskillclauseisapplicableif,foritscurrentvariablebindings,itseffectsdonotyethold,itsrequirementsaresatisfied,and,ifthesystemhasnotyetstartedexecutingit,thestartconditionsmatchthecurrentsituation.Ahigher-levelskillclauseisapplicableifitsheadisnotsatisfied,thestartconditionsaresatisfiedifithasnotbeeninitiated,andatleastonesubskillisapplicable.Becausethislattertestisrecursive,askillisapplicableonlywhenIcaruscanfindatleastoneacceptablepathdownwardtoexecutableactions,whichthearchitecturereturnsforinvocation.
Forexample,supposeanIcarusagenthasthegoal(clearA)inasituationwhereblockAisonthetable,blockBisonA,blockCisonB,andthehandisempty.SupposefurtherthattheagenthasaccesstotheprimitiveskillsinTable2andthenonprimitiveonesinTable3.Inthiscase,thesystemwouldfindanapplicablepaththroughtheskillhierarchythatisrelevanttoitsgoal:[(clearA),(unstackableBA),(clearB),(unstackableCB),(clearC),(unstackCB)].Thisholdsbecausetheinstantiatedstartconditionsofeachskillalongthepath(e.g.,(onBA)and(hand-empty)forthetopmostskill)arepresentinconceptualshort-termmemory.Ifselected,(unstackCB)wouldaltertheenvironment,makingthepath[(clearA),(unstackableBA),(clearB),(unstackableCB),(hand-empty),(putdownCT)]acceptableonthenextcycle.Thiswouldproduceabeliefstatethatenablesthenextstepintheprocedure,whichwouldcontinueuntiltheagenthadsatisfieditstop-levelgoal,(clearA).
Duringskillselection,Icarusincorporatestwopreferencesthatprovideabalancebetweenreac-tivityandpersistence.Whenconfrontedwithachoicebetweentwoormoresubskills,itselectsthefirstalternativeforwhichtheheadisnotsatisfied.Thissupportsreactivecontrol,sincethesystemreconsiderspreviouslycompletedsubskillsand,iftheireffectsnolongerholdforsomereason,reex-ecutesthemtoremedytheproblem.Ontheotherhand,whenencounteringtwoormoreapplicableskillpaths,Icarusselectstheonethatsharesthemostelementsfromthestartofthepathexecutedonthepreviouscycle.Thisencouragesthesystemtocontinuingexecutingahigh-levelskillithasalreadystarteduntilthatskillachievesitsassociatedgoaloruntilitbecomesinapplicable.
4.Means-EndsProblemSolving
Asjustexplained,Icaruscanexecutecomplexhierarchicalskillsinareactivemanner,butourinitialstudies(e.g.,Choietal.,2004;Langleyetal.,2004)assumedthattheseskillsarealreadypresentinlong-termmemory.Althoughmuchhumanbehaviorappearstoinvolvetheapplicationofsuchroutineskills,peoplecanalsosolvenoveltasksthatrequirethedynamiccombinationofexistingknowledgeelementsthroughsomeformofheuristicproblemsolving.
TomodelthiscapabilityinIcarus,wehaveintroducedavariantofmeans-endsanalysis(Newell,Shaw,&Simon,1960)thatoperatesoverthearchitecture’sknowledgestructures,includingbothlong-termconceptsandskillsprovidedbytheprogrammerandshort-termbeliefsandgoalspro-ducedbythearchitecture.Traditionalmeans-endsproblemsolvingselectssomeunsatisfiedaspectofthegoalstatetoachieve,thenselectsanoperatorthatwouldachieveit.Ifthatoperator’spreconditionsmatchthecurrentstate,itisapplied;otherwise,themethodselectsanunsatisfied
LearningHierarchicalSkillsPage7
preconditiontoachieve,selectsanoperatorthatwouldachieveit,andsoon.Onceaconditionismet,theprocessisrepeateduntiltheoriginalgoaldescriptionissatisfied.Thismayrequiresearch,whichisoftenpursuedinadepth-firstmanner.Means-endsanalysishasbeenimplicatedrepeatedlyinhumanproblemsolvingonnoveltasks.
Tosupportthismechanism,ourextendedversionofIcarusaugmentstheshort-termskillmem-orywithagoalstack.Eachelementinthisstackspecifiesagoal(adesiredconceptinstance),whethertheagentintendstoachieveitbybackwardchainingoffaconceptdefinitionoraskillclause,and,inthelattercase,theskillinstancethat,ifexecuted,shouldachieveit.Eachgoalelementalsospecifiessubgoalsthathavealreadybeenachieved,alongwithskilland/orconceptinstancesthatithastriedinreachingthisgoalbutthathavefailed.Thefirstareneededtokeepthesystemfromconsideringskillsthatwouldundoitspreviousaccomplishments,whereasthesec-ondensuresitdoesnotrepeatearliermistakes.Wealsoassumethatboththestartconditionsofprimitiveskillsandtop-levelgoalsmustbecastassinglerelationalliterals,whichcausesnolossingenerality,sinceeithermaybedefinedconcepts.
WehavealsoextendedtheIcarusinterpretertotakeadvantageofthesenewmemorystructures.Oneachcycle,thesystemtakesoneoffivedistinct,orderedsteps:
1.IfthecurrentgoalGofthegoalstackGSissatisfied,thenpopGfromGSandstoreinformation
aboutthesuccesswithG’sparent.2.IfthegoalstackGSdoesnotexceedthedepthlimitandthereareapplicableskillpathsthat
startfromaskillinstancewiththecurrentgoalGasitshead,thenselectonesuchpathandexecuteit.3.IfthereisanonemptysetofprimitiveskillinstancesinwhichthecurrentgoalGisaneffectthat
havenotalreadyfailed,thenselectaskillinstancefromthissetandpushitsstartcondition(whichweassumesubsumesanyrequiredconditions)ontothegoalstackGS.4.IfthecurrentgoalGisaninstanceofacomplexconceptwithunsatisfiedsubconceptsHand
withsatisfiedsubconceptsF,thenifthereisasubconceptIinHthathasnotyetfailed,pushIontothegoalstackGS.5.OtherwisepopthecurrentgoalGfromthegoalstackGSandstoreinformationaboutthe
failurewithG’sparent.Weassumethateachoftheseactivitiestakesasinglecycleofthearchitecture,withtheinitialsituationbeingaspecialcaseofthethirditemthattriggerstheprocess.Becausereasoningabouthowtoachieveanobjectivecanrequiremanymanipulationsofthegoalstack,ittakesmorecyclesthanexecutingastoredhierarchicalskillforthatgoal,evenwhentheagentfindsasolutiononitsfirstattemptanddoesnothavetobacktrack.
Figure1showsasuccessfultraceoftheproblemsolver’sbehavioronasimpleBlocksWorldtaskinwhichwhenthegoalis(clearA)andwhenblockAisonthetable,blockBisonA,blockCisonB,andthehandisempty.Inthissituation,thesystemlooksforexecutableskillswiththisgoalasitsheadbut,whenthisfails,itconsidersskillsthathavethegoalasoneofitseffects.Inthiscase,invokingtheprimitiveskillinstance(unstackBA)wouldproducetheintendedresult,butit
Page8LearningHierarchicalSkills
Figure1.AtraceofsuccessfulproblemsolvingintheBlocksWorld,whichellipsesindicatinggoalsandrectangles
denotingprimitiveskills.
cannotbeappliedbecauseitsinstantiatedstartcondition,(unstackableBA),doesnothold.Inresponse,theproblemsolverstorestheskillinstancewiththeinitialgoalandpushesthesubgoalontothegoalstack.
Next,theproblemsolverattemptstoretrieveskillsthatwouldachieve(unstackableBA).How-ever,becauseithasnosuchskillsinmemory,itresortstochainingoffthedefinitionofunstackable.Thisinvolvesthreeinstantiatedsubconcepts–(clear),(onBA),and(hand-empty)–butonlythefirstoftheseisunsatisfied,sothesystempushesitontothegoalstack.Thisinturnleadsittoconsiderskillsthatwouldproducethisliteralasaneffectandretrievestheskillinstance(unstackCB),whichitstoreswiththecurrentgoal.Inthiscase,thestartconditionoftheselectedskill,(unstackableCB)alreadyholds,sothesystemexecutes(unstackCB).Theassociatedactionsaltertheenvironmentandcausetheagenttoinfer(clearB)fromitspercepts.Inresponse,itpopsthisgoalfromthestackandreconsidersitsparent,(unstackableBA).However,thishasnotyetbeenachievedbecauseexecutingtheskillhascausedthethirdofitscomponentconceptinstances,(hand-empty),tobecomefalse.Thus,thesystempushesthisontothestackand,uponinspectingmemory,retrievestheskillinstance(putdownCT),whichitcananddoesexecute.
Thissecondstepachievesthesubgoal(hand-empty),whichinturnletstheagentinfer(unstack-ableBA).Thus,theproblemsolverpopsthiselementfromthegoalstackandexecutestheskillinstanceithadoriginallyselected,(unstackBA),inthenewsituation.Uponcompletion,thesystemperceivesthatthealteredenvironmentsatisfiesthetop-levelgoal,(clearA),whichleadsittohalt,sinceithassolvedtheproblem.
LearningHierarchicalSkillsPage9
Forthesakeofclarity,bothourdescriptionandFigure1presentthetraceofsuccessfulproblemsolving,butfindingsuchasolutionmayinvolvesearch.Whenbackwardchainingoffskillsthatwouldachievetheobjectiveofthecurrentstackentry,Icarusconsidersonlyskillinstancesthathavenotyetfailed.Thesystemalsopreferscandidatesthathavethefewestexpandedstartcondi-tionsthatareunmetbythecurrentenvironmentalstate,withfullymatchedconditionsbeingmostdesirable.Ifcandidatestieonthiscriterion,itselectsanalternativeatrandom.Whenbackwardchainingofftheunmatchedelementsofaconceptdefinition,thesystemselectssubgoalsatrandomaftereliminatingthosewhichhavefailedinthepast.
Takentogether,thesebiasesproduceaheuristicversionofmeans-endsanalysis.However,thisproblem-solvingmethodistightlyintegratedwiththeexecutionprocess.Icarusbackwardchainsoffconceptorskilldefinitionswhennecessary,butitexecutestheskillassociatedwiththetopstackentryassoonasitbecomesapplicable.Moreover,becausethearchitecturecanchainoverhierarchicalreactiveskills,theirexecutionmaycontinueformanycyclesbeforeproblemsolvingisresumed.Incontrast,mostmodelsofhumanproblemsolvingandmostAIplanningsystemsfocusonthegenerationortheexecutionofplans,ratherthaninterleavingthetwoprocesses.
Ofcourse,executingacomponentskillbeforeconstructingacompleteplancanleadanagentintodifficulties,sinceonecannotalwaysbacktrackinthephysicalworld.Thisstrategymaywellleadtosuboptimalbehaviors,buthumanintelligenceismoreaboutsatisficingthanoptimizing,andinterleavingproblemsolvingwithexecutionrequiresfarlessmemorythanconstructingafullplanbeforeexecutingit.However,itcanproducesituationsfromwhichtheagentcannotrecoverwithoutstartingtheproblemover.
Insuchcases,Icarusstoresthegoalelementforwhichitsexecutedskillcausedaproblem,alongwitheverythingbelowitinthestack.Thesystembeginstheproblemagain,thistimeavoidingtheskillandselectinganotheroption.Ifitmakesadifferentexecutionerrorthistime,itagainstorestheproblematicskillanditscontext,thenstartsoveroncemore.Icarusalsostartsoverifithasnotachievedthetop-levelobjectivewithinaspecifiednumbercycles.Suchrepeatedattemptsatsolvingatask,withselectedmemoryaboutpreviouspasses,seemsabettermodelofhumanproblemsolvingthansystemsthatconstructacompleteplanbeforeexecution.JonesandLangley’s(inpress)modelofmeans-endsproblemsolving,Eureka,usedasimilarrestartstrategy,butitkeptnoexplicitrecordofpreviousfailedpaths.
5.LearningHierarchicalSkillsfromProblemSolving
Inthepreviouspages,wedescribedtwofacetsofIcarus:itsexecutionofhierarchicalskillsonfamiliartasksanditsuseofproblemsolvingtohandlenovelones.Thefirstletsthesystemoperateefficiently,butskillsaretedioustoconstructmanually,whereasthesecondgivesthesystemflexibilitybutrequiresreasoningandmeans-endssearch.Webelievethathumansalsohavebothcapabilities,butthattheyuselearningtotransformtheresultsofsuccessfulproblemsolvingintohierarchicalskills.WewouldliketoincorporateasimilarcapabilityintoIcarus.
However,wewantourlearningmechanismstoreflectcertainpropertiesthatappeartoholdforhumanskillacquisition.Oneisthatlearningshouldtakeadvantageofexistingknowledge,suchas
Page10LearningHierarchicalSkills
thedefinitionsofcurrentskillsandconcepts.Inaddition,acquisitionshouldbeincremental,inthatitlearnsfromeachnewexperience,andinterleavedwiththeproblem-solvingprocess.Therecentliteratureoncomputationallearningcontainsfewcasesofsuchknowledgeacquisition,althoughinSection7wediscusssomeolderworkthathasthischaracter.
OurextensionofIcarusachievesthiseffectthroughaformofimpasse-drivenlearningthatistiedcloselytoitsproblem-solvingandexecutionprocesses.Forthisreason,thelearningmechanismsrequirenoadditionalinputsbeyondthoserequiredforthesebasicperformanceprocesses.AsinSoar(Lairdetal.,1986),thepurposeofskilllearningistoavoidsuchimpassesinthefuture.Thus,wheneverthearchitectureachievesanobjectivethatisassociatedwithanentryinthegoalstack,thissuccessprovidesanopportunityforlearning.Thesystemacquirestwodistinctformsofskillthataretiedtodifferentaspectsofproblemsolving.
ThefirstclassofskillsresultfromsituationsinwhichtheproblemsolvercannotfindaskilltoachieveagoalG,andthuspursuessubgoalsbasedontheunsatisfiedconditionsofG’sconceptualdefinition.Iftheagentachievesthesesubgoalsintheorder{G1,G2,...,Gn},thussatisfyingtheparentgoalG,IcarusconstructsanewskillclausethathasGasitsheadandthathas{G1,G2,...,Gn}asitsorderedsubskills.4ThestartconditionsofthenewclausearesimplythosesubconceptsofGthatweresatisfiedwhenitwaspushedontothegoalstack.Thehead,conditions,andsubskillshavetheirargumentsreplacedbyvariablesinaconsistentmanner,ensuringapplicabilitytoanalogoussituationsthatinvolvedifferentobjects.
Forexample,uponachievingthesubgoal(unstackableBA)inFigure1,thesystemconstructstheunstackableskillclauselabeled3inTable3.Thehead(unstackable?B?A)isageneralizedversionofthegoal(unstackableBA),whereastheorderedsubskills(clear?B)and(hand-empty)aregeneralizedversionsofitstwosubgoals(clearB)and(hand-empty).Thestartconditionsare(on?B?A)and(hand-empty),whicharegeneralizedversionsofthesubconceptsthatheldwhenthegoalwasestablished.Finally,the:perceptsfieldspecifiesthetypesforobjectsthatserveasthehead’sarguments.Thismechanismconstructsdifferentvariantsofaskill,withseparatestartconditionsanddistinctsubskills,fromsubproblemsthatinvolvedifferentinitialconditions.ThesecondcategoryresultsfromsituationsinwhichIcarushasselectedaprimitiveskillinstanceS2inordertoachieveagoalG,butfounditssinglestartconditionG2unsatisfiedandselectedanotherskillinstance,S1,toachieveit.Oncetheagenthasexecutedbothskillssuccessfullyandithasreachedthegoal,thesystemconstructsanewskillclausethathasGasitsheadandthathasG2(ratherthanthespecificclauseS1)andS2asorderedsubskills.ThestartconditionsaresimplythestartconditionsoftheS1clauseusedinthesubproblemsolution,whicharesufficientbecausetheproblemsolverS1selectedittoachievesthestartconditionofS2,whichinturnachievesthegoalG.Again,specificargumentsarereplacedconsistentlybyvariables.
Forinstance,uponachievingthetop-levelgoal(clearA)inFigure1,Icaruscreatestheclearskillclauselabeled4inTable3.Thisincorporatesageneralizedversionof(clearA)asitshead,alongwithvariableizedversionsof(unstackableBA)and(unstackBA)asitstwoorderedsubskills.Thestartconditions,(on?B?A)and(hand-empty),arethesameasthoseforunstackableclause
LearningHierarchicalSkillsPage11
3justdiscussed,sincethelatterwascreatedtoachievethestartconditionofunstackunderthosesameconditions,whichinturnsatisfiesthegoalclear.
Bothlearningmechanismsarefullyincremental,inthateachlearningeventdrawsonasingleproblem-solvingexperienceandthusrequiresnomemoryofpreviousones.Theysupportwithin-triallearning,sinceskillsacquiredononesubproblemmaybeusedtohandlelatersubproblems.Theprocessesalsobuildonexistingknowledge,sincetheconstructionofnewskillclausesinvolvesthecompositionofthoseusedinatrainingproblem’ssolution.Takentogether,thesesupportaformofcumulativelearning,inwhichIcaruslearnsskillsononeproblem,usesthemtosolvealaterproblem,andincorporatesthemintostillhigher-levelstructures.
Assuggestedbyourexamples,theselearningmethodscanacquirebothdisjunctiveandrecursiveskills.Thekeytothisabilityliesintheassumptionthatacquiredskillclauseswhichachievethesamegoalshouldbegiventhesamehead.Byindexingskillsinthismanner,Icarusknowswhentwoormoreclausesshouldbestoredtogether,whichleadsinturntothecreationofskillsthatcallonthemselves,eitherdirectlyorthroughintermediateskills.Thismakesthearchitecture’slearnedskillsconsiderablymoreflexibleandgeneralthantraditional‘macro-operators’(e.g.,Iba,1988)orcomposedproductionrules(e.g.,Neves&Anderson,1981).
Ofcourse,thecreationofdisjunctiveandrecursivestructureshaspotentialforovergeneralization,asdemonstratedbyresearchontheinductionofcontext-freegrammars(e.g.,Langley&Stromsten,2000).Ourtechniquefordeterminingthestartconditionsonnewskillclausesismuchsimplerthanstandardtechniquesforanalyticallearningorruleinduction.Infact,atfirstglance,thelearnedclausesinTable3appearhighlyovergeneral,butthisignoresthefactthatIcarusdoesnotinterpretskillsinisolation.Recallthatthearchitecturemustfindanentirepaththroughtheskillhierarchybeforeitcanexecutetheprimitiveskillatitsterminus.Thismeansthesystemcollectsconditionsdynamically,asitdescendsthehierarchy,guardingagainstovergeneralizationbycarryingoutlimitedanalysisatperformancetimeratherthandoingitallatlearningtime.Unlikesomeapproachestoincrementallearning,Icarus’methodsrequirenoadditionalmecha-nismsforskillrefinement.Eachskillclauseisgeneralizedwhenthearchitectureconstructsit,anditsstartconditionsareassumedtobeaccurate.Theknowledgeitacquiresfromsolvingagivenproblemmaywellbeincomplete,butthiswillsimplyleadtofurtherimpassesthatproduceaddi-tionallearning.Skillclausesacquiredlatercomplement,butdonotcompetewith,thoselearnedearlierbecausetheycoverdifferentsituationsortheolderclauseswouldhaveavoidedtheimpasse.Thus,learningispurelymonotonic,asinframeworkslikeSoar.
Weshouldnotethatourcurrentimplementationrestrictstheuseoflearnedskillsinfutureproblemsolving.Inparticular,wehaveadoptedMooney’s(1989)ideathatoneshouldnotchainoffthepreconditionsoflearnedskills.Thisdoesnotrestricttheirusebytheexecutionmodule,butitdoesmeanthattheproblemsolverconsidersalearnedskillonlywhenitsstartconditionsarealreadysatisfied.Asaresult,clausesacquiredfromchainingoffskillsalwayshavealeft-branchingstructureinwhichthesecondsubskillisprimitive.Thisassumptionmayseemrestrictive,but,likeMooney,webelieveitprovidesaneffectiveguardagainsttheutilityproblem(Minton,1990),inwhichthecreationanduseofcomplexstructuresreducessearchbutactuallyslowsperformance.
Page12LearningHierarchicalSkills
6.ExperimentalStudiesofSkillLearning
Althoughthenewmethodsforlearninghierarchicalskillsseemplausible,whethertheyimproveanIcarusagent’sperformanceisanempiricalquestion.Inthissection,wereporttheresultsofbasictestsofthesemechanismsonthreedistinctdomains:in-citydriving,theBlocksWorld,andFreeCellsolitaire.Afterthis,wereportmoresystematicexperimentswiththedomainsthatexaminetheeffectsoflearninginmoredetail.Asonemeasureofperformance,weusedthenumberofrecognize-actcyclesrequiredtosolvetheprobleminthesimulatedenvironment,includingbothproblemsolvingandexecutionsteps.However,wealsomeasuredtheCPUtimerequiredtosolveeachproblem,todeterminewhetherIcarussuffersfromtheutilityproblem.6.1DomainsandBasicDemonstrations
Toensurethatourapproachtolearninghierarchicalskillsoperatedasintended,wedevelopedIcarusprogramsforthethreedomains.Ineachcase,weprovidedasetofprimitiveskillssufficientforsolvingproblemswithmeans-endsanalysisandasetofhierarchicalconceptssufficientforrecognizingsituationsthatwererelevanttoexecutingthoseskills.Forexample,wedevisedsome41conceptsand19skillsforthein-citydrivingdomain,11conceptsandfourskillsfortheBlocksWorld,and24conceptsand12skillsforFreeCellsolitaire.TheAppendixgivesthenamesoftheprimitiveconcepts,nonprimitiveconcepts,andprimitiveskillsprovidedforeachdomain,whichshouldalsosuggesttheirfunction.Inaddition,wealsoprovidedthearchitecturewithasetofsensorsandeffectorsforeachsimulatedenvironment.
WehavealreadydiscussedtheBlocksWorld,butbothin-citydrivingandFreeCellmeritsomeexplanation.Thefirstdomaininvolvesadynamicsimulationofadowntowndrivingenvironment.Thecitycontainsobjectsrepresentedasrectanglesofdifferentsizes,includingbuildingsandside-walksorganizedintosquareblocksthataredividedbystreetsegmentsandintersections.Eachsegmentincludesayellowcenterlineandwhitedottedlanelines,andithasamarkedstreetnameandspeedlimit.Eachbuildingshasauniquestreetaddresstohelptheagentnavigatethroughthecityandtosupporttaskslikepackagedelivery.Thecityconfigurationusedinourexperimentshasnineblockswithfourverticalstreetsandfourhorizontalstreets.TheIcarusagentmustoperateunderphysicallawsandfollowtherulesofdriving,suchasstayingontherightsideofthestreetandturningfromtheproperlane.Weprovidetheagentspecificwithgoalstoachieve,suchasgettingontoanotherstreetsegmentordeliveringapackagetoacertainaddress.
FreeCellsolitaireinvolveseightstacksofcards,thefirstfourofwhichcontainsevencardsandthelastfourcontainsixcards.All52cardsaredealtfaceup,makingthemvisibletotheplayer.Inaddition,therearefourfreecells,whichcanserveastemporaryholdingspotsforonecardeachduringthegame,andonefoundationcellforeachsuit.ThegoalinFreeCellistogetallcardsonthefoundationcellsinascendingorder(wheretheaceisoneandthekingisthirteen)groupedbysuit.Onceonitsfoundationcell,acardcannotberemoved.Onlyfully-exposedcardsatthetopofeachstackandcardsthatinthefreecellsareinplay.Theagentcanmoveonecardatatimetoanavailablefreecell,totheappropriatefoundationcell,toanemptystack,ortoastackinwhichthetopcardhasadifferentcolorandvalueonehigherthanthemovedcard.
LearningHierarchicalSkillsPage13
Samplerunswiththein-citydrivingdomain,theBlocksWorld,andareducedversionofFreeCellindicatedthattheextendedversionofIcaruswasabletosolveproblemsintheirrespectivedomainswithsomesearchand,fromtheirrespectivetraces,learnhierarchicalskillsinthemannerdescribedearlier.Wefoundthat,whengiventhesametasktosolveasecondtime,thesystemutilizedthisknowledgetohandleitwithoutproblemsolving.Moreover,becausethesystemgeneralizesitslearnedstructuresbeyondthespecificinstancesonwhichtheyarebased,theytransferfullytoanytasksthatareisomorphictothoseithasalreadysolved.Theonlyconstraintisthatthisisomorphismmustinvolvethesamegoalandhavethesameconceptssatisfiedorunsatisfiedintheinitialenvironment.
However,weshouldnotethisabilitydoesnotmeanthatthesystemcancompleteafamiliarprobleminasinglecycle.Recallthat,traditionalworkoncognitivearchitectures,Icarusresortstoproblemsolvingonlytoenableaction,anditmuststillexecuteitsacquiredskillstoachieveagoal.Thus,foraproblemthatrequiresfourprimitivesteps,thesystemtakessixcyclesonthesecondencounter,withonetoretrievethehierarchicalskillandonetorealizeithasfinished.However,theagentrequiresneithersearchorbackwardchainingoverskillsorconceptstocompleteanyproblemithassolvedpreviously.6.2ExperimentwithIn-CityDriving
Althoughtheseinitialrunswereencouraging,wedesiredmorethananecdotaldemonstrationsthatthenewmechanismssupportedincrementallearningofhierarchicalskills.Wealsowantedevidencefromsystematicexperimentsthatthislearnedknowledgeproducesmoreeffectivebehavior.Ourfirststudyalongtheselinesfocusedonin-citydriving,whichisthemostdynamicofthethreesettingsandthustheonemostappropriateforevaluatingourmethodsforlearningskillsthatsupportreactiveexecution.
Asnotedabove,weprovidedIcaruswith41conceptsand19primitiveskillsrelevanttothisenvironment.Withthebasicknowledge,theagentcancharacterizeitssituationatmultiplelevelsofabstractionandperformactionsforaccelerating,decelerating,andsteeringleftorrightatrealisticangles.Thus,itcanoperateavehicle,butthisisnotsufficienttodrivesafelyinacityenvironment.Theagentmuststilllearnskillsforstayingalignedandcenteredwithinlanelines,changelanes,increaseordecreasespeedforturns,andstopforparking.
Toencouragesuchlearning,wepresentedtheagentwiththegoalofdrivingonadifferentstreetsegmentthanitscurrentone.Toachievethisobjective,itresortedtoproblemsolving,whichfoundasolutionpaththatinvolvedchangingtotherightmostlane,stayingalignedandcentereduntiltheintersection,steeringrightintothetargetsegment,turningthecorner,andfinallyaligningandcenteringinthenewlane.WelettheIcarusagentpracticethistaskforfivetrialstoexamineitsimprovementwithexperience.Werepeatedthisproceduretendifferenttimeswithslightlydifferentstartingpositions,collectedperformancemeasuresforeachrun,andaveragedtheresults.Figure2showsthetotalnumberofcyclesasafunctionofthenumberoftrials,alongwiththenumberofplanningandexecutioncyclesrequiredtoachievethegoal.Astheagentaccumulatesknowledgeaboutthistask,problemsolvingdisappearsalmostentirely,whichcausesthereduced
Page14LearningHierarchicalSkills
Number of cycles required200250300150totalplanningexecution
00
5010012345
Number of trials
Figure2.Thetotalnumberofcyclesrequiredtosolveaparticularright-turntaskalongwiththeplanningand
executiontimes,asafunctionofthenumberoftrials.Eachlearningcurveshowsthemeanovertensetsoftrialsand95percentconfidenceintervals.
numberoftotalcycles.However,thisproblemisdominatedbyexecutiontime,sincetheagentmustactuallydrivethevehicletoitsdestination.Executioncyclesappeartoincrease,whichoccursnotbecausethelearnedskillsareinefficientbutratherbecausetimeprogressesevenduringproblemsolving.Thus,thevehiclemovesintherightdirectionduringthisperiodintheearlytrials,reducingthedistancethatremainstotravel.Asproblemsolvingbecomesunnecessary,theagentdrivesthisextradistanceunderconsciouscontrolratherthanaccidentally.CPUtimeremainedapproximatelythesamewithincreasedexperience,presumablyforthesamereasons.
Table4showsthefiveskillclausesacquiredduringoneoftheseruns.Thetwoclausesfordriving-in-segmentspecifydifferentdecompositionsforachievingthistop-levelgoalunderalternativestartconditions.Thesecondofthesereferstotheclauseforin-segment,whichreferstothelearnedsub-skillforin-intersection-for-right-turnandtheprimitiveskillsteer-for-right-turn.Theformerreferstoin-rightmost-lane,whichinvokestheprimitiveskillclausedriving-in-segment,butitalsocallsonitselfrecursivelywithdistinctarguments.Forclarification,thetablealsopresentstheprimitiveclauseforin-intersection-for-right-turn,whichthesystemwasgivenasbackgroundknowledge.Figure3showsatraceoftheagent’sbehavioronthetaskduringlearning,inasituationthatinvolvesastreetwithtwolanes,andafterwards,inasettingthatinsteadinvolvesthreelanes.Thetraceofthevehicle’smovementdemonstratesthatthelearnedskillsgeneralizetocasesthatinvolvemorelanesthanwerepresentduringtraining.Thisabilityfollowsdirectlyfromtherecursivestruc-tureofthelearnedin-intersection-for-right-turnclause.Behaviorafterlearningisalsosmoother,presumablybecausetheagentneednotengageinproblemsolvingwhenitovershootsslightlyaftergettingintothetargetlaneinpreparationfortherightturn.6.3ExperimentwiththeBlocksWorld
AlthoughtheBlocksWorldisfarlessdynamicthanin-citydriving,itlendsitselftoscalingstudiesthatinvolvegeneralizationtotaskswithvaryingnumbersofobjects.Forthisdomain,weprovidedIcaruswiththefourprimitiveskillsinTable2and11conceptsthatweresufficient,inprinciple,
LearningHierarchicalSkillsPage15
Table4.Fiveskillclauseslearnedforin-citydriving,alongwithaprimitiveskillforthesamedomain.
Page16LearningHierarchicalSkills
Figure3.AtraceoftheIcarusdrivingagent’sbehavior,duringandafterlearning,onataskthatrequiredchanging
totherightmostlaneandturningattheintersection.Thetracedemonstratesgeneralizationtoanewsettingwithadifferentnumberoflanes.
tosolveanyproblem.Wethenpresentedtheagentwiththeproblemsinsequence,usingeachtaskasatrainingproblembutalsorecordingthenumberofcyclesandCPUtimerequiredtocompleteit.Becausemisguidedsearchcombinedwithexecutioncanleadtheproblemsolverintoundesirablephysicalstates,wetoldittohaltifithadnotfinishedarunwithin100cyclesandtostartoverfromtheinitialstate.However,theagentcouldattemptagivenproblemonlyfivetimes,andthusspendatmost500cyclesbeforegivingupentirely.Wealsolimitedthestackdepthtotengoalelements.Weenforcedtheseconstraintsforreasonsofpracticalityandbecausewethinktheyreflectthemannerinwhichhumanstacklenovelproblems.
WegeneratedrandomlyasetofrandomBlocksWorldtasksthatinvolvedsettingswith5,10,15,20,25,and30blocks.Eachcomplexityclasshad67to69distinctproblems,whichweorderedbydifficultyclass(five-blocktasksfirstand30-blocktaskslast).Theintuitionwasthatthesystemwouldlearnmoreeffectivelyifwepresenteditfirstwithsimplerproblems,whichitcouldthenuseinsolvingmoredifficultones.Tothisend,Icarusretainedskillsacquiredonsuccessfulrunsforuseinlatertasks.Weprovidedthesystemsome400randomlygeneratedproblemordersandrecordedthenumberofcyclesandCPUtimesneededforeachtask.Asacontrol,wealsoranthesystemwithitslearningmechanismsoffforanother400problemsetsthatwereorderedrandomlywithindifficultyclasses.Becausetheproblemsrequiredifferentamountsofeffort,traditionallearningcurvesarenotveryinformative.Instead,followingMinton(1990),wereportcumulativecyclesandCPUtimesasafunctionofthenumberoftrainingproblems.
Figure4showstheresultingcurves,including95percentconfidenceintervalsaroundeachmean.Asexpected,thecurvesmainlytaketheformofsuperlinearfunctionswhoseslopesincreasewithproblemdifficulty.Althoughthelargescaleofplotsmadethelearningandnon-learningcurveslook
LearningHierarchicalSkillsPage17
26300Number of cycles requiredlearning onlearning off
CPU time required41000learning onlearning off
21040157801052052600067134201
268335402
Number of problems encountered
00
820016400246003280067134201
268335402
Number of problems encountered
Figure4.CumulativenumberofcyclesandCPUtimesrequiredbyIcarustosolveaBlocksWorldtaskasafunction
ofthenumberofproblemsencountered,averagedover400runsandwithproblemsorderedbydifficulty,withthegoalstackofsizeten.Tickmarksonthehorizontalaxisindicateshiftsinproblemcomplexity.
similarforearlypartsofthecurves,therewassomebenefitforlearningevenfromthebeginning,butthedifferencegrowssubstantiallyasthesystemsencounterharderproblems.Clearly,priorexperiencereducessearchsubstantiallywhenitreachesproblemswithmanyblocks,andthereisnoevidencethatlearningproducesautilityproblem.Rememberthatwehavemadethetransferoflearnedknowledgechallenginginthatnoneoftheproblemsareisomorphic,althoughtheymayinvolveisomorphicsubtasks.TheresultsindicatethatIcaruscantakeadvantageofthissimilarsubstructuretoreduceitseffortonlaterproblems.6.4ExperimentwithFreeCellSolitaire
ToensurethatourconclusionsheldformorethantheBlocksWorld,wecarriedoutasimilarexperimentwithFreeCellsolitaire,whichwedescribedearlierinthissection.WegavetheIcarusagentonlythe12basicskillsneededtomovecardsandthetop-levelgoalofgettingallcardsintofoundationcells,alongwith24conceptsfordescribingsituations.UnliketheBlocksWorld,thisdomainhasonlyonegoalcondition,butitstillhasmanypossiblestartingstates.
Forthisstudy,werandomlygenerated20problemseachthatinvolved8,12,16,20,and24cards.5Weranthesystemon300differentsequencesoftasks,withsimplerproblemsbeingpresentedearlierbutorderedrandomlywithineachofthefivedifficultyclasses.Asbefore,weexpectedthattheagentwouldlearnskillsfromtheeasierproblemsthatwouldassistontheharderones,thusreducingproblem-solvingeffort.Forcomparison,wepresentedanother300randomsequencestoanon-learningsystemwiththesameinitialskillsandconcepts.
Figure5presentsthecumulativeresultsforthisexperiment,witherrorbarsthatindicatethe95%confidenceintervals.AsintheBlocksWorld,thedifferencebetweenthelearningandnon-learningconditionsissubstantial.However,problemswith20cardsormorerequireadifferent
Page18LearningHierarchicalSkills
23000Number of cycles requiredCPU time required96000learning onlearning off
learning on
18400learning off
13800002040
6080100Number of problems encountered
00
19200460038400920057600768002040
6080100Number of problems encountered
Figure5.CumulativenumberofcyclesandCPUtimesrequiredbyIcarustosolveaFreeCelltaskasafunctionof
thenumberofproblemsencountered,averagedover300runsandwithproblemsorderedbydifficulty.
classofskillsthatinvolvecolumn-to-columnmoves,whichcausedthelessenedgapbetweenthetwoconditionsaroundthe80thproblem.However,oncetheyhavebeenacquired,theseskillsprovidesomeadvantage,asevidencedbythedownturninthecurveforthelearningsystemonthefarrightofthegraphs.Again,wedetectednosignofautilityproblemastheagentaccumulatesknowledgeinthisdomain.
7.RelatedResearch
ResearchonlearningcognitiveskillsfromproblemsolvinghasalonghistorywithinbothAIandcognitivescience.Forexample,workonexplanation-basedlearningoftenaimedtoimproveef-ficiencyonproblem-solvingtasksandcombinedexperiencewithadomaintheorytocreatenewcognitivestructures.Sometechniquesfocusedontheacquisitionofsearch-controlrulestoguideproblemsolving,butothereffortsdealtinsteadwiththeconstructionofmacro-operatorsfromprimitiveoperators(e.g.,Iba,1988;Mooney,1989;Shavlik,1989).Ourapproachtoskilllearn-ingcomesclosertothesecondparadigm,sincebothinvolvecomposingknowledgeelementsintolargerstructures.However,Icarusadaptsthisideatothecreationofdisjunctiveandevenre-cursiveskillhierarchies,whereastraditionalmethodsemphasizedthecreationof‘fixed-sequence’macro-operatorsthatwerefarlessflexible.
Icarusalsobearssomesimilaritytoothercognitivearchitecturesthatincorporatevarietiesofanalyticalorexplanation-basedlearning.Forexample,Laird,Rosenbloom,andNewell’s(1986)Soarrevolvesaroundaproblemsolverthatproceedsuntilthesystemencountersanimpasse,inwhichcaseitcreatesasubgoaltoresolveit.Thisresolutionmayrequiresearchandtakesometimetoproducetheinformationnecessary.Oncetheimpassehasbeenhandled,Soarcreatesachunkthatencodesageneralizedexplanationoftheresultintermsoftheoriginalgoalcontext.Intermediatestepsfromthesolutionarelost,buttheacquiredchunkletsthesystemsidestepsimilarimpassesinthefuture.
LearningHierarchicalSkillsPage19
Anderson’s(1993)ACT-Remploysarelatedmechanism,calledcompilation,whichcreatesnewproductionrulesfromonesthatareinvolvedinthesamereasoningchain.Thisschemeproducesveryspecificrulesthatreplacevariableswiththedeclarativeelementsagainstwhichtheymatched,ratherthanforminggeneralizedstructures,asdoIcarusandmostothersystemsthatlearnmacro-operatorsorsearch-controlrules.Infact,ourapproachismoreakintothecompositionprocessthatplayedaroleinearlierversionsofACT(Neves&Anderson,1981),thoughthismechanismproducedfixedbehavioralsequencesratherthanflexibleskillhierarchies.
Icarus’closestarchitecturalrelativeisProdigy(Minton,1990),whichinvokesmeans-endsanalysistosolveproblemsandusesananalyticalmethodtolearneithersearch-controlrolesormacro-operatorsfromproblem-solvingtraces.VelosoandCarbonell(1993)alsodescribeanexten-sionthatrecordsthesetracesinmemoryandsolvesnewproblemsbyderivationalanalogywithearlierones.Noneofthesemechanismsgeneratesexplicithierarchicalstructures,butVelosoandCarbonell’sapproachprovidesflexibilitysimilartothatfoundinIcarus,andthetwosystemsrecordandutilizeverysimilarinformationintheirgoalstacks.
Someothersystemssupportlearninginproblem-solvingdomainswithoutmakingstrongar-chitecturalcommitments.RubyandKibler’s(1991)SteppingStonelearnsgeneralizedrulesfordecomposingcomplexproblemsintosimplerones,whichitobtainsthroughmixeduseofexist-ingproblem-reductionrulesandforward-chainingexhaustivesearchwhenitreachesanimpasse.MarsellaandSchmidt’s(1993)systemalsoacquirestask-decompositionrulesthatincorporatepar-tialorderingsamongcomponents.Theirsystemcombinesforwardandbackwardsearchtoidentifycandidatestatepairs,whichinturnproducehypothesizedproblem-reductionrulesthatarerevisedbasedonfurtherexperience.6
PerhapstheclosestrelativetoourapproachisReddyandTadepalli’s(1997)X-Learn,whichacquiresgoal-decompositionrulesfromasequenceoftrainingproblems.Theirsystemdoesnotincludeanexecutionengine,butitgeneratesrecursivehierarchicalplansinamannerthatalsoidentifiesdeclarativegoalswiththeheadsoflearnedclauses.However,becauseitinvokesforward-chainingratherthanbackward-chainingsearchtosolvenewproblems,itreliesonthetrainertodeterminehierarchicalstructure.X-Learnalsousesaquitesophisticatedmixtureofanalyticalandinductivetechniquestodetermineconditionsonskills,ratherthanthemuchsimplermethodthatIcarusincorporates.
AnotherkeydifferencefromX-Learn,PRL,andSteppingstoneisthatIcaruslearnsskillsforuseinreactiveexecutionratherthanforuseinplanning.Therehasbeenotherworkonthistopic,butithasemphasizedtheacquisitionofflatcontrollersratherthanhierarchicalstructures.Forinstance,Benson’s(1995)TRAILlearnsteleoreactivecontrollersforphysicalagents,butitinvokesinductivelogicprogrammingtodeterminerulesforindividualactions.Fernetal.(2004)reportanapproachtolearningreactivecontrollersthattrainsitselfonincreasinglycomplexproblems,butthatalsoacquiresdecisionlistsforactionselection.Khardon(1999)considerstherelatedtaskoflearninghierarchicalcontrollers,buthisformalanalysisassumestheagentisprovidedwithannotatedsamplesolutionsratherthanbeinggeneratedthroughproblemsolving.
Page20LearningHierarchicalSkills
Otherresearchershavebuiltsystemsthatsupportcumulativelearningoutsidethecontextofproblem-solvingtasks.OneearlyexamplewasSammutandBanerji’s(1986)Marvin,whichlearnsincreasinglycomplexlogicalconceptsthatarecomposedofonesithasmasteredpreviously.StoneandVeloso(2000)takeasimilarapproachtolearningconceptsandcontrollersforplayingroboticsoccer,althoughtheirsystemacquiresquitedifferenttypesofstructureateachlevelofdescription.StracuzziandUtgoff’s(2002)STLalgorithmreceivestrainingcasesaboutmanyconceptsinparallel,butitlearnscomplexonesonlywhenithasacquiredsimplerstructuresthatletitmasterthemwithlittleeffort.Pfleger(2004)describesanothersystemthatacquireshierarchicalpatternsinanon-linesetting,inthiscasefromunsuperviseddata.LikeMarvinandSTL,itlearnsconceptualstructuresfromthebottomup,sothatmorecomplexpatternsareapparentaftersimpleroneshavebeenacquired.
8.ConcludingRemarks
Intheprecedingpages,wepresentedIcarus,acognitivearchitectureforphysicalagentsthatusesstoredconceptsandskills,bothorganizedinhierarchies,torecognizefamiliarsituationsandcontrolbehavior.Wedescribedanewmodulethatsupportsmeans-endsproblemsolvingonnoveltasks,alongwithalearningmechanismthatproducesnewskillsfromtracesofproblemsolutions.Thismethodoperatesinanincrementalmanner,creatinghierarchicalstructuresthatrefertootherslearnedearlier.Inaddition,wereportedexperimentswithin-citydriving,theBlocksWorld,andFreeCellthatshowedsuchlearningenablesmoreeffectivebehavioronunfamiliarproblemsthansolvingthemwithonlybasicknowledgeaboutthedomain.
Despitetheseadvances,ourworkonskilllearninginIcarusisstillinitsearlystages.Forinstance,weshoulddemonstrateitsabilitytolearnhierarchicalstructuresbothontraditionalcognitivetaskslikemulti-columnsubtractionandonotherdynamicdomainsthat,likein-citydriving,requiretheintegrationofproblemsolvingwithreactivecontrol.Futureworkondrivingshouldshowthatourmethodsaresufficienttoacquiremorecomplicatedskillsthatinvolveextendedtaskslikepackagedeliveryandcomplexsettingsthatincludeothervehicles.Anotherpromisingclassofdomainsforstudyingskilllearninginvolvestwo-persongameslikechess,whichseemcertaintointroducenewchallengesbecauseoftheirextendedduration.
Inaddition,Icarus’methodsforproblemsolvingandhierarchicallearningwouldbenefitfromnewcapabilities.Wenotedearlierthatthecurrentsystemdoesnotchainbackwardfromthestartconditionsoflearnedskillclauses.Extendingtheproblemsolvertosupportthisabilitywouldmeandefiningnewconceptsthatcharacterizethesituationsinwhichlearnedskillsareapplicable.Thisadditionwouldalsoremedyanotherlimitationofthecurrentsystem,namelyitsinabilitytoaccountfortheoriginofconcepthierarchies,whichitassumesaregiven.Suchanextensionwouldbestraightforwardforsometasks,butotherswillrequiretheabilitytoacquirerecursiveconcepts.Augmentingthesysteminthismannermayalsoleadtoautilityproblem,notduringexecutionoflearnedskillsbutduringtheproblemsolvingusedfortheiracquisition,whichwewouldthenneedtoovercome.
LearningHierarchicalSkillsPage21
Anotherdrawbackisthearchitecture’srelianceonpurelydeductiveinference,whichdiffersmarkedlyfromtheprobabilisticapproachtakenbyitsearliestancestor(Langleyetal.,1991).Futureversionsoftheframeworkshouldextendtherepresentationofconceptsandskillstoin-corporateprobabilities,replacedeductiveprocesseswithabductivemethodsthatmakeplausibledefaultinferences,andaugmentproblemsolvingtooperateoverskillswithuncertainoutcomes.Wehypothesizethatthecurrentmechanismsforlearningthestructureofskillscanbeadaptedeasilytothissetting,butweshouldalsointroducemethodsforestimatingtheprobabilitiesthatannotatethesymbolicstructures.
Weshouldalsonotethat,althoughourapproachlearnsskillsthatgeneralizetosituationswithdifferentnumbersofobjects,itstreatmentofgoalsislessflexible.Forexample,Icaruscanacquireageneralprocedureforclearingablockthatdoesnotdependonthenumberofblocksaboveit,butitcannotlearnaprocedureforconstructingatowerwitharbitrarilyspecifiedcomponents.Extendingthemethod’sabilitytolearnaboutsuchrecursivegoalstructuresisanotherimportantdirectionforfutureresearchthatwillbringthearchitectureintocloseralignmentwiththeabilitiesobservedincomplexhumanlearning.
Acknowledgements
ThisresearchwasfundedinpartbyGrantHR0011-04-1-0008fromDARPAIPTOandbyGrantIIS-0335353fromtheNationalScienceFoundation.DiscussionswithGlennIba,DavidNicholas,StephanieSage,DanShapiro,andJudeShavlikcontributedtomanyideaspresentedhere.
References
Anderson,J.R.(1993).Rulesofthemind.Hillsdale,NJ:LawrenceErlbaum.
Asgharbeygi,N.,Nejati,N.,Langley,P.,&Arai,S.(2005).Guidinginferencethroughrelationalreinforcementlearning.ProceedingsoftheFifteenthInternationalConferenceonInductiveLogicProgramming.Bonn,Germany:Springer.
Benson,S.(1995).Inductionlearningofreactiveactionmodels.ProceedingsoftheTwelfthInter-nationalConferenceonMachineLearning(pp.47–54).SanFrancisco:MorganKaufmann.
Choi,D.,Kaufman,M.,Langley,P.,Nejati,N.,&Shapiro,D.(2004).Anarchitectureforpersis-tentreactivebehavior.ProceedingsoftheThirdInternationalJointConferenceonAutonomousAgentsandMultiAgentSystems(pp.988–995).NewYork:ACMPress.
Choi,D.,&Langley,P.(2005).Learningteleoreactivelogicprogramsfromproblemsolving.Pro-ceedingsoftheFifteenthInternationalConferenceonInductiveLogicProgramming.Bonn,Ger-many:Springer.
Doyle,J.(1979).Atruthmaintenancesystem.ArtificialIntelligence,12,231–272.
Fern,A.,Yoon,S.W.,&Givan,R.(2004).Learningdomain-specificcontrolknowledgefromrandomwalks.ProceedingsoftheFourteenthInternationalConferenceonAutomatedPlanningandScheduling(pp.191–199).Whistler,BC:AAAIPress.
Page22LearningHierarchicalSkills
Forgy,C.L.(1982).Rete:Afastalgorithmforthemanypattern/manyobjectpatternmatchproblem.ArtificialIntelligence,19,17–37.
Jones,R.M.,&Langley,P.(inpress).Aconstrainedarchitectureforlearningandproblemsolving.ComputationalIntelligence.
Iba,G.A.(1989).Aheuristicapproachtothediscoveryofmacro-operators.MachineLearning,3,285–317.
Ilghami,O.,Nau,D.S.,Mu˜noz-Avila,H.,&Aha,D.W.(2002).CaMeL:Learningmethodpre-conditionsforHTNplanning.ProceedingsoftheSixthInternationalConferenceonAIPlanningandScheduling(pp.131–14).Toulouse,France.
Khardon,R.(1999).Learningtotakeactions.MachineLearning,35,57–90.
Laird,J.E.,Rosenbloom,P.S.,&Newell,A.(1986).ChunkinginSoar:Theanatomyofagenerallearningmechanism.MachineLearning,1,11–46.
Langley,P.,Cummings,K.,&Shapiro,D.(2004).Hierarchicalskillsandcognitivearchitectures.ProceedingsoftheTwenty-SixthAnnualConferenceoftheCognitiveScienceSociety(pp.779–784).Chicago,IL.
Langley,P.,McKusick,K.B.,Allen,J.A.,Iba,W.F.,&Thompson,K.(1991).AdesignfortheIcarusarchitecture.SIGARTBulletin,2,104–109.
Langley,P.,&Stromsten,S.(2000).Learningcontext-freegrammarswithasimplicitybias.Pro-ceedingsoftheEleventhEuropeanConferenceonMachineLearning(pp.220–228).Barcelona:Springer-Verlag.
Marsella,S.,&Schmidt,C.F.(1993).Amethodforbiasingthelearningofnonterminalreductionrules.InS.Minton(Ed.),Machinelearningmethodsforplanning.SanMateo,CA:MorganKaufmann.
Minton,S.N.(1990).Quantitativeresultsconcerningtheutilityofexplanation-basedlearning.ArtificialIntelligence,42,363–391.
Mooney,R.J.(1989).Theeffectofruleuseontheutilityofexplanation-basedlearning.ProceedingsoftheEleventhInternationalJointConferenceonArtificialIntelligence(pp.725–730).Detroit:MorganKaufmann.
Nardi,D.,&Brachman,R.J.(2002).Anintroductiontodescriptionlogics.InF.Baaderetal.(Eds.),Descriptionlogichandbook.Cambridge:CambridgeUniversityPress.
Newell,A.(1990).Unifiedtheoriesofcognition.Cambridge,MA:HarvardUniversityPress.
Newell,A.,Shaw,J.C.,&Simon,H.A.(1960).Reportonageneralproblem-solvingprogramforacomputer.InformationProcessing:ProceedingsoftheInternationalConferenceonInformationProcessing(pp.256–264).UNESCOHouse,Paris.
Nilsson,N.(1994).Teleoreactiveprogramsforagentcontrol.JournalofArtificialIntelligenceResearch,1,139–158.
Pfleger,K.(2004).On-linecumulativelearningofhierarchicalsparsen-grams.ProceedingsoftheThirdInternationalConferenceonDevelopmentandLearning.SanDiego,CA:IEEEPress.Reddy,C.,&Tadepalli,P.(1997).Learninggoal-decompositionrulesusingexercises.ProceedingsoftheFourteenthInternationalConferenceonMachineLearning(pp.278–286).SanFrancisco:MorganKaufmann.
LearningHierarchicalSkillsPage23
Richman,H.B.,Staszewski,J.J.,&Simon,H.A.(1995).SimulationofexpertmemoryusingEPAMIV.PsychologicalReview,102,305–330.
Ruby,D.,&Kibler,D.(1991).SteppingStone:Anempiricalandanalyticalevaluation.ProceedingsoftheTenthNationalConferenceonArtificialIntelligence(pp.527–532).MenloPark,CA:AAAIPress.
Sammut,C.,&Banerji,R.B.(1986).Learningconceptsbyaskingquestions.InR.S.Michalski,J.G.Carbonell,&T.M.Mitchell(Eds.),Machinelearning:Anartificialintelligenceapproach(Vol.2).LosAltos,CA:MorganKaufmann.
Shapiro,D.,Langley,P.,&Shachter,R.(2001).Usingbackgroundknowledgetospeedrein-forcementlearninginphysicalagents.ProceedingsoftheFifthInternationalConferenceonAu-tonomousAgents(pp.254–261).Montreal:ACMPress.
Shavlik,J.W.(1989).Acquiringrecursiveconceptswithexplanation-basedlearning.ProceedingsoftheEleventhInternationalJointConferenceonArtificialIntelligence(pp.688–693).Detroit,MI:MorganKaufmann.
Stone,P.,&Veloso,M.M.(2000).Layeredlearning.ProceedingsoftheEleventhEuropeanCon-ferenceonMachineLearning(pp.369–381).Barcelona:Springer-Verlag.
Utgoff,P.,&Stracuzzi,D.(2002).Many-layeredlearning.ProceedingsoftheSecondInternationalConferenceonDevelopmentandLearning(pp.141–146).
Veloso,M.M.,&Carbonell,J.G.(1993).DerivationalanalogyinProdigy:Automatingcaseacquisition,storage,andutilization.MachineLearning,10,249–278.
Page24LearningHierarchicalSkills
Appendix:ConceptsandSkillsProvidedinExperiments
Table5.ConceptsandskillsprovidedtoIcarusforthein-citydrivingdomain,withitalicsdenotingthegoalconcept
andparenthesesindicatingthenumberofclausesfordisjunctiveskills.
primitiveconcepts(15)
parked
aligned-with-lane-in-segmentcentered-in-lane
steering-wheel-not-straightdriving-in-segmentat-speed-for-right-turnready-for-right-turnin-leftmost-lanelane-to-rightlane-to-left
in-rightmost-lanein-right-turn-lane
off-centered-to-right-in-segmentoff-centered-to-left-in-segmentbuilding-on-rightbuilding-on-leftcurrent-building
start-aligned-with-lane-in-segmentstart-centered-in-lane-1start-centered-in-lane-2
start-adjust-speed-for-cruisestart-cruise-within-segmentstart-change-lane-to-rightstart-change-lane-to-leftstart-in-lane-1start-in-lane-2
primitiveskills(19)
LearningHierarchicalSkillsPage25
Table6.ConceptsandskillsprovidedtoIcarusfortheBlocksWorld,withgoalconceptsinitalics.
primitiveconcepts(4)
clear
three-tower
two-tower-one-on-tableunstackablepickupablestackableputdownable
primitiveskills(4)
nonprimitiveconcepts(14)
starthomesuccessorcolcolpairavailable-cellavailable-columnclearon
bottomincellhome
column-to-homecolumn-to-newhomecolumn-to-freecelllastcolumn-to-homelastcolumn-to-newhomelastcolumn-to-freecellfreecell-to-homefreecell-to-newhomefreecell-to-columncolumn-to-columnfreecell-to-new-columncolumn-to-new-column
因篇幅问题不能全部显示,请点此查看更多更全内容