您的当前位置:首页正文

Interleaving Learning, Problem Solving, and Execution in the Icarus Architecture

来源:华佗健康网
InterleavingLearning,ProblemSolving,and

ExecutionintheIcarusArchitecture

PatLangley(langley@csli.stanford.edu)DongkyuChoi(dongkyuc@stanford.edu)SethRogers(srogers@csli.stanford.edu)ComputationalLearningLaboratory

CenterfortheStudyofLanguageandInformationStanfordUniversity,Stanford,CA94305USA

Abstract

Inthispaper,wereviewIcarus,acognitivearchitecturethatutilizeshierarchicalskillsandcon-ceptsforreactiveexecutioninphysicalenvironments.Inaddition,wepresenttwoextensionstotheframework.Thefirstinvolvestheincorporationofmeans-endsanalysis,whichletsthesystemcomposeknownskillstosolvenovelproblems.Thesecondinvolvesthestorageofnewskillsthatarebasedonsuccessfulmeans-endstraces.Wereportexperimentalstudiesofthesemechanismsonthreedistinctdomains.Ourresultssuggestthatthetwomethodsinteracttoacquireusefulskillhierarchiesthatgeneralizewellandthatreducetheeffortrequiredtohandlenewtasks.Weconcludewithadiscussionofrelatedworkonlearningandprospectsforadditionalresearch.

Keywords:incrementallearning,cognitivearchitecture,reactivecontrol,problemsolving,

hierarchicalskills

1.IntroductionandMotivation

Researchoncognitivearchitectures(Newell,1990)attemptstounderstandthecomputationalin-frastructuresthatsupportintelligentbehavior.Aspecificarchitecturecharacterizestheaspectsofacognitiveagentthatremainthesameacrosstimeandoverdifferentdomains,andtypicallymakesstrongcommitmentsabouttherepresentationofknowledgestructuresandtheprocessesthatop-erateonthem.Learninghasbeenacentralconcerninmostarchitecturalresearch,withavarietyofmechanismshavingbeenproposedtomodeltheacquisitionofknowledgefromexperience.Thelearningmethodsembeddedinmostcognitivearchitecturesareincremental,reflectingevidencethathumansacquireknowledgeinthismanner,buttherehavebeenfewaccountsoftheoriginofhierarchicalstructuresthatappearcrucialtocomplexcognition.

InthispaperwereviewIcarus,acandidatearchitecturethatdivergesfromitspredecessorsonanumberofdimensions.Oneimportantdifferenceisthattypicalarchitectureshandleconcep-tualknowledgeinaproceduralmanner,typicallyusingproductionrules,whereasourframeworkcontainsseparatememoriesforconceptsandskills.Anotherdistinctivefeatureisthatmostar-chitecturesarebasedonproductionsystems,whichencodeknowledgeasa‘flat’setofcondition-actionrules,whereasIcarusmakesanarchitecturalcommitmenttothehierarchicalorganizationofknowledge.OnecancertainlyencodehierarchicalstructuresinframeworkslikeACT-R(Anderson,1993)andSoar(Laird,Rosenbloom,&Newell,1986),butthisremainsthemodeler’schoiceratherthanastrongtheoreticalclaim.Inaddition,mostcognitivearchitecturesevolvedfromtheoriesofhumanproblemsolving,whichhasledtosubordinaterolesforperceptionandactioneveninthoseframeworksthatsupportthem.1Incontrast,Icarusisprimarilyanexecutionarchitecturethatperceivesandreactstoexternalenvironments,whichweviewasmorebasicthanproblemsolving.However,Icarus’relianceonhierarchicalstructuresraiseskeyquestionsabouttheirorigin.Moreover,thearchitecture’semphasisonexecutiondoesnotmeanthatmentalactivitieslikeprob-lemsolvingareunimportant,sincetheycanletanagenthandlenoveltasksforwhichstoredknowledgeisunavailable.Thecentralhypothesisofthispaperisthathierarchicalskillsarise,atleastinmanycases,fromproblem-solvingbehavior,andthat,oncelearned,theagentcanusethesestructurestosupportreactiveexecutionintheenvironment.Moreover,thisacquisitionoccursinanincrementalmanner,withnewskillsbeinglearnedgraduallyastheagentencountersnewproblemsitcannothandlewithoutresortingtoproblemsolving.

WerefertoIcarusasa‘cognitivearchitecture’inthesamesensethattheSoarcommunityusesthatexpression.Bothframeworksaimforconsistencywithgeneralknowledgeabouthumancognitionandhopetosupportthesamebroadrangeofabilitiesthatpeopledemonstrate.However,ourcurrentresearchdoesnotattempttomatchatafine-grainedleveltheresultsofpsychologicalexperiments,asdonewitharchitectureslikeACT-R.Wemayaddresssuchissuesinfutureresearch,butfornowweareconcernedwithcoarseregularitiesthatdemandexplanation,suchastheapparenthierarchicalnatureofhumanskillsandtheirincrementalacquisitionfromexperience.

Inthesectionsthatfollow,wereviewIcarus’representationandorganizationofconceptsandskills,alongwiththeinferenceandexecutionprocessesthatutilizethem.Afterthis,wepresent

Page2LearningHierarchicalSkills

anewmodulethatinterleavesmeans-endsproblemsolvingwithexecutionwhenknownskillsareinsufficienttosolveatask.Nextwedescribeamechanismforcreatinggeneralizedskillsfromtracesofsuccessfulproblemsolvingthatsupportsincremental,hierarchicallearning.Wereportexperi-mentswiththislearningmechanismthatdemonstrateitsabilitytogeneralizetonovelsituationsandreduceeffortonnewproblems.Inclosing,wediscussearlierresearchonlearningforproblemsolvingandexecution,alongwithsomedirectionsforfuturework.

2.RepresentationandOrganization

Likeothercognitivearchitectures,Icarusmakescommitmentstoitsrepresentationofknowledge,themannerinwhichthatknowledgeisorganized,andthememoriesinwhichitresides.Followingmosttheoriesofhumancognition,theframeworkdistinguishesbetweenlong-termmemories,whichchangeonlygraduallyduetolearning,andshort-termmemories,whichchangerapidlyastheagentrevisesitsbeliefsandgoals.Inthissection,wediscussIcarus’memoriesandtheformalismsusedtoencodetheircontents.2WewilltakeourexamplesfromtheBlocksWorld,sincemanyreadersshouldfindthisdomainfamiliar.Wehavedescribedtheseaspectsoftheframeworkinmoredetailelsewhere,includingtheiruseinotherdomainslikein-citydriving(Choietal.,2004)andmulti-columnsubtraction(Langley,Cummings,&Shapiro,2004).2.1Long-TermConceptualMemory

OneofIcarus’long-termmemoriesstoresconceptsthatdescribegeneralizedsituationsintheenvironment.Thesemayinvolveisolatedobjects,suchasindividualblocks,buttheycanalsocharacterizephysicalrelationsamongobjects,suchastherelativepositionsofblocks.Long-termconceptualmemorycontainsthedefinitionsoftheselogicalcategories.Eachelementspecifiestheconcept’snameandarguments,alongwithfieldswhichdescribeperceptualentitiesthatmustbepresent,lower-levelconceptsthatmustmatch,lower-levelconceptsthatmustnotmatch,andnumericrelationsthatmustbesatisfied.Table1presentssomeconceptsfromtheBlocksWorld.Forexample,therelationondescribesaperceivedsituationinwhichtwoblockshavethesamexpositionandthebottomofonehasthesameypositionasthetopoftheother.Theconceptclearinsteadreferstoasingleblock,butonethatcannotholdtherelationontoanyother.

DefinitionsofthissortorganizeIcaruscategoriesintoaconceptualhierarchy.Primitivecon-ceptsaredefinedentirelyintermsofperceptualconditionsandnumerictests,butmanyincorporateotherconceptsintheirdefinitions.Thisimposesalatticestructureonthememory,withmorebasicconceptsatthebottomandmorecomplexconceptsathigherlevels.Theresultinghierarchyissim-ilarinspirittodiscriminationnetworkmodelsofhumanmemorylikeEpam(Richman,Staszewski,&Simon,1995),aswellastoframeworkslikedescriptionlogics(Nardi&Brachman,2002).Struc-turally,thislatticebearsacloseresemblancetotheRetenetworks(Forgy,1982)usedformatchinginproduction-systemarchitectures.

LearningHierarchicalSkillsPage3

Table1.SomeIcarusconceptsfortheBlocksWorld,withvariablesindicatedbyquestionmarks.Perceptsrefer

onlytoattributevaluesusedelsewhereintheconceptdefinition.

2.2Long-TermSkillMemory

Icarusalsoincorporatesasecondlong-termmemorythatstoresknowledgeaboutskillsitcanexecuteintheenvironment,includingtheirconditionsforapplicationandtheirexpectedeffects.Eachskillclauseincludesahead(anameandzeroormorearguments)andabodythatspecifiestheconceptsthatmustholdtoinitiatetheskillandoneormorecomponents.Aprimitiveskillclauseindicatesoneormoreordered,executableactions,alongwiththoseconceptsthat,takentogether,describethesituationtheskillproduceswhendone.Aprimitiveskillmayalsostateconditionsthatmustholdthroughoutitsexecution,whichmayrequiremultiplecyclestocomplete.Forexample,Table2showstheskillpickup,whichmustsatisfythestartcondition,(pickupable?block?from),definedinTable1,andinvokes*grasp,whichgraspsablock,and*vertical-move,whichmovesthehandintheverticaldirection.Theskill’sonlystatedeffectistomake(holding?block)true.Incontrast,anonprimitiveskillclausespecifieshowtodecomposethatactivityfurther.Forinstance,Table3includestwoclausesforthenonprimitiveskillclear.Eachindicatesthatexecut-ingtheclausewillachievethatgoal,buttheydifferintheirstartconditionsandintheirsubskills.Nonprimitiveskillclausesdonotspecifyeitherrequiredconditionsoreffects,buttheirheadsalwayscorrespondstoaconceptthattheskillwillachieveuponsuccessfulcompletion.Thisrepresenta-tionalassumptionfigurescentrallyinthelearningmechanismwedescribelater.BecauseIcarusconceptsandskillsutilizeasyntaxsimilartothatfoundintheprogramminglanguageProlog,wehavereferredelsewheretosetsoftheselong-termmemorystructuresasteleoreactivelogicprograms(Choi&Langley,2005).Thisphraseconveysboththeirstructuralsimilaritytotraditionallogicprogramsandtheirabilitytobehavereactivelyinagoal-drivenmanner,followingNilsson’s(1994)notionofateleoreactivesystem.

Page4LearningHierarchicalSkills

Table2.PrimitiveskillsfortheBlocksWorld.Eachclausehasaheadthatspecifiestheskill’snameandarguments,

asetoftypedpercepts,asinglestartcondition,asetofeffects,andasetofexecutableactions(markedbyasterisks).

2.3Short-TermMemories

Inadditiontolong-termmemories,whichencoderelativelystableknowledgeaboutadomain,Icarusfollowsstandardpsychologicaltheorybyincorporatingshort-termstoresthatchangemorerapidly.Thesecontaintheagent’stemporaryperceptionsandbeliefsabouttheenvironment,aswellasitsgoalsandintendedactivities.Theyinclude:

•aperceptualbufferthatholdsdescriptionsofphysicalentitieswhichcorrespondtotheoutputofsensors;fortheblocksworld,thisincludesliteralslike(blockBxpos10ypos2width2height2),whichspecifythepositionandsizeofindividualblocks.•ashort-termconceptualmemorythatcontainsbeliefsabouttheenvironmentwhichtheagentinfersfromitemspresentinitsperceptualbufferandlong-termconceptmemory;forinstance,thismightcontaintheinstance(onBC),whichisaninstanceoftheonconceptinTable1.•ashort-termskillmemorythatcontainstheagent’sgoalsandassociatedskillinstancesitintendstoexecute;eachgoalliteralspecifiesaconcept’snameandargumets,asin(clearA),whereaseachassociatedintentiongivesaskill’snameanditsarguments,asin(stackBC),whichisaninstanceoftheskillstackinTable2.

LearningHierarchicalSkillsPage5

Table3.SomenonprimitiveskillsfortheBlocksWorldthatinvolverecursion.Eachskillclausehasaheadthat

specifiesthegoalitachieves,asetoftypedpercepts,oneormorestartconditions,andasetoforderedsubskills.Numbersaftertheheaddistinguishdifferentclausesthatachievethesamegoal.

Unlikemostcognitivearchitectures,everyelementintheshort-termconceptualandskillmemoriesmustbeaninstanceofsomegeneralizedstructureinthelong-termconceptualandskillmem-ory,respectively;theycannotbearbitrarysymbolicstructures.Wehavediscussedthisstrongcorrespondenceassumptionatmorelengthelsewhere(Langley&Rogers,2005).

3.ConceptualInferenceandSkillExecution

Likemostcognitivearchitectures,Icarusoperatesindistinctcycles.Oneachsuchiteration,thesystemupdatesitsperceptualbufferbysensingobjectsinitsfieldofview,withthespecificsensorsdependingontheparticularenvironmentinwhichtheagentisoperating.Thisprocessproducesperceptualelements,whicharearedepositedintheperceptualbufferandwhichinitiatematchingagainstlong-termconcepts.Thematchercheckstoseewhichprimitiveconcepts(i.e.,thosedefinedentirelyintermsofpercepts)aresatisfied,addseachmatchedinstancetoconceptualshort-termmemory,andrepeatstheprocessonnonprimitiveconceptstoinferhigher-levelbeliefs.

Inthisway,Icarusinfersallinstancesofconceptsthatareimpliedbyitsconceptualdefinitionsandthecontentsoftheperceptualbuffer.Forexample,aBlocksWorldagentwouldfirstupdateitsdescriptionsoftheblocksandthetable,theninferprimitiveconceptslikeon,andfinallyinfercomplexconceptslikeunstackable.Thisbottom-upprocedureoperatesinmuchthesamewayastheRetenetworks(Forgy,1982)usedinmanyproduction-systemarchitecturesandthelogicalinferencemethodsusedinmanytruth-maintenancesystems(e.g.,Doyle,1979).Thedefaultprocessisexhaustive,butelsewherewehavereportedanalternativemechanismthatmakesinferencesmoreselectively(Asgharbeygietal.,2005).

Oneachcycle,thearchitecturealsoexaminestheagent’sgoalsandtheirassociatedintentionsinshort-termskillmemorytodeterminewhich,ifany,applytothecurrentsituation.3Foreachintendedskillinstance,Icarusaccessesallclausesofthegeneralskilltoseeiftheyareapplicable.

Page6LearningHierarchicalSkills

Sincevariablescanbeboundwithinaskill’sbody,thissetmayincludemultiplevariantsofeachskillclausestoredinlong-termmemory.Aprimitiveskillclauseisapplicableif,foritscurrentvariablebindings,itseffectsdonotyethold,itsrequirementsaresatisfied,and,ifthesystemhasnotyetstartedexecutingit,thestartconditionsmatchthecurrentsituation.Ahigher-levelskillclauseisapplicableifitsheadisnotsatisfied,thestartconditionsaresatisfiedifithasnotbeeninitiated,andatleastonesubskillisapplicable.Becausethislattertestisrecursive,askillisapplicableonlywhenIcaruscanfindatleastoneacceptablepathdownwardtoexecutableactions,whichthearchitecturereturnsforinvocation.

Forexample,supposeanIcarusagenthasthegoal(clearA)inasituationwhereblockAisonthetable,blockBisonA,blockCisonB,andthehandisempty.SupposefurtherthattheagenthasaccesstotheprimitiveskillsinTable2andthenonprimitiveonesinTable3.Inthiscase,thesystemwouldfindanapplicablepaththroughtheskillhierarchythatisrelevanttoitsgoal:[(clearA),(unstackableBA),(clearB),(unstackableCB),(clearC),(unstackCB)].Thisholdsbecausetheinstantiatedstartconditionsofeachskillalongthepath(e.g.,(onBA)and(hand-empty)forthetopmostskill)arepresentinconceptualshort-termmemory.Ifselected,(unstackCB)wouldaltertheenvironment,makingthepath[(clearA),(unstackableBA),(clearB),(unstackableCB),(hand-empty),(putdownCT)]acceptableonthenextcycle.Thiswouldproduceabeliefstatethatenablesthenextstepintheprocedure,whichwouldcontinueuntiltheagenthadsatisfieditstop-levelgoal,(clearA).

Duringskillselection,Icarusincorporatestwopreferencesthatprovideabalancebetweenreac-tivityandpersistence.Whenconfrontedwithachoicebetweentwoormoresubskills,itselectsthefirstalternativeforwhichtheheadisnotsatisfied.Thissupportsreactivecontrol,sincethesystemreconsiderspreviouslycompletedsubskillsand,iftheireffectsnolongerholdforsomereason,reex-ecutesthemtoremedytheproblem.Ontheotherhand,whenencounteringtwoormoreapplicableskillpaths,Icarusselectstheonethatsharesthemostelementsfromthestartofthepathexecutedonthepreviouscycle.Thisencouragesthesystemtocontinuingexecutingahigh-levelskillithasalreadystarteduntilthatskillachievesitsassociatedgoaloruntilitbecomesinapplicable.

4.Means-EndsProblemSolving

Asjustexplained,Icaruscanexecutecomplexhierarchicalskillsinareactivemanner,butourinitialstudies(e.g.,Choietal.,2004;Langleyetal.,2004)assumedthattheseskillsarealreadypresentinlong-termmemory.Althoughmuchhumanbehaviorappearstoinvolvetheapplicationofsuchroutineskills,peoplecanalsosolvenoveltasksthatrequirethedynamiccombinationofexistingknowledgeelementsthroughsomeformofheuristicproblemsolving.

TomodelthiscapabilityinIcarus,wehaveintroducedavariantofmeans-endsanalysis(Newell,Shaw,&Simon,1960)thatoperatesoverthearchitecture’sknowledgestructures,includingbothlong-termconceptsandskillsprovidedbytheprogrammerandshort-termbeliefsandgoalspro-ducedbythearchitecture.Traditionalmeans-endsproblemsolvingselectssomeunsatisfiedaspectofthegoalstatetoachieve,thenselectsanoperatorthatwouldachieveit.Ifthatoperator’spreconditionsmatchthecurrentstate,itisapplied;otherwise,themethodselectsanunsatisfied

LearningHierarchicalSkillsPage7

preconditiontoachieve,selectsanoperatorthatwouldachieveit,andsoon.Onceaconditionismet,theprocessisrepeateduntiltheoriginalgoaldescriptionissatisfied.Thismayrequiresearch,whichisoftenpursuedinadepth-firstmanner.Means-endsanalysishasbeenimplicatedrepeatedlyinhumanproblemsolvingonnoveltasks.

Tosupportthismechanism,ourextendedversionofIcarusaugmentstheshort-termskillmem-orywithagoalstack.Eachelementinthisstackspecifiesagoal(adesiredconceptinstance),whethertheagentintendstoachieveitbybackwardchainingoffaconceptdefinitionoraskillclause,and,inthelattercase,theskillinstancethat,ifexecuted,shouldachieveit.Eachgoalelementalsospecifiessubgoalsthathavealreadybeenachieved,alongwithskilland/orconceptinstancesthatithastriedinreachingthisgoalbutthathavefailed.Thefirstareneededtokeepthesystemfromconsideringskillsthatwouldundoitspreviousaccomplishments,whereasthesec-ondensuresitdoesnotrepeatearliermistakes.Wealsoassumethatboththestartconditionsofprimitiveskillsandtop-levelgoalsmustbecastassinglerelationalliterals,whichcausesnolossingenerality,sinceeithermaybedefinedconcepts.

WehavealsoextendedtheIcarusinterpretertotakeadvantageofthesenewmemorystructures.Oneachcycle,thesystemtakesoneoffivedistinct,orderedsteps:

1.IfthecurrentgoalGofthegoalstackGSissatisfied,thenpopGfromGSandstoreinformation

aboutthesuccesswithG’sparent.2.IfthegoalstackGSdoesnotexceedthedepthlimitandthereareapplicableskillpathsthat

startfromaskillinstancewiththecurrentgoalGasitshead,thenselectonesuchpathandexecuteit.3.IfthereisanonemptysetofprimitiveskillinstancesinwhichthecurrentgoalGisaneffectthat

havenotalreadyfailed,thenselectaskillinstancefromthissetandpushitsstartcondition(whichweassumesubsumesanyrequiredconditions)ontothegoalstackGS.4.IfthecurrentgoalGisaninstanceofacomplexconceptwithunsatisfiedsubconceptsHand

withsatisfiedsubconceptsF,thenifthereisasubconceptIinHthathasnotyetfailed,pushIontothegoalstackGS.5.OtherwisepopthecurrentgoalGfromthegoalstackGSandstoreinformationaboutthe

failurewithG’sparent.Weassumethateachoftheseactivitiestakesasinglecycleofthearchitecture,withtheinitialsituationbeingaspecialcaseofthethirditemthattriggerstheprocess.Becausereasoningabouthowtoachieveanobjectivecanrequiremanymanipulationsofthegoalstack,ittakesmorecyclesthanexecutingastoredhierarchicalskillforthatgoal,evenwhentheagentfindsasolutiononitsfirstattemptanddoesnothavetobacktrack.

Figure1showsasuccessfultraceoftheproblemsolver’sbehavioronasimpleBlocksWorldtaskinwhichwhenthegoalis(clearA)andwhenblockAisonthetable,blockBisonA,blockCisonB,andthehandisempty.Inthissituation,thesystemlooksforexecutableskillswiththisgoalasitsheadbut,whenthisfails,itconsidersskillsthathavethegoalasoneofitseffects.Inthiscase,invokingtheprimitiveskillinstance(unstackBA)wouldproducetheintendedresult,butit

Page8LearningHierarchicalSkills

Figure1.AtraceofsuccessfulproblemsolvingintheBlocksWorld,whichellipsesindicatinggoalsandrectangles

denotingprimitiveskills.

cannotbeappliedbecauseitsinstantiatedstartcondition,(unstackableBA),doesnothold.Inresponse,theproblemsolverstorestheskillinstancewiththeinitialgoalandpushesthesubgoalontothegoalstack.

Next,theproblemsolverattemptstoretrieveskillsthatwouldachieve(unstackableBA).How-ever,becauseithasnosuchskillsinmemory,itresortstochainingoffthedefinitionofunstackable.Thisinvolvesthreeinstantiatedsubconcepts–(clear),(onBA),and(hand-empty)–butonlythefirstoftheseisunsatisfied,sothesystempushesitontothegoalstack.Thisinturnleadsittoconsiderskillsthatwouldproducethisliteralasaneffectandretrievestheskillinstance(unstackCB),whichitstoreswiththecurrentgoal.Inthiscase,thestartconditionoftheselectedskill,(unstackableCB)alreadyholds,sothesystemexecutes(unstackCB).Theassociatedactionsaltertheenvironmentandcausetheagenttoinfer(clearB)fromitspercepts.Inresponse,itpopsthisgoalfromthestackandreconsidersitsparent,(unstackableBA).However,thishasnotyetbeenachievedbecauseexecutingtheskillhascausedthethirdofitscomponentconceptinstances,(hand-empty),tobecomefalse.Thus,thesystempushesthisontothestackand,uponinspectingmemory,retrievestheskillinstance(putdownCT),whichitcananddoesexecute.

Thissecondstepachievesthesubgoal(hand-empty),whichinturnletstheagentinfer(unstack-ableBA).Thus,theproblemsolverpopsthiselementfromthegoalstackandexecutestheskillinstanceithadoriginallyselected,(unstackBA),inthenewsituation.Uponcompletion,thesystemperceivesthatthealteredenvironmentsatisfiesthetop-levelgoal,(clearA),whichleadsittohalt,sinceithassolvedtheproblem.

LearningHierarchicalSkillsPage9

Forthesakeofclarity,bothourdescriptionandFigure1presentthetraceofsuccessfulproblemsolving,butfindingsuchasolutionmayinvolvesearch.Whenbackwardchainingoffskillsthatwouldachievetheobjectiveofthecurrentstackentry,Icarusconsidersonlyskillinstancesthathavenotyetfailed.Thesystemalsopreferscandidatesthathavethefewestexpandedstartcondi-tionsthatareunmetbythecurrentenvironmentalstate,withfullymatchedconditionsbeingmostdesirable.Ifcandidatestieonthiscriterion,itselectsanalternativeatrandom.Whenbackwardchainingofftheunmatchedelementsofaconceptdefinition,thesystemselectssubgoalsatrandomaftereliminatingthosewhichhavefailedinthepast.

Takentogether,thesebiasesproduceaheuristicversionofmeans-endsanalysis.However,thisproblem-solvingmethodistightlyintegratedwiththeexecutionprocess.Icarusbackwardchainsoffconceptorskilldefinitionswhennecessary,butitexecutestheskillassociatedwiththetopstackentryassoonasitbecomesapplicable.Moreover,becausethearchitecturecanchainoverhierarchicalreactiveskills,theirexecutionmaycontinueformanycyclesbeforeproblemsolvingisresumed.Incontrast,mostmodelsofhumanproblemsolvingandmostAIplanningsystemsfocusonthegenerationortheexecutionofplans,ratherthaninterleavingthetwoprocesses.

Ofcourse,executingacomponentskillbeforeconstructingacompleteplancanleadanagentintodifficulties,sinceonecannotalwaysbacktrackinthephysicalworld.Thisstrategymaywellleadtosuboptimalbehaviors,buthumanintelligenceismoreaboutsatisficingthanoptimizing,andinterleavingproblemsolvingwithexecutionrequiresfarlessmemorythanconstructingafullplanbeforeexecutingit.However,itcanproducesituationsfromwhichtheagentcannotrecoverwithoutstartingtheproblemover.

Insuchcases,Icarusstoresthegoalelementforwhichitsexecutedskillcausedaproblem,alongwitheverythingbelowitinthestack.Thesystembeginstheproblemagain,thistimeavoidingtheskillandselectinganotheroption.Ifitmakesadifferentexecutionerrorthistime,itagainstorestheproblematicskillanditscontext,thenstartsoveroncemore.Icarusalsostartsoverifithasnotachievedthetop-levelobjectivewithinaspecifiednumbercycles.Suchrepeatedattemptsatsolvingatask,withselectedmemoryaboutpreviouspasses,seemsabettermodelofhumanproblemsolvingthansystemsthatconstructacompleteplanbeforeexecution.JonesandLangley’s(inpress)modelofmeans-endsproblemsolving,Eureka,usedasimilarrestartstrategy,butitkeptnoexplicitrecordofpreviousfailedpaths.

5.LearningHierarchicalSkillsfromProblemSolving

Inthepreviouspages,wedescribedtwofacetsofIcarus:itsexecutionofhierarchicalskillsonfamiliartasksanditsuseofproblemsolvingtohandlenovelones.Thefirstletsthesystemoperateefficiently,butskillsaretedioustoconstructmanually,whereasthesecondgivesthesystemflexibilitybutrequiresreasoningandmeans-endssearch.Webelievethathumansalsohavebothcapabilities,butthattheyuselearningtotransformtheresultsofsuccessfulproblemsolvingintohierarchicalskills.WewouldliketoincorporateasimilarcapabilityintoIcarus.

However,wewantourlearningmechanismstoreflectcertainpropertiesthatappeartoholdforhumanskillacquisition.Oneisthatlearningshouldtakeadvantageofexistingknowledge,suchas

Page10LearningHierarchicalSkills

thedefinitionsofcurrentskillsandconcepts.Inaddition,acquisitionshouldbeincremental,inthatitlearnsfromeachnewexperience,andinterleavedwiththeproblem-solvingprocess.Therecentliteratureoncomputationallearningcontainsfewcasesofsuchknowledgeacquisition,althoughinSection7wediscusssomeolderworkthathasthischaracter.

OurextensionofIcarusachievesthiseffectthroughaformofimpasse-drivenlearningthatistiedcloselytoitsproblem-solvingandexecutionprocesses.Forthisreason,thelearningmechanismsrequirenoadditionalinputsbeyondthoserequiredforthesebasicperformanceprocesses.AsinSoar(Lairdetal.,1986),thepurposeofskilllearningistoavoidsuchimpassesinthefuture.Thus,wheneverthearchitectureachievesanobjectivethatisassociatedwithanentryinthegoalstack,thissuccessprovidesanopportunityforlearning.Thesystemacquirestwodistinctformsofskillthataretiedtodifferentaspectsofproblemsolving.

ThefirstclassofskillsresultfromsituationsinwhichtheproblemsolvercannotfindaskilltoachieveagoalG,andthuspursuessubgoalsbasedontheunsatisfiedconditionsofG’sconceptualdefinition.Iftheagentachievesthesesubgoalsintheorder{G1,G2,...,Gn},thussatisfyingtheparentgoalG,IcarusconstructsanewskillclausethathasGasitsheadandthathas{G1,G2,...,Gn}asitsorderedsubskills.4ThestartconditionsofthenewclausearesimplythosesubconceptsofGthatweresatisfiedwhenitwaspushedontothegoalstack.Thehead,conditions,andsubskillshavetheirargumentsreplacedbyvariablesinaconsistentmanner,ensuringapplicabilitytoanalogoussituationsthatinvolvedifferentobjects.

Forexample,uponachievingthesubgoal(unstackableBA)inFigure1,thesystemconstructstheunstackableskillclauselabeled3inTable3.Thehead(unstackable?B?A)isageneralizedversionofthegoal(unstackableBA),whereastheorderedsubskills(clear?B)and(hand-empty)aregeneralizedversionsofitstwosubgoals(clearB)and(hand-empty).Thestartconditionsare(on?B?A)and(hand-empty),whicharegeneralizedversionsofthesubconceptsthatheldwhenthegoalwasestablished.Finally,the:perceptsfieldspecifiesthetypesforobjectsthatserveasthehead’sarguments.Thismechanismconstructsdifferentvariantsofaskill,withseparatestartconditionsanddistinctsubskills,fromsubproblemsthatinvolvedifferentinitialconditions.ThesecondcategoryresultsfromsituationsinwhichIcarushasselectedaprimitiveskillinstanceS2inordertoachieveagoalG,butfounditssinglestartconditionG2unsatisfiedandselectedanotherskillinstance,S1,toachieveit.Oncetheagenthasexecutedbothskillssuccessfullyandithasreachedthegoal,thesystemconstructsanewskillclausethathasGasitsheadandthathasG2(ratherthanthespecificclauseS1)andS2asorderedsubskills.ThestartconditionsaresimplythestartconditionsoftheS1clauseusedinthesubproblemsolution,whicharesufficientbecausetheproblemsolverS1selectedittoachievesthestartconditionofS2,whichinturnachievesthegoalG.Again,specificargumentsarereplacedconsistentlybyvariables.

Forinstance,uponachievingthetop-levelgoal(clearA)inFigure1,Icaruscreatestheclearskillclauselabeled4inTable3.Thisincorporatesageneralizedversionof(clearA)asitshead,alongwithvariableizedversionsof(unstackableBA)and(unstackBA)asitstwoorderedsubskills.Thestartconditions,(on?B?A)and(hand-empty),arethesameasthoseforunstackableclause

LearningHierarchicalSkillsPage11

3justdiscussed,sincethelatterwascreatedtoachievethestartconditionofunstackunderthosesameconditions,whichinturnsatisfiesthegoalclear.

Bothlearningmechanismsarefullyincremental,inthateachlearningeventdrawsonasingleproblem-solvingexperienceandthusrequiresnomemoryofpreviousones.Theysupportwithin-triallearning,sinceskillsacquiredononesubproblemmaybeusedtohandlelatersubproblems.Theprocessesalsobuildonexistingknowledge,sincetheconstructionofnewskillclausesinvolvesthecompositionofthoseusedinatrainingproblem’ssolution.Takentogether,thesesupportaformofcumulativelearning,inwhichIcaruslearnsskillsononeproblem,usesthemtosolvealaterproblem,andincorporatesthemintostillhigher-levelstructures.

Assuggestedbyourexamples,theselearningmethodscanacquirebothdisjunctiveandrecursiveskills.Thekeytothisabilityliesintheassumptionthatacquiredskillclauseswhichachievethesamegoalshouldbegiventhesamehead.Byindexingskillsinthismanner,Icarusknowswhentwoormoreclausesshouldbestoredtogether,whichleadsinturntothecreationofskillsthatcallonthemselves,eitherdirectlyorthroughintermediateskills.Thismakesthearchitecture’slearnedskillsconsiderablymoreflexibleandgeneralthantraditional‘macro-operators’(e.g.,Iba,1988)orcomposedproductionrules(e.g.,Neves&Anderson,1981).

Ofcourse,thecreationofdisjunctiveandrecursivestructureshaspotentialforovergeneralization,asdemonstratedbyresearchontheinductionofcontext-freegrammars(e.g.,Langley&Stromsten,2000).Ourtechniquefordeterminingthestartconditionsonnewskillclausesismuchsimplerthanstandardtechniquesforanalyticallearningorruleinduction.Infact,atfirstglance,thelearnedclausesinTable3appearhighlyovergeneral,butthisignoresthefactthatIcarusdoesnotinterpretskillsinisolation.Recallthatthearchitecturemustfindanentirepaththroughtheskillhierarchybeforeitcanexecutetheprimitiveskillatitsterminus.Thismeansthesystemcollectsconditionsdynamically,asitdescendsthehierarchy,guardingagainstovergeneralizationbycarryingoutlimitedanalysisatperformancetimeratherthandoingitallatlearningtime.Unlikesomeapproachestoincrementallearning,Icarus’methodsrequirenoadditionalmecha-nismsforskillrefinement.Eachskillclauseisgeneralizedwhenthearchitectureconstructsit,anditsstartconditionsareassumedtobeaccurate.Theknowledgeitacquiresfromsolvingagivenproblemmaywellbeincomplete,butthiswillsimplyleadtofurtherimpassesthatproduceaddi-tionallearning.Skillclausesacquiredlatercomplement,butdonotcompetewith,thoselearnedearlierbecausetheycoverdifferentsituationsortheolderclauseswouldhaveavoidedtheimpasse.Thus,learningispurelymonotonic,asinframeworkslikeSoar.

Weshouldnotethatourcurrentimplementationrestrictstheuseoflearnedskillsinfutureproblemsolving.Inparticular,wehaveadoptedMooney’s(1989)ideathatoneshouldnotchainoffthepreconditionsoflearnedskills.Thisdoesnotrestricttheirusebytheexecutionmodule,butitdoesmeanthattheproblemsolverconsidersalearnedskillonlywhenitsstartconditionsarealreadysatisfied.Asaresult,clausesacquiredfromchainingoffskillsalwayshavealeft-branchingstructureinwhichthesecondsubskillisprimitive.Thisassumptionmayseemrestrictive,but,likeMooney,webelieveitprovidesaneffectiveguardagainsttheutilityproblem(Minton,1990),inwhichthecreationanduseofcomplexstructuresreducessearchbutactuallyslowsperformance.

Page12LearningHierarchicalSkills

6.ExperimentalStudiesofSkillLearning

Althoughthenewmethodsforlearninghierarchicalskillsseemplausible,whethertheyimproveanIcarusagent’sperformanceisanempiricalquestion.Inthissection,wereporttheresultsofbasictestsofthesemechanismsonthreedistinctdomains:in-citydriving,theBlocksWorld,andFreeCellsolitaire.Afterthis,wereportmoresystematicexperimentswiththedomainsthatexaminetheeffectsoflearninginmoredetail.Asonemeasureofperformance,weusedthenumberofrecognize-actcyclesrequiredtosolvetheprobleminthesimulatedenvironment,includingbothproblemsolvingandexecutionsteps.However,wealsomeasuredtheCPUtimerequiredtosolveeachproblem,todeterminewhetherIcarussuffersfromtheutilityproblem.6.1DomainsandBasicDemonstrations

Toensurethatourapproachtolearninghierarchicalskillsoperatedasintended,wedevelopedIcarusprogramsforthethreedomains.Ineachcase,weprovidedasetofprimitiveskillssufficientforsolvingproblemswithmeans-endsanalysisandasetofhierarchicalconceptssufficientforrecognizingsituationsthatwererelevanttoexecutingthoseskills.Forexample,wedevisedsome41conceptsand19skillsforthein-citydrivingdomain,11conceptsandfourskillsfortheBlocksWorld,and24conceptsand12skillsforFreeCellsolitaire.TheAppendixgivesthenamesoftheprimitiveconcepts,nonprimitiveconcepts,andprimitiveskillsprovidedforeachdomain,whichshouldalsosuggesttheirfunction.Inaddition,wealsoprovidedthearchitecturewithasetofsensorsandeffectorsforeachsimulatedenvironment.

WehavealreadydiscussedtheBlocksWorld,butbothin-citydrivingandFreeCellmeritsomeexplanation.Thefirstdomaininvolvesadynamicsimulationofadowntowndrivingenvironment.Thecitycontainsobjectsrepresentedasrectanglesofdifferentsizes,includingbuildingsandside-walksorganizedintosquareblocksthataredividedbystreetsegmentsandintersections.Eachsegmentincludesayellowcenterlineandwhitedottedlanelines,andithasamarkedstreetnameandspeedlimit.Eachbuildingshasauniquestreetaddresstohelptheagentnavigatethroughthecityandtosupporttaskslikepackagedelivery.Thecityconfigurationusedinourexperimentshasnineblockswithfourverticalstreetsandfourhorizontalstreets.TheIcarusagentmustoperateunderphysicallawsandfollowtherulesofdriving,suchasstayingontherightsideofthestreetandturningfromtheproperlane.Weprovidetheagentspecificwithgoalstoachieve,suchasgettingontoanotherstreetsegmentordeliveringapackagetoacertainaddress.

FreeCellsolitaireinvolveseightstacksofcards,thefirstfourofwhichcontainsevencardsandthelastfourcontainsixcards.All52cardsaredealtfaceup,makingthemvisibletotheplayer.Inaddition,therearefourfreecells,whichcanserveastemporaryholdingspotsforonecardeachduringthegame,andonefoundationcellforeachsuit.ThegoalinFreeCellistogetallcardsonthefoundationcellsinascendingorder(wheretheaceisoneandthekingisthirteen)groupedbysuit.Onceonitsfoundationcell,acardcannotberemoved.Onlyfully-exposedcardsatthetopofeachstackandcardsthatinthefreecellsareinplay.Theagentcanmoveonecardatatimetoanavailablefreecell,totheappropriatefoundationcell,toanemptystack,ortoastackinwhichthetopcardhasadifferentcolorandvalueonehigherthanthemovedcard.

LearningHierarchicalSkillsPage13

Samplerunswiththein-citydrivingdomain,theBlocksWorld,andareducedversionofFreeCellindicatedthattheextendedversionofIcaruswasabletosolveproblemsintheirrespectivedomainswithsomesearchand,fromtheirrespectivetraces,learnhierarchicalskillsinthemannerdescribedearlier.Wefoundthat,whengiventhesametasktosolveasecondtime,thesystemutilizedthisknowledgetohandleitwithoutproblemsolving.Moreover,becausethesystemgeneralizesitslearnedstructuresbeyondthespecificinstancesonwhichtheyarebased,theytransferfullytoanytasksthatareisomorphictothoseithasalreadysolved.Theonlyconstraintisthatthisisomorphismmustinvolvethesamegoalandhavethesameconceptssatisfiedorunsatisfiedintheinitialenvironment.

However,weshouldnotethisabilitydoesnotmeanthatthesystemcancompleteafamiliarprobleminasinglecycle.Recallthat,traditionalworkoncognitivearchitectures,Icarusresortstoproblemsolvingonlytoenableaction,anditmuststillexecuteitsacquiredskillstoachieveagoal.Thus,foraproblemthatrequiresfourprimitivesteps,thesystemtakessixcyclesonthesecondencounter,withonetoretrievethehierarchicalskillandonetorealizeithasfinished.However,theagentrequiresneithersearchorbackwardchainingoverskillsorconceptstocompleteanyproblemithassolvedpreviously.6.2ExperimentwithIn-CityDriving

Althoughtheseinitialrunswereencouraging,wedesiredmorethananecdotaldemonstrationsthatthenewmechanismssupportedincrementallearningofhierarchicalskills.Wealsowantedevidencefromsystematicexperimentsthatthislearnedknowledgeproducesmoreeffectivebehavior.Ourfirststudyalongtheselinesfocusedonin-citydriving,whichisthemostdynamicofthethreesettingsandthustheonemostappropriateforevaluatingourmethodsforlearningskillsthatsupportreactiveexecution.

Asnotedabove,weprovidedIcaruswith41conceptsand19primitiveskillsrelevanttothisenvironment.Withthebasicknowledge,theagentcancharacterizeitssituationatmultiplelevelsofabstractionandperformactionsforaccelerating,decelerating,andsteeringleftorrightatrealisticangles.Thus,itcanoperateavehicle,butthisisnotsufficienttodrivesafelyinacityenvironment.Theagentmuststilllearnskillsforstayingalignedandcenteredwithinlanelines,changelanes,increaseordecreasespeedforturns,andstopforparking.

Toencouragesuchlearning,wepresentedtheagentwiththegoalofdrivingonadifferentstreetsegmentthanitscurrentone.Toachievethisobjective,itresortedtoproblemsolving,whichfoundasolutionpaththatinvolvedchangingtotherightmostlane,stayingalignedandcentereduntiltheintersection,steeringrightintothetargetsegment,turningthecorner,andfinallyaligningandcenteringinthenewlane.WelettheIcarusagentpracticethistaskforfivetrialstoexamineitsimprovementwithexperience.Werepeatedthisproceduretendifferenttimeswithslightlydifferentstartingpositions,collectedperformancemeasuresforeachrun,andaveragedtheresults.Figure2showsthetotalnumberofcyclesasafunctionofthenumberoftrials,alongwiththenumberofplanningandexecutioncyclesrequiredtoachievethegoal.Astheagentaccumulatesknowledgeaboutthistask,problemsolvingdisappearsalmostentirely,whichcausesthereduced

Page14LearningHierarchicalSkills

Number of cycles required200250300150totalplanningexecution

00

5010012345

Number of trials

Figure2.Thetotalnumberofcyclesrequiredtosolveaparticularright-turntaskalongwiththeplanningand

executiontimes,asafunctionofthenumberoftrials.Eachlearningcurveshowsthemeanovertensetsoftrialsand95percentconfidenceintervals.

numberoftotalcycles.However,thisproblemisdominatedbyexecutiontime,sincetheagentmustactuallydrivethevehicletoitsdestination.Executioncyclesappeartoincrease,whichoccursnotbecausethelearnedskillsareinefficientbutratherbecausetimeprogressesevenduringproblemsolving.Thus,thevehiclemovesintherightdirectionduringthisperiodintheearlytrials,reducingthedistancethatremainstotravel.Asproblemsolvingbecomesunnecessary,theagentdrivesthisextradistanceunderconsciouscontrolratherthanaccidentally.CPUtimeremainedapproximatelythesamewithincreasedexperience,presumablyforthesamereasons.

Table4showsthefiveskillclausesacquiredduringoneoftheseruns.Thetwoclausesfordriving-in-segmentspecifydifferentdecompositionsforachievingthistop-levelgoalunderalternativestartconditions.Thesecondofthesereferstotheclauseforin-segment,whichreferstothelearnedsub-skillforin-intersection-for-right-turnandtheprimitiveskillsteer-for-right-turn.Theformerreferstoin-rightmost-lane,whichinvokestheprimitiveskillclausedriving-in-segment,butitalsocallsonitselfrecursivelywithdistinctarguments.Forclarification,thetablealsopresentstheprimitiveclauseforin-intersection-for-right-turn,whichthesystemwasgivenasbackgroundknowledge.Figure3showsatraceoftheagent’sbehavioronthetaskduringlearning,inasituationthatinvolvesastreetwithtwolanes,andafterwards,inasettingthatinsteadinvolvesthreelanes.Thetraceofthevehicle’smovementdemonstratesthatthelearnedskillsgeneralizetocasesthatinvolvemorelanesthanwerepresentduringtraining.Thisabilityfollowsdirectlyfromtherecursivestruc-tureofthelearnedin-intersection-for-right-turnclause.Behaviorafterlearningisalsosmoother,presumablybecausetheagentneednotengageinproblemsolvingwhenitovershootsslightlyaftergettingintothetargetlaneinpreparationfortherightturn.6.3ExperimentwiththeBlocksWorld

AlthoughtheBlocksWorldisfarlessdynamicthanin-citydriving,itlendsitselftoscalingstudiesthatinvolvegeneralizationtotaskswithvaryingnumbersofobjects.Forthisdomain,weprovidedIcaruswiththefourprimitiveskillsinTable2and11conceptsthatweresufficient,inprinciple,

LearningHierarchicalSkillsPage15

Table4.Fiveskillclauseslearnedforin-citydriving,alongwithaprimitiveskillforthesamedomain.

Page16LearningHierarchicalSkills

Figure3.AtraceoftheIcarusdrivingagent’sbehavior,duringandafterlearning,onataskthatrequiredchanging

totherightmostlaneandturningattheintersection.Thetracedemonstratesgeneralizationtoanewsettingwithadifferentnumberoflanes.

tosolveanyproblem.Wethenpresentedtheagentwiththeproblemsinsequence,usingeachtaskasatrainingproblembutalsorecordingthenumberofcyclesandCPUtimerequiredtocompleteit.Becausemisguidedsearchcombinedwithexecutioncanleadtheproblemsolverintoundesirablephysicalstates,wetoldittohaltifithadnotfinishedarunwithin100cyclesandtostartoverfromtheinitialstate.However,theagentcouldattemptagivenproblemonlyfivetimes,andthusspendatmost500cyclesbeforegivingupentirely.Wealsolimitedthestackdepthtotengoalelements.Weenforcedtheseconstraintsforreasonsofpracticalityandbecausewethinktheyreflectthemannerinwhichhumanstacklenovelproblems.

WegeneratedrandomlyasetofrandomBlocksWorldtasksthatinvolvedsettingswith5,10,15,20,25,and30blocks.Eachcomplexityclasshad67to69distinctproblems,whichweorderedbydifficultyclass(five-blocktasksfirstand30-blocktaskslast).Theintuitionwasthatthesystemwouldlearnmoreeffectivelyifwepresenteditfirstwithsimplerproblems,whichitcouldthenuseinsolvingmoredifficultones.Tothisend,Icarusretainedskillsacquiredonsuccessfulrunsforuseinlatertasks.Weprovidedthesystemsome400randomlygeneratedproblemordersandrecordedthenumberofcyclesandCPUtimesneededforeachtask.Asacontrol,wealsoranthesystemwithitslearningmechanismsoffforanother400problemsetsthatwereorderedrandomlywithindifficultyclasses.Becausetheproblemsrequiredifferentamountsofeffort,traditionallearningcurvesarenotveryinformative.Instead,followingMinton(1990),wereportcumulativecyclesandCPUtimesasafunctionofthenumberoftrainingproblems.

Figure4showstheresultingcurves,including95percentconfidenceintervalsaroundeachmean.Asexpected,thecurvesmainlytaketheformofsuperlinearfunctionswhoseslopesincreasewithproblemdifficulty.Althoughthelargescaleofplotsmadethelearningandnon-learningcurveslook

LearningHierarchicalSkillsPage17

26300Number of cycles requiredlearning onlearning off

CPU time required41000learning onlearning off

21040157801052052600067134201

268335402

Number of problems encountered

00

820016400246003280067134201

268335402

Number of problems encountered

Figure4.CumulativenumberofcyclesandCPUtimesrequiredbyIcarustosolveaBlocksWorldtaskasafunction

ofthenumberofproblemsencountered,averagedover400runsandwithproblemsorderedbydifficulty,withthegoalstackofsizeten.Tickmarksonthehorizontalaxisindicateshiftsinproblemcomplexity.

similarforearlypartsofthecurves,therewassomebenefitforlearningevenfromthebeginning,butthedifferencegrowssubstantiallyasthesystemsencounterharderproblems.Clearly,priorexperiencereducessearchsubstantiallywhenitreachesproblemswithmanyblocks,andthereisnoevidencethatlearningproducesautilityproblem.Rememberthatwehavemadethetransferoflearnedknowledgechallenginginthatnoneoftheproblemsareisomorphic,althoughtheymayinvolveisomorphicsubtasks.TheresultsindicatethatIcaruscantakeadvantageofthissimilarsubstructuretoreduceitseffortonlaterproblems.6.4ExperimentwithFreeCellSolitaire

ToensurethatourconclusionsheldformorethantheBlocksWorld,wecarriedoutasimilarexperimentwithFreeCellsolitaire,whichwedescribedearlierinthissection.WegavetheIcarusagentonlythe12basicskillsneededtomovecardsandthetop-levelgoalofgettingallcardsintofoundationcells,alongwith24conceptsfordescribingsituations.UnliketheBlocksWorld,thisdomainhasonlyonegoalcondition,butitstillhasmanypossiblestartingstates.

Forthisstudy,werandomlygenerated20problemseachthatinvolved8,12,16,20,and24cards.5Weranthesystemon300differentsequencesoftasks,withsimplerproblemsbeingpresentedearlierbutorderedrandomlywithineachofthefivedifficultyclasses.Asbefore,weexpectedthattheagentwouldlearnskillsfromtheeasierproblemsthatwouldassistontheharderones,thusreducingproblem-solvingeffort.Forcomparison,wepresentedanother300randomsequencestoanon-learningsystemwiththesameinitialskillsandconcepts.

Figure5presentsthecumulativeresultsforthisexperiment,witherrorbarsthatindicatethe95%confidenceintervals.AsintheBlocksWorld,thedifferencebetweenthelearningandnon-learningconditionsissubstantial.However,problemswith20cardsormorerequireadifferent

Page18LearningHierarchicalSkills

23000Number of cycles requiredCPU time required96000learning onlearning off

learning on

18400learning off

13800002040

6080100Number of problems encountered

00

19200460038400920057600768002040

6080100Number of problems encountered

Figure5.CumulativenumberofcyclesandCPUtimesrequiredbyIcarustosolveaFreeCelltaskasafunctionof

thenumberofproblemsencountered,averagedover300runsandwithproblemsorderedbydifficulty.

classofskillsthatinvolvecolumn-to-columnmoves,whichcausedthelessenedgapbetweenthetwoconditionsaroundthe80thproblem.However,oncetheyhavebeenacquired,theseskillsprovidesomeadvantage,asevidencedbythedownturninthecurveforthelearningsystemonthefarrightofthegraphs.Again,wedetectednosignofautilityproblemastheagentaccumulatesknowledgeinthisdomain.

7.RelatedResearch

ResearchonlearningcognitiveskillsfromproblemsolvinghasalonghistorywithinbothAIandcognitivescience.Forexample,workonexplanation-basedlearningoftenaimedtoimproveef-ficiencyonproblem-solvingtasksandcombinedexperiencewithadomaintheorytocreatenewcognitivestructures.Sometechniquesfocusedontheacquisitionofsearch-controlrulestoguideproblemsolving,butothereffortsdealtinsteadwiththeconstructionofmacro-operatorsfromprimitiveoperators(e.g.,Iba,1988;Mooney,1989;Shavlik,1989).Ourapproachtoskilllearn-ingcomesclosertothesecondparadigm,sincebothinvolvecomposingknowledgeelementsintolargerstructures.However,Icarusadaptsthisideatothecreationofdisjunctiveandevenre-cursiveskillhierarchies,whereastraditionalmethodsemphasizedthecreationof‘fixed-sequence’macro-operatorsthatwerefarlessflexible.

Icarusalsobearssomesimilaritytoothercognitivearchitecturesthatincorporatevarietiesofanalyticalorexplanation-basedlearning.Forexample,Laird,Rosenbloom,andNewell’s(1986)Soarrevolvesaroundaproblemsolverthatproceedsuntilthesystemencountersanimpasse,inwhichcaseitcreatesasubgoaltoresolveit.Thisresolutionmayrequiresearchandtakesometimetoproducetheinformationnecessary.Oncetheimpassehasbeenhandled,Soarcreatesachunkthatencodesageneralizedexplanationoftheresultintermsoftheoriginalgoalcontext.Intermediatestepsfromthesolutionarelost,buttheacquiredchunkletsthesystemsidestepsimilarimpassesinthefuture.

LearningHierarchicalSkillsPage19

Anderson’s(1993)ACT-Remploysarelatedmechanism,calledcompilation,whichcreatesnewproductionrulesfromonesthatareinvolvedinthesamereasoningchain.Thisschemeproducesveryspecificrulesthatreplacevariableswiththedeclarativeelementsagainstwhichtheymatched,ratherthanforminggeneralizedstructures,asdoIcarusandmostothersystemsthatlearnmacro-operatorsorsearch-controlrules.Infact,ourapproachismoreakintothecompositionprocessthatplayedaroleinearlierversionsofACT(Neves&Anderson,1981),thoughthismechanismproducedfixedbehavioralsequencesratherthanflexibleskillhierarchies.

Icarus’closestarchitecturalrelativeisProdigy(Minton,1990),whichinvokesmeans-endsanalysistosolveproblemsandusesananalyticalmethodtolearneithersearch-controlrolesormacro-operatorsfromproblem-solvingtraces.VelosoandCarbonell(1993)alsodescribeanexten-sionthatrecordsthesetracesinmemoryandsolvesnewproblemsbyderivationalanalogywithearlierones.Noneofthesemechanismsgeneratesexplicithierarchicalstructures,butVelosoandCarbonell’sapproachprovidesflexibilitysimilartothatfoundinIcarus,andthetwosystemsrecordandutilizeverysimilarinformationintheirgoalstacks.

Someothersystemssupportlearninginproblem-solvingdomainswithoutmakingstrongar-chitecturalcommitments.RubyandKibler’s(1991)SteppingStonelearnsgeneralizedrulesfordecomposingcomplexproblemsintosimplerones,whichitobtainsthroughmixeduseofexist-ingproblem-reductionrulesandforward-chainingexhaustivesearchwhenitreachesanimpasse.MarsellaandSchmidt’s(1993)systemalsoacquirestask-decompositionrulesthatincorporatepar-tialorderingsamongcomponents.Theirsystemcombinesforwardandbackwardsearchtoidentifycandidatestatepairs,whichinturnproducehypothesizedproblem-reductionrulesthatarerevisedbasedonfurtherexperience.6

PerhapstheclosestrelativetoourapproachisReddyandTadepalli’s(1997)X-Learn,whichacquiresgoal-decompositionrulesfromasequenceoftrainingproblems.Theirsystemdoesnotincludeanexecutionengine,butitgeneratesrecursivehierarchicalplansinamannerthatalsoidentifiesdeclarativegoalswiththeheadsoflearnedclauses.However,becauseitinvokesforward-chainingratherthanbackward-chainingsearchtosolvenewproblems,itreliesonthetrainertodeterminehierarchicalstructure.X-Learnalsousesaquitesophisticatedmixtureofanalyticalandinductivetechniquestodetermineconditionsonskills,ratherthanthemuchsimplermethodthatIcarusincorporates.

AnotherkeydifferencefromX-Learn,PRL,andSteppingstoneisthatIcaruslearnsskillsforuseinreactiveexecutionratherthanforuseinplanning.Therehasbeenotherworkonthistopic,butithasemphasizedtheacquisitionofflatcontrollersratherthanhierarchicalstructures.Forinstance,Benson’s(1995)TRAILlearnsteleoreactivecontrollersforphysicalagents,butitinvokesinductivelogicprogrammingtodeterminerulesforindividualactions.Fernetal.(2004)reportanapproachtolearningreactivecontrollersthattrainsitselfonincreasinglycomplexproblems,butthatalsoacquiresdecisionlistsforactionselection.Khardon(1999)considerstherelatedtaskoflearninghierarchicalcontrollers,buthisformalanalysisassumestheagentisprovidedwithannotatedsamplesolutionsratherthanbeinggeneratedthroughproblemsolving.

Page20LearningHierarchicalSkills

Otherresearchershavebuiltsystemsthatsupportcumulativelearningoutsidethecontextofproblem-solvingtasks.OneearlyexamplewasSammutandBanerji’s(1986)Marvin,whichlearnsincreasinglycomplexlogicalconceptsthatarecomposedofonesithasmasteredpreviously.StoneandVeloso(2000)takeasimilarapproachtolearningconceptsandcontrollersforplayingroboticsoccer,althoughtheirsystemacquiresquitedifferenttypesofstructureateachlevelofdescription.StracuzziandUtgoff’s(2002)STLalgorithmreceivestrainingcasesaboutmanyconceptsinparallel,butitlearnscomplexonesonlywhenithasacquiredsimplerstructuresthatletitmasterthemwithlittleeffort.Pfleger(2004)describesanothersystemthatacquireshierarchicalpatternsinanon-linesetting,inthiscasefromunsuperviseddata.LikeMarvinandSTL,itlearnsconceptualstructuresfromthebottomup,sothatmorecomplexpatternsareapparentaftersimpleroneshavebeenacquired.

8.ConcludingRemarks

Intheprecedingpages,wepresentedIcarus,acognitivearchitectureforphysicalagentsthatusesstoredconceptsandskills,bothorganizedinhierarchies,torecognizefamiliarsituationsandcontrolbehavior.Wedescribedanewmodulethatsupportsmeans-endsproblemsolvingonnoveltasks,alongwithalearningmechanismthatproducesnewskillsfromtracesofproblemsolutions.Thismethodoperatesinanincrementalmanner,creatinghierarchicalstructuresthatrefertootherslearnedearlier.Inaddition,wereportedexperimentswithin-citydriving,theBlocksWorld,andFreeCellthatshowedsuchlearningenablesmoreeffectivebehavioronunfamiliarproblemsthansolvingthemwithonlybasicknowledgeaboutthedomain.

Despitetheseadvances,ourworkonskilllearninginIcarusisstillinitsearlystages.Forinstance,weshoulddemonstrateitsabilitytolearnhierarchicalstructuresbothontraditionalcognitivetaskslikemulti-columnsubtractionandonotherdynamicdomainsthat,likein-citydriving,requiretheintegrationofproblemsolvingwithreactivecontrol.Futureworkondrivingshouldshowthatourmethodsaresufficienttoacquiremorecomplicatedskillsthatinvolveextendedtaskslikepackagedeliveryandcomplexsettingsthatincludeothervehicles.Anotherpromisingclassofdomainsforstudyingskilllearninginvolvestwo-persongameslikechess,whichseemcertaintointroducenewchallengesbecauseoftheirextendedduration.

Inaddition,Icarus’methodsforproblemsolvingandhierarchicallearningwouldbenefitfromnewcapabilities.Wenotedearlierthatthecurrentsystemdoesnotchainbackwardfromthestartconditionsoflearnedskillclauses.Extendingtheproblemsolvertosupportthisabilitywouldmeandefiningnewconceptsthatcharacterizethesituationsinwhichlearnedskillsareapplicable.Thisadditionwouldalsoremedyanotherlimitationofthecurrentsystem,namelyitsinabilitytoaccountfortheoriginofconcepthierarchies,whichitassumesaregiven.Suchanextensionwouldbestraightforwardforsometasks,butotherswillrequiretheabilitytoacquirerecursiveconcepts.Augmentingthesysteminthismannermayalsoleadtoautilityproblem,notduringexecutionoflearnedskillsbutduringtheproblemsolvingusedfortheiracquisition,whichwewouldthenneedtoovercome.

LearningHierarchicalSkillsPage21

Anotherdrawbackisthearchitecture’srelianceonpurelydeductiveinference,whichdiffersmarkedlyfromtheprobabilisticapproachtakenbyitsearliestancestor(Langleyetal.,1991).Futureversionsoftheframeworkshouldextendtherepresentationofconceptsandskillstoin-corporateprobabilities,replacedeductiveprocesseswithabductivemethodsthatmakeplausibledefaultinferences,andaugmentproblemsolvingtooperateoverskillswithuncertainoutcomes.Wehypothesizethatthecurrentmechanismsforlearningthestructureofskillscanbeadaptedeasilytothissetting,butweshouldalsointroducemethodsforestimatingtheprobabilitiesthatannotatethesymbolicstructures.

Weshouldalsonotethat,althoughourapproachlearnsskillsthatgeneralizetosituationswithdifferentnumbersofobjects,itstreatmentofgoalsislessflexible.Forexample,Icaruscanacquireageneralprocedureforclearingablockthatdoesnotdependonthenumberofblocksaboveit,butitcannotlearnaprocedureforconstructingatowerwitharbitrarilyspecifiedcomponents.Extendingthemethod’sabilitytolearnaboutsuchrecursivegoalstructuresisanotherimportantdirectionforfutureresearchthatwillbringthearchitectureintocloseralignmentwiththeabilitiesobservedincomplexhumanlearning.

Acknowledgements

ThisresearchwasfundedinpartbyGrantHR0011-04-1-0008fromDARPAIPTOandbyGrantIIS-0335353fromtheNationalScienceFoundation.DiscussionswithGlennIba,DavidNicholas,StephanieSage,DanShapiro,andJudeShavlikcontributedtomanyideaspresentedhere.

References

Anderson,J.R.(1993).Rulesofthemind.Hillsdale,NJ:LawrenceErlbaum.

Asgharbeygi,N.,Nejati,N.,Langley,P.,&Arai,S.(2005).Guidinginferencethroughrelationalreinforcementlearning.ProceedingsoftheFifteenthInternationalConferenceonInductiveLogicProgramming.Bonn,Germany:Springer.

Benson,S.(1995).Inductionlearningofreactiveactionmodels.ProceedingsoftheTwelfthInter-nationalConferenceonMachineLearning(pp.47–54).SanFrancisco:MorganKaufmann.

Choi,D.,Kaufman,M.,Langley,P.,Nejati,N.,&Shapiro,D.(2004).Anarchitectureforpersis-tentreactivebehavior.ProceedingsoftheThirdInternationalJointConferenceonAutonomousAgentsandMultiAgentSystems(pp.988–995).NewYork:ACMPress.

Choi,D.,&Langley,P.(2005).Learningteleoreactivelogicprogramsfromproblemsolving.Pro-ceedingsoftheFifteenthInternationalConferenceonInductiveLogicProgramming.Bonn,Ger-many:Springer.

Doyle,J.(1979).Atruthmaintenancesystem.ArtificialIntelligence,12,231–272.

Fern,A.,Yoon,S.W.,&Givan,R.(2004).Learningdomain-specificcontrolknowledgefromrandomwalks.ProceedingsoftheFourteenthInternationalConferenceonAutomatedPlanningandScheduling(pp.191–199).Whistler,BC:AAAIPress.

Page22LearningHierarchicalSkills

Forgy,C.L.(1982).Rete:Afastalgorithmforthemanypattern/manyobjectpatternmatchproblem.ArtificialIntelligence,19,17–37.

Jones,R.M.,&Langley,P.(inpress).Aconstrainedarchitectureforlearningandproblemsolving.ComputationalIntelligence.

Iba,G.A.(1989).Aheuristicapproachtothediscoveryofmacro-operators.MachineLearning,3,285–317.

Ilghami,O.,Nau,D.S.,Mu˜noz-Avila,H.,&Aha,D.W.(2002).CaMeL:Learningmethodpre-conditionsforHTNplanning.ProceedingsoftheSixthInternationalConferenceonAIPlanningandScheduling(pp.131–14).Toulouse,France.

Khardon,R.(1999).Learningtotakeactions.MachineLearning,35,57–90.

Laird,J.E.,Rosenbloom,P.S.,&Newell,A.(1986).ChunkinginSoar:Theanatomyofagenerallearningmechanism.MachineLearning,1,11–46.

Langley,P.,Cummings,K.,&Shapiro,D.(2004).Hierarchicalskillsandcognitivearchitectures.ProceedingsoftheTwenty-SixthAnnualConferenceoftheCognitiveScienceSociety(pp.779–784).Chicago,IL.

Langley,P.,McKusick,K.B.,Allen,J.A.,Iba,W.F.,&Thompson,K.(1991).AdesignfortheIcarusarchitecture.SIGARTBulletin,2,104–109.

Langley,P.,&Stromsten,S.(2000).Learningcontext-freegrammarswithasimplicitybias.Pro-ceedingsoftheEleventhEuropeanConferenceonMachineLearning(pp.220–228).Barcelona:Springer-Verlag.

Marsella,S.,&Schmidt,C.F.(1993).Amethodforbiasingthelearningofnonterminalreductionrules.InS.Minton(Ed.),Machinelearningmethodsforplanning.SanMateo,CA:MorganKaufmann.

Minton,S.N.(1990).Quantitativeresultsconcerningtheutilityofexplanation-basedlearning.ArtificialIntelligence,42,363–391.

Mooney,R.J.(1989).Theeffectofruleuseontheutilityofexplanation-basedlearning.ProceedingsoftheEleventhInternationalJointConferenceonArtificialIntelligence(pp.725–730).Detroit:MorganKaufmann.

Nardi,D.,&Brachman,R.J.(2002).Anintroductiontodescriptionlogics.InF.Baaderetal.(Eds.),Descriptionlogichandbook.Cambridge:CambridgeUniversityPress.

Newell,A.(1990).Unifiedtheoriesofcognition.Cambridge,MA:HarvardUniversityPress.

Newell,A.,Shaw,J.C.,&Simon,H.A.(1960).Reportonageneralproblem-solvingprogramforacomputer.InformationProcessing:ProceedingsoftheInternationalConferenceonInformationProcessing(pp.256–264).UNESCOHouse,Paris.

Nilsson,N.(1994).Teleoreactiveprogramsforagentcontrol.JournalofArtificialIntelligenceResearch,1,139–158.

Pfleger,K.(2004).On-linecumulativelearningofhierarchicalsparsen-grams.ProceedingsoftheThirdInternationalConferenceonDevelopmentandLearning.SanDiego,CA:IEEEPress.Reddy,C.,&Tadepalli,P.(1997).Learninggoal-decompositionrulesusingexercises.ProceedingsoftheFourteenthInternationalConferenceonMachineLearning(pp.278–286).SanFrancisco:MorganKaufmann.

LearningHierarchicalSkillsPage23

Richman,H.B.,Staszewski,J.J.,&Simon,H.A.(1995).SimulationofexpertmemoryusingEPAMIV.PsychologicalReview,102,305–330.

Ruby,D.,&Kibler,D.(1991).SteppingStone:Anempiricalandanalyticalevaluation.ProceedingsoftheTenthNationalConferenceonArtificialIntelligence(pp.527–532).MenloPark,CA:AAAIPress.

Sammut,C.,&Banerji,R.B.(1986).Learningconceptsbyaskingquestions.InR.S.Michalski,J.G.Carbonell,&T.M.Mitchell(Eds.),Machinelearning:Anartificialintelligenceapproach(Vol.2).LosAltos,CA:MorganKaufmann.

Shapiro,D.,Langley,P.,&Shachter,R.(2001).Usingbackgroundknowledgetospeedrein-forcementlearninginphysicalagents.ProceedingsoftheFifthInternationalConferenceonAu-tonomousAgents(pp.254–261).Montreal:ACMPress.

Shavlik,J.W.(1989).Acquiringrecursiveconceptswithexplanation-basedlearning.ProceedingsoftheEleventhInternationalJointConferenceonArtificialIntelligence(pp.688–693).Detroit,MI:MorganKaufmann.

Stone,P.,&Veloso,M.M.(2000).Layeredlearning.ProceedingsoftheEleventhEuropeanCon-ferenceonMachineLearning(pp.369–381).Barcelona:Springer-Verlag.

Utgoff,P.,&Stracuzzi,D.(2002).Many-layeredlearning.ProceedingsoftheSecondInternationalConferenceonDevelopmentandLearning(pp.141–146).

Veloso,M.M.,&Carbonell,J.G.(1993).DerivationalanalogyinProdigy:Automatingcaseacquisition,storage,andutilization.MachineLearning,10,249–278.

Page24LearningHierarchicalSkills

Appendix:ConceptsandSkillsProvidedinExperiments

Table5.ConceptsandskillsprovidedtoIcarusforthein-citydrivingdomain,withitalicsdenotingthegoalconcept

andparenthesesindicatingthenumberofclausesfordisjunctiveskills.

primitiveconcepts(15)

parked

aligned-with-lane-in-segmentcentered-in-lane

steering-wheel-not-straightdriving-in-segmentat-speed-for-right-turnready-for-right-turnin-leftmost-lanelane-to-rightlane-to-left

in-rightmost-lanein-right-turn-lane

off-centered-to-right-in-segmentoff-centered-to-left-in-segmentbuilding-on-rightbuilding-on-leftcurrent-building

start-aligned-with-lane-in-segmentstart-centered-in-lane-1start-centered-in-lane-2

start-adjust-speed-for-cruisestart-cruise-within-segmentstart-change-lane-to-rightstart-change-lane-to-leftstart-in-lane-1start-in-lane-2

primitiveskills(19)

LearningHierarchicalSkillsPage25

Table6.ConceptsandskillsprovidedtoIcarusfortheBlocksWorld,withgoalconceptsinitalics.

primitiveconcepts(4)

clear

three-tower

two-tower-one-on-tableunstackablepickupablestackableputdownable

primitiveskills(4)

nonprimitiveconcepts(14)

starthomesuccessorcolcolpairavailable-cellavailable-columnclearon

bottomincellhome

column-to-homecolumn-to-newhomecolumn-to-freecelllastcolumn-to-homelastcolumn-to-newhomelastcolumn-to-freecellfreecell-to-homefreecell-to-newhomefreecell-to-columncolumn-to-columnfreecell-to-new-columncolumn-to-new-column

因篇幅问题不能全部显示,请点此查看更多更全内容