USRE46186E1 - Information processing apparatus, information processing method, and computer program for controlling state transition - Google Patents
Information processing apparatus, information processing method, and computer program for controlling state transition Download PDFInfo
- Publication number
- USRE46186E1 USRE46186E1 US13/927,708 US201313927708A USRE46186E US RE46186 E1 USRE46186 E1 US RE46186E1 US 201313927708 A US201313927708 A US 201313927708A US RE46186 E USRE46186 E US RE46186E
- Authority
- US
- United States
- Prior art keywords
- state
- state transition
- unit
- learning
- hmm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
- 230000007704 transition Effects 0.000 title claims abstract description 587
- 230000010365 information processing Effects 0.000 title claims abstract description 58
- 238000003672 processing method Methods 0.000 title claims description 6
- 238000004590 computer program Methods 0.000 title description 6
- 230000009471 action Effects 0.000 claims abstract description 80
- 238000012545 processing Methods 0.000 claims description 325
- 230000006399 behavior Effects 0.000 claims description 105
- 238000000034 method Methods 0.000 claims description 61
- 238000012217 deletion Methods 0.000 claims description 34
- 230000037430 deletion Effects 0.000 claims description 34
- 238000005070 sampling Methods 0.000 claims description 24
- 238000011156 evaluation Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 description 141
- 238000010586 diagram Methods 0.000 description 91
- 230000006870 function Effects 0.000 description 49
- 239000003795 chemical substances by application Substances 0.000 description 24
- 238000004422 calculation algorithm Methods 0.000 description 15
- 238000010606 normalization Methods 0.000 description 15
- 230000003287 optical effect Effects 0.000 description 15
- 238000007476 Maximum Likelihood Methods 0.000 description 13
- 230000002457 bidirectional effect Effects 0.000 description 13
- 230000008859 change Effects 0.000 description 13
- 238000013507 mapping Methods 0.000 description 12
- 230000002787 reinforcement Effects 0.000 description 10
- 238000004088 simulation Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 230000008447 perception Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 102220517255 Paladin_S84C_mutation Human genes 0.000 description 4
- 239000006185 dispersion Substances 0.000 description 4
- 239000004065 semiconductor Substances 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000001364 causal effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 239000011800 void material Substances 0.000 description 3
- 108010076504 Protein Sorting Signals Proteins 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 2
- 239000010931 gold Substances 0.000 description 2
- 229910052737 gold Inorganic materials 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 102200084943 rs121434591 Human genes 0.000 description 2
- 102200028553 rs61754445 Human genes 0.000 description 2
- 102220184129 rs778408190 Human genes 0.000 description 2
- 102220093748 rs876661263 Human genes 0.000 description 2
- 241000899771 Arenga undulatifolia Species 0.000 description 1
- 102220471249 M-phase inducer phosphatase 1_S82A_mutation Human genes 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003245 coal Substances 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000000803 paradoxical effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 102220085619 rs140512665 Human genes 0.000 description 1
- 102220087761 rs202246859 Human genes 0.000 description 1
- 102200101361 rs35731153 Human genes 0.000 description 1
- 230000015541 sensory perception of touch Effects 0.000 description 1
- 238000013517 stratification Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/04—Programme control other than numerical control, i.e. in sequence controllers or logic controllers
- G05B19/045—Programme control other than numerical control, i.e. in sequence controllers or logic controllers using logic state machines, consisting only of a memory or a programmable logic device containing the logic for the controlled machine and in which the state of its outputs is dependent on the state of its inputs or part of its own output states, e.g. binary decision controllers, finite state controllers
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G06N99/005—
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/20—Pc systems
- G05B2219/23—Pc programming
- G05B2219/23288—Adaptive states; learning transitions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
Definitions
- the present invention relates to an information processing apparatus, an information processing method, and a computer program, and, more particularly to an information processing apparatus, an information processing method, and a computer program that can self-organize an internal state to create an environment model.
- the reinforcement learning means a method of mechanical learning for autonomously acquiring an optimum behavior on the basis of actual experiences and returns.
- Mechanical learning for learning by trial and error, relying only on returns from an environment, a control method for attaining the returns is referred to as reinforcement learning in a broad sense (see, for example, “Reinforcement Learning” Richard S. Sutton, Andrew G. Barto, translated by Sadayoshi Mikami and Masaaki Minakawa, Morikita Publishing.
- the reinforcement learning have been applied to various Markov decision problems having finite numbers of states and finite numbers of behaviors such as acquisition of strategies in games and achieved successes.
- an information processing apparatus including: model learning means for self-organizing, on the basis of state transition model having a state and state transition to be learned by using time series data as data in time series, an internal state from an observation signal obtained by a sensor; and controller learning means for performing learning for allocating a controller, which outputs an action, to each of transitions of a state or each of transition destination states in the state transition model indicating the internal state self-organized by the model learning means.
- the information processing apparatus further includes: planning means for planning a path for attaining a target as a transition sequence of a state on the state transition model indicating the internal state self-organized by the model learning means; and execution managing means for invoking, for each of transitions included in the path planned by the planning means, the controller allocated by the controller learning means to manage execution of an action along the path.
- the model learning means self-organizes, independently for each of plural modals, an internal state from an observation signal obtained by a sensor of a modal corresponding thereto on the basis of state transition models.
- the information processing apparatus further includes causality means for estimating causality of transition in one state transition model and a state of another state transition model among the state transition models for each of the plural modals respectively indicating the internal state self-organized by the model learning means.
- the execution managing means causes, when it is difficult to directly control an internal state of a predetermined modal among the plural modals respectively indicating the internal state self-organized by the model learning means, the planning means to recursively execute planning to control the internal state on the basis of the causality estimated by the causality means.
- the information processing apparatus further includes setting means for spontaneously setting a target from the internal state self-organized by the model learning means.
- the controller learning means, the planning means, and the execution managing means execute respective kinds of processing to realize the target spontaneously set by the setting means.
- an internal state is self-organized from an observation signal obtained by a sensor on the basis of a Markov model.
- learning for allocating a controller, which outputs an action is allocated to each of transitions of a state.
- the information processing apparatus and the like can self-organize an internal state to create an environment model.
- FIGS. 1A and 1B are diagrams for explaining an overview of processing according to an embodiment of the present invention.
- FIGS. 2A and 2B are diagrams for explaining the overview of the processing according to the embodiment.
- FIG. 3 is a functional block diagram of an information processing system according to the embodiment.
- FIG. 4 is diagram for explaining a simple pendulum task
- FIG. 5 is a flowchart for explaining an example of processing for controlling the simple pendulum task
- FIG. 6 is a diagram of an example of a time series observation signal
- FIG. 7 is a diagram of an example of an HMM
- FIG. 8 is a diagram of an example of the HMM
- FIGS. 9A and 9B are diagrams of an example of an HMM
- FIGS. 10A to 10C are diagrams of examples of HMMs
- FIG. 11 is a diagram of an example of a learning result of an HMM in the simple pendulum task
- FIG. 12 is a flowchart for explaining a detailed example of recognition processing shown in FIG. 5 ;
- FIG. 13 is a flowchart for explaining a detailed example of the recognition processing shown in FIG. 5 ;
- FIG. 14 is a flowchart for explaining a detailed example of the recognition processing shown in FIG. 5 ;
- FIG. 15 is a functional block diagram of an information processing system according to an embodiment of the present invention.
- FIG. 16 is a diagram of an example of display of a simulator applicable to a multi-modal task
- FIG. 17 is a diagram of an example of an observation signal of a multi-modal sensor
- FIGS. 18A to 18C are diagrams of examples of learning results of HMMs in respective modals in the multi-modal task
- FIG. 19 is a diagram for explaining an example of a path and control of an HMM for distance
- FIG. 20 is a diagram for explaining an overview of causality estimation
- FIG. 21 is a diagram for explaining an overview of causality estimation
- FIG. 22 is a diagram for explaining an example of multi-stage behavior control in the multi-modal task
- FIG. 23 is a diagram for explaining an example of a path and control of an HMM for light
- FIG. 24 is a diagram for explaining an example of multi-stage behavior control for causality in the multi-modal task
- FIG. 25 is a block diagram of a configuration example of a personal computer as an information processing apparatus according to an embodiment of the present invention.
- FIG. 26 is a diagram for explaining an overview of a configuration example of a data processing apparatus according to an embodiment of the present invention.
- FIG. 27 is a diagram of an example of an Ergodic HMM
- FIG. 28 is a diagram of an HMM of a left-to-right type
- FIG. 29 is a block diagram of a detailed configuration example of the data processing apparatus.
- FIGS. 30A and 30B are diagrams of an example of an initial structure of an HMM set by an initial-structure setting unit 116 ;
- FIGS. 31A and 31B are diagrams for explaining division of a state
- FIGS. 32A and 32B are diagrams for explaining merging of a state
- FIGS. 33A and 33B are diagrams for explaining addition of a state
- FIGS. 34A and 34B are diagrams for explaining addition of state transition
- FIGS. 35A and 35B are diagrams for explaining deletion of a state
- FIG. 36 is a flowchart for explaining learning processing by the data processing apparatus
- FIG. 37 is a flowchart for explaining processing by a structure adjusting unit 117 ;
- FIGS. 38A and 38B are diagrams of moving loci used in simulation
- FIGS. 39A to 39C are diagrams of HMMs obtained as a result of learning
- FIG. 40 is a graph of logarithmic likelihood calculated from an HMM obtained as a result of learning
- FIG. 41 is a block diagram of a configuration example of a computer according to an embodiment of the present invention.
- FIG. 42 is a diagram of a functional configuration example of an information processing apparatus
- FIG. 43 is a flowchart for explaining processing concerning causality perception of the information processing apparatus.
- FIG. 44 is a diagram of an example of modals
- FIG. 45 is a diagram of a specific example of the modals.
- FIG. 46 is a diagram of an example of a change with time in a state of a system
- FIGS. 47A to 47D are diagrams of examples of event occurrence counters
- FIGS. 48A to 48C are diagrams of examples of transition occurrence counters prepared in association with respective state transitions of a modal 1;
- FIG. 49 is a diagram of an example of state transitions of the modal 1;
- FIGS. 50A to 50C are diagrams of examples of transition occurrence counters prepared in association with respective state transitions of a modal 2;
- FIG. 51 is a diagram of an example of state transitions of the modal 2;
- FIGS. 52A to 52C are diagrams of examples of transition occurrence counters prepared in association with respective state transitions of a modal 3;
- FIG. 53 is a diagram of an example of state transitions of the modal 3;
- FIGS. 54A and 54B are diagrams of examples of an event occurrence counter and a transition occurrence counter that perform count-up;
- FIG. 55 is a diagram of an example of the event occurrence counter that performs count-up
- FIGS. 56A and 56B are diagrams of another example of the event occurrence counter and the transition occurrence counter that perform count-up;
- FIGS. 57A and 57B are diagrams of still another example of the event occurrence counter and the transition occurrence counter that perform count-up;
- FIGS. 58A and 58B are diagrams of still another example of the event occurrence counter and the transition occurrence counter that perform count-up;
- FIGS. 59a to 59AC are diagrams of examples of state vector patterns
- FIGS. 60A to 60AC are diagrams of examples of state vectors
- FIGS. 61A and 61B are diagrams of other examples of the state vectors.
- FIG. 62 is a flowchart for explaining behavior determination processing by the information processing apparatus
- FIG. 63 is a diagram of an example of a behavior based on a causal relation
- FIG. 64 is a diagram of another example of the behavior based on the causal relation
- FIG. 65 is a graph of an example of measurement results.
- FIG. 66 is a diagram of a configuration example of a computer.
- FIGS. 1A, 1B and 2 First, an overview of processing according to an embodiment of the present invention is explained with reference to FIGS. 1A, 1B and 2 .
- a target system or agent automatically constructs a model of an external environment on the basis of a sensor signal for observation (hereinafter referred to as observation signal) and an action signal of an action taken by the system or the agent.
- observation signal a sensor signal for observation
- action signal an action taken by the system or the agent.
- the system or the agent freely generates an intellectual behavior for realizing the automatic construction of the model and realizing an arbitrary state on an internally-perceived model.
- the “agent” indicates an autonomous entity that can perceives (e.g., senses) a state of an environment and select a behavior on the basis of a perceived content.
- the system is used as an operation entity.
- HMM Hidden Markov Model
- a model for obtaining an action signal from an observation signal is created.
- the system constructs an HMM only from an observation signal.
- the system analyzes a relation between respective state transitions (hereinafter abbreviated as transitions as appropriate) of the constructed HMM and a behavior performed by the system (an action signal). Consequently, relations between sensor signals necessary for the respective transitions and action signals are learned as controllers.
- a target state in an example in FIG. 2B , a state F
- the system calculates a transition sequence from a present state (in the example in FIG. 2B , a state A) to the target state (in the example in FIG. 2B , a transition sequence indicated by a bold line arrow).
- a transition sequence is hereinafter referred to as path as appropriate.
- the calculation of such a path is hereinafter referred to as planning.
- the system can realize an arbitrary state by invoking controllers necessary for respective transitions included in the path.
- FIG. 3 is a functional block diagram of a functional configuration example of the information processing system according to this embodiment (hereinafter simply referred to as system shown in FIG. 3 ).
- the system shown in FIG. 3 includes a sensor unit 21 , a modeling unit 22 , an innateness controller 23 , a behavior control unit 24 , and an action unit 25 .
- the sensor unit 21 observes a predetermined physical amount of an environment in which an agent is placed and provides the modeling unit 22 with a result of the observation as an observation signal.
- the modeling unit 22 includes a learning unit 31 , an HMM storing unit 32 , a recognizing unit 33 , and a planning unit 34 .
- the learning unit 31 constructs an HMM using the observation signal of the sensor unit 21 (see FIG. 1B ) and stores the HMM in the HMM storing unit 32 .
- the recognizing unit 33 estimates, when the action unit 25 explained later behaves, respective transitions up to a present state (a present situation) using the HMM stored in the HMM storing unit 32 and an observation signal sequence of the sensor unit 21 . An estimation result of the recognizing unit 33 is provided to the behavior control unit 24 .
- the planning unit 34 plans (calculates) an optimum path from the present state toward a target state using the HMM stored in the HMM storing unit 32 and provides the behavior control unit 24 with the optimum path (see FIG. 2B ).
- the target state means a state given to the behavior control unit 24 as a target.
- the target state is provided from the behavior control unit 24 to the modeling unit 22 .
- the innateness controller 23 issues, on the basis of a predetermined innateness rule, various commands for learning of a learning unit 41 of the behavior control unit 24 explained later and provides the learning unit 41 and the action unit 25 with the commands.
- the behavior control unit 24 includes the learning unit 41 , a controller-table storing unit 42 , a controller storing unit 43 , and an execution managing unit 44 .
- the learning unit 41 learns a controller for each of transitions using respective transitions, which are recognized by the recognizing unit 33 on the basis of a behavior result of the action unit 25 conforming to a command from the innateness controller 23 , and the command from the innateness controller 23 (see FIG. 2A ).
- the learning unit 41 stores respective controllers in the controller storing unit 43 .
- the learning unit 41 stores relations between the respective controllers and the transitions in the controller-table storing unit 42 . Details of the controller are explained later.
- the execution managing unit 44 generates a command for the action unit 25 such that the action unit 25 behaves along a path provided from the planning unit 34 , i.e., realizes respective transitions in the path.
- the execution managing unit 44 provides the action unit 25 with the command. This command is inversely generated on the basis of information stored in the controller-table storing unit 42 and the controller storing unit 43 . Details of processing by the execution managing unit 44 are explained later.
- angular velocity ⁇ is given in addition to the angle ⁇ (i.e., two variables are given) as an observation signal.
- a target a target that the simple pendulum 51 swings up, i.e., the angle ⁇ reaches 180° is given.
- a target function for attaining the target for example, a target function that a return is given when the angle ⁇ reaches 180° or a higher value is outputted as the angle ⁇ is closer to 180° is designed and given.
- one of targets of the system shown in FIG. 3 is to realize an agent that can autonomously solve various tasks regardless of this simple pendulum task. Therefore, a limitation that only the angle ⁇ , which is a part of a state, can be observed is applied to the system shown in FIG. 3 .
- Another one of the targets of the system shown in FIG. 3 is to realize an arbitrary internal state rather than giving a target function. Therefore, the system shown in FIG. 3 does not need a target function dependent on a task of swing-up.
- FIG. 5 is a flowchart for explaining an example of processing performed by the system shown in FIG. 3 to attain the simple pendulum task (hereinafter referred to as control processing for the simple pendulum task as appropriate).
- step S 1 the system shown in FIG. 3 executes learning processing for an HMM.
- step S 2 the system shown in FIG. 3 executes recognition processing.
- step S 3 the system shown in FIG. 3 executes controller learning processing.
- step S 4 the system shown in FIG. 3 executes planning processing.
- step S 5 the system shown in FIG. 3 executes behavior control processing.
- step S 1 the learning processing for an HMM in step S 1 is explained.
- the action unit 25 In an initial state, the action unit 25 outputs a control signal ⁇ generated at random or a control signal ⁇ obtained by adding a proper perturbation to a pattern innately embedded in advance. Such a control signal ⁇ is generated by the action unit 25 on the basis of, for example, a command given by the innateness controller 23 .
- time series observation signal A time series of the observation signal ⁇ outputted from the sensor unit 21 during this period (hereinafter referred to as time series observation signal) is stored on a not-shown memory of the learning unit 31 .
- a signal 52 shown in FIG. 6 is an example of the time series observation signal.
- the learning unit 31 learns these time series observation signals to construct an HMM and stores the HMM in the HMM storing unit 32 .
- HMMs In the learning processing for an HMM, in general, a Baum-Welch algorithm is used. Examples of HMMs applicable to such an algorithm are shown in FIG. 7 to FIGS. 10A to 10C .
- the system is caused to learn a totally-connected HMM shown in FIG. 7 without any limitation, the HMM converges to local minimum depending on an initial value of a parameter. This makes it difficult to learn the HMM.
- FIGS. 9A and 9B and FIGS. 10A to 10C which are examples of a sparsely connected HMM.
- the HMM shown in FIGS. 9A and 9B is a two-dimensional neighborhood restricted HMM.
- the HMM shown in FIG. 10A is an HMM by three-dimensional grid restriction.
- the HMM shown in FIG. 10B is an HMM by two-dimensional random arrangement restriction.
- the HMM shown in FIG. 10C is an HMM by a small world network.
- a display example of a result obtained by giving, in the simple pendulum task, a two-dimensional neighborhood restricted HMM with 484 nodes to the system as an initial structure and causing the system to learn a time series observation signal is shown in FIG. 11 .
- the abscissa of FIG. H indicates the angle ⁇ of the simple pendulum 51 as an observation signal.
- the ordinate of FIG. H indicates the angular velocity ⁇ of the simple pendulum 51 .
- circles indicate nodes (states).
- a solid line between two circles indicates connection (transition) between two nodes.
- respective nodes are plotted as circles on a ( ⁇ , ⁇ ) space on the basis of an average of true states ( ⁇ , ⁇ ) of an environment at the time of nodes perceived by the system (the agent) shown in FIG. 3 .
- connections among the nodes only connections having transition probabilities equal to or larger than 0.01 are displayed as solid lines.
- step S 1 The learning processing for an HMM in step S 1 is explained above. Subsequently, the recognition processing in step S 2 is explained below.
- the recognition processing is processing for estimating a present state of the system shown in FIG. 3 using the HMM constructed by the learning processing for an HMM in step S 1 .
- the recognition processing is executed by the recognizing unit 33 .
- a result of the recognition processing is used for the controller learning processing in step S 3 explained later.
- the recognition processing is executed as one kind of processing of the behavior control processing in step S 5 explained later separately from the processing in step S 2 (see step S 61 in FIG. 14 ).
- the Viterbi Algorithm is widely used for state estimation for an HMM. Therefore, in this embodiment, it is assumed that the recognition processing is executed as follows: a state fifty steps before the present is set undefined, i.e., probabilities of respective nodes are set equal, the state fifty steps before the present is set as an initial state, observation results for fifty steps are given, and states in the respective steps are decided by the Viterbi Algorithm to estimate a state of the last fiftieth step, i.e., a present state.
- the recognition processing is executed according to a flowchart of FIG. 12 .
- a transition probability from a node “i” to a node “j” is described as aij or Aij.
- An initial state probability is described as ⁇ i.
- An observation value (a level of an observation signal) at time t is described as o(t).
- Likelihood of the observation value o(t) at the node “i” is referred to as observation likelihood and described as bi(o(t)).
- Present time is described as T.
- step S 21 the recognizing unit 33 sets the time t to 0.
- step S 22 the recognizing unit 33 multiplies the initial state probability ⁇ i with observation likelihood b(0(0)) and sets the initial state probability ⁇ i in the respective nodes.
- step S 23 the recognizing unit 33 multiplies a state probability at time t with the transition probability Aij and observation likelihood b(0(t+1)) and updates a maximum probability in the node “j” at a transition destination to a state probability of the node “j”.
- step S 24 the recognizing unit 33 stores the node “i” at a transition source at that point in a storage table.
- a constructing location for the storage table is not specifically limited. In this embodiment, for example, it is assumed that the storage table is constructed in the inside of the recognizing unit 33 .
- step S 26 the recognizing unit 33 determines whether the time t has reached the present time T.
- step S 26 determines in step S 26 that the time t has not reached the present time T (NO in step S 26 ). The processing is returned to step S 23 and the processing in step S 23 and subsequent steps is repeated.
- the recognizing unit 33 determines in step S 26 that the time t has reached the present time T (YES in step S 26 ). The processing proceeds to step S 27 .
- step S 27 the recognizing unit 33 selects a node having a maximum state probability among state probabilities at time t and sets the node as a decided node at time t.
- the processing in step S 27 immediately after it is determined as YES in the processing in step S 26 that the time t has reached the present time T, since the time t is the present time T, a decided node at the present time T is obtained.
- step S 28 the recognizing unit 33 extracts the node “i” at the transition source of the node “j” selected in the processing in step S 27 from the storage table and sets the node “i” as a node at time t ⁇ 1.
- step S 30 the recognizing unit 33 determines whether the time t is 0.
- step S 30 determines in step S 30 that the time t is not 0 (NO in step S 30 ). The processing is returned to step S 27 and the processing in step S 27 and subsequent steps is repeated.
- the recognizing unit 33 determines in step S 30 that the time t is 0 (YES in step S 30 ). The recognition processing is finished.
- step S 2 The recognition processing in step S 2 is explained above. Subsequently, the controller learning processing in step S 3 is explained below.
- the node “i” indicating a state at every time is determined.
- the transition probability Aij from the node “i” to the node “j” indicating a state at the next time is also determined.
- the transition probability Aij is referred to as transition edge Aij as appropriate.
- the upper-case letter “A” is used in such a manner as transition probability Aij (transition edge Aij) in the explanation of the controller learning processing. This is for the purpose of preventing confusion with the lower-case letter “a” in an action a(t) explained later.
- the system shown in FIG. 3 performs some random or innate behavior as explained above.
- a behavior performed by the system shown in FIG. 3 in a state “i” during the innate behavior is referred to as action a(t).
- the action a(t) is abbreviated as action “a” as appropriate.
- a causality model in which the transition edge Aij is caused by the action “a” holds.
- the learning unit 41 of the behavior control unit 24 samples, with respect to each of caused transition edges Aij, the observation value o(t) (hereinafter abbreviated as observation value “o”) and the action “a” at the point when the transition edge Aij is caused.
- observation value “o” the observation value o(t)
- a learning method for this function mapping Fij( ) for example, a method like a neural network can be adopted.
- a learning method for the function mapping Fij( ) for outputting an average of the action “a” regardless of the observation value “o” can be adopted.
- Such a function mapping Fij( ) is stored in the controller storing unit 43 as a controller to be executed by the action unit 25 .
- a learning result of the controller i.e., information indicating, for each of the transition edges Aij, which controller (the function mapping Fij( )) corresponds to the transition edge Aij is stored in the controller-table storing unit 42 in a table format.
- Such a table is referred to as controller table.
- an identifier (ID) for uniquely specifying each of the controllers (function mappings Fij( )) is given to the controller.
- ID an identifier for uniquely specifying each of the controllers (function mappings Fij( ))
- an ID of the controller can be adopted. Therefore, in this embodiment, for each of the transition edges Aij, an ID of the controller (the function mapping Fij( )) corresponding to the transition edge Aij is stored in the controller table 42 .
- each of the controllers (the function mappings Fij( )) is stored to be tied to an ID thereof. An example of a method of using the ID is referred to in explanation of step S 70 in FIG. 14 .
- step S 3 the processing for performing learning for allocating a controller for outputting an action to each of transitions of a state is explained as an example.
- the controller learning processing according to this embodiment besides the example explained above, for example, processing for performing learning for allocating a controller for outputting an action to each of transition destination states can also be adopted.
- step S 4 The planning processing in step S 4 is explained below.
- the system shown in FIG. 3 finishes the learning.
- the system can set an arbitrary target in an internal state formed by the system using an HMM and perform behavior for realizing the attainment of the target.
- the planning unit 34 sets up a plan (planning) for realizing the attainment of the target. Processing for setting up such a plan is the planning processing in step S 4 .
- the planning unit 34 sets, as a goal, a target designated from the outside or endogenously obtained in the system.
- the target is provided from the execution managing unit 44 .
- a node indicating a state of the goal is referred to as goal node “g”.
- the planning unit 34 searches for a path connecting these two nodes on an HMM. Processing for searching for such a path from the present state node “i” to the goal node “g” is the planning processing in step S 4 .
- FIG. 13 is a flowchart for explaining an example of the planning processing.
- step S 41 the planning unit 34 sets a state probability of the present state node “i” to 1.0 and sets a state probability of the other nodes to 0.
- the planning unit 34 sets the time t to 0.
- step S 42 the planning unit 34 sets the transition probabilities Aij equal to or higher than a threshold (0.01) to 0.9 and sets the other transition probabilities Aij to 0.
- step S 43 the planning unit 34 multiplies the state probability at time t with the transition probability Aij and updates a maximum probability in the node “j” at the transition destination to a state probability of the node “j”.
- step S 44 the planning unit 34 stores the node “i” at the transition source at that point in the storage table.
- a constructing location for the storage table is not specifically limited. In this embodiment, for example, it is assumed that the storage table is constructed in the planning unit 34 .
- step S 45 the planning unit 34 determines whether a state probability of the goal node “g” as the target has exceeded 0.
- step S 45 the state probability of the goal node “g” has not exceeded 0 (NO in step S 45 ). The processing proceeds to step S 46 .
- step S 46 the planning unit 34 determines whether loop processing from step S 43 to step S 47 has been repeated N times.
- the N times repetition means that the state probability has not reached the target yet even if the steps are repeated N times. Therefore, in such a case, i.e., when the planning unit 34 determines in step S 46 that the loop processing has been repeated N times (YES in step S 46 ), the planning processing is finished on the assumption that the planning unit 34 has given up the planning.
- step S 46 determines in step S 46 that the loop processing has not been repeated N times (NO in step S 46 ). The processing proceeds to step S 47 .
- step S 43 The loop processing from step S 43 to step S 47 is repeated several times in this way.
- the planning unit 34 determines in step S 45 that the state probability of the goal node “g” has exceeded 0 (YES in step S 45 ).
- the processing proceeds to step S 48 .
- step S 48 the planning unit 34 selects the goal node “g”.
- step S 49 the planning unit 34 sets the goal node “g” equal to the node “j”.
- step S 50 the planning unit 34 extracts the node “i” at the transition source of the selected node “j” from the storage table and sets the node “i” as a node at time t ⁇ 1.
- step S 51 the planning unit 34 decrements the time t by 1.
- step S 52 the planning unit 34 determines whether the time t is 0.
- step S 52 determines in step S 52 that the time t is not 0 (NO in step S 52 ).
- the processing proceeds to step S 53 .
- step S 53 the planning unit 34 sets the node “j” equal to the node “i”. Thereafter, the processing is returned to step S 50 and the processing in step S 50 and subsequent steps is repeated.
- step S 50 Loop processing from step S 50 to step S 53 is repeated until the time t reaches 0.
- the planning unit 34 determines in step S 52 that the time t is 0 (YES in step S 52 ).
- the planning processing is finished.
- a node sequence formed at this point i.e., a node sequence from the present state node “i” to the goal node “g” is decided as a path.
- step S 4 The planning processing in step S 4 is explained above. Subsequently, the behavior control processing in step S 5 is explained below.
- FIG. 14 is a flowchart for explaining an example of the behavior control processing by the behavior control unit 24 , i.e., processing of behavior control by the behavior control unit 24 on the basis of the path (the node sequence) calculated in the processing in step S 4 .
- step S 61 the execution managing unit 44 of the behavior control unit 24 performs recognition processing for an HMM and selects a node having a highest state probability among all the nodes as a node i_max.
- processing conforming to the flowchart of the example shown in FIG. 12 is executed as the recognition processing for an HMM.
- an operation entity of the recognition processing for an HMM is the execution managing unit 44 .
- the operation entity is the recognizing unit 33 .
- the recognizing unit 33 performs the recognition processing for an HMM.
- the execution managing unit 44 selects the node i_max on the basis of a result of the processing.
- step S 62 the execution managing unit 44 selects, as the present node i_pathmax, a node having a highest state probability between the last node i_pathmax and the goal node among the nodes on the path.
- step S 63 the execution managing unit 44 determines whether a ratio of state probabilities P(i_max) and P(i_pathmax) is equal to or smaller than a threshold (e.g., equal to or smaller than 0.7).
- the state probability P (i_max) indicates a state probability of the node i_max.
- the state probability P(i_pathmax) indicates a state probability of the node i_pathmax.
- step S 63 determines in step S 63 that the ratio is equal to or smaller than the threshold (YES in step S 63 ). The behavior control processing is finished.
- step S 63 determines in step S 63 that the ratio is not equal to or smaller than the threshold (NO in step S 63 ). The processing proceeds to step S 64 .
- step S 64 the execution managing unit 44 determines whether the system stays in the same node i_pathmax, i.e., whether the node i_pathmax selected in the present processing in S 62 and the node i_pathmax selected in the last processing in step S 62 are the same.
- step S 64 determines in step S 64 that the system does not stay in the same node i_pathmax (NO in step S 64 ). The processing proceeds to step S 68 . Processing in step S 68 and subsequent steps is explained later.
- step S 64 determines in step S 64 that the system stays in the same node i_pathmax (YES in step S 64 ). The processing proceeds to step S 65 .
- step S 65 the execution managing unit 44 determines whether a state probability of the next node i_next on the path rises to be higher than the last state probability.
- step S 65 When the state probability of the next node i_next does not rise, assuming that the system is not transitioning along the path, the execution managing unit 44 determines in step S 65 that the state probability of the next node i_next does not rise to be higher than the last state probability (NO in step S 65 ). In step S 66 , the execution managing unit 44 sets the node i_pathmax as the node i_next. Thereafter, the processing proceeds to step S 68 . Processing in step S 68 and subsequent steps is explained later.
- step S 65 the execution managing unit 44 determines in step S 65 that the state probability of the next node i_next rises to be higher than the last state probability (YES in step S 65 ). The processing proceeds to step S 67 .
- step S 67 the execution managing unit 44 determines whether the system stays in the same node the number of times equal to or larger than N (e.g., fifty).
- step S 67 the execution managing unit 44 determines in step S 67 that the system does not stay in the same node the number of times equal to or larger than N (NO in step S 67 ). The processing proceeds to step S 68 . Processing in step S 68 and subsequent steps is explained later.
- step S 67 the execution managing unit 44 determines in step S 67 that the system stays in the same node the number of times equal to or larger than N (YES in step S 67 ).
- step S 66 the execution managing unit 44 sets the node i_pathmax as the node i_next. In other words, when the system stays in the same node the number of times equal to or larger than N, the execution managing unit 44 regards that the path is advanced by force. Thereafter, the processing proceeds to step S 68 .
- step S 68 the execution managing unit 44 determines whether the system is already on the goal node.
- step S 68 the execution managing unit 44 determines in step S 68 that the system is already on the goal node (YES in step S 68 ). Assuming that the system has reached the target, the execution managing unit 44 finishes the behavior control processing.
- step S 68 the execution managing unit 44 determines in step S 68 that the system is not on the gold node (NO in step S 68 ). The processing proceeds in step S 69 .
- step S 69 the execution managing unit 44 decides the transition edge Aij for transitioning to the next node on the past.
- step S 70 the execution managing unit 44 invokes the controller (the function mapping Fij( )) allocated to the transition edge Aij.
- the action unit 25 gives the present observation value “o” to the controller to calculate the action “a” that should be performed.
- an ID of the controller (the function aping Fij( )) allocated to the transition edge Aij is read out from the controller-table storing unit 42 .
- the controller (the function mapping Fij( )) specified by the ID is read out from the controller storing unit 43 .
- An output obtained as a result of inputting the present observation value “o” to the function mapping Fij( ) as the controller is the action “a”.
- the action “a” is provided to the action unit 25 as a command. Therefore, in step S 71 , the action unit 25 executes the command “a”.
- step S 61 The processing in step S 61 and subsequent steps is repeated.
- step S 68 When the execution managing unit 44 determines in step S 68 that system is already on the goal node (YES in step S 68 ) and the behavior control processing ends, the execution managing unit 44 may determine again whether the node i_max at that point is truly the goal node. When a result of re-determination is a result indicating that the node i_max is the goal node, the entire control processing for the simple pendulum shown in FIG. 5 is finished. On the other hand, when the result of the re-determination is a result indicating that the node i_max is not the goal node, the system shown in FIG. 3 returns the processing to step S 4 . The system performs the behavior control processing in step S 5 again after executing the planning processing again in the same goal node and creating a new path.
- FIG. 15 is a functional block diagram of an information processing system according to an embodiment of the present invention (hereinafter simply referred to as system shown in FIG. 15 ) having a functional configuration example different from that in the system shown in FIG. 3 .
- the system shown in FIG. 15 includes a sensor unit 61 , three kinds of modeling units 62 A and 62 C, a causality unit 63 , a behavior control unit 64 , and an action unit 65 .
- the sensor unit 61 is configured as a so-called multi-modal sensor.
- the multi-modal sensor is briefly explained below.
- multi-modal interface As one of concepts obtained by expanding a human interface in the past is present as a concept of the multi-modal interface.
- the multi-modal interface for example, there is a word called multi-media interface.
- the multi-media interface represents an interface that simply uses plural media (sound, video, tactile sense, etc.)
- the multi-media interface is referred to as multi-modal interface when the respective media are used in various forms and perform information transmission.
- the multi-modal interface there is an interface that sets events such as utterance, action, and line of sight as modals, causes these modals to cooperate with one another, simultaneously uses the modals, and combines plural kinds of messages to cause a human to understand a message that the human originally intends to communicate or is naturally transmitted.
- the multi-modal sensor is a sensor for realizing such a multi-modal interface and is a sensor that can detect a physical amount corresponding to each of the plural modals (events).
- the sensor unit 61 observes, for each of three modals, a predetermined physical amount of an environment in which the agent is placed, i.e., a physical amount corresponding to the modal and provides, as an observation signal, modeling units 62 A, 62 B, and 62 C with a result of the observation.
- Each of the modeling units 62 A, 62 B, and 62 C has a function and a configuration basically same as those of the modeling unit 22 shown in FIG. 3 .
- Concerning the modeling unit 62 A, each of a learning unit 71 A, an HMM storing unit 72 A, a recognizing unit 73 A, and a planning unit 74 A has a function basically same as that of each of the learning unit 31 , the HMM storing unit 32 , the recognizing unit 33 , and the planning unit 34 shown in FIG. 3 .
- the modeling unit 62 B includes a learning unit 71 B, an HMM storing unit 72 B, a recognizing unit 73 B, and a planning unit 74 B having functions and configurations basically same as those of each of the learning unit 31 , the HMM storing unit 32 , the recognizing unit 33 , and the planning unit 34 shown in FIG. 3 .
- the modeling unit 62 C includes a learning unit 71 C, an HMM storing unit 72 C, a recognizing unit 73 C, and a planning unit 74 C having functions and configurations basically same as those of the learning unit 31 , the HMM storing unit 32 , the recognizing unit 33 , and the planning unit 34 shown in FIG. 3 .
- respective HMMs constructed as a result of learning performed by using observation signals for respective three modals of the sensor 61 i.e., HMMs of the three modals are stored in the HMM storing units 72 A to 72 C.
- the modals to be modeled by modeling units 62 A to 62 C are referred to as modals A to C.
- respective HMMs of the modals A to C are stored in the HMM storing units 72 A to 72 C.
- the number of modals is not limited to three and only has to be equal to or larger than two. However, in that case, modeling units corresponding to the modeling unit 62 A equivalent to the number of modals are present.
- the causality unit 63 includes a causality learning unit 75 , a causality-table storing unit 76 , and a causality estimating unit 77 .
- the causality learning unit 75 learns node transition, which is recognized by a recognizing unit 73 AK on the basis of the structure of an HMM of a modal K (K is any one of A to C), and a relation of a state of an HMM of another modal L (L is any one of A to C other than K). A result of the learning is stored in the causality-table storing unit 76 . Details of processing by the causality learning unit 75 are explained later.
- the behavior control unit 64 includes an execution managing unit 78 and a controller unit 79 .
- the controller unit 79 includes a controller-table storing unit 80 and a controller storing unit 81 .
- the controller-table storing unit 80 and the controller storing unit 81 have functions and configurations basically same as those of the controller-table storing unit 42 and the controller storing unit 43 shown in FIG. 3 .
- the execution managing unit 78 determines the modal K corresponding to the target and provides the modeling unit 62 K with the modal K.
- a planning unit 74 K of the modeling unit 62 K plans a path according to the target and provides the execution managing unit 78 with the path.
- the execution managing unit 78 controls the action unit 65 such that the system (the agent) shown in FIG. 15 behaves along the path.
- the execution managing unit 78 inquires, in order to realize the path, the causality estimating unit 77 of a cause node that is a cause of transition.
- the causality estimating unit 77 estimates the cause node and a cause modal and provides the execution managing unit 78 with the cause node and the cause modal.
- the cause node and the cause modal are explained later.
- the execution managing unit 78 inquires the controller unit 79 and outputs a command corresponding to the controller. If the cause node is a node on an HMM of another modal L, the execution managing unit 78 recursively inquires a planning unit 74 L of a path with the node set as a target. Details of the series of processing by the execution managing unit 78 are explained later.
- the action unit 65 performs a predetermined behavior according to a command from the behavior control unit 64 .
- FIG. 15 The system shown in FIG. 15 is explained more in detail below with reference to an example in which a multi-modal task is given as a task.
- the multi-modal task has a purpose of allowing a round mobile robot 85 to freely move within an area surrounded by a wall 86 as shown in FIG. 16 .
- a point 87 indicates that a light source is present there.
- FIG. 16 is a diagram of an external appearance of the simulator.
- a prototype of the simulator shown in FIG. 16 adopted in this experiment is disclosed in a document “Olivier Michel. Khepera Simulator Package version 2.0: Freeware mobile robot simulator written at the University of Nice Plaza—Antipolis by Oliver Michel. Downloadable from the World Wide Web at http://wwwi3s.unice.fr/ ⁇ orn/khep-sim.html” (Document A).
- the prototype is referred to above because the simulator adopted this time is not the simulator per se disclosed in the document but a simulator incorporating observation signals and actions shown in FIG. 17 .
- the robot 85 is mounted with, as the sensor unit 61 , an energy sensor 61 C in addition to a distance sensor 61 A that detects a distance to the wall 86 and an optical sensor 61 B that detects brightness of light.
- the robot 85 can move by driving left and right wheels.
- the distance sensor 61 A outputs values corresponding to distances to the wall in the respective twenty-four directions as an observation signal.
- bar graphs with numbers 1 to 24 respectively represents signal intensities (instantaneous values) of the observation signals in the twenty-four directions.
- the optical sensor 61 B On the assumption that the optical sensor 61 B is attached in twenty-four directions (same as the directions of the distance sensor 61 A) around the robot 85 , the optical sensor 61 B outputs values corresponding to brightness of light in the respective twenty-four directions as an observation signal.
- the values of the observation signal are adapted to be not only values in one direction but also values affected by the sensors around the robot 85 .
- bar graphs of numbers 25 to 48 respectively represent signal intensities (instantaneous values) of the observation signal in the twenty-four directions.
- the energy sensor 61 C observes energy defined as explained below and outputs an observation value of the energy as an observation signal.
- the energy is consumed in proportion to a movement amount and supplied in proportion to an amount of irradiated light.
- a bar graph with a number 49 represents signal intensity (instantaneous value) of the observation signal.
- a command of a movement amount is adopted.
- a command ( ⁇ x, ⁇ y) for movement along the abscissa and the ordinate on the simulator shown in FIG. 16 (hereinafter referred to as movement command) is adopted.
- ⁇ x is a movement command in an x direction (a horizontal direction in the figure)
- ⁇ y is a movement command in the y axis (a vertical direction in FIG. 16 ).
- the robot 85 has a detection function realized by using the twenty-four-dimension distance sensor 61 A, the twenty-four-dimension optical sensor 61 B, and the one-dimensional energy sensor 61 C.
- the robot 85 also has input and output functions for two-dimensional movement command.
- the robot 85 is an agent controlled by the system shown in FIG. 15 . Therefore, the robot 85 displays these various functions to make it possible to self-organize an internal state and arbitrarily control the internal state.
- the system shown in FIG. 15 executes the learning processing for an HMM in the same manner as the processing in step S 1 of the control processing for the simple pendulum task shown in FIG. 5 .
- the learning processing for an HMM executed by the system shown in FIG. 15 is different from the learning processing shown in FIG. 5 and is processing explained below.
- the system shown in FIG. 15 (the agent as the robot 85 A) performs a behavior on the basis of a random or simple innateness rule (e.g., when the system moves in a certain direction and bumps against the wall 86 , the system changes the direction). It is assumed that, when the behavior based on the innateness rule is performed, the innateness controller 23 shown in FIG. 3 is provided in the system shown in FIG. 15 as well.
- a random or simple innateness rule e.g., when the system moves in a certain direction and bumps against the wall 86 , the system changes the direction.
- learning for an HMM is performed by using the time series observation signal (the time series signal at the angle ⁇ ) as only observation information.
- the robot 85 has a detection function realized by the twenty-four dimensional distance sensor 61 A, the twenty-four dimensional optical sensor 61 B, and one-dimensional energy sensor 61 C. Therefore, the learning processing for an HMM is performed for each of three kinds of observation signals, i.e., an observation signal (distance) of the distance sensor 61 A, an observation signal (light) of the optical sensor 61 B, and an observation signal (energy) of the energy sensor 61 C.
- a learning processing unit for an HMM concerning one observation signal is basically the same as the learning processing for an HMM in the control processing for the simple pendulum task shown in FIG. 5 .
- the modeling unit 62 A constructs an HMM for distance and stores the HMM in the HMM storing unit 72 A.
- the modeling unit 62 B constructs an HMM for light and stores the HMM in the HMM storing unit 72 B.
- the modeling unit 62 C constructs an HMM for energy and stores the HMM in the HMM storing unit 72 C.
- a display example of a learning processing result for an HMM by the modeling unit 62 A i.e., a display example of a result obtained by giving, as an initial structure, a two-dimensional neighborhood structure HMM with 400 nodes to the system and causing the system to learn a time series of the observation signal (distance) of the distance sensor 61 A is shown in FIG. 18A .
- a display example of a learning processing result for an HMM by the modeling unit 62 B i.e., a display example of a result obtained by giving, as an initial structure, a two-dimensional neighborhood structure HMM with 100 nodes to the system and causing the system to learn a time series of the observation signal (light) of the distance sensor 61 B is shown in FIG. 18B .
- a display example of a learning result of the modeling unit 62 C i.e., a display example of a result obtained by giving, as an initial structure, a two-dimensional neighborhood structure HMM with 100 nodes to the system and causing the system to learn a time series of the observation signal (energy) of the distance sensor 61 C is shown in FIG. 18C .
- nodes (white void circles) are plotted in average positions where the robots 85 is present when the respective nodes are recognized.
- the abscissa indicates a distance in the horizontal direction x and the ordinate indicates a distance in the vertical direction y.
- nodes are plotted in average positions where the robot 85 is present when the respective nodes are recognized.
- the abscissa indicates a distance in the horizontal direction x and the ordinate indicates a distance in the vertical direction y.
- a center position i.e., a coordinate (0,0) indicates a position of the point 87 as a light source.
- the coordinate (0,0) does not mean a position of specific one point 87 but means a position of any one of three points 87 shown in FIG. 16 .
- nodes are plotted on a space of a value of energy (the ordinate) and a distance to light (the point 87 as the light source) closest to an average position where the robot 85 is present.
- the HMM for distance shown in FIG. 18A is represented as a topological network of a maze configuration.
- a method of plotting shown in FIG. 18C is a method of plotting with a distance to light (a distance to the point 87 A) set on the abscissa. Therefore, it is seen that network in which, when close to the light, state transition is formed in a direction in which energy rises and, on the other hand, when distant from the light, a direction of state transition is determined in a direction in which energy falls, i.e., a so-called latter-type network is formed.
- the multi-modal task as the target is considered only in terms of an HMM for distance and an action (a command) and controlled to be in an arbitrary state, a state shown in FIG. 19 is obtained.
- the behavior control processing can be realized by an idea same as that of the simple pendulum task. In other words, in this case, the system shown in FIG. 15 only has to execute steps S 2 to S 5 in FIG. 5 .
- transition of the HMM for energy shown in FIG. 18C depends on a distance relation between the light sources (the points 87 ) on the simulator shown in FIG. 16 and the robot 85 . Therefore, the transition of the HMM for energy shown in FIG. 18C has no relation with a moving action indicating in which direction the robot 85 moves at a certain instance. However, in an internal state like a position in the maze represented by the HMM for distance shown in FIG. 18A , a high relation is present between a moving action of the robot 85 and a transitioning node.
- the causality unit 63 is provided in order to realize a function with which the agent (the robot 85 ) can autonomously find and control a relation between an internal state and a behavior even in such a case.
- the causality unit 63 can execute processing explained below instead of steps S 2 and S 3 in FIG. 5 in order to attain the target of the multi-modal task.
- presently-recognized one node is decided according to recognition results in the respective HMMs shown in FIGS. 18A to 18C .
- a recognition result in a unit HMM for example, a result of the recognition processing shown in FIG. 12 concerning the unit HMM can be adopted.
- an action performed at that time can be treated as one modal by being discretized.
- a modal is referred to as action modal.
- a state of the action modal is referred to as action state.
- a state of the HMM at time t including the action state is described as S k,i ( t ).
- “k” indicates a modal number.
- “i” indicates an index representing a state in the modal.
- P ⁇ ( s k , j ⁇ ( t + 1 ) ) ⁇ i , 1 ⁇ P ⁇ ( s k , i ⁇ s k , j
- Formula (1) indicates that the next state of a certain modal depends on the present state and a state S m,l of certain another modal.
- This “certain modal” is referred to as cause modal.
- a present station node in the cause modal is referred to as cause node.
- Formula (1) indicates a simple behavior result model in which a node transitioned from the present state node (the cause node) changes according to a behavior (an action) performed at time t.
- causality estimation finding of a cause modal and a cause node concerning node transition of respective modals is referred to causality estimation. Since the causality estimation is explained in detail later, only an overview of the causality estimation is explained below.
- the causality estimation means when transition occurs in a certain modal, counting states of other modals recognized at that point and deducing a state that occurs simultaneously with the transition at a high frequency. This makes it possible to find cause modals and cause nodes corresponding to respective transitions.
- the causality learning unit 75 finds a cause modal and a cause node corresponding to each of the transitions by performing such causality estimation for each of the transitions.
- the cause modal and the cause node for each of the transitions are stored in the causality-table storing unit 76 as a table. In the following explanation, such a table is referred to as causality table.
- FIG. 20 movement of the moving robot 85 in the simulator shown in FIG. 16 is assumed as a task. Processing of the system shown in FIG. 15 in the case of one modal of only distance is shown in the figure.
- FIG. 20 FIGS. 21 and 24 referred to later
- actions (behaviors) of the robot 85 the agent
- only moving actions in the four directions E (east), W (west), S (south), and N (north) are adopted.
- Step S 81 A is processing of self-organization of an internal state by structure learning of an HMM for distance.
- Step S 82 A is processing of estimating, i.e., counting actions that cause respective state transitions.
- Step S 83 A is processing of generating a path.
- Step S 84 A is execution processing for an action.
- a modal indicates processing of the system shown in FIG. 15 at the time when energy is also present in addition to distance.
- step S 81 B the system shown in FIG. 15 acquires HMMs independently for respective modals.
- an HMM for distance and an HMM for energy are acquired.
- step S 82 B the system shown in FIG. 15 generates a “(extended) cause state—result transition model” shown in FIG. 21 .
- an action state assuming that an action is one of states (an action state), a cause state that causes transitions of respective modals. For example, as shown in FIG. 21 , in specific transition on the HMM for distance, when the system is typically in an action state to the north (N), the action state is counted. For example, in the HMM for energy, when energy typically increases in a location where food is present, a state of the food in the HMM for distance is counted.
- the system shown in FIG. 15 can finish the learning at that stage, set an arbitrary target in the internal state formed by the system itself, and perform a behavior for realizing attainment of the target.
- the system shown in FIG. 15 sets up a plan (planning) for realizing the attainment of the target.
- Such processing for setting up a plan is planning processing.
- this planning processing is different from the planning processing executed in step S 4 of the control processing for the simple pendulum task shown in FIG. 5 . Therefore, in the following explanation, planning processing performed in a multi-modal task is specifically referred to as multi-stage planning processing.
- the system shown in FIG. 15 executes behavior control processing according to a result of the multi-stage planning processing.
- this behavior control processing is different from the behavior control processing executed in step S 5 of the control processing for the simple pendulum task shown in FIG. 5 . Therefore, behavior control processing performed in a multi-modal task is specifically referred to as multi-stage behavior control processing.
- the planning unit 74 K of the modeling unit 62 K sets, as a goal, a target designated from the outside or endogenously obtained in the system.
- a predetermined state (node) in a predetermined modal is set as the goal.
- a goal modal and a goal state are set.
- the modeling unit 62 K executes, for example, planning processing conforming to the flowchart of FIG. 13 . Consequently, a path from a present state node (a start node) to a goal node in the modal K is generated.
- the planning unit 74 C executes the planning processing for the modal C for energy, a path shown on the right side of FIG. 22 is set.
- the behavior control unit 64 can execute multi-stage behavior control processing explained below.
- the execution managing unit 78 of the behavior control unit 64 acquires, from the causality estimating unit 77 of the causality unit 63 , cause modals and cause nodes allocated to respective transitions on the path from the start node to the goal node.
- the causality estimating unit 77 receives notification of predetermined transition from the execution managing unit 78
- the causality estimating unit 77 finds and extracts a cause modal and a cause node allocated to the predetermined transition from the causality-table storing unit 76 and provides the execution managing unit 78 with the cause modal and the cause node.
- the execution managing unit 78 can acquire a command corresponding to the cause node from the controller unit 79 and provide the action unit 65 with the command. Therefore, in this case, the execution managing unit 78 only has to execute behavior control processing conforming to the flowchart of FIG. 14 .
- the cause modal when the cause modal is not an action modal, it is necessary to transition a present state of the cause modal to the cause node.
- the cause modal is the modal B for light.
- the execution managing unit 78 requests a modeling unit 62 L for the cause modal L to perform planning processing.
- the planning unit 74 L of the modeling unit 62 L executes planning processing for the present state node to the cause node and notifies the execution managing unit 78 of a result of the execution, i.e., a path.
- the planning unit 74 B of the modeling unit 62 B for the modal B for light executes the planning processing for the present state node to the cause node as shown on the left side of FIG. 22 and notifies the execution managing unit 78 of a result of the execution, i.e., a path.
- the execution managing unit 78 acquires, from the causality estimating unit 77 of the causality unit 63 , cause modals and cause nodes allocated to respective transitions on the notified path.
- the execution managing unit 78 recursively invokes the cause modals and the cause nodes in this way. At a stage when the execution managing unit 78 reaches an action modal that the agent can directly output, the execution managing unit 78 determines an action (a command) at that time and provides the action unit 65 with the action.
- the execution managing unit 78 returns to the original modal and executes the behavior control processing in the modal.
- the execution managing unit 78 returns to the modal C for energy on the right side of FIG. 22 . Transition of the present state node on the HMM occurs.
- the execution managing unit 78 can finally reach a goal node given first (in the example shown in FIG. 22 , a goal node on the HMM for energy on the right), the target is attained.
- plural cause modals and cause nodes are present as in a large number of problems in the real world.
- any one of the light sources may be a cause. Since energy can be sufficiently obtained around light, any one of nodes near the light may be a cause.
- the system shown in FIG. 15 selects a path to which the system shown in FIG. 15 reaches first in setting up a plan for a cause node. In this way, the system shown in FIG. 15 can select an appropriate cause node and the path. Specifically, first, the system shown in FIG. 15 selects one of cause modals.
- the system shown in FIG. 15 sets all cause nodes to be candidates in the cause modal as goal nodes and plans paths from the present state node to the goal nodes.
- This planning is basically realized by execution of the planning processing shown in FIG. 13 .
- the determination is applied to all goal nodes. With this method, it is possible to select a goal node reached first and a path to the goal node.
- a state near a certain light source (in the example shown in FIG. 16 , the point 87 ) on the HMM for light is set as a goal.
- the system shown in FIG. 15 performs path search (planning processing)
- path search planning processing
- a present state node is near light and brightness is sensed, it is known in which direction brightness of the light changes. Therefore, transition to nodes around the light is associated with an action as in the case of the HMM for distance.
- a present position is in a state in which light is unseen, it is unknown for the robot 85 as the agent in which direction the robot 85 should move to see the light.
- any one of the three points 87 ) has a high correlation with each of nodes at a south end of the three light sources (in the example shown in FIG. 16 , the three points 87 ) on the HMM for distance. Therefore, the system shown in FIG. 15 sets the nodes on the south end side as cause nodes, respectively, as explained above, executes the planning processing for the HMM for distance, and executes the control processing. In this way, transition occurs in the HMM for light. In executing the planning processing when the plural nodes are present in this way, in step S 45 of FIG. 13 , the system shown in FIG. 15 can calculate, simply by checking reaching conditions for the plural goal nodes, a path that can reach the plural goal nodes first.
- the robot 85 as the agent can calculate path to an outer edge of light on the HMM for distance and move to a nearest light source (in the example shown in FIG. 16 , the point 87 ).
- a nearest light source in the example shown in FIG. 16 , the point 87 .
- the robot 85 can move to a position relative to the light as a target of transition in the HMM for light.
- the HMM for energy is not directly related to an action concerning any transition. If the causality estimation is performed well, transition in a direction in which energy rises has high causality with nodes near the light sources (in the example shown in FIG. 16 , the points 87 ) represented by the HMM for light and nodes near positions of the light sources (in the example shown in FIG. 16 , the points 87 ) represented in the HMM for distance. Since there are three light sources, whereas causality is divided into three on the HMM for distance, representation is the same for all lights in the HMM for light. Therefore, the transition has high causality to nodes on the HMM for light.
- step S 81 C a target for increasing energy is given.
- step S 82 C the system shown in FIG. 15 creates a path through which energy sequentially rises in the HMM for energy.
- the system shown in FIG. 15 generates, on the basis of causality of transition on the HMM for energy, a path that approaches the light source (in the example shown in FIG. 16 , the points 87 ) on the HMM for light. If necessary, the system shown in FIG. 15 creates a path approaching a nearest light source (in the example shown in FIG. 16 , the point 87 ) using the representation of the HMM for distance even in a place where the light sources (in the example shown in FIG. 16 , the points 87 ) are unseen for the robot 85 as the agent. In other words, processing in steps S 83 C and S 84 C explained below is executed. Step S 83 C is processing for realizing a cause state (in order to cause transition). Step S 84 C is processing for creating a path.
- the system shown in FIG. 15 can perform, on the basis of this path, a behavior for approaching light from a distance and staying in a place near the light until energy reaches a target state.
- processing in steps S 85 C and S 86 C explained below is executed.
- Step S 85 C is processing for realizing a cause state.
- Step S 86 C is preparation processing that can be immediately executed.
- the system only has to perform a behavior for moving away from the light sources (in the example shown in FIG. 16 , the points 87 ) and staying in a place away from the light sources.
- the system shown in FIG. 15 can reduce various problems to the problems of the state transition for each of independent modals (events) and the path control for the state transition, deduce causality among the modals, and recursively control the causality. As a result, it is possible to treat a complicated problem of performance control without relying on premise knowledge for a task.
- a personal computer shown in FIG. 25 may be used as at least a part of the system explained above.
- a CPU (Central Processing Unit) 91 executes various kinds of processing according to programs recorded in a ROM (Read Only Memory) 92 or programs loaded from a storing unit 98 to a RAM (Random Access Memory) 93 . Data and the like necessary for the CPU 91 to execute the various kinds of processing are also stored in the RAM 93 as appropriate.
- ROM Read Only Memory
- RAM Random Access Memory
- the CPU 91 , the ROM 92 , and the RAM 93 are connected to one another via a bus 94 .
- An input and output interface 95 is also connected to the bus 94 .
- An input unit 96 including a keyboard and a mouse, an output unit 97 including a display, the storing unit 98 including a hard disk, and a communication unit 99 including a modem and a terminal adapter are connected to the input and output interface 95 .
- the communication unit 99 controls communication performed with other apparatuses (not shown) via a network including the Internet.
- a drive 100 is also connected to the input and output interface 95 according to necessity.
- a removable medium 101 including a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is inserted in the drive 100 as appropriate.
- a computer program read out from the removable medium 101 is installed in the storing unit 98 according to necessity.
- a program forming the software is installed in, from a network or a recording medium, a computer incorporated in dedicated software, a general-purpose personal computer that can execute various functions by installing various programs, or the like.
- the storing medium including such programs is configured by a removable medium 101 distributed to provide a user with the programs separately from an apparatus main body.
- the removable medium 101 includes a magnetic disk (including a floppy disk), an optical disk (including a CD-ROM (Compact Disk-Read Only Memory), or a DVD (Digital Versatile Disk) in which the programs are recorded or includes a semiconductor memory or the like.
- the storing medium is configured by the ROM 92 in which the programs are recorded, a hard disk included in the storing unit 98 , and the like, which are provided to the user in a state in which the devices are incorporated in the apparatus main body in advance.
- FIG. 26 is a diagram for explaining an overview of a configuration example of a data processing apparatus according to an embodiment of the present invention.
- the data processing apparatus stores a state transition model having a state and state transition.
- the data processing apparatus is a type of a learning apparatus that performs learning for modeling a modeling target according to the state transition model, i.e., a learning apparatus that learns, on the basis of a sensor signal observed from the modeling target, a state transition model for giving probabilistic and statistical dynamic characteristics.
- the data processing apparatus can be applied to the learning unit 31 .
- a sensor signal obtained by sensing the modeling target is observed from the modeling target, for example, in time series.
- the data processing apparatus performs, using a sensor signal observed from the modeling target, learning of a state transition model, i.e., estimation of parameters of the state transition model and determination of structure of the state transition model.
- an HMM for example, an HMM, a Bayesian network, a POMDP (Partially Observable Markov Decision Process) and the like can be adopted.
- an HMM is adopted as the state transition model.
- FIG. 27 is a diagram of an example of the HMM.
- the HMM is a state transition model having states and an inter-state transition.
- FIG. 27 an example of an HMM in three states is shown.
- b j (x) represents an output probability density function with which an observation value x is observed during state transition to the state s j .
- ⁇ i represents an initial probability of the state s i being in an initial state.
- the output probability density function b j (x) for example, a mixed normal probability distribution is used.
- An HMM (a continuous HMM) is defined by the state transition probability a ij , the output probability density function b j (x), and the initial probability ⁇ i .
- N represents the number of states of the HMM.
- the Baum-Welch re-estimation method is a method of estimating parameters based on an EM (Expectation-Maximization) algorithm.
- x t represents a signal (a sample value) observed at time t.
- T represents the length of the time series data (the number of samples).
- the Baum-Welch re-estimation method is a parameter estimating method based on likelihood maximization. However, optimality is not guaranteed. An HMM may converge into a local solution depending on the structure of the HMM or initial values of the parameters ⁇ . Details of the HMM and the Baum-Welch re-estimation method are described in, for example, Laurence Rabiner and Biing-Hwang Juang, “Basics of Sound Recognition (two volumes)”, NTT Advanced Technology Corporation (hereinafter also referred to as document A).
- HMMs are widely used in sound recognition. However, in the HMMs used for sound recognition, in general, the number of states, a method of state transition, and the like are determined in advance.
- FIG. 28 is a diagram of an example of an HMM used in sound recognition.
- the HMM shown in FIG. 28 is called left-to-right type.
- State transition is limited to structure for allowing only self-transition (state transition from the state s i to the state s i ) and state transition from the left to a state on the right.
- the HMM without limitation on state transition shown in FIG. 27 i.e., an HMM that allows state transition from an arbitrary state s i to an arbitrary state s j is called Ergodic HMM.
- the Ergodic HMM is an HMM with a highest degree of freedom in terms of structure. However, when the number of states is large, it is difficult to estimate the parameters ⁇ .
- the number of states of the Ergodic HMM is 1000
- Limited state transitions may be sufficient as necessary state transitions depending on a modeling target. However, when it is not known in advance how state transitions should be limited, it is extremely difficult to appropriately estimate such an enormous number of parameters ⁇ . When an appropriate number of states is not known in advance and information for determining the structure of an HMM is not known in advance either, it is more difficult to calculate appropriate parameters ⁇ .
- the data processing apparatus shown in FIG. 26 determines the structure of an HMM appropriate for a modeling target and performs learning for estimating parameters ⁇ of the HMM even if limitation is not given in advance concerning the structure of the HMM, i.e., the number of states of the HMM and state transitions.
- FIG. 29 is a block diagram of a configuration example of the data processing apparatus shown in FIG. 26 .
- the data processing apparatus includes a time-series-data input unit 111 , a data adjusting unit 112 , a parameter estimating unit 113 , an evaluating unit 114 , a model storing unit 115 , an initial-structure setting unit 116 , and a structure adjusting unit 117 .
- a sensor signal observed from a modeling target is inputted to the time-series-data input unit 111 .
- the time-series-data input unit 111 directly supplies, for example, a sensor signal in time series observed from the modeling target to the data adjusting unit 112 as observed time series data x.
- the time-series-data input unit 111 supplies the observed time series data x to the data adjusting unit 112 in response to a request from the evaluating unit 114 .
- the data adjusting time 112 converts, with the down-sampling processing, the observed time series data x sampled at 1000 Hz into the adjusted time series data x′ sampled at 100 Hz.
- the adjusted time series data x′ is changed to time series data including only macro characteristics of the observed time series data x, i.e., low frequency components of the observed time series data x.
- the data adjusting unit 112 it is an important problem in performing learning for appropriately acquire characteristics of the observed time series data x to determine how the observed time series data x should be adjusted, i.e., in the present case, into the adjusted time series data x′ of which sampling frequency the observed time series data x should be converted.
- the data adjusting unit 112 adjusts the observed time series data x according to progress of learning of an HMM.
- the adjustment is performed such that, as the learning of the HMM progresses, the adjusted time series data x′ changes from time series data including only macro characteristics of the observed time series data x to time series data including micro characteristics, i.e., high-frequency components of the observed time series data x as well.
- the data adjusting unit 112 gradually changes a sampling frequency of the adjusted time series data x′ from a small value to a large value as the learning of the HMM progresses.
- the data adjusting unit 112 sets the sampling frequency of the adjusted time series data x′ to 10 Hz. Thereafter, as the learning progresses, the data adjusting unit 112 sequentially changes the sampling frequency of the adjusted time series data x′ to 50 Hz, 100 Hz, 500 Hz, and 1000 Hz.
- the HMM acquires the macro characteristics of the observed time series data x in the initial period of the learning and acquires the micro characteristics of the observed time series data x as the learning progresses.
- Progress state information indicating a state of the progress of the learning is supplied to the data adjusting unit 112 from the evaluating unit 114 .
- the data adjusting unit 112 recognizes the state of the progress of the learning on the basis of the progress state information from the evaluating unit 114 and changes the sampling frequency of the adjusted time series data x′.
- filter bank processing can be adopted besides the down-sampling processing (processing for curtailing the observed time series data x in a time direction).
- the observed time series data x is filtered by using a predetermined division number of filter banks. Consequently, the observed time series data x is divided into the predetermined division number of frequency components. The predetermined division number of frequency components are outputted as the adjusted time series data x′.
- the number of divisions of the filter banks is gradually changed to a larger number as the learning progresses.
- T′ represents the length of the adjusted time series data x′.
- the adjusted time series data x′ outputted by the data adjusting unit 112 is supplied to the parameter estimating unit 113 and the structure adjusting unit 117 .
- the parameter estimating unit 113 estimates the parameters k of the HMM stored in the model storing unit 115 using the adjusted time series data x′ supplied from the data adjusting unit 112 .
- the parameter estimating unit 113 estimates, for example, with the Baum-Welch re-estimation method, the parameters ⁇ of the HMM stored in the model storing unit 115 using the adjusted time series data x′ from the data adjusting unit 112 .
- the parameter estimating unit 113 supplies new parameters ⁇ obtained by the estimation of the parameters k of the HMM to the model storing unit 115 and causes the model storing unit 115 to store the new parameters ⁇ in a form of overwriting.
- the parameter estimating unit 113 uses values stored in the model storing unit 115 as initial values of the parameters ⁇ .
- the parameter estimating unit 113 when processing for estimating the new parameters ⁇ is performed, the number of times of learning is counted once.
- the parameter estimating unit 113 increments the number of times of learning by 1 every time the processing for estimating the new parameters ⁇ is performed and supplies the number of times of learning to the evaluating unit 114 .
- the parameter estimating unit 113 calculates, from the HMM defined by the new parameters ⁇ , likelihood that the adjusted time series data x′ supplied from the data adjusting unit 112 is observed and supplies the likelihood to the evaluating unit 114 .
- the likelihood supplied to the evaluating unit 114 by the parameter estimating unit 113 can be calculated by using the observed time series data x rather than the adjusted time series data x′.
- the evaluating unit 114 evaluates, on the basis of the likelihood and the number of times of learning supplied from the parameter estimating unit 113 , the learned HMM, i.e., the HMM, the parameters ⁇ of which are estimated by the parameter estimating unit 113 .
- the evaluating unit 114 determines, on the basis of a result of the evaluation of the HMM, whether the learning of the HMM should be finished.
- the evaluating unit 114 evaluates that the acquisition of characteristics (time series patterns) of the observed time series data x by the HMM is insufficient and determines to continue the learning of the HMM.
- the evaluating unit 114 evaluates that the acquisition of characteristics of the observed time series data x by the HMM is sufficient and determines to finish the learning of the HMM.
- the evaluating unit 114 evaluates that the acquisition of characteristics (time series patterns) of the observed time series data x by the HMM is insufficient and determines to continue the learning of the HMM.
- the evaluating unit 114 evaluates that the acquisition of characteristics of the observed time series data x by the HMM is sufficient and determines to finish the learning of the HMM.
- the evaluating unit 114 determines to continue the learning of the HMM, the evaluating unit 114 requests the time-series-data input unit 111 , the data adjusting unit 112 , and the structure adjusting unit 117 to perform predetermined processing.
- the evaluating unit 114 requests the time-series-data input unit 111 to supply observed time series data.
- the evaluating unit 114 supplies the number of times of learning and the likelihood to the data adjusting unit 112 as progress state information representing a state of progress of the learning to request the data adjusting unit 112 to perform down-sampling processing corresponding to the progress of the learning.
- the evaluating unit 114 requests, according to the progress of the learning, the structure adjusting unit 117 to adjust the structure of the HMM stored in the model storing unit 115 .
- the model storing unit 115 stores, for example, the HMM as the state transition model.
- the model storing unit 115 updates (overwrites) stored values (stored parameters of the HMM) with the new parameters.
- the model storing unit 115 stores the structure of the HMM initialized by the initial-structure setting unit 116 (initial structure), i.e., initial values of parameters of the HMM determined on the basis of limitation concerning the number of states and state transitions of the HMM.
- the parameters of the HMM by the parameter estimating unit 113 are estimated from the initial values determined by the initial-structure setting unit 116 .
- the structure of the HMM stored in the model storing unit 115 is adjusted by the structure adjusting unit 117 according to the progress of the learning.
- the update of the stored values in the model storing unit 115 is also performed according to parameters of the HMM obtained by the adjustment of the structure of the HMM by the structure adjusting unit 117 .
- the initial-structure setting unit 116 initializes the structure of the HMM before the learning of the HMM is started and sets parameters of the HMM having the initialized structure (initial structure) (initial parameters).
- the initial-structure setting unit 116 sets the initial structure of the HMM, i.e., the number of states and state transitions of the HMM.
- Predetermined limitation can be applied to the number of states and the state transitions of the HMM as the initial structure.
- the initial-structure setting unit 116 sets the number of states of the HMM to be equal to or smaller than a predetermined number set as the predetermined limitation.
- the initial-structure setting unit 116 sets the number of states of the HMM to a relatively small number such as sixteen or one hundred.
- the initial-structure setting unit 116 appropriately arranges states in the number of states set as the initial structure in an L-dimensional space (L is a positive integer) equal to or larger than one dimension.
- the initial-structure setting unit 116 arranges the sixteen states in the two-dimensional space in, for example, a lattice shape.
- the initial-structure setting unit 116 sets, with respect to the sixteen states arranged in the two-dimensional space, state transitions, i.e., self-transition and state transition to other states.
- Predetermined limitations such as limitation that the structure should be sparse structure can be applied to the state transitions set with respect to the sixteen states.
- the sparse structure is, rather than dense state transition structure like the Ergodic HMM in which state transition from an arbitrary state to an arbitrary state is possible, structure in which states to which state transition is possible from a certain state are extremely limited.
- the initial-structure setting unit 116 obtains the initial structure by, for example, as explained above, applying the predetermined limitation to initialize the structure of the HMM into the sparse structure. Then, the initial-structure setting unit 116 sets initial parameters, i.e., initial values of the state transition probability a ij , the output probability density function b j (x), and the initial probability ⁇ i in the HMM having the initial structure.
- the initial-structure setting unit 116 sets, for example, with respect to each of the states, the state transition probability a ij of (valid) state transition, which is possible from the state, to a uniform value (when the number of possible state transitions is M, 1/M).
- the initial-structure setting unit 116 sets the state transition probability a ij of difficult state transition state, i.e., transition other than the state transition set as the sparse state transition to 0.
- the initial-structure setting unit 116 sets the normal distribution defined by the average ⁇ and the dispersion ⁇ 2 in the output probability density function b j (x) of the respective states s j .
- ⁇ means summation with time t changed from 1 to length T of the observed time series data x.
- the initial-structure setting unit 116 sets initial probabilities ⁇ i of the respective states s i to a uniform value.
- the initial-structure setting unit 116 sets the initial probabilities ⁇ i of the respective N states s i to 1/N.
- the (initial) structure and the (initial) parameters stored in the model storing unit 115 are updated by learning.
- the structure-adjusting unit 117 adjusts, in response to a request from the evaluating unit 114 , the structure of the HMM stored in the model storing unit 115 using the adjusted time series data x′ supplied from the data adjusting unit 112 .
- the adjustment of the structure of the HMM performed by the structure adjusting unit 117 includes adjustment of parameters of the HMM necessary for the adjustment of the structure.
- types of the adjustment of the structure of the HMM Performed by the structure adjusting unit 117 there are six types, i.e., division of a state, merging of a state, addition of a state, addition of state transition, deletion of a state, and deletion of state transition.
- the processing by the initial-structure setting unit 116 shown in FIG. 29 is further explained with reference to FIGS. 30A and 30B .
- the initial-structure setting unit 116 can set Ergodic structure as the initial structure of the HMM or can set sparse structure by applying predetermined limitation to the initial structure.
- FIGS. 30A and 30B are diagrams of an HMM having sparse initial structure (state transitions).
- FIGS. 30A and 30B circles represent states and arrows represent state transitions (the same holds true in the following figures). Further, in FIGS. 30A and 30B , a bidirectional arrow connecting two states represents state transition from one to the other of the two states and state transition from the other to one of the two states (the same holds true in the following figures). In FIGS. 30A and 30B , the respective states in which self-transition can be performed. An arrow representing the self-transition is not shown in the figures (the same holds true in the following figures).
- sixteen states are arranged in a lattice shape on a two-dimensional space.
- sixteen states are arranged in the horizontal direction and four states are also arranged in the vertical direction.
- a method of setting the sparse initial structure is not limited to a method of applying limitation to states arranged on the L-dimensional space to allow only state transitions (including self-transition) to states located in the neighborhood according to a distance between states.
- the HMM shown in FIG. 10A indicates an HMM by three-dimensional grid limitation.
- the HMM shown in FIG. 10B indicates an HMM by two-dimensional random arrangement limitation.
- the HMM shown in FIG. 10C indicates an HMM by a small world network.
- circles represent states.
- circles affixed with a number “i” are described as state s i .
- FIG. 31A is a diagram of an HMM before the division of a state is performed.
- the HMM has six states s 1 , s 2 , s 3 , s 4 , s 5 , and s 6 .
- Bidirectional state transitions between the states s 1 and s 2 , between the states s 1 and s 4 , between the states s 2 and s 3 , between the states s 2 and s 5 , between the states s 3 and s 6 , between the states s 4 and s 5 , and between the states s 5 and s 6 and self-transition are possible.
- FIG. 31B is a diagram of an HMM after the division of a state is performed with the HMM shown in FIG. 31A set as a target.
- the division of a state is performed in order to increase the size of the HMM.
- the state S 5 is divided among the states S 1 to S 6 of the HMM shown in FIG. 31A .
- the division of the state S 5 is performed by adding a new state s 7 , in which state transitions same as those for the state s 5 as a division target can be performed and bidirectional state transition to and from the state s 5 can be performed.
- the structure adjusting unit 117 sets, concerning the new state s 7 , as in the state s 5 , state transitions between the state s 7 and the states s 2 , s 4 , and s 6 and self-transition as valid (possible) state transitions.
- the structure adjusting unit 117 sets, concerning the new state s 7 , state transition between the state s 7 and the state s 5 as valid state transition as well.
- the structure adjusting unit 117 state sets parameters of the new state s 7 to, so to speak, succeed parameters of the division target state s 5 .
- the structure adjusting unit 117 applies normalization processing to necessary parameters of the HMM after the division of a state and finishes processing for dividing a state.
- the structure adjusting unit 117 applies normalization processing, which satisfies the following formula, to the initial probability ⁇ i and the state transition probability a ij of the HMM after the division of a state.
- ⁇ means summation with the variable “j” representing a state changed from 1 to the number of states N of the HMM after the division of a state.
- the number of states N of the HMM after the division of a state is seven.
- a state as a division target is not limited to one state.
- n is equal to or larger than 1 and equal to or smaller than N
- n is equal to or larger than 1 and equal to or smaller than N
- higher order “n” states having large dispersion ⁇ 2 defining the output probability density function b j (x), i.e., higher order “n” states with relatively large fluctuation in an observation value observed from the states can be selected out of the N states s 1 to S N of the HMM before the division of a state.
- the number “n” of states as division targets can be set at random or can be set to a fixed value. In both the cases, by the division of a state, the structure of the HMM is updated to a structure in which the number of states increase by “n” from the number of states before the division.
- FIG. 32A is a diagram of an HMM before the merging of a state is performed.
- the HMM is the same as the HMM shown in FIG. 31A .
- FIG. 32B is a diagram of an HMM after the merging of a state is performed with the HMM shown in FIG. 32A set as a target.
- the merging of a state is performed in order to degenerate redundantly-allocated states.
- the state s 5 is set as a merging target and the merging target state s 5 is merged into the state s 6 as a merged target.
- the merging of the state s 5 into the state s 6 is performed by deleting state transition between the merging target state s 5 and the merged target state s 6 and deleting the merging target state s 5 such that the merged target state s 6 , so to speak, succeeds state transitions (hereinafter also referred to as peculiar state transitions) between the merging target state s 5 and other states excluding the merging target state s 5 and the merged target state s 6 .
- the structure adjusting unit 117 deletes (invalidates) state transition between the merging target state s 5 and the merged target state s 6 .
- the peculiar state transitions of the state S 5 are state transitions between the state s 5 and the states s 2 and s 4 . Therefore, the structure adjusting unit 117 adds (sets) the state transitions between the merged target state s 6 and the states s 2 and s 4 are valid state transitions.
- the structure adjusting unit 117 deletes the merging target state s 5 .
- the structure adjusting unit 117 sets state transition probabilities a i6 and a 6i to succeed state transition probabilities a i5 and a 5j of the merging target state s 5 .
- the structure adjusting unit 117 applies normalization processing to necessary parameters of the HMM after the merging of a state and finishes the merging of a state.
- the structure adjusting unit 117 applies normalization processing same as that in the case of the division of a state to the initial probability ⁇ i of the HMM after the merging of a state and the state transition probability a ij .
- a set of a state to be set as a merging target and a state to be set as a merged target (hereinafter also referred to as merge set) is not limited to one set.
- a pair of states to be set as the merge set for example, a pair of higher order “n” (n is a value equal to or larger than 1) states with larger correlation among the states out of pairs of states, in which bidirectional state transitions can be performed, in the N states s 1 to s N of the HMM before the merging of a state.
- the number “n” of pairs of state to be set as the merge set can be set at random or can be set to a fixed value. In both the cases, by the merging of a state, the structure of the HMM is updated to structure in which the number of states decreases by “n” from the number of states before the merging.
- the correlation among states represents a degree of similarity of state transitions (including self-transition) to the other states, state transitions from the other states, and observation values observed from states. For example, the correlation among states is calculated as explained below.
- the adjusted time series data x′ used for estimation of parameters of the HMM in the parameter estimating unit 113 is supplied to the structure adjusting unit 117 from the data adjusting unit 112 .
- the structure adjusting unit 117 calculates a correlation among states of the HMM stored in the model storing unit 115 using the adjusted time series data x′ supplied from the data adjusting unit 112 .
- the forward-backward algorithm is an algorithm for calculating a probability value as an integrated value of a forward probability ⁇ i (t) calculated by propagating a probability of reaching the respective states s i forward in a time direction and a backward probability ⁇ i (t) calculated by propagating the probability of reaching the respective states s i backward in the time direction.
- the structure adjusting unit 117 calculates, concerning the HMM stored in the model storing unit 115 , the forward probability ⁇ i (t) of observing the data x 1 ′, x 2 ′, . . . , x t ′ of the adjusted time series data x′ and being present in the state s i at time t. Further, the structure adjusting unit 117 calculates, concerning the HMM stored in the model storing unit 115 , the backward probability ⁇ i (t) of being present in the state s i at time t and thereafter observing data x t ′, x t+1 ′, . . .
- the structure adjusting unit 117 calculates the forward-backward probability p i (t) of being present in the state s i at time t using the forward probability ⁇ i (t) and the backward probability ⁇ i (t).
- a correlation between the certain state s i and the other states s j is represented as p i *p j .
- p i *p j ⁇ p i (t)p j (t)
- ⁇ means summation with the time t changed from 1 to the length T′ of the adjusted time series data x′.
- the correlation p i *p j between the states s i and s j is high when time change patterns of the forward-backward probability p i of the state s i and the forward-backward probability p j of the state s i are similar, i.e., when, besides one of the states s i and s j , the other is redundantly present.
- a pair of the states s i and s j is selected as a merge set.
- a state of the merging target is merged into a state of the merged target.
- the structure adjusting unit 117 can also calculate a correlation between states of the HMM stored in the model storing unit 115 using the observed time series data x rather than the adjusted time series data x′.
- FIG. 33A is a diagram of an HMM before the addition of a state is performed.
- the HMM is the same as that shown in FIG. 31A .
- FIG. 33B is a diagram of an HMM after the addition of a state is performed with the HMM shown in FIG. 33A set as a target.
- a state s 5 among states s 1 to s 6 of the HMM shown in FIG. 33A is set as a target to which a state is added.
- a new state s 7 is added to the state S 5 .
- the addition of a state is performed by adding the new state s 7 , in which bidirectional state transition to and from the state s 5 as the target to which a state is added can be performed.
- the structure adjusting unit 117 sets, concerning the new state s 7 , the self-transition and the state transition to and from the state s 5 as valid state transitions.
- the structure adjusting unit 117 sets, in the addition of a state, parameters of the new state s 7 to, so to speak, succeed parameters of the state s 5 as the target to which a state is added.
- the structure adjusting unit 117 applies normalization processing to necessary parameters of the HMM after the addition of a state and finishes processing for adding a state.
- the structure adjusting unit 117 applies normalization processing same as that in the case of the division of a state to the initial probability ⁇ i and the state transition probability a ij of the HMM after the addition of a state.
- a state as the target to which a state is added is not limited to one state.
- an arbitrary number “n” of states (n is equal to or larger than 1 and equal to or smaller than N) can be selected, for example, at random, out of the N states s 1 to s N of the HMM before the addition of a state.
- higher order “n” states having large dispersion ⁇ 2 defining the output probability density function bj(x), i.e., higher order “n” states with relatively large fluctuation in an observation value observed from the states can be selected out of the N states s 1 to s N of the HMM before the addition of a state.
- the number “n” of states as targets to which a state is added can be set at random or can be set to a fixed value. In both the cases, by the addition of a state, the structure of the HMM is updated to a structure in which the number of states increases by “n” from the number of states before the addition.
- the addition of a state and the division of a state explained with reference to FIGS. 31A and 31B are the same in that the number of states of the HMM increases.
- the addition of a state is different from the division of a state in that, whereas, in the addition of a state, a new state does not succeed state transitions of a state as a target to which a state is added, in the division of a state, a new state succeeds state transitions of a state as a division target.
- the new state is directly affected by, besides state transition to and from the state as the division target, other state transitions of the state as the division target.
- the new state is directly affected by only state transition to and from the state as the target to which a state is added.
- FIG. 34A is a diagram of an HMM before the addition of state transition is performed.
- the HMM is the same as that shown in FIG. 31A .
- FIG. 34B is a diagram of an HMM after the addition of state transition is performed with the HMM shown in FIG. 34A set as a target.
- state transition is performed to solve a problem in that state transitions are insufficient for appropriately representing a modeling target in the structure of the HMM stored in the model storing unit 115 .
- sparse state transition is set as an initial structure of the HMM by the initial-structure setting unit 116 , it is important to add state transition necessary for appropriate representation of the modeling target.
- the states s 4 and s 6 are set as targets of addition of state transition.
- Bidirectional state transition is added between the states s 4 and s 6 as the targets of addition of state transition.
- the structure adjusting unit 117 applies normalization processing to necessary parameters of the HMM after the addition of state transition and finishes processing for the addition of state transition.
- the structure adjusting unit 117 applies normalization processing same as that in the case of the division of a state to a state transition probability aij after the addition of state transition.
- addition target set a set of two states as targets of addition of state transition (hereinafter also referred to as addition target set) is not limited to one set.
- a pair of states to be set as the addition target set for example, a pair of higher order “n” (n is a value equal to or larger than 1) states having a large correlation between the states can be selected out of pairs of states, in which bidirectional state transition is not possible, in N states s 1 to s N of the HMM before the addition of state transition.
- a pair of states having a large correlation are selected as the addition target set among the states in which bidirectional state transition is not possible.
- the two states are mechanically connected by state transition.
- the number “n” of pairs of states to be set as the addition target set can be set at random or can be set to a fixed value. In both the cases, the structure of the HMM is updated to a slightly complicated structure in which the number of states does not change but state transitions increases by “n”.
- FIG. 35A is a diagram of an HMM before the deletion of a state is performed.
- the HMM has nine states s 1 , s 2 , s 3 , s 4 , s 5 , s 6 , s 7 , s 8 , and s 9 .
- FIG. 35B is a diagram of an HMM after the deletion of a state is performed with the HMM shown in FIG. 35A set as a target.
- the deletion of a state is performed to delete a state unnecessary for appropriately representing a modeling target.
- FIG. 35B for example, the state s 5 among the states s 1 to s 9 of the HMM shown in FIG. 35A is deleted.
- the deletion of a state is performed by deleting the state s 5 as the target of deletion and state transitions possible from the state s 5 (including state transitions to the state s 5 ).
- the structure adjusting unit 117 deletes, concerning the state s 5 as the target of deletion, the state s 5 , state transitions between the state s 5 and the states s 2 , s 4 , s 6 , and s 8 and the self-transition of the state s 5 .
- the state adjusting unit 117 applies normalization processing to necessary parameters of the HMM after the deletion of a state and finishes processing for the deletion of a state.
- the structure adjusting unit 117 applies normalization processing same as that in the case of the division of a state to an initial probability ⁇ i after the deletion of a state and a state transition probability a ij .
- the structure adjusting unit 117 selects a state to be set as a target of deletion, for example, as explained below.
- the adjusted time series data x′ used for estimation of parameters of the HMM in the parameter estimating unit 113 is supplied to the structure adjusting unit 117 from the data adjusting unit 112 .
- the structure adjusting unit 117 determines, concerning the adjusted time series data x′, a sequence of states s 1 ′, s 2 ′, . . . , s T′ ′ and then detects a state not forming the maximum likelihood path (a state not included in the maximum likelihood path) among the states of the HMM.
- the structure adjusting unit 117 selects the state s 5 not forming the maximum likelihood path among the states s 1 to s 9 forming the HMM as a state to be set as a target of deletion.
- the structure adjusting unit 117 deletes the state s 5 selected as the target of deletion. Consequently, adjustment of the structure for changing the HMM shown in FIG. 35A to the HMM shown in FIG. 35B is performed.
- the structure adjusting unit 117 performs, as the adjustment of the structure of the HMM, deletion of state transition besides the division of a state, the merging of a state, the addition of a state, the addition of state transition, and the deletion of a state explained with reference to FIGS. 31A to 35B .
- the deletion of state transition is performed in the same manner as the deletion of a state.
- the structure adjusting unit 117 determines, concerning the adjusted time series data x′, a sequence of states s 1 ′, s 2 ′, . . . , s T′ ′ as a maximum likelihood path and selects state transition not forming the maximum likelihood path as state transition to be set as a target of deletion.
- the structure adjusting unit 117 deletes the state transition selected as the state transition to be set as a target of deletion, applies normalization processing same as that in the case of the division of a state to the state transition probability a ij of the HMM after the deletion of state transition, and finishes processing for the deletion of state transition.
- FIG. 36 is a flowchart for explaining processing (learning processing) by the data processing apparatus shown in FIG. 29 .
- a sensor signal from a modeling target is supplied to the time-series-data input unit 111 .
- the time-series-data input unit 111 directly sets, for example, the sensor signal observed from the modeling target as observed time series data x.
- the observed time series data x is supplied from the time-series-data input unit 111 to the data adjusting unit 112 . Besides, the observed time series data x is supplied to the initial-structure setting unit 116 as well and, as explained above, used for setting of the output probability density function b j (x) in the initial-structure setting unit 116 .
- step S 111 the initial-structure setting unit 116 performs initialization of an HMM.
- the initial-structure setting unit 116 initializes the structure of the HMM to an initial structure and sets parameters of the HMM having the initial structure (initial parameters).
- the initial-structure setting unit 116 sets, as the initial structure of the HMM, the number of states of the HMM and sets sparse state transition in the HMM having the number of states.
- the initial-structure setting unit 116 sets, in the HMM having the initial structure, initial values of the state transition probability a ij , the output probability density function b j (x), and the initial probability ⁇ i as the initial parameters.
- step S 111 supplies the observed time series data x to the data adjusting unit 112 .
- the processing proceeds to step S 113 .
- step S 113 the data adjusting unit 112 performs, as explained with reference to FIG. 29 , adjustment of the observed time series data x supplied from the time-series-data input unit 111 to obtain the adjusted time series data x′ and supplies the adjusted time series data x′ to the parameter estimating unit 113 .
- the processing proceeds to step S 114 .
- the adjusted time series data x′ is supplied to the structure adjusting unit 117 as well.
- step S 114 the parameter estimating unit 113 estimates, with the parameters of the HMM stored in the model storing unit 115 set as initial values, new parameters of the HMM with the Baum-Welch re-estimation method using the adjusted time series data x′ supplied from the data adjusting unit 112 .
- the parameter estimating unit 113 supplies the new parameters of the HMM to the model storing unit 115 and causes the model storing unit 115 to store the new parameters in a form of overwriting.
- the parameter estimating unit 113 increments the number of times of learning, which is reset to 0 during the start of the learning processing in FIG. 36 , by 1 and supplies the number of times of learning to the evaluating unit 114 .
- the parameter estimating unit 113 calculates likelihood of observation of the adjusted time series data x′ from the HMM defined by the new parameters ⁇ and supplies the likelihood to the evaluating unit 114 .
- the processing proceeds from step S 114 to step S 115 .
- step S 115 the evaluating unit 114 evaluates, on the basis of the likelihood and the number of times of learning supplied from the parameter estimating unit 113 , the HMM for which learning is performed i.e., the HMM for which the parameters ⁇ are estimated by the parameter estimating unit 113 and determines, on the basis of a result of the evaluation of the HMM, whether the learning of the HMM should be finished.
- step S 115 When it is determined in step S 115 that the learning of the HMM is not finished, the evaluating unit 114 requests the time-series-data input unit 111 , the data adjusting unit 112 , and the structure adjusting unit 117 to perform predetermined processing. The processing proceeds to step S 116 .
- step S 116 the structure adjusting unit 117 performs, in response to a request from the evaluating unit 114 , processing for adjusting the structure of the HMM stored in the model storing unit 115 using the adjusted time series data x′ supplied from the data adjusting unit 112 .
- the processing returns to step S 112 .
- step S 112 the time-series-data input unit 111 supplies the observed time series data x to the data adjusting unit 112 in response to a request from the evaluating unit 114 .
- the processing proceeds to step S 113 .
- step S 113 the data adjusting unit 112 performs, in response to a request from the evaluating unit 114 , adjustment of the observed time series data x supplied from the time-series-data input unit 111 as explained with reference to FIG. 29 to obtain the adjusted time series data x′. Thereafter, the processing explained above is repeated.
- the parameter estimating unit 113 estimates parameters of the HMM.
- the structure adjusting unit 117 adjusts the structure of the HMM defined by the parameters after the estimation. This processing is repeated.
- the data adjusting unit 112 performs down-sampling processing with the observed time series data x set as a target, for example, as explained with reference to FIG. 29 , to obtain the adjusted time series data x′.
- a sampling frequency for the adjusted time series data x′ is gradually changed from a small value to a large value as learning of the HMM proceeds.
- step S 115 when it is determined in step S 115 that the learning of the HMM is finished, the learning processing is finished.
- the structure of the HMM is initialized to sparse structure. Thereafter, the observed time series data x used for learning is adjusted according to the progress of learning, the adjusted time series data x′ is outputted, parameters of the HMM are estimated by using the adjusted time series data x′, and the structure of the HMM is adjusted. This processing is repeated.
- an HMM having a large number of states and a large number of state transitions is necessary for modeling of a complicated modeling target.
- the structure of the HMM is initialized to sparse structure, the observed time series data x is adjusted according to progress of learning, and the structure of the HMM is adjusted. Consequently, even if an HMM that appropriately represents a complicated modeling target is a large HMM, it is possible to correctly estimate parameters of the large HMM (estimate parameters estimated as correct).
- an HMM that appropriately represents the modeling target can be calculated.
- FIG. 37 is a flowchart for explaining details of processing performed by the structure adjusting unit 117 in step S 116 in FIG. 36 .
- step S 121 the structure adjusting unit 117 applies the division of a state explained with reference to FIGS. 31A and 31B to the HMM stored in the model storing unit 115 .
- the processing proceeds to step S 122 .
- step S 122 the structure adjusting unit 117 calculates a correlation among states forming the HMM after the division of a state using the adjusted time series data x′ supplied from the data adjusting unit 112 .
- the processing proceeds to step S 123 .
- step S 123 the structure adjusting unit 117 applies the merging of a state explained with reference to FIGS. 32A and 32B to the HMM after the division of a state on the basis of the correlation calculated in step S 122 .
- the processing proceeds to step S 124 .
- step S 124 the structure adjusting unit 117 applies the addition of state transition explained with reference to FIGS. 34A and 34B to the HMM after the merging of a state on the basis of the correlation calculated in step S 122 .
- the processing proceeds to step S 125 .
- step S 125 the structure adjusting unit 117 applies the addition of a state explained with reference to FIGS. 33A and 33B to the HMM after the addition of state transition.
- the processing proceeds to step S 126 .
- step S 126 the structure adjusting unit 117 calculates, concerning the adjusted time series data x′ supplied from the data adjusting unit 112 , a maximum likelihood path using the HMM after the addition of a state. The processing proceeds to step S 127 .
- step S 127 the structure adjusting unit 117 detects a state and state transition not forming the maximum likelihood path. Further, in step S 127 , the structure adjusting unit 117 deletes the state and the state transition not forming the maximum likelihood path as explained with reference to FIGS. 35A and 35B .
- the structure adjusting unit 117 updates the stored values in the model storing unit 115 with parameters of the HMM after the deletion of the state and the state transition.
- the processing returns to step S 121 .
- the structure adjusting unit 117 performs, concerning the HMM stored in the model storing unit 115 , six kinds of adjustment of the structure, i.e., the division of a state, the merging of a state, the addition of a state, the addition of state transition, the deletion of a state, and the deletion of state transition.
- the evaluating unit 114 requests the structure adjusting unit 117 to adjust the structure every time the number of times of learning increases by one.
- the structure adjusting unit 117 performs the adjustment of the structure of the HMM every time the number of times of learning increases by one.
- the adjustment of the structure of the HMM can be performed according to progress of the learning other than the increase in the number of times of learning by one.
- the evaluating unit 114 supplies the number of times of learning and the likelihood to the data adjusting unit 112 as progress state information representing a state of programs of the learning.
- the progress state information can be supplied to the structure adjusting unit 117 as well.
- the structure adjusting unit 117 performs the adjustment of the structure of the HMM according to the progress state information supplied from the evaluating unit 114 .
- the structure adjusting unit 117 it is possible to cause the structure adjusting unit 117 to perform the adjustment of the structure when the number of times of learning as the progress state information increases by a predetermined number of times from the number of times at the time of the last adjustment of the structure.
- the structure adjusting unit 117 can perform the adjustment of the structure, for example, when the likelihood as the progress state information falls from a value at the time of the last adjustment of the structure or when a ratio of an increase in the likelihood falls to be equal to or lower than a predetermined value.
- the adjustment of the structure of the HMM by the structure adjusting unit 117 does not guarantee that the structure of the HMM converges to an optimum structure that represents the modeling target.
- a state and state transition estimated as being appropriate for representing the modeling target are added and, on the other hand, a state and state transition estimated as being unnecessary for representing the modeling target are deleted. Therefore, even if a modeling target is a complicated modeling target, it is possible to obtain a large HMM that appropriately models the modeling target.
- the adjustment of the structure is performed in order of the division of a state, the merging of a state, the addition of state transition, the addition of a state, the deletion of a state, and the deletion of state transition.
- order of the adjustment of the structure is not limited to this.
- a range of the coordinates (x,y) of the two-dimensional space in which the robot could move was set in a range excluding areas of four blocks # 1 , # 2 , # 3 , and # 4 indicated by areas of the following formula in a range represented by ⁇ 100 ⁇ x ⁇ +100 and ⁇ 100 ⁇ y ⁇ +100.
- Block # 1 ⁇ 70 ⁇ x ⁇ 20, ⁇ 70 ⁇ y ⁇ 20
- Block # 2 ⁇ 70 ⁇ x ⁇ 20, +20 ⁇ y ⁇ +70
- Block # 3 +20 ⁇ x ⁇ +70, ⁇ 70 ⁇ y ⁇ 20
- Block # 4 +20 ⁇ x ⁇ +70, +20 ⁇ y ⁇ +70
- the robot was moved 10000 steps (times) in a movable range with an origin (0,0) set as a start position while a very small moving amount ( ⁇ x, ⁇ y) was sequentially determined at random.
- FIGS. 38A and 38B are diagrams of a moving locus of the robot.
- FIG. 38A is a diagram of a moving locus until the robot moved 200 steps from the start position (the origin).
- FIG. 38B is a diagram of a moving locus until the robot moved 10000 steps from the start position.
- black circles represent coordinates after the robot moved by the very small moving amount ( ⁇ x, ⁇ y).
- a moving locus is shown by connecting the black circles with straight lines in time order.
- FIGS. 38A and 38B it is seen that the robot moved at random in the entire movable range.
- the sequence of the coordinates (x,y) for 10000 steps was used as the observed time series data x.
- the movable range of the robot and the observed time series data x were the coordinates (x,y) in the two-dimensional space.
- the HMM having the sixteen states shown in FIG. 30A was adopted as the HMM having the initial structure.
- a normal distribution was adopted as the output probability density function b j (x) of the respective states s j of the HMM.
- the learning of the HMM was finished at a stage when the number of times of learning reached thirty-six.
- the observed time series data as the sequence of the coordinates (x,y) for 10000 steps i.e., the observed time series data including 10000 samples was used for the learning.
- the down-sampling processing was applied to the observed time series data including 10000 samples such that a sampling frequency fell to 1/10 of an original sampling frequency.
- Adjusted time series data including 1000 samples obtained as a result of the down-sampling processing was used for, for example, estimation of parameters of the HMM.
- the sampling frequency of the adjusted time series data was gradually increased such that the sampling frequency fell to 1/9, 1 ⁇ 8, 1/7, . . . and 1/1 of the original sampling frequency every time the number of times of learning increased by three.
- the adjusted time series data was the observed time series data itself.
- FIGS. 39A to 39C are diagrams of HMM obtained as a result of the learning.
- FIG. 39A is a diagram of an HMM at a point right after the learning is started (a learning initial period).
- FIG. 39B is a diagram of an HMM at a point when the learning progresses to some extent (a learning intermediate period).
- FIG. 39C is a diagram of an HMM after the learning is performed a sufficient number of times (after the learning ends).
- black circles represent coordinates (x,y) indicated by average vectors of the output probability density function b j ( ) of the states s j of the HMM and correspond to the states s j .
- FIGS. 39A to 39C an arrow representing the direction of state transition is not shown.
- states are arranged all over a movable range. State transition is present between states corresponding to two positions (coordinates), between which the system can move in a single (a constant) method of movement. Therefore, it is seen that an HMM that appropriately represents properties (characteristics) of a moving method of moving in a movable range of a two-dimensional space can be obtained.
- FIG. 40 is a graph of logarithmic likelihood (a logarithmic value of likelihood) calculated for the adjusted time series data from the HMM obtained as a result of the learning.
- learning is started from a rough HMM formed by sparse state transition given by the initial-structure setting unit 116 and the HMM is gradually detailed by the structure adjusting unit 117 according to progress of the learning.
- learning is started from macro characteristics of observed time series data and adjustment is performed by the data adjusting unit 112 to adjust the learning to gradually include micro characteristics according to the progress of the learning.
- the data processing apparatus shown in FIG. 29 can be applied to identification and control of a system (the system is one apparatus or a logical set of plural apparatuses; apparatuses of respective configurations do not always have to be present in the same housing) and learning of a state transition model used for artificial intelligence and the like.
- the data processing apparatus can be applied to, for example, learning for an autonomous agent or the like such as an autonomous robot to cognize (recognize) an environment and a state of the agent and perform a behavior corresponding to a result of the cognition.
- the data processing apparatus shown in FIG. 29 can be applied to learning of networks for a social system such as transportation, finance, and information, a physical system and a chemical system for physical phenomena and chemical reactions, a biological system related to living beings, and the like.
- the initial-structure setting unit 116 initializes the structure of the HMM to the sparse structure.
- the initial-structure setting unit 116 can initialize the structure of the HMM to, for example, Ergodic structure.
- the data adjusting unit 112 adjusts the observed time series data according to the progress of the learning. However, the adjustment of the observed time series data does not have to be performed. In this case, in the data processing apparatus shown in FIG. 29 , it is unnecessary to provide the data adjusting unit 112 .
- the series of processing explained above can be performed by hardware or can be performed by software.
- a program configuring the software is installed in a general-purpose computer or the like.
- FIG. 41 is a diagram of a configuration example of a computer according to an embodiment of the present invention in which the program for executing the series of processing is installed.
- the program can be recorded in advance in a hard disk 155 or a ROM 153 as a recording medium incorporated in the computer.
- the program can be temporarily or permanently stored (recorded) in a removable recording medium 161 such as a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto Optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, or a semiconductor memory.
- a removable recording medium 161 can be provided as so-called package software.
- the program can be transferred by radio from a download site to the computer via an artificial satellite for digital satellite broadcast or transferred by wire from the download site to the computer via a network such as a LAN (Local Area Network) or the Internet.
- the computer can receive the program transferred in that way in a communication unit 158 and install the program in the hard disk 155 incorporated therein.
- the computer incorporates a CPU (Central Processing Unit) 152 .
- An input and output interface 160 is connected to the CPU 152 via a bus 151 .
- a command is inputted to the CPU 152 by a user via the input and output interface 160 according to, for example, operation of an input unit 157 including a keyboard, a mouse, and a microphone.
- the CPU 152 executes the program stored in the ROM (Read Only Memory) 153 according to the command.
- the CPU 152 loads the program stored in the hard disk 155 , the program transferred from the satellite or the network, received by the communication unit 158 , and installed in the hard disk 155 , or the program read out from the removable recording medium 161 inserted in a drive 159 and installed in the hard disk 155 onto a RAM (Random Access Memory) 154 and executes the program. Consequently, the CPU 152 performs processing conforming to the flowcharts explained above or processing performed by the configurations shown in the block diagrams explained above.
- the CPU 152 outputs a result of the processing from an output unit 156 including an LCD (Liquid Crystal Display) and a speaker, transmits the processing result from the communication unit 158 , or records the processing result in the hard disk 155 via the input and output interface 160 according to necessity.
- an output unit 156 including an LCD (Liquid Crystal Display) and a speaker
- transmits the processing result from the communication unit 158 or records the processing result in the hard disk 155 via the input and output interface 160 according to necessity.
- FIG. 42 is a diagram of a functional configuration example of an information processing apparatus.
- the information processing apparatus shown in FIG. 42 includes a configuration concerning causality perception and a configuration for determining a behavior of a robot (an agent) on the basis of causality.
- a configuration concerning causality perception corresponds to the configuration of the causality unit 63 shown in FIG. 15 .
- the configuration for determining a behavior of a robot on the basis of causality corresponds to the configuration of the behavior control unit 64 shown in FIG. 15 .
- the information processing apparatus includes a causality-learning processing unit 201 a causality-estimation processing unit 202 , a causality-candidate-list storing unit 203 , a causality-candidate-list arrangement processing unit 204 , and a behavior determining unit 205 .
- the causality-learning processing unit 201 acquires HMMs of plural modals such as the distance HMM, the light HMM, and the energy HMM generated as explained above and performs causality learning.
- the causality-learning processing unit 201 outputs data obtained by performing the causality learning to the causality-estimation processing unit 202 .
- the causality-estimation processing unit 202 performs causality estimation using data supplied from the causality-learning processing unit 201 .
- the causality-estimation processing unit 202 causes the causality-candidate-list storing unit 203 to store a list representing causality obtained by performing the causality estimation.
- the causality among the events is represented by a conditional probability as explained later.
- Acquiring data used for calculating the conditional probability is referred to as causality learning.
- Calculating a conditional probability using the data acquired by the causality learning and estimating causality is referred to as causality estimation.
- Causality perception represents a state in which the causality among the events is perceived by the causality estimation.
- the causality-candidate-list arrangement processing unit 204 appropriately arranges a causality candidate list stored in the causality-candidate-list storing unit 203 .
- the behavior determining unit 205 determines a behavior with reference to the causality candidate list stored in the causality-candidate-list storing unit 203 .
- a behavior of the robot is controlled on the basis of a command representing the behavior determined by the behavior determining unit 205 .
- causality-estimation processing unit 202 basically, causality estimation is performed as explained below. Details of the causality estimation are explained later.
- events that could occur are classified into, at least in a range of experiences of the robot, a set A including events a1, a2, a3, and the like including the event a1 exclusive to one another and mutually exclusive and collectively exhaustive (MECE) and a set B as a set of the other events.
- a set A including events a1, a2, a3, and the like including the event a1 exclusive to one another and mutually exclusive and collectively exhaustive (MECE) and a set B as a set of the other events.
- MECE mutually exclusive and collectively exhaustive
- T ak ⁇ al
- P conditional probability
- ak,b) is represented by the following formula
- a conditional probability is calculated from values of N(T,ak,b) and N(ak,b).
- N(T,ak,b) represents the number of times the event ak and the event b simultaneously occur and the event al occurs at the next time.
- N(ak,b) represents the number of times the event ak and the event b simultaneously occur.
- the conditional probability calculated in this way has an error.
- the magnitude of the error is expected to be inversely proportional to ⁇ N(T,ak,b). Therefore, if the event b is controlled by, for example, changing granularity to keep N(T,ak, b) at a value in an appropriate range, the error can be reduced.
- the conditional probability is calculated by multiplying the numerator and the denominator with an attenuation ratio and evaluated. Therefore, when the attenuation ratio is set to 0.1, P(T
- ak,b) is a formula that gives likelihood of a model that “if (ak,b), T:ak ⁇ al occurs” under observation T:ak ⁇ al (transition T from the event ak to the event al).
- ak,b) when the conditional probability P(T
- an expected value of an error is calculated by using the number of trials in the past and a conditional probability at the present point and the behavior determination is performed optimistically because of the expected value, i.e., a conditional probability is increased by the expected value and used for the behavior determination.
- accuracy of the behavior determination is improved.
- step S 201 the causality-learning processing unit 201 acquires HMMs of plural modals and performs causality learning.
- the causality-learning processing unit 201 outputs data obtained by performing the causality learning to the causality-estimation processing unit 202 .
- step S 202 the causality-estimation processing unit 202 performs causality estimation using the data supplied from the causality-learning processing unit 201 .
- the causality-estimation processing unit 202 causes the causality-candidate list storing unit 203 to store a causality candidate list representing causality obtained by performing the causality estimation.
- step S 203 the causality-candidate-list arrangement processing unit 204 arranges the causality candidate list stored in the causality-candidate-list storing unit 203 and finishes the processing.
- S 2 5 represents that a second modal is in a state 5.
- a state of the system is represented by a state vector having the state numbers as elements.
- FIG. 44 is a diagram of an example of modals.
- modals 1 to 3 are shown.
- a value of M is 3.
- the modal 1 corresponds to the energy HMM
- the modal 2 corresponds to the light HMM
- the modal 3 corresponds to the distance HMM
- S i j corresponds to nodes of the HMMs.
- causality learning performed by the causality-learning processing unit 201 is explained.
- 0 is set as values of all counters and the counters are initialized.
- two counters i.e., an event occurrence counter and a transition occurrence counter are used.
- t is equal to or larger than 1 (t ⁇ 1).
- a state S t of the entire system at time t and a state s t ⁇ 1 , at the immediately preceding time are compared. Modals, states of which change, are listed.
- MaxCombi is a parameter for specifying complication of combinations of modals to be taken into account.
- An arbitrary natural number can be set as MaxCombi min(M ⁇ 1,MaxCombi) represents a smaller one of values of M ⁇ 1 and MaxCombi.
- Arbitrary one combination among M C L+1 combinations of modals at the time when L+1 modals are selected out of M modals is represented by cM(L+1;).
- State vectors representing states of respective modals of the arbitrary one combination at time t ⁇ 1 are represented by S cM(L+1;) t ⁇ 1 .
- the event occurrence counter is a counter for counting the number of times of occurrence of an event represented by a state vector corresponding thereto.
- Arbitrary one combination among M ⁇ 1 C L combinations of modals at the time when L modals are selected out of M ⁇ 1 modals other than the modals “i” is represented by cM(L;i).
- State vectors representing states of respective modals of the arbitrary one combination at time t ⁇ 1 are represented by S cM(L;i) t ⁇ 1 .
- the event occurrence counter corresponding to (S cM(L;i) t ⁇ 1
- T i ), which is a pair of S CM(L;i) t ⁇ 1 and state transition T i t ⁇ 1 (S i k(t ⁇ 1) ⁇ S i k(t) ) of the modals “i”, is counted up by 1.
- the transition occurrence counter is a counter for counting the number of times of occurrence of an event represented by a state vector corresponding thereto at timing immediately before occurrence of state transition for calculating causality.
- states of the modal 1 i.e., 1 and 2(S 1 1 , S 1 2 ).
- a state of the system changes with time as shown in FIG. 46 .
- a number 1 in the middle represents that a state of the modal 2 is the state 1.
- a number 1 at the bottom represents that a state of the modal 3 is the state 1.
- FIGS. 47A to 47D are diagrams of examples of event occurrence counters.
- M C L+1 combinations of modals at the time when L+1 modals are selected out of the three modals are ⁇ 1, 2 ⁇ , ⁇ 1, 3 ⁇ , ⁇ 2, 3 ⁇ , and ⁇ 1, 2, 3 ⁇ as shown on the left side.
- the combinations of modals ⁇ 1, 2 ⁇ , ⁇ 1, 3 ⁇ , ⁇ 2, 3 ⁇ , ⁇ 1, 2, 3 ⁇ correspond to cM(L+1;) explained above.
- the number of state vectors that could be taken is a number obtained by multiplying the number of elements with the number of states of the modals included in the combination of attention. Therefore, when attention is directed to modals ⁇ 1, 3 ⁇ , the number of state vectors that could be taken is six as shown in FIG. 47B . When attention is directed to the combination of the modals ⁇ 2, 3 ⁇ , the number of state vectors that could be taken is twelve as shown in FIG. 47C . When attention is directed to the combination of the modals ⁇ 1, 2, 3 ⁇ , the number of state vectors that can be taken is twenty-four as shown in FIG. 47D .
- Event occurrence counters are prepared in association with the respective state vectors. Therefore, in the case of this example, fifty event occurrence counters are prepared in total.
- FIGS. 48A to 48C are diagrams of examples of transition generation occurrence counters prepared in association with respective state transitions of the modal 1.
- the transition occurrence counters shown in FIGS. 48A to 48C are prepared in association with, 60 for example, bidirectional state transition between the states 1 and 2 of the modal 1.
- FIGS. 50A to 50C are diagrams of examples of transition occurrence counters prepared in association with respective state transitions of the modal 2.
- the transition occurrence counters shown in FIGS. 50A to 50C are prepared in association with bidirectional state transitions between the states 1 and 2, between the states 2 and 3, between the states 3 and 4, between the states 4 and 1, between the states 1 and 3, and between the states 2 and 4 of the modal 2.
- a value of L is set to 1 or 2
- arbitrary one combination among M ⁇ 1 C L combinations of modals at the time when L modals are selected out of modals other than the modal 2 is each of ⁇ 1 ⁇ , ⁇ 3 ⁇ , and ⁇ 1, 3 ⁇ as shown on the left side of FIGS. 50A to 50C .
- FIGS. 52A to 52C are diagrams of examples of transition occurrence counters prepared in association with respective state transitions of the modal 3.
- the transition occurrence counters shown in FIGS. 52A to 52C are prepared in association with bidirectional state transitions between the states 1 and 2, between the states 2 and 3, and between the states 3 and 1 of the modal 3.
- fourteen transition occurrence counters are prepared in total in association with respective state transitions of the modal 3.
- the transition occurrence counter is prepared to associate respective state transitions of a certain modal and all combinations of states of the other modals.
- the event occurrence counters are counted up.
- Attention is directed to each of ⁇ 1, 2 ⁇ , ⁇ 1, 3 ⁇ , ⁇ 2, 3 ⁇ , and ⁇ 1, 2, 3 ⁇ , which is arbitrary one combination, among M C L+1 combinations of modals. Even occurrence counters corresponding to state vectors representing states at the immediately preceding time of modals included in the combination of attention are counted up by 1.
- the state vectors representing the states at the immediately preceding time of the modals included in the combination of attention correspond to S cM(L+1;) t ⁇ 1 explained above.
- transition occurrence counters are counted up.
- the calculated pairs represent state vectors representing states at the immediately preceding time of the modals included in the combination of attention associated with the state transition (1 ⁇ 2) of the modal 2.
- state vectors are associated with the respective state transitions of the modal 2.
- State vectors representing states at the immediately preceding time of the modals included in the combination of attention correspond to S cM(L;i) t ⁇ 1 and the pairs correspond to (S cM(L;i) t ⁇ 1
- transition occurrence counters ( FIGS. 50A to 50C ) associated with the state vectors representing the states at the immediately preceding time of the modals included in the combination of attention associated with the state transition (1 ⁇ 2) of the modal 2 are counted up by 1.
- a modal, a state of which changes, is determined as not present.
- the event occurrence counters are counted up.
- Attention is directed to each of ⁇ 1, 2 ⁇ , ⁇ 1, 3 ⁇ , ⁇ 2, 3 ⁇ , and ⁇ 1, 2, 3 ⁇ , which is arbitrary one combination, among M C L+1 combinations of modals at the time when L+1 modals are selected out of three modals.
- the event occurrence counters corresponding to state vectors representing states at the immediately preceding time of the modals included in the combination of attention are counted up by 1.
- the event occurrence counters are counted up.
- Attention is directed to each of ⁇ 1, 2 ⁇ , ⁇ 1, 3 ⁇ , ⁇ 2, 3 ⁇ , and ⁇ 1, 2, 3 ⁇ , which is arbitrary one combination, among M C L+1 combinations of modals at the time when L+1 modals are selected out of the three modals. Even occurrence counters corresponding to state vectors representing states at the immediately preceding time of modals included in the combination of attention are counted up by 1.
- transition occurrence counters are counted up.
- transition occurrence counters ( FIGS. 48A to 48C ) associated with the state vectors representing the states at the immediately preceding time of the modals included in the combination of attention associated with the state transition (1 ⁇ 2) of the modal 1 are counted up by 1.
- the event occurrence counters are counted up.
- Attention is directed to each of ⁇ 1, 2 ⁇ , ⁇ 1, 3 ⁇ , ⁇ 2, 3 ⁇ , and ⁇ 1, 2, 3 ⁇ , which is arbitrary one combination, among M C L+1 combinations of modals at the time when L+1 modals are selected out of the three modals. Even occurrence counters corresponding to state vectors representing states at the immediately preceding time of modals included in the combination of attention are counted up by 1.
- transition occurrence counters are counted up. When two modals are listed, the same processing is repeated for the respective modals.
- transition occurrence counters ( FIGS. 50A to 50C ) associated with the state vectors representing the states at the immediately preceding time of the modals included in the combination of attention associated with the state transition (2 ⁇ 4) of the modal 2 are counted up by 1.
- transition occurrence counters ( FIGS. 52A to 52C ) associated with the state vectors representing the states at the immediately preceding time of the modals included in the combination of attention associated with the state transition (1 ⁇ 3) of the modal 3 are counted up by 1.
- the event occurrence counters are counted up.
- Attention is directed to each of ⁇ 1, 2 ⁇ , ⁇ 1, 3 ⁇ , ⁇ 2, 3 ⁇ , and ⁇ 1, 2, 3 ⁇ , which is arbitrary one combination, among M C L+1 combinations of modals at the time when L+1 modals are selected out of the three modals. Even occurrence counters corresponding to state vectors representing states at the immediately preceding time of modals included in the combination of attention are counted up by 1.
- transition occurrence counters are counted up.
- transition occurrence counters ( FIGS. 48A to 48C ) associated with the state vectors representing the states at the immediately preceding time of the modals included in the combination of attention associated with the state transition (2 ⁇ 3) of the modal 1 are counted up by 1.
- the causality learning is advanced by repeating the processing explained above.
- Information representing values of the event occurrence counters and values of the transition occurrence counters obtained by the causality learning is supplied from the causality-learning processing unit 201 to the causality-estimation processing unit 202 .
- the causality estimation performed by the causality-estimation processing unit 202 is explained.
- Arbitrary one combination among M ⁇ 1 C L combinations of modals at the time when L modals are selected out of M ⁇ 1 modals other than the modal “i” is represented as cM(L;i).
- a state vector pattern corresponding to cM(L;i) is represented as S cM(L;i) .
- S i k as a state of a transition source of the modal “i” is added as an element of the state vector S cM(L;i) j , whereby a state vector (S i k ,S cM(L;i) j ) is generated and a value N S of an event occurrence counter corresponding to the generated state vector (S i k ,S cM(L;i) j ) is acquired.
- S cM(L;i) j ) of a state transition T i corresponding to the state vector S cM(L;i) j is set as ⁇ 0 .
- ⁇ 0 is a fixed value equal to or larger than 0 and equal to or smaller than 1 that gives a minimum probability.
- Adding the value a to the provisional probability value p 0 represents setting a value obtained by optimistically considering an estimation error of a probability based on experiences as a final conditional probability P.
- An event of the state transition as the target of estimation of causality is an event having two values, i.e., whether the event occurs or not. Therefore, the event can be modeled by the Beroulli trial of the occurrence probability p.
- the value ⁇ as an estimation error is calculated by using an appropriate parameter ⁇ 0.
- the causality candidate list is a list of the state vector S cM(L;i) j having causality with the state transitions T i .
- the state vector S cM(L;i) j is associated for each of state transitions T i in order from one having a highest conditional probability P(T i
- a state represented by the state vector S cM(L;i) j is a causality candidate of the state transition T′.
- the state vector having the causality with the state transition (1 ⁇ 2) of the modal 2 is a state vector representing states of both the modal 1 and the modal 3 or a state vector representing one of the modal 1 and the modal 3. Therefore, when a value of L is set to 1 or 2 and arbitrary one combination of M ⁇ 1 C L combinations of modals at the time when L modals are selected out of the modals other than the modal 2 are considered, the combinations are ⁇ 1 ⁇ , ⁇ 3 ⁇ , and ⁇ 1, 3 ⁇ .
- the respective combinations of the modals correspond to cM (L;i).
- patterns of two state vectors corresponding to ⁇ 1 ⁇ , patterns of three state vectors corresponding to ⁇ 3 ⁇ , and patterns of six state vectors corresponding to ⁇ 1, 3 ⁇ correspond to the state vector pattern ScM(L;i) corresponding to cM(L;i).
- the state vectors shown in FIGS. 59A to 59C are the same as those shown in FIGS. 50A to 50C .
- [1 * ⁇ ] or [2 * ⁇ ] of patterns of two state vectors [1 * ⁇ ] and [2 * ⁇ ] corresponding to ⁇ 1 ⁇ corresponds to the state vector S cM(L;i) j .
- the following processing is performed with respective eleven state vectors S cM(L;i) j shown in FIGS. 59A to 59C set as targets.
- a conditional probability representing causality with the state transition (1 ⁇ 2) of the modal 2 is calculated.
- a conditional probability of the state transition (1 ⁇ 2) of the modal 2 with respect to the state vectors [1 * ⁇ ] and [1 * 1] shown in FIG. 60A among the eleven state vectors shown in FIGS. 59A to 59C is calculated.
- [1 * ⁇ ] When attention is directed to [1 * ⁇ ], as shown on the left side of FIG. 60B , 1 indicating a state of a transition source of the modal 2 is added as an element of [1 * ⁇ ], whereby [1 1 ⁇ ] is generated. [1 1 ⁇ ] corresponds to (S i ,S cM(L;i) j ).
- the value N S ( FIGS. 47A to 47D ) of an event occurrence counter corresponding to [1 1 ⁇ ] is acquired.
- the value N S represents the number of times the state 1 of the modal 1 and the state 1 of the modal 2 simultaneously occur.
- the value N S is acquired by causality learning.
- the value N T (the left side of FIG. 60C and FIG. 50A ) of a transition occurrence counter corresponding to [1 * ⁇ ] prepared in association with the state transition (1 ⁇ 2) of the modal 2 is acquired.
- the value N T of the transition occurrence counter represents the number of times the state 1 of the modal 1 and the state 1 of the modal 2 simultaneously occur immediately before time when the state transition (1 ⁇ 2) of the modal 2 occurs.
- the value N T is acquired by causality learning.
- a conditional probability of the state transition (1 ⁇ 2) of the modal 2 with respect to [1 * ⁇ ] is calculated on the basis of the value N S of the event occurrence counter and the value N T of the transition occurrence counter.
- ⁇ 0 is calculated as the conditional probability.
- the conditional probability is calculated according to Formula (3).
- N S of an event occurrence counter corresponding to [1 1 1] and the value N T of a transition occurrence counter corresponding to [1 * 1] (the right side of FIG. 60C and FIG. 50C ) prepared in association with the state transition (1 ⁇ 2) of the modal 2 are acquired.
- a conditional probability of the state transition (1 ⁇ 2) of the modal 2 with respect to [1 * 1] is calculated on the basis of the value N S of the event occurrence counter and the value N T of the transition occurrence counter.
- conditional probability calculated as explained above is registered in the causality candidate list in association with the state vectors as appropriate and stored in the causality-candidate-list storing unit 203 .
- the arrangement of the causality candidate list is merging of the state vectors registered in the causality candidate list.
- the arrangement of the causality candidate list corresponds to controlling the event b by, for example, changing granularity to keep N(T,ak,b) at a value in an appropriate range.
- the arrangement of the causality candidate list is performed at predetermined timing.
- a state vector S cM(L;) k defined as a pair of specific states in L modals is discussed below.
- Possibility of merging is determined between the state vector S cM(L;) k and a state vector (S cM(L;) k ,S i j ) obtained by adding a specific state S i j of the modal “i”, which is one modal not included in the L modals, to the state vector S cM(L;) k .
- the state vector S cM(L;) k and the state vector (S cM(L;) k ,S i j ) are state vectors registered in the causality candidate list in association with a conditional probability of the same state transition.
- the state vector (S cM(L;) k ,S i j ) is a state vector obtained by adding S i j to the state vector S cM(L;) k . Therefore, it can be said that, conceptually, the state vector S cM(L;) k is a state vector higher in order than the state vector (S cM(L;) k ,S i j ).
- the determination of possibility of merging is determination concerning whether the low order state vector is included in the high order state vector and considered the same.
- a conditional probability P of target state transition with respect to the state vector S cM(L;) k is represented by Formula (5).
- a conditional probability P′ of the same state transition with respect to the state vector (S cM(L;) k ,S i j ) is represented by Formula (6).
- Such determination of possibility of merging is performed between the state vector S cM(L;) k and all n i state vectors (S cM(L;) k ,S i j ) of the modal “i” obtained by adding the specific state S i j to the state vector (S cM(L;) k .
- the state vector S cM(L;) k is deleted from the causality candidate list. State vectors conceptually low in order remain in the causality candidate list.
- [1 * ⁇ ] is a state vector registered in the causality candidate list as a state vector representing a state of a causality candidate of certain state transition of the modal 2 .
- [1 * ⁇ ] corresponds to S cM(L;) k .
- a conditional probability of certain state transition of the modal 2 with respect to [1 * ⁇ ] is calculated according to Formula (5).
- a conditional probability of the same state transition of the modal 2 with respect to each of [1 * 1], [1 * 2], and [1 * 3] is calculated according to Formula (6).
- the method of arranging causality (arrangement of the state vectors of the causality candidate list) is applied.
- a merging coefficient ⁇ is set to 1.
- female) when P(curing
- causality effective at present is only “difference between male and female ⁇ presence or absence of curing”. It is difficult to conclude whether the curing is effective or has a side effect.
- Behavior determination processing performed by the behavior determining unit 205 by using the causality candidate list appropriately arranged as explained above and stored in the causality-candidate-list storing unit 203 is explained with reference to a flowchart of FIG. 62 .
- step S 211 the behavior determining unit 205 acquires a target value.
- the target value is, for example, a value representing one state of a certain modal to be set as a target.
- the behavior determining unit 205 reads out the causality candidate list stored in the causality-candidate-list storing unit 203 and determines a behavior for transitioning a state of the modal to the state represented by the target value. For example, the behavior determining unit 205 determines transitions from a present state of the modal to the state of the target value and acquires a predetermined number of causality candidates of the respective transitions out of causality candidates registered in the causality candidate list in order from one with a highest conditional probability.
- the behavior determining unit 205 causes a robot to perform a behavior for transitioning states of other modals to a state represented by a state vector that is one causality candidate having the highest conditional probability or a conditional probability equal to or higher than a fixed level selected out of the acquired causality candidates.
- FIG. 63 it is possible to transition a state of energy of the robot from a present state S 1 to a state S 2 and increase the energy by transitioning a state of the optical sensor and a state of the distance sensor to predetermined states, respectively.
- the energy can be increased by bringing the state of the optical sensor to a state at the time when the robot is present around light.
- the abscissa indicates the energy.
- FIG. 64 it is possible to transition a state of energy of the robot from a present state S 11 to a state S 12 and reduce the energy by transitioning the state of the optical sensor and the state of the distance sensor to predetermined states, respectively.
- the energy can be reduced by bringing the state of the optical sensor to a state at the time when the robot is present in a position to which light does not reach.
- ak,b) such that behavior determination is performed by taking into account the number of times of simultaneous occurrence of the events and the expected value a of estimated error estimated from the conditional probability.
- ak,b) a conditional probability
- FIG. 65 is a diagram of results obtained by calculating optimality of a behavior of a robot by adopting, as a method for solving the tradeoff between the use and the search of causality, the method of using a conditional probability with a probability increased by the expected value a and the methods in the past, i.e., the random method, the ⁇ -greedy method, and the Soft-max method.
- the abscissa represents the number of experiences and the ordinate represents optimality of a behavior.
- a curve L 1 represents a result obtained by using the method of using the conditional probability with a probability increased by the expected value ⁇ and a curve L 2 represents a result obtained by using the Soft-max method.
- a curve L 3 represents a result obtained by using the ⁇ -greedy method and a curve L 4 represents a result obtained by using subspecies of the ⁇ -greedy method for reducing a parameter ⁇ as time elapses.
- a curve L 5 represents a result obtained by using the random method. As shown in FIG. 65 , according to the method of using the conditional probability with a probability increased by the expected value ⁇ , a result better than results of the other methods can be obtained.
- parameter tuning is necessary in the other methods in the past, parameter tuning is unnecessary in the method of using the conditional probability with a probability increased by the expected value ⁇ . Therefore, it can be said that the method of using the conditional probability with a probability increased by the expected value ⁇ is practical.
- the series of processing explained above can be performed by hardware or can be performed by software.
- a program configuring the software is installed, from a program recording medium, in a computer incorporated in dedicated hardware, a general-purpose personal computer that can execute various functions by installing various programs, or the like.
- FIG. 66 is a block diagram of a configuration example of hardware of a computer that executes the series of processing according to a program.
- a CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- An input and output interface 215 is connected to the bus 214 .
- An input unit 216 including a keyboard, a mouse, and a microphone, an output unit 217 including a display and a speaker, a storing unit 218 including a hard disk and a non-volatile memory, a communication unit 219 including a network interface, and a drive 220 that drives a removable medium 221 such as an optical disk or a semiconductor memory are connected to the input and output interface 215 .
- the CPU 211 loads, for example, a program stored in the storing unit 218 onto the RAM 213 via the input and output interface 215 and the bus 214 and executes the program, whereby the series of processing is performed.
- the program executed by the CPU 211 is provided by, for example, being recorded in the removable medium 221 or transmitted via a wired or wireless transmission medium such as a local area network, the Internet, or a digital broadcast and is installed in the storing unit 218 .
- the program executed by the computer may be a program for performing processing in time series according to the order explained in this specification or may be a program for performing processing in parallel or at necessary timing such as when the program is invoked.
- Embodiments of the present invention are not limited to the embodiment explained above. Various modifications are possible without departing from the spirit of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Automation & Control Theory (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
Description
μ=(1/T)Σxt
σ2=(1/T)Σ(xt−μ)2
Σπj=1
Σaij=1(i=1, 2, . . . N)
pi*pj=Σpi(t)pj(t)
−100<x<+100 and −100<y<+100.
P(T\ak,b)=P(T,ak,b)/P(ak,b)=≡N(T,ak,b)/N(ak,b)
p=max(0.5,p−σ) . . . p>0.5
p=min(0.5,p+σ) . . . otherwise (4)
|p0−p′0|>α(σ+σ′) . . . Merging is difficult Otherwise . . . Merging is possible (7)
P(curing|treat) = 0.5 | σ = 0.079 | ||
P(curing|not treat) = 0.4 | σ = 0.078 | ||
P(curing|treat, male) = 0.6 | σ = 0.089 | ||
P(curing|treat, female) = 0.2 | σ = 0.13 | ||
P(curing|not treat, male) = 0.7 | σ = 0.14 | ||
P(curing|not treat, female) = 0.3 | σ = 0.084 | ||
P(curing|male) = 0.63 | σ = 0.077 | ||
P(curing|female) = 0.28 | σ = 0.071 | ||
|P(curing|treat)−P(curing|treat,male)|=0.1<(0.079+0.089)=0.17
|P(curing|treat)−P(curing|treat,female)|=0.3>(0.079+0.13)=0.21
|P(curing|not treat)−P(curing|not treat,male)|=0.3>(0.078+0.14)=0.22
|P(curing|not treat)−P(curing|not treat,female)|=0.1<(0.078+0.084)=0.16
Therefore, it is difficult to merge P(curing|not treat, male) and P(curing|not treat, female). P(curing|treat) is deleted.
|P(curing|male)−P(curing|treat,male)|=0.03<(0.077+0.089)=0.17
|P(curing|male)−P(curing|not treat,male)|=0.07<(0.077+0.14)=0.22
|P(curing|female)−P(curing|treat,female)|=0.08<(0.071+0.13)=0.20
|P(curing|female)−P(curing|not treat,female)|=0.02<(0.071+0.084)=0.15
|P(curing|treat)−P(curing|treat,male)|=0.1<(0.0079+0.0089)=0.017
|P(curing|treat)−P(curing|treat,female)|=0.3>(0.0079+0.013)=0.021
|P(curing|not treat)−P(curing|not treat,male)|=0.3>(0.0078+0.014)=0.0022
|P(curing|not treat)−P(curing|not treat,female)|=0.1<(0.0078+0.0084)=0.016
Therefore, it is difficult to merge P(curing|not treat, male) and P(curing|not treat, female). P(curing|treat) is deleted.
|P(curing|male)−P(curing|treat,male)|=0.03<(0.0077+0.0089)=0.017
|P(curing|male)−P(curing|not treat,male)|=0.07<(0.0077+0.014)=0.022
|P(curing|female)−P(curing|treat,female)|=0.08>(0.0071+0.013)=0.020
|P(curing|female)−P(curing|not treat,female)|=0.02>(0.0071+0.0084)=0.015
P(curing|treat, male) = 0.6 | σ = 0.0089 | ||
P(curing|treat, female) = 0.2 | σ = 0.013 | ||
P(curing|not treat, male) = 0.7 | σ = 0.014 | ||
P(curing|not treat, female) = 0.3 | σ = 0.0084 | ||
Claims (30)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/927,708 USRE46186E1 (en) | 2008-03-13 | 2013-06-26 | Information processing apparatus, information processing method, and computer program for controlling state transition |
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2008064994A JP4596024B2 (en) | 2008-03-13 | 2008-03-13 | Information processing apparatus and method, and program |
JPP2008-064995 | 2008-03-13 | ||
JPP2008-064994 | 2008-03-13 | ||
JPP2008-064993 | 2008-03-13 | ||
JP2008064993A JP4683308B2 (en) | 2008-03-13 | 2008-03-13 | Learning device, learning method, and program |
JP2008064995A JP4687732B2 (en) | 2008-03-13 | 2008-03-13 | Information processing apparatus, information processing method, and program |
US12/381,499 US8290885B2 (en) | 2008-03-13 | 2009-03-12 | Information processing apparatus, information processing method, and computer program |
US13/927,708 USRE46186E1 (en) | 2008-03-13 | 2013-06-26 | Information processing apparatus, information processing method, and computer program for controlling state transition |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/381,499 Reissue US8290885B2 (en) | 2008-03-13 | 2009-03-12 | Information processing apparatus, information processing method, and computer program |
Publications (1)
Publication Number | Publication Date |
---|---|
USRE46186E1 true USRE46186E1 (en) | 2016-10-25 |
Family
ID=41063910
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/381,499 Ceased US8290885B2 (en) | 2008-03-13 | 2009-03-12 | Information processing apparatus, information processing method, and computer program |
US13/927,708 Expired - Fee Related USRE46186E1 (en) | 2008-03-13 | 2013-06-26 | Information processing apparatus, information processing method, and computer program for controlling state transition |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/381,499 Ceased US8290885B2 (en) | 2008-03-13 | 2009-03-12 | Information processing apparatus, information processing method, and computer program |
Country Status (1)
Country | Link |
---|---|
US (2) | US8290885B2 (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5633734B2 (en) * | 2009-11-11 | 2014-12-03 | ソニー株式会社 | Information processing apparatus, information processing method, and program |
JP2011243088A (en) * | 2010-05-20 | 2011-12-01 | Sony Corp | Data processor, data processing method and program |
JP2012003494A (en) * | 2010-06-16 | 2012-01-05 | Sony Corp | Information processing device, information processing method and program |
JP2013058059A (en) * | 2011-09-08 | 2013-03-28 | Sony Corp | Information processing apparatus, information processing method and program |
JP5951802B2 (en) * | 2012-02-02 | 2016-07-13 | タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited | System and method for identifying and analyzing a user's personal context |
JP5908350B2 (en) | 2012-06-21 | 2016-04-26 | 本田技研工業株式会社 | Behavior control system |
US9542644B2 (en) | 2013-08-13 | 2017-01-10 | Qualcomm Incorporated | Methods and apparatus for modulating the training of a neural device |
JP6225927B2 (en) | 2015-02-02 | 2017-11-08 | トヨタ自動車株式会社 | Vehicle state prediction system |
US11562287B2 (en) | 2017-10-27 | 2023-01-24 | Salesforce.Com, Inc. | Hierarchical and interpretable skill acquisition in multi-task reinforcement learning |
US10573295B2 (en) * | 2017-10-27 | 2020-02-25 | Salesforce.Com, Inc. | End-to-end speech recognition with policy learning |
JP6962964B2 (en) * | 2019-04-15 | 2021-11-05 | ファナック株式会社 | Machine learning device, screen prediction device, and control device |
US11415975B2 (en) * | 2019-09-09 | 2022-08-16 | General Electric Company | Deep causality learning for event diagnosis on industrial time-series data |
JP7332425B2 (en) * | 2019-10-17 | 2023-08-23 | 株式会社日立製作所 | computer system |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0554068A (en) | 1991-08-29 | 1993-03-05 | Toshiba Corp | Speech recognizing system |
US5995963A (en) * | 1996-06-27 | 1999-11-30 | Fujitsu Limited | Apparatus and method of multi-string matching based on sparse state transition list |
US6212510B1 (en) * | 1998-01-30 | 2001-04-03 | Mitsubishi Electric Research Laboratories, Inc. | Method for minimizing entropy in hidden Markov models of physical signals |
US20020165717A1 (en) * | 2001-04-06 | 2002-11-07 | Solmer Robert P. | Efficient method for information extraction |
US6801656B1 (en) * | 2000-11-06 | 2004-10-05 | Koninklijke Philips Electronics N.V. | Method and apparatus for determining a number of states for a hidden Markov model in a signal processing system |
US20050004786A1 (en) * | 2002-11-16 | 2005-01-06 | Koninklijke Philips Electronics N.V. | State machine modelling |
US20050256817A1 (en) * | 2004-05-12 | 2005-11-17 | Wren Christopher R | Determining temporal patterns in sensed data sequences by hierarchical decomposition of hidden Markov models |
US7076102B2 (en) * | 2001-09-27 | 2006-07-11 | Koninklijke Philips Electronics N.V. | Video monitoring system employing hierarchical hidden markov model (HMM) event learning and classification |
US20060241927A1 (en) * | 2005-04-25 | 2006-10-26 | Shubha Kadambe | System and method for signal prediction |
US20060248026A1 (en) * | 2005-04-05 | 2006-11-02 | Kazumi Aoyama | Method and apparatus for learning data, method and apparatus for generating data, and computer program |
US7203635B2 (en) * | 2002-06-27 | 2007-04-10 | Microsoft Corporation | Layered models for context awareness |
US7260558B1 (en) * | 2004-10-25 | 2007-08-21 | Hi/Fn, Inc. | Simultaneously searching for a plurality of patterns definable by complex expressions, and efficiently generating data for such searching |
US20080300879A1 (en) * | 2007-06-01 | 2008-12-04 | Xerox Corporation | Factorial hidden markov model with discrete observations |
US20090018877A1 (en) * | 2007-07-10 | 2009-01-15 | Openconnect Systems Incorporated | System and Method for Modeling Business Processes |
JP5054068B2 (en) | 2008-06-04 | 2012-10-24 | ツィンファ ユニバーシティ | Method for producing carbon nanotube film |
-
2009
- 2009-03-12 US US12/381,499 patent/US8290885B2/en not_active Ceased
-
2013
- 2013-06-26 US US13/927,708 patent/USRE46186E1/en not_active Expired - Fee Related
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0554068A (en) | 1991-08-29 | 1993-03-05 | Toshiba Corp | Speech recognizing system |
US5995963A (en) * | 1996-06-27 | 1999-11-30 | Fujitsu Limited | Apparatus and method of multi-string matching based on sparse state transition list |
US6212510B1 (en) * | 1998-01-30 | 2001-04-03 | Mitsubishi Electric Research Laboratories, Inc. | Method for minimizing entropy in hidden Markov models of physical signals |
US6801656B1 (en) * | 2000-11-06 | 2004-10-05 | Koninklijke Philips Electronics N.V. | Method and apparatus for determining a number of states for a hidden Markov model in a signal processing system |
US20020165717A1 (en) * | 2001-04-06 | 2002-11-07 | Solmer Robert P. | Efficient method for information extraction |
US7076102B2 (en) * | 2001-09-27 | 2006-07-11 | Koninklijke Philips Electronics N.V. | Video monitoring system employing hierarchical hidden markov model (HMM) event learning and classification |
US7203635B2 (en) * | 2002-06-27 | 2007-04-10 | Microsoft Corporation | Layered models for context awareness |
US20050004786A1 (en) * | 2002-11-16 | 2005-01-06 | Koninklijke Philips Electronics N.V. | State machine modelling |
US20050256817A1 (en) * | 2004-05-12 | 2005-11-17 | Wren Christopher R | Determining temporal patterns in sensed data sequences by hierarchical decomposition of hidden Markov models |
US7260558B1 (en) * | 2004-10-25 | 2007-08-21 | Hi/Fn, Inc. | Simultaneously searching for a plurality of patterns definable by complex expressions, and efficiently generating data for such searching |
US20060248026A1 (en) * | 2005-04-05 | 2006-11-02 | Kazumi Aoyama | Method and apparatus for learning data, method and apparatus for generating data, and computer program |
US20060241927A1 (en) * | 2005-04-25 | 2006-10-26 | Shubha Kadambe | System and method for signal prediction |
US20080300879A1 (en) * | 2007-06-01 | 2008-12-04 | Xerox Corporation | Factorial hidden markov model with discrete observations |
US20090018877A1 (en) * | 2007-07-10 | 2009-01-15 | Openconnect Systems Incorporated | System and Method for Modeling Business Processes |
JP5054068B2 (en) | 2008-06-04 | 2012-10-24 | ツィンファ ユニバーシティ | Method for producing carbon nanotube film |
Non-Patent Citations (6)
Title |
---|
B. Fritzke, "Growing Grid-a self-organizing network with constant neighborhood range and adaptation strength", Neural Processing Letters, vol. 2, No. 5, pp. 9-13 (1995). |
Brants, T., Estimating Markov model structures, Fourth International Conference on Spoken Language, 1996, ICSLP 96, Proceedings, Oct. 3, 1996 No. 2, pp. 893-896. |
English translation of Japanese Office Action issued on Dec. 8, 2009, issued in Japanese Patent Application No. 2008-064993. |
Ikeda, Shiro, Generation of the phoneme model by the structure search of HMM, Institute of Electronics, Information and Communication Engineers article magazine, Japan, Corporate judicial person Institute of Electronics, Information and Communication Engineers, Jan. 25, 1995, vol. J78-D-II, No. 1, pp. 10-18. |
Itsuki, Noda, "Hidden Markov Modeling for Multi-agent Systems," PRICAI 2002, LNAI 2417, pp. 128-137, 2002. Springer-Verlag Berlin Heidelberg 2002. |
Tatsuya Akutsu,Mathematical Algorithms of Bioinformatics, publication dated Feb. 15, 2007, pp. 62-71. (Hidden Markov Model). |
Also Published As
Publication number | Publication date |
---|---|
US20090234467A1 (en) | 2009-09-17 |
US8290885B2 (en) | 2012-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
USRE46186E1 (en) | Information processing apparatus, information processing method, and computer program for controlling state transition | |
US11586925B2 (en) | Neural network recogntion and training method and apparatus | |
KR102410820B1 (en) | Method and apparatus for recognizing based on neural network and for training the neural network | |
US8326780B2 (en) | Smoothed sarsa: reinforcement learning for robot delivery tasks | |
JP6483667B2 (en) | System and method for performing Bayesian optimization | |
KR20190028531A (en) | Training machine learning models for multiple machine learning tasks | |
KR102355489B1 (en) | Method for predicting drug-target protein interactions and device thereof | |
US20110288835A1 (en) | Data processing device, data processing method and program | |
US10783452B2 (en) | Learning apparatus and method for learning a model corresponding to a function changing in time series | |
JP4683308B2 (en) | Learning device, learning method, and program | |
KR102293791B1 (en) | Electronic device, method, and computer readable medium for simulation of semiconductor device | |
US20210182545A1 (en) | Apparatus and method for controlling electronic device | |
KR20180046172A (en) | System and method for searching optimal solution based on multi-level statistical machine learning | |
JP4596024B2 (en) | Information processing apparatus and method, and program | |
JP4687732B2 (en) | Information processing apparatus, information processing method, and program | |
Rottmann et al. | Adaptive autonomous control using online value iteration with gaussian processes | |
Werner et al. | Topological map induction using neighbourhood information of places | |
Infantes et al. | Learning the behavior model of a robot | |
Harithas et al. | Cco-voxel: Chance constrained optimization over uncertain voxel-grid representation for safe trajectory planning | |
KR20190031786A (en) | Electronic device and method of obtaining feedback information thereof | |
US20210248442A1 (en) | Computing device and method using a neural network to predict values of an input variable of a software | |
Alyoubi et al. | Connotation of fuzzy logic system in Underwater communication systems for navy applications with data indulgence route | |
Bai et al. | Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation | |
US20240176311A1 (en) | Method and apparatus for performing optimal control based on dynamic model | |
US20240241486A1 (en) | Method and apparatus for performing optimal control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SONY CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SABE, KOHTARO;MINAMINO, KATSUKI;KAWAMOTO, KENTA;AND OTHERS;SIGNING DATES FROM 20090121 TO 20090122;REEL/FRAME:034856/0021 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |