WO2006073025A1

WO2006073025A1 - Information processing system, information processing method, and program

Info

Publication number: WO2006073025A1
Application number: PCT/JP2005/021062
Authority: WO
Inventors: Shigeki Sugano; Chyon Hae Kim; Tetsuya Ogata
Original assignee: Waseda University
Priority date: 2004-12-15
Filing date: 2005-11-16
Publication date: 2006-07-13
Also published as: JP2006172141A; JP4472506B2

Abstract

There are provided an information processing system, an information processing method, and a program capable of performing effective autonomous control in a short time. The information processing system (10) generates an intensification signal to be given to a network (20) according to an evaluation result of the state of a control object such as a robot (30) and transmits the intensification signal from constitution elements of the network (20) (nodes (21, 22, 23) formed by logic circuits and a link (24)) to other constituent elements. Here, the intensification signal to be given to the constituent element of the transmission destination is generated according to the I/O state of the constituent element of the transmission source and/or the transmission destination. Each of the constituent elements is generated or deleted by using an accumulation value of the intensification signal given to each of the constituent elements, thereby autonomously changing the structure of the network (20).

Description

Specification

Information processing system, information processing method, and program

Technical field

[0001] The present invention relates to an information processing system, an information processing method, and a program using a network including a plurality of nodes that perform information processing and a link that links these nodes and transmits information between the nodes as constituent elements For example, it can be used when performing robot motion control, game character motion control on the display screen, air conditioning management, and the like.

Background art

[0002] In the current fields of machine control and information processing, including the development of intelligent robots, the creation of learners for autonomous control has become a major issue. Necessary conditions for the learner include (1) autonomous search for various outputs, (2) application to arbitrary tasks, (3) small computational cost, and (4) learning by reusing existing knowledge. (5) Response to time series can be considered, but the creation of learners that satisfy all of these conditions has not yet been achieved.

[0003] In general, as a method for creating a learning device used for autonomous control, there is a typical method using the following reinforcement signal used in the field of reinforcement learning. In this method, an input from the outside world is given to the learning device, and an enhancement signal (a signal corresponding to a reward if positive, a punishment if negative) is sent from the outside world to the learning device as an evaluation of the output generated at that time. By giving it, the behavior of the learner is improved. Among various learners created by this method, there is a learner created by a learning method called neurogenetic learning, which satisfies the conditions (1), (2), and (5) described above simultaneously. It is known that you can. This neurogenetic learning learner is constructed by a neural network that mimics a neural network. In constructing a neural network, virtual genes are used, and the genes are selected according to the reinforcement signal to promote network evolution and enhance the performance of \ / O processing.

[0004] Also, a reconfigurable circuit is provided, and the adaptability of the circuit to the environment is evaluated and evaluated. There is an autonomous evolution system in which the hardware configuration changes autonomously in response to environmental changes by changing the circuit configuration based on the evaluation results and evolving it (see Patent Document 1)

) o

[0005] Furthermore, there is a signal processing device that uses a neural network learning method that optimizes the coupling coefficient between neuron units (see Patent Documents 2 and 3).

Patent Document 1: Japanese Patent Laid-Open No. 10-307805 (Claim 1, FIG. 1, Abstract)

Patent Document 2: Japanese Patent Laid-Open No. 5-73705 (Claim 1, FIG. 1, Abstract)

Patent Document 3: Japanese Patent Laid-Open No. 4 336656 (Claim 1, FIG. 1, Abstract)

Disclosure of the invention

Problems to be solved by the invention

[0007] However, with the above-described learner using Eurogenetic learning, the network is evaluated as a whole in order to promote the evolution of the network. In addition to increasing costs, it is unclear whether learning that reuses previous learning results as existing knowledge is performed when the environment or tasks change.

[0008] In the autonomous evolution system described in Patent Document 1 described above, the evolution method is a method for evaluating the entire network and generating and generating the entire network. In other words, a change in the circuit configuration based on the evaluation result can be regarded as a replacement of the entire circuit configuration with another configuration, and even if the result is only a partial change in the circuit configuration, This is not a change based on the result of evaluating a part of the result, but a change of the result of evaluating the entire circuit configuration. Therefore, there is a problem that the evaluation period becomes long. In this respect, as will be described later, the evaluation period is extremely short because the evaluation, generation, and selection are performed in units of network elements rather than the entire network.

[0009] Further, in the signal processing apparatus using the -Ural network learning method described in Patent Documents 2 and 3 described above, the power for optimizing the coupling coefficient between neuron units In the case of an optimization method, normally, when constructing a network, the network structure is determined by the builder's prior knowledge regarding the environment and tasks in which the network is used, and the determined structure is determined. Optimization in is done. In other words, the coupling coefficient is optimized without changing the network structure. Therefore, created The learner to be used has high ability for a specific environment and task. It is difficult to use in any environment and task. In this respect, the present invention differs from the present invention in which the network structure itself is autonomously changed and optimized by optimizing the coupling coefficient in the determined network structure.

An object of the present invention is to provide an information processing system, an information processing method, and a program capable of performing effective autonomous control in a short time.

Means for solving the problem

[0011] The present invention is an information processing system using a network including a plurality of nodes that perform information processing and a link that links these nodes and transmits information between the nodes as a constituent element. Network structure storage means for storing the structure of the network including the connection relationship of the network, input / output state storage means for storing the input / output state of the constituent elements formed by the network output generation process, and a form based on the network output result. The enhancement signal generation means for generating an enhancement signal to be given as a reward or punishment to the network according to the evaluation result of the state of the control target, and at least one enhancement signal generated by the enhancement signal generation means Assigned to an element and configured from a configuration element with an enhanced signal to another configuration element The propagation source and Z or propagation stored in the input / output state storage means in order based on the reinforcement signal given to the constituent element of the propagation source in order to propagate the reinforcement signal according to the chain connection relationship between the elements. Depending on the input / output state of the previous component element, an enhancement signal to be given as a reward or punishment to the propagation destination component element is generated, and the enhancement signal given to the component element or its history or the accumulated value of the enhancement signal or Using the history, a configuration element is generated or deleted for each configuration element to change the network structure, and the network structure after the change is stored in the network structure storage unit and stored in the network structure storage unit A network whose structure is changed by learning means Output generation means for generating the output of the network by using the enhancement signal storage means for storing the reinforcement signal of the constituent element generated by the learning means or its history or the cumulative value of the enhancement signal or the history for each constituent element; It is characterized by the absence of Here, the “control target” is, for example, a robot (may be a virtual robot such as a robot displayed on a display screen, which may be an actual robot, or a robot displayed by holography). These are the game characters displayed on the display screen, the environment of the space subject to air conditioning management, and so on. The same applies to the following inventions.

[0013] The "control target state" refers to, for example, the robot state (behavior result) brought about by the robot action based on the network output result, and the game character action based on the network output result. The state of the target space brought about by the air conditioning management based on the state of the resulting character (for example, if it is a fighting game, the damage received by the self, the damage given to the enemy, the result of winning or losing), and the network output result Environmental conditions (comfort, safety, etc.). The same applies to the following inventions.

[0014] Furthermore, it is not always necessary to store both input and output for each component element in the "input / output state storage means". For example, only the output of each component element is stored, and refer to the network structure. By doing so, the input and output of each constituent element may be grasped. In addition, the “input / output status of component elements” stored in the I / O status storage means includes not only the current (latest step) I / O status but also the past (previous step) I / O status. It may be. Therefore, when the reinforcement signal is generated “depending on the input / output state of the constituent element” by the learning means, not only the present but also the past input / output state (history of a plurality of points in time in the past may be used). ) May be referred to. The same applies to the following inventions.

[0015] In addition, the "enhanced signal" in "sequentially based on the enhanced signal given to the constituent element of the propagation source" includes not only the current (latest step) enhanced signal but also the past (the previous step). ) Enhancement signal may be included. Therefore, when generating the reinforcement signal to be given to the propagation destination configuration element by the learning means, not only the current enhancement signal given to the propagation source configuration element but also the past enhancement signal (the past one). It is also possible to refer to the history of multiple points in time, and to generate data based on the result of calculation using them.

[0016] In order to generate or delete the configuration element "using the enhancement signal or its history given to the configuration element or the cumulative value of the enhancement signal or its history" For example, when performing generation or deletion judgment processing using the enhancement signal or the cumulative value of the enhancement signal as they are, various arithmetic processing (for example, simple sum of each enhancement signal, each enhancement signal) Processing to calculate simple average of signals, weighted sum of each enhancement signal, weighted average of each enhancement signal, weight variance of each enhancement signal, standard deviation, etc. ) Is used to perform generation or deletion judgment processing using the values obtained by performing various processing (for example, the rate of change of each cumulative value, the This includes a process of calculating variance 'standard deviation, etc., and performing generation or deletion determination processing using values obtained by performing processing (regardless of linear' non-linear '). The same applies to the following inventions.

[0017] In addition, information processing performed by a "node" is usually a process that obtains one output using a plurality of inputs. For example, a special node such as a node located at the end of a network. In this case, a dummy node may be used, for example, a process for obtaining one output using one input or a process for obtaining one output without input. The same applies to the following inventions.

[0018] In such an information processing system of the present invention, an enhancement signal to be given to the network is generated in accordance with the evaluation result of the state of the controlled object, and this enhancement signal is transmitted from the network constituent elements to other elements. Propagate to constituent elements. At this time, the strengthening signal to be propagated, that is, the strengthening signal to be given to the constituent element of the propagation destination, is generated according to the input / output state of the constituent element of the propagation source and Z or the propagation destination, and configured in this way. It is determined whether to generate (add) or delete (淘汰) a configuration element for each configuration element using the reinforcement signal or its history given individually for each element or the cumulative value of the enhancement signal or its history. This process is executed to change the network structure autonomously.

[0019] Therefore, unlike the case of the above-described learner based on -eurogenetic learning, when changing the network structure, the evaluation is performed in units of constituent elements rather than in the evaluation unit of the entire network. Since each element is created or deleted, the time required for evaluation can be shortened, and it is possible to construct a network autonomously in a low time order. Accompanying calculation Strikes can be reduced.

[0020] In addition, as in the above-described Patent Documents 2 and 3, the network structure is determined in accordance with the use environment and tasks of the network, and the determined structure is determined. In the present invention, the network structure itself is autonomously changed and optimized in order to avoid limitations on the environment and tasks due to the structure determination. It becomes possible. For this reason, even when the network usage environment and tasks change, there is a high possibility that learning can be performed by reusing previous learning results as existing knowledge, thereby achieving the above objective.

[0021] Further, in the information processing system described above, a state detection means for detecting the state of the controlled object or a state evaluation signal acquiring means for acquiring a state evaluation signal for evaluating the state of the controlled object itself is controlled. The enhancement signal generation means may be configured to evaluate the state of the controlled object based on the state evaluation signal acquired by the state evaluation signal acquisition means and generate an enhancement signal according to the evaluation result.

Here, the “state detection means” includes, for example, position, velocity, acceleration, distance, rotation angle, rotation angular velocity, rotation angular acceleration, temperature, humidity, pressure, odor, light, sound, vibration, touch, etc. These are various sensors that detect

[0023] When the state of the control target is evaluated based on the state evaluation signal acquired by the state evaluation signal acquisition unit as described above, the state of the control target is evaluated without intervention of human judgment. This makes it possible to increase the speed of autonomous network construction and to easily perform consistent learning according to the purpose.

[0024] Furthermore, the information processing system described above further includes an evaluation result input receiving unit that receives an input of an evaluation result of the state of the control target by the user, and the enhancement signal generating unit is configured to receive the evaluation result received by the evaluation result input receiving unit. Depending on the situation, the enhancement signal may be generated.

[0025] When the evaluation result input receiving means is configured as described above, an enhancement signal is generated according to the user's evaluation result, and this enhancement signal is transmitted from the component element to another component element. Therefore, it is possible to promote the autonomous construction of the network so that the control target can be controlled in accordance with the user's intention.

[0026] The number of "users" may be one or more. When multiple users use or refer to the same controlled object, as in the latter case, it is possible to accept evaluation results by multiple users (evaluated results of different or identical states for the same controlled object). For example, if the control target is a search engine on the network, multiple user terminal devices connected to the network will be sent. It is possible to change the search algorithm of the search engine and the like.

[0027] In the information processing system described above, the learning means equally applies the reinforcement signal generated by the enhancement signal generation means to all the output nodes constituting the output layer of the network. Propagation that is determined according to the input / output status of the source node based on the reinforcement signal given to the source node, with the source element as the node and the destination component as the input side link of the source node. It is desirable to generate an enhanced signal that is given as a reward or punishment to the destination input link according to the degree of contribution to the node output of the destination input link.

[0028] In this way, when the enhanced signal is propagated from the node to its input side link, the enhanced signal given to the network can be propagated back from the output node, and Depending on the degree of contribution to the node output of the input link at the propagation destination, an enhanced signal to be given to the input link is generated, making it possible to perform reasonable evaluations for each link individually. Therefore, it is possible to realize generation or deletion for each component element.

[0029] Further, in the case where the enhancement signal is propagated from the node to the input side link as described above, the learning means uses the propagation source configuration element as a node and the propagation destination configuration element as the propagation source node. The input side node connected to the input side of the input side link of the input side, and the contribution to the node output of the input side link determined according to the input / output state of the source node based on the reinforcement signal given to the source node Depending on the degree, it may be configured to generate a reinforcement signal to be given as a reward or punishment to the input side node of the propagation destination. [0030] When the node power is configured to propagate the enhanced signal to the input side node of the input side link in this way, the node power is combined with the back propagation of the enhanced signal from the node to the input side link. The back propagation of the enhancement signal to the input side node of the input side link can be performed, and the back propagation of the enhancement signal can be realized more smoothly.

[0031] It should be noted that the propagation of the reinforcement signal from node to node is not performed via the link connecting these nodes rather than directly to the node of the propagation source node as described above. In other words, it is possible to store the strengthening signal once on the link connecting these nodes and pass it to the destination node.

[0032] Further, in the case where the enhancement signal is propagated from the node to the input side link as described above, the enhancement signal storage means is a history of the enhancement signal given to the link or a cumulative value of the enhancement signal. Is stored for each link, and the learning means is preferably configured to delete this link when the cumulative value of the reinforcement signal given to the link falls below a threshold value. Better!/,.

[0033] Here, the learning means performs a process of adding the accumulated value of the enhancement signal necessary for determining whether or not the force is below the threshold to the history of the enhancement signal stored in the enhancement signal storage means. Therefore, the cumulative value of the enhancement signal stored in the enhancement signal storage means may be read and grasped. The same applies to the following inventions.

[0034] If the link is deleted when the cumulative value of the reinforcement signal given to the link falls below the threshold value in this way, it is useful for controlling the control target as intended. It is possible to make a selection of links that cannot be established, that is, links that are considered unnecessary, and it is possible to change the network structure autonomously.

[0035] Then, in the case where the link is deleted when the cumulative value of the enhancement signal given to the link falls below the threshold as described above, the learning means is connected to the input side of the node. It is desirable to delete this node when the number of links becomes 1 or less.

[0036] As described above, when this node is deleted when the number of links on the input side of the node becomes 1 or less, it is considered to be useful for controlling the control target as intended. It is possible to check nodes that are considered unnecessary. It is possible to change the structure autonomously.

[0037] In addition, in a configuration in which an enhanced signal is propagated from a node to its input side link, a test that does not contribute to node output in addition to the input side link of the propagation destination is provided on the input side of the propagation source node. A link is provided, and the enhancement signal storage means is configured to store the history of the enhancement signal given to the test link or the cumulative value of the enhancement signal, and the learning means is an enhancement signal given to the test link. It is desirable that the test link be registered in the network structure storage means as the input side link of the propagation source node when the cumulative value of the above exceeds the threshold.

[0038] When the test link is configured as described above, the test link that is considered to be useful for controlling the control target as intended is promoted to an actual link that contributes to the node output, and formally. Since it can be registered as an input-side link, autonomous link generation can be realized and the network structure can be changed autonomously.

[0039] Further, in the case where the test link is provided as described above, the learning means performs the test link when the cumulative value of the reinforcement signal given to the test link falls below the threshold value. It is desirable to create a new test link to be deleted, coupled to an arbitrary node, and register it in the network structure storage means.

[0040] When the cumulative value of the enhancement signal given to the test link falls below the threshold value, the test link is deleted and a new test link is generated. Test links that are appropriate candidates for the links that are generated (real links) can be prepared in advance, enabling appropriate and smooth generation of links and autonomously changing the network structure. Let it be! /

[0041] In the information processing system described above, the link is provided with a test node that does not contribute to the output of the network accompanying the link, and the test node receives the first input to the input side node of the link. Connected by the side test link, and connected to the output side node of the link by the output side test link, and connected to any node by the second input side test link. It is a link, the propagation destination component is a test node, and the propagation destination test is performed according to the output of the propagation link and the propagation of the propagation test node based on the enhanced signal given to the propagation link. Reward for the node Or it is desirable to be the structure which produces | generates the reinforcement | strengthening signal given as punishment.

[0042] When the test node is provided in association with the link as described above, it is possible to prepare a candidate for a newly generated node (real node), and the network structure is autonomous. It becomes possible to change to.

[0043] Further, in the case where the test node is provided in association with the link as described above, the learning means uses the propagation source configuration element as the test node and the propagation destination configuration element as the first and second test nodes. Test of the first and second input side test links of the propagation destination determined as the second input side test link and based on the input / output state of the test node of the propagation source based on the reinforcement signal given to the test node of the propagation source It is desirable to generate an enhanced signal that is given as reward or punishment for the first and second input side test links of the propagation destination according to the contribution to the node output.

[0044] When the enhancement signal is propagated from the test node to the first and second input-side test links in this way, a newly generated link (actual link) candidate is prepared. This makes it possible to change the network structure autonomously.

[0045] Further, in the case where the enhanced signal is propagated from the test node to the first and second input-side test links as described above, the enhanced signal storage means includes the first and second propagation destinations. The history of the enhancement signal given to the two input side test links or the cumulative value of the enhancement signal is also stored for each link, and the learning means is the first or second input side test link of the propagation destination. When the cumulative value of the enhancement signal given to the threshold value falls below the threshold value, the input side test link below the threshold value is deleted, and a new input side test link that is coupled to an arbitrary node is generated. It is desirable that the configuration be registered in the structure storage means.

[0046] When the cumulative value of the enhancement signal given to the first or second input side test link in this way falls below the threshold value, the input side test link below the threshold value is deleted and a new input is made. When the test link is configured to be generated, a test link that is a suitable candidate for a newly generated link (actual link) can be prepared, so the link can be generated appropriately and smoothly. It is possible to autonomously change the network structure.

[0047] The first input test link is given a sufficiently large reward when this link is generated. If this is done and it is preferable not to be deleted, in effect, only the second input test link is subject to deletion.

[0048] Then, in the case where the enhanced signal is propagated from the test node to the first and second input-side test links as described above, the enhanced signal storage means has the first and second propagation destinations. The history of the reinforcement signal given to the two input side test links or the cumulative value of the enhancement signal is also stored for each link, and the learning means is the first and second input side test links of the propagation destination. When the cumulative value of the enhancement signal given to all exceeds the threshold, the test node is promoted to a real node that contributes to the network output and registered in the network structure storage means in order to put the test node into practical use It is desirable to have a configuration that does this.

[0049] As described above, when the cumulative value of the enhancement signals given to the first and second input side test links exceeds the threshold value, the test node is put into practical use. New nodes (real nodes) can be created (added), and the network structure can be changed autonomously.

[0050] In the information processing system described above, it is desirable that the node be configured to perform information processing using at least one logical circuit.

Here, as the “logic circuit”, for example, an AND (logical product) circuit, an OR (logical sum) circuit, an X OR (exclusive OR) circuit, a NOT (negative) circuit, a NAND ( A NOR (Not AND) circuit, a NOR (Negative OR: Not OR) circuit, an XNOR (Negative Exclusive OR: Exclusive Not OR) circuit, or the like can be used.

[0052] When a node is configured using a logic circuit in this way, an information processing system capable of realizing target control can be constructed with a simple structure.

[0053] Further, as an information processing method realized by the information processing system of the present invention described above, the following information processing method of the present invention can be cited.

That is, the present invention is an information processing method using a network including a plurality of nodes that perform information processing and a link that links these nodes and transmits information between the nodes as a configuration element. The network structure including the connection relationship between the elements is stored in the network structure storage means, and formed by the network output generation process. The input / output states of the constituent elements to be stored are stored in the input / output state storage means, and the reinforcement signal generation means rewards the network according to the evaluation result of the state of the control target formed based on the output result of the network. Alternatively, a process of generating an enhancement signal to be given as punishment is performed, and the learning means assigns the enhancement signal generated by the enhancement signal generation means to at least one component element, and from the component element to which the enhancement signal is given to the other component In order to propagate the enhancement signal to the configuration element according to the chain connection relationship between the configuration elements, the propagation source stored in the input / output state storage unit is sequentially based on the enhancement signal assigned to the configuration element of the propagation source. And Z or the strength given as a reward or punishment to the propagation destination component element according to the input / output status of the propagation destination component element A signal is generated, and the strengthening signal of the generated constituent element or its accumulated value is stored in the strengthening signal storage means for each constituent element, and the strengthening signal given to the constituent element or its history or the cumulative value of the strengthening signal or its history Is used to generate or delete a configuration element for each configuration element, change the network structure, and store the changed network structure in the network structure storage means. The network structure stored in the structure storage means is referred to, and the network output is generated using the network whose structure has been changed by the learning means.

[0055] Here, "to store the enhancement signal of the generated component element or its accumulated value in the enhancement signal storage means for each component element" includes the case where the accumulated value of the enhancement signal or the enhancement signal is overwritten and saved, This includes the case where the past enhancement signal or the cumulative value of the enhancement signal is stored as a history while the enhancement signal or the cumulative value of the enhancement signal is additionally stored.

[0056] In such an information processing method of the present invention, the effect obtained by the information processing system of the present invention described above can be obtained as it is, thereby achieving the object.

[0057] Further, the present invention provides a computer as an information processing system using a network that includes a plurality of nodes that perform information processing and a link that links these nodes and transmits information between the nodes as constituent elements. A network structure storage means for storing a network structure including a connection relationship between constituent elements, and The input / output state storage means for storing the input / output state of the constituent elements formed in the network output generation process, and the network is rewarded or rewarded according to the evaluation result of the state of the control target formed based on the network output result. An enhancement signal generation means for generating an enhancement signal to be given as punishment, and an enhancement signal generated by the enhancement signal generation means is assigned to at least one component element, and the component element to which the enhancement signal is given is given to another component element. In order to propagate the reinforcement signal according to the chain connection relationship between the constituent elements, the propagation source and Z or the Z or the Z stored in the input / output state storage means are sequentially based on the reinforcement signal given to the constituent element of the propagation source. As a reward or punishment for the constituent element of the propagation destination according to the input / output state of the constituent element of the propagation destination The structure of the network is changed by generating or deleting the configuration element for each configuration element using the enhancement signal given to the configuration element or its history, or the cumulative value of the enhancement signal or its history. The network structure storage means stores the changed network structure and the network structure stored in the network structure storage means is referred to, and the network whose structure is changed by the learning means is used. Output generation means for generating output, and reinforcement signal storage means for storing the reinforcement signal of the constituent element generated by the learning means or its history, or the cumulative value of the enhancement signal or its history for each constituent element As an information processing system characterized by It is intended for causing ability.

Note that the above program or a part of the above program is, for example, a magneto-optical disk (MO), a read-only memory (CD-ROM) using a compact disk (CD), a CD recordable (CD-R), a CD rewritable ( CD-RW), read-only memory (DVD-ROM) using digital 'Versatile' disc (DVD), random 'access' memory using DVD (D VD-RAM), flexible disc (FD), magnetic tape, Recording to storage media such as hard disk, read-only memory (ROM), electrically erasable and rewritable read-only memory (EEPROM), flash 'memory, random'access' memory (RAM), etc. For example, local area network (LAN), metropolitan 'area' network (MAN), wide area network (WAN), internet DOO, intranet, extranet, etc. wired network or a wireless communication network, It is also possible to transmit using a transmission medium such as a network or a combination of these yarns, or to carry it on a carrier wave. Furthermore, the above program may be a part of another program or may be recorded on a recording medium together with a separate program.

The invention's effect

[0059] As described above, according to the present invention, an enhancement signal to be applied to the network is generated according to the evaluation result of the state of the controlled object, and the enhancement signal is further transmitted from the network constituent element. Network structure by evaluating, generating, or deleting for each configuration element using the enhancement signal or its history, or the cumulative value of the enhancement signal or its history. The time required for evaluation can be shortened compared to the case where the entire network is evaluated as an evaluation unit as in the past, and the network can be constructed autonomously with a low time order! If you can!

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 shows the overall configuration of the information processing system 10 of the present embodiment. FIG. 2 shows the data structure used in the processing by the information processing system 10. Also, Fig. 3 shows the overall flow of operation control of the robot 30, Fig. 4 shows the flow of processing of the network 20, and Fig. 5 shows learning of the intermediate OR node (real node). The flow of processing is shown, and Fig. 6 shows the flow of learning processing for non-inverted links. Furthermore, Fig. 7 is an explanatory diagram of the learning process of the intermediate OR node, Fig. 8 is a diagram showing an example of distribution of the reinforcement signal during learning of the intermediate OR node, and Fig. 9 is a diagram of the intermediate AND node. FIG. 10 is a diagram illustrating an example of distribution of reinforcement signals during learning, and FIG. 10 is an explanatory diagram of learning processing for a non-inverted link (real link). FIG. 11 shows the configuration of initialization, and FIG. 12 shows the configuration of deletion processing during learning. FIG. 13 is an explanatory diagram of the output node initialization process G4, FIG. 14 is an explanatory diagram of the intermediate OR node initialization process G5, and FIG. 15 is an explanatory diagram of the test intermediate OR node initialization process G7. FIGS. 16 to 18 are explanatory diagrams of the intermediate OR node deletion processing E1.

In FIG. 1, an information processing system 10 uses a network 20 to control objects (this implementation In the embodiment, the robot 30 is taken as an example. Is an information processing system that consists of one or more computers. The network 20 is an information processing network configured in a computer. The network 20 is arranged in the input layer, the intermediate layer, and the output layer, and each of the plurality of input nodes 21 and the plurality of intermediate nodes 22 performs information processing individually. And a plurality of output nodes 23, and a link 24 that links these nodes 21, 22 and 23 to transmit information between the nodes.

[0062] Each of the nodes 21, 22, 23 and the link 24 is a self-organizing network element (SONE) that functions as an element used to create a learning device. Self-organized network elements (SONE) is a network element (SONE) that autonomously constructs the network 20 by giving these elements death conditions, new element generation functions, enhanced signal generation / propagation functions, etc. It is a circuit element that can.

In the present embodiment, the control target is described as a robot 30 called a Keppera robot as an example. However, the control target of the information processing system of the present invention is not limited to the Kepera robot and is not limited to the robot.

As shown in FIG. 1, the robot 30 includes a right wheel 31 and a motor 32 that drives the right wheel 31, a left wheel 33 and a motor 34 that drives the right wheel 31, and six in the forward part in the traveling direction, In this portion, two infrared sensors 35 are provided. This robot 30 is a robot that moves forward while avoiding a collision with the wall 36. The eight infrared sensors 35 are provided to detect the distance D between the robot 30 and the wall 36.

[0065] The node functions as an information processing device. In this embodiment, each node is configured by a logical circuit (AND circuit or OR circuit), and includes an input node 21 and four types of intermediate nodes 22 (intermediate). There are 6 types of nodes: AND node, intermediate OR node, test intermediate AND node, test intermediate OR node), and output node 23. The node is basically a force input node 21 composed of a logic circuit that can obtain one output from a plurality of inputs, and is a dummy node that performs only output. In this embodiment, an AND circuit and an OR circuit are used, but other types of logic circuits such as an XOR circuit may be used, or a plurality of logic circuits may be combined to form one node.

The input node 21 is provided corresponding to the eight infrared sensors 35. Ie 1 The sensor signal of the infrared sensor 35 is 16 bits, and in the case of eight, 16 X 8 = 128 bits. Therefore, if one input node 21 is assigned to one bit, in this embodiment, the input node The number of 21 is 128.

The output node 23 is provided corresponding to the two motors 32 and 34. In other words, one motor output signal (rotation speed) is 16 bits, and the left and right two are 16 X 2 = 32 bits, so if one output node 23 is assigned to one bit, this In the embodiment, the number of output nodes 23 is 32. In this embodiment, the output node 23 may be a mixture of force AND nodes that are all OR nodes.

[0068] Although the number of input nodes 21 and output nodes 23 is fixed, the number of intermediate nodes 22 varies because the structure of the network 20 changes autonomously.

[0069] In this embodiment, there are four types of links, that is, an inverted link (a link whose output is inverted from the input), a non-inverted link, a test inverted link, and a test non-inverted link.

In FIG. 1, the information processing system 10 includes a sensor signal acquisition unit 41, a state evaluation signal acquisition unit 42, an enhancement signal generation unit 43, a motor output signal transmission unit 44, and a network processing unit. 50, robot information storage means 60, network information storage means 61, node information storage means 62, and link information storage means 63.

[0071] The sensor signal acquisition unit 41 performs a process of acquiring the sensor signals output from the eight infrared sensors 35 and writing them in the robot information storage unit 60.

The state evaluation signal acquisition means 42 performs a process of acquiring a state evaluation signal for evaluating the state (behavior result) of the robot 30 that is the control target. In the present embodiment, the sensor signal from the infrared sensor 35 acquired by the sensor signal acquisition unit 41 and stored in the robot information storage unit 60 (see FIG. 2) and the motor output signal transmission unit 44 as state evaluation signals. Thus, the motor output signal (rotation speed) read from the robot information storage means 60 and transmitted to the motors 32 and 34 of the robot 30 is used. Therefore, the infrared sensor 35 functions as a state detection unit that detects the state of the robot 30 that is a control target. In this embodiment, the motor output signal is read and acquired from the robot information storage means 60, but the motor output signal stored in the robot information storage means 60 is transmitted to the robot 30 as it is. It is assumed that the motor output signal is acquired from the robot 30. You can also. Then, the actual motor output signal (actual rotational speed) detected by the state detection means, not the motor output signal as the control signal transmitted to the motors 32 and 34, can be used as the state evaluation signal. Good. When the robot 30 is a virtual robot displayed on a display screen that is not an actual robot, the motor output signal as the control signal and the actual motor output signal (actual rotation speed) are the same. is there. The state evaluation signal acquisition unit 42 also acquires the state index value A6 (see FIG. 2) one step before stored in the robot information storage unit 60 as a state evaluation signal.

[0073] The reinforcement signal generation means 43 is based on the state evaluation signal acquired by the state evaluation signal acquisition means 42, and the state of the robot 30 that is the control target formed based on the output result of the network 20 (action result). In response to the evaluation result, processing is performed to generate a reinforcement signal that is given to the network 20 as a reward or punishment. At this time, the reinforcement signal generating means 43 grasps the relative distance D between the robot 30 and the wall 36 based on the sensor signal from the infrared sensor 35, and reports when the robot 30 moves away from the wall 36. A reward (positive reinforcement signal) is given, and a punishment (minus reinforcement signal) is given when moving toward wall 36. Also, based on the motor output signal, it is ascertained whether or not the robot 30 is moving straight, giving reward (positive reinforcement signal) when traveling straight, and punishment (negative reinforcement signal) when not traveling straight. )give.

[0074] More specifically, the enhancement signal generating means 43, for example, determines that the robot 30 is in a state where at least one of the sensor signals from the infrared sensor 35 has a value larger than a threshold value (for example, zero). Since we are in the vicinity of wall 36, we sum the values of the 8 sensor signals, multiply the sum by 1 and then multiply by a constant if necessary. Is written in the current state index value A5 of the robot information storage means 60. Further, when the total value force threshold (for example, zero) of the sensor signal from the infrared sensor 35 is larger, this total value may be multiplied by 1 and further multiplied by a constant as necessary. Therefore, the closer to the wall 36, the larger the absolute value of minus. Then, from this current state index value, the state index value indicating the state of the robot 30 one step before (the value calculated in the same manner and stored in the robot information storage means 60, the state evaluation signal It is acquired by the acquisition means 42.) The difference between the sensor signals is taken, and the obtained value is used as an enhancement signal to be given to the network 20. As a result, if the robot 30 moves away from the wall 36, the reinforcement signal becomes positive (reward), and if it approaches the wall 36, it becomes negative (punishment). Thereafter, the current state index value is stored in the robot information storage means 60 as the state index value of the previous step for the processing of the next step. If any of the sensor signals from the infrared sensor 35 is less than or equal to a threshold value (for example, zero), the robot 30 is not in the vicinity of the wall 36. It is determined whether the motors 32 and 34 have the same rotation speed. If the rotation speeds are the same, it is determined that the motors are moving straight, and a reinforcement signal (reward) of “+1” is given, and the rotation speeds are the same. If it is not, it is judged that the vehicle is not moving straight, and a reinforcement signal of “-0.01” is given (small punishment).

The motor output signal transmitting unit 44 performs processing for transmitting the motor output signal written in the robot information storage unit 60 to the motors 32 and 34 of the robot 30 based on the output result of the network 20.

[0076] The network processing means 50 performs processing using the network 20, and includes a learning means 51, an input conversion means 52, an output generation means 53, and an output conversion means 54. Yes.

[0077] The learning unit 51 equally applies the enhancement signal generated by the enhancement signal generation unit 43 to all the output nodes 23, and sequentially strengthens from the output layer to the intermediate layer, and further from the intermediate layer to the input layer. The signal is propagated back to each link 24, each intermediate node 22, and each input node 21 according to the chain connection relationship between the constituent elements (between nodes and links and between nodes). The process to propagate is performed. At this time, the learning means 51 determines the propagation destination configuration element according to the input / output state of the propagation source and the Z or propagation destination configuration element based on the reinforcement signal given to the propagation configuration element (node or link). Generates a reinforcement signal that is given as a reward or punishment for the rement. In addition, the learning means 51 uses the accumulated value of the reinforcement signal assigned to the configuration element (node or link) to generate or delete the configuration element for each configuration element to change the structure of the network 20 and change it. The structure of the subsequent network 20 is registered in the network information storage means 61, the node information storage means 62, and the link information storage means 63 (see FIG. 2) that function as network structure storage means. Details of the learning process will be described later. The input conversion means 52 performs a process of converting the sensor signal stored in the robot information storage means 60 into a binary number and setting it as the output of each input node 21.

[0079] The output generation means 53 has the structure of the network 20 stored in the network information storage means 61, the node information storage means 62, and the link information storage means 63 (see FIG. 2) that function as network structure storage means. The processing for generating the output of the network 20 is performed using the network 20 whose structure is changed by the learning means 51 with reference to the network 20. The output generation means 53 realizes the function (output generation function) of each logic circuit constituting each intermediate node 22 and each output node 23 by executing a program.

[0080] The output conversion means 54 performs processing for converting the output (binary number) of each output node 23 into a real number and writing it in the robot information storage means 60 as a motor output signal (rotation number).

In FIG. 2, the robot information storage means 60 includes an input array A1 (real number X 8, ie, sensor signals of each infrared sensor 35, which are sensor signals from the eight infrared sensors 35 acquired by the sensor signal acquisition means 41. The corresponding Al (1) to A1 (8) force)) and the output array A2 (real number X2, that is, A2 (l) corresponding to each motor output signal) are the left and right motor output signals (rotations). , A2 (2) power)), and the network address A3 which is the address of the network information storage means 61, the reinforcement signal A4 (real number) generated by the reinforcement signal generation means 43 and given to the network 20, and A state index value A5 (real number) that indicates the current state of the robot 30 and a state index value A6 (real number) that indicates the state of the robot 30 one step before are stored. Here, “one step” when referring to one step means one round of processing in units of the loop processing of steps S5 to S9 in FIG. 3, and each step S5 to S9 constituting the loop. It does not mean that processing.

[0082] Note that the enhancement signal A4 given to the network 20 is the same as the enhancement signal B4 stored in the network information storage means 61 described later, and therefore the enhancement signal generation means 43 generates Is directly written in the enhancement signal B4 of the network information storage means 61, which is not the same as the reinforcement signal A4 of the robot information storage means 60, the memory for the enhancement signal A4 may be omitted. The state index value that indicates the current state of the robot 30 is once written in the current state index value A5 of the robot information storage unit 60, and then the current state index value written in the robot information storage unit 60 A5 and robot The force that is used to calculate the reinforcement signal using the state index value A6 of the previous step stored in the information storage means 60. The state index value A6 of the previous step in the robot information storage means 60 Stored in the robot information storage means 60, it is possible to perform the process of calculating the enhancement signal without writing to the current state index value A5. The memory reservation may be omitted.

[0083] The network information storage means 61 is an input node address B1 (which is a variable length array and corresponds to each input node 21) which is an address of a part for storing information of each input node 21 in the node information storage means 62. Bl (1), B1 (2) ··· Β1 (m) ...) and intermediate node address B2 (which is the address of the part storing the information of each intermediate node 22 in the node information storage means 62 ( A variable-length array consisting of Β2 (1), Β2 (2) ··· Β2 (η)… corresponding to each intermediate node 22), and information on each output node 23 in the node information storage means 62 The output node address Β3 (which is a variable-length array, corresponding to each output node 23, Β3 (1), Β3 (2) ··· Β3 (consists of ...) The enhanced signal により 4 (real number) generated by the signal generation means 43 and given to the network 20 is written. It is something to remember.

[0084] The node information storage means 62 individually stores information on each of the six types of nodes, and the input of the node in the link information storage means 63 for each node. Input side link address C1 (variable length array, which is the address of the part that stores side link information, and consists of CI (1), CI (2) -Cl (k) ... corresponding to each input side link. ) And the output side link address C2 (the variable length array, C2 (l) corresponding to each output side link, which is the address of the part that stores the output side link information of the node in the link information storage means 63 , C2 (2) '"C2 (h) ..." and the network address C3 which is the address of the network information storage means 61 and the input side of the corresponding node in the link information storage means 63 Unit for storing test link information Test link address C4, which is the address of the node, and AN D-OR node flag C5 to identify whether the node is an AND node or an OR node (l bit, "True (or 1)" if AND node) If it is an OR node, it will be “False (or 0)”.) And the input node flag C6 (1 bit, which identifies whether the node is the input node 21 or not) “True (or 1)” if present, “False (or 0)” if not an input node). Output node flag C7 for identifying whether the power is 23 or not (7 bit, “True (or 1)” if it is an output node, “False (or 0)” if it is not an output node. ) And a test node flag C8 (l bit for identifying whether the node is a test node or not, “True (or 1)” if it is a test node, “False ( Or 0) "), and the output C9 of the node (l bit," True (or 1) "or" False (or 0) ") and the enhancement signal given to the node The total value of C10 (which is a real number, but the total value is not the cumulative value of each step, but means the total value of the enhancement signal propagated from each constituent element of the propagation source). is there. This node information storage means 62 dynamically adds / deletes memory corresponding to these nodes in accordance with the addition / deletion of nodes.

In the node information storage means 62, when the node is a test node, the input side link address C1 is the first and second input side test link addresses CI (CI (1) and C1 (2 ) Only), the output side link address C2 becomes the output side test link address C2 (C2 (1) only), and the test link address C4 disappears. Note that the test link means a link that does not contribute to the output and does not have an associated test node. On the other hand, a real link means a practical link that contributes to output, and is a link that owns an associated test node.

[0086] The link information storage means 63 individually stores information on a plurality of links for each of the four types of links. For each link, the input of the link in the node information storage means 62 is stored. The input side node address D1 which is the address of the part storing the information of the side node and the output side node address D2 which is the address of the part storing the information of the output side node of the link in the node information storage means 62 And the network address D3 which is the address of the network information storage means 61, the test node address D4 which is the address of the part of the node information storage means 62 which stores the information of the test node associated with the link, Inverted non-inverted flag D5 (l bit to identify whether the link is an inverted link or a non-inverted link. "True (or 1)" and non-inverted link "False (or 0)"), the test link flag D6 (l Bits and true (or 1) if test link "If not a test link," False (or 0) "is displayed. ), The output D7 of the link (1 bit, “True (or 1)” or “False (or 0)”) and the accumulated value D8 of the enhancement signal given to the link (Real number, cumulative value of multiple steps) and enhancement signal D9 (real number, value for one step) given to the link. is there. In this link information storage means 63, according to the addition and deletion of links, the memory corresponding to these links is dynamically added or deleted.

[0087] The network information storage means 61 stores B1 to B3, the node information storage means 62 stores C1 to C8, and the link information storage means 63 stores D1 to D6. A network structure storage means for storing the structure of the network 20 including the coupling relationship between the constituent elements is configured.

[0088] Furthermore, the input / output state of the constituent elements formed by the output generation processing of the network 20 is determined by the portion storing C9 of the node information storage means 62 and the portion storing D7 of the link information storage means 63. An input / output state storage means for storing is configured.

[0089] Then, the learning unit includes the part that stores B4 of the network information storage unit 61, the part that stores C10 of the node information storage unit 62, and the part that stores D8 and D9 of the link information storage unit 63. Strengthening signal storage means for storing the strengthening signal for the constituent element or its accumulated value generated by 51 for each constituent element is configured.

In the above, the sensor signal acquisition means 41, the state evaluation signal acquisition means 42, the enhancement signal generation means 43, the motor output signal transmission means 44, and the network processing means 50 are the computer main body constituting the information processing system 10. A central processing unit (CPU) provided inside the PC (including not only personal computers but also higher-level models), and one or more programs that define the operating procedure of this CPU (for example, It is realized by a program written in C ++ language).

Further, the robot information storage unit 60, the network information storage unit 61, the node information storage unit 62, and the link information storage unit 63 are realized by, for example, a main memory, a cache memory, or a local memory. For example, it may be realized by using an external storage device such as a hard disk, MO, DVD-RAM, FD, magnetic tape, etc., as long as there is no problem with force S, access speed, storage capacity, etc. In this embodiment, autonomous control of the operation of the robot 30 is performed by the information processing system 10 as follows.

First, the overall flow of operation control of the robot 30 by the information processing system 10 will be described with reference to FIGS.

[0094] In Fig. 3, a program for realizing the information processing system 10 is launched, and the operation control of the robot 30 is started (step Sl).

Subsequently, the network processing unit 50 performs necessary initialization processing (step S2). The initialization process performed here includes an initialization process for information stored in the robot information storage means 60 (robot initialization process G1 in FIG. 11 described later) and an initial process for information stored in the network information storage means 61. Initialization processing (network initialization processing G2 in FIG. 11 to be described later) and initialization processing for generating the required number (128 in this embodiment) of input nodes 21 (input node initialization processing G3 in FIG. 11 to be described later) And an initialization process (output node initialization process G4 in FIG. 11 to be described later) for generating the required number (32 in this embodiment) of output nodes 23, and each output node 23 as an input side link. An initialization process that generates a real link that is randomly connected from node 23 to any one of input nodes 21 (inverted link initialization process G9 or non-inverted link initialization process G10 in FIG. 11 described later), and each output node 23 Provided on the input side Initialization processing to generate a test link that is randomly connected from the power node 23 to any one of the input nodes 21 (test inversion link initialization processing Gl 1 in FIG. 11 described later or test non-inversion link initialization processing G12) and Initialization process to generate a test node associated with the generated real link (inverted link initialization process G9 or non-inverted link initialization process G10 in FIG. 11 described later) Test intermediate OR node initialization process G7 or test intermediate AND node initialization process G8) in Fig. 11 and initialization process to generate the first and second input side test links of the generated test node (Fig. 11 test reverse link initialization processing G11 or test non-reverse link initialization processing G12).

Then, the sensor signal acquisition means 41 acquires sensor signals detected by the eight infrared sensors 35, and the acquired eight sensor signals are input to the input array Al (1) to Al (1) ˜ Write to A1 (8) (see Figure 2) (step S3).

Then, the robot information is obtained as a state evaluation signal by the state evaluation signal acquisition unit 42. Sensor signals from the eight infrared sensors 35 stored in the input array Al (1) to A1 (8) of the storage means 60 and the output arrays A2 (l) and A2 (2) of the robot information storage means 60 The motor output signal (number of rotations) stored and the state index value A6 one step before stored in the robot information storage means 60 are acquired (step S4).

Subsequently, the reinforcement signal generation means 43 evaluates the state (behavior result) of the robot 30 to be controlled based on the state evaluation signal acquired by the state evaluation signal acquisition means 42, and the evaluation result In response, a reinforcement signal to be given as a reward or punishment to network 20 is generated (step S4). In the initial state evaluation signal acquisition process described above, the output arrays A2 (l) and A2 (2) include a motor output signal (number of rotations) based on the output result of the network 20 whose structure is changed by learning. ) Is not included, and the state index value A6 of the previous step is not included in the state index value as a result of the state evaluation performed in the previous step. Becomes zero. Then, the enhancement signal generation means 43 writes the enhancement signal thus generated in the enhancement signal A4 of the robot information storage means 60. Further, the reinforcement signal generation means 43 stores the current state index value obtained by evaluating the state (behavior result) of the robot 30 at the current step for use in the state evaluation process at the next step. Write to state index value A6 one step before means 60 and save. As described above, since the initial reinforcement signal is zero, the structure of the network 20 is not changed as if the initial learning process by the learning means 51 described later is not substantially performed. .

[0099] Then, the network processing means 50 performs processing of the network 20, that is, learning processing and output generation processing (step S5).

In FIG. 4, in the learning process, first, the learning means 51 reads the reinforcement signal A4 from the robot information storage means 60 and writes it into the enhancement signal B4 from the network information storage means 61, whereby the reinforcement signal is transmitted as the network 20. Receive (step S501).

[0101] Next, the learning means 51 refers to the output node address B3 of the network information storage means 61, and stores the information of each output node 23 corresponding to these output node addresses B3 in the node information storage means 62. The same value as that of the enhancement signal B4 of the network information storage means 61 is stored in the total value C10 of the enhancement signal in the portion to be processed. As a result, all The enhancement signal is uniformly transmitted to the output node 23 (step S502).

Subsequently, the learning means 51 performs learning processing for each output node 23 corresponding to the output node address B3 of the network information storage means 61 (step S503). Details of the learning process of the output node 23 will be described later.

[0103] Further, the learning means 51 performs learning processing for each intermediate node 22 corresponding to the intermediate node address B2 of the network information storage means 61 (step S504). Details of the learning process of the intermediate node 22 will be described later with reference to FIG. Fig. 5 shows the learning process flow for an intermediate OR node (real node).

[0104] Then, the learning means 51 refers to the input side link address C1 of the node information storage means 62 for each output node 23 corresponding to the output node address B3 of the network information storage means 61. A learning process is performed on each input side link of each output node 23 corresponding to the link address C1 (step S505). Details of the learning process for the input side link of the output node 23 will be described later.

Furthermore, the learning means 51 refers to the input side link address C1 of the node information storage means 62 for each intermediate node 22 corresponding to the intermediate node address B2 of the network information storage means 61, and these input side link addresses. A learning process is performed for each input side link of each intermediate node 23 corresponding to C1 (step S506). Details of the learning process of the input side link of the intermediate node 22 will be described later with reference to FIG. Figure 6 shows the flow of non-inverted link learning processing.

[0106] After that, learning processing (see steps S501 to S506) is performed as described above, and processing for generating a new output of the network 20 is performed using the network 20 after the structure is changed. Note that the change in the structure of the network 20 due to the learning process is caused by the reinforcement signal generated according to the evaluation result of the state of the robot 30 formed based on the output result of the network 20 before changing the structure. The input / output states of the constituent elements used for various judgments in the learning process (see steps S501 to S506) are the input / output states obtained by the network output generation process that forms the state of the robot 30 that is the basis for generating the reinforcement signal. It must be in a state. In this respect, the input / output states of the constituent elements used for various determinations in the learning process (see steps S501 to S506) are stored in the memory (input / output status in FIG. 2). Input / output state remaining in the state storage means), that is, the network before the structure is changed

The input / output state obtained by the 20 output generation processes satisfies the above requirements.

In the output generation processing, first, the input conversion means 52 refers to the input node address B1 of the network information storage means 61 and grasps the part storing the information of each input node 21 in the node information storage means 62 The eight sensor signals stored in the input arrays Al (1) to A1 (8) of the robot information storage means 60 are converted into binary numbers, and the values obtained by the conversion are stored in the node information storage means 62. Set as output C9 of each input node 21 (step S507)

[0108] Subsequently, the output generation means 53 refers to the intermediate node address B2 of the network information storage means 61 to grasp the part for storing the information of each intermediate node 22 in the node information storage means 62. According to the function of the logic circuit constituting the intermediate node 22, the output C9 of each intermediate node 22 is calculated (step S508). At this time, the newly generated intermediate node 22 is added after the array of the intermediate node address B2 of the network information storage means 61, and this new intermediate node 22 is input to the network 20 in the input / output chain. Therefore, in order to realize output generation processing from the input layer to the output layer, the output generation processing of the intermediate node 22 is performed in the array of the intermediate node address B2 of the network information storage means 61. Perform in reverse order.

[0109] Further, the output generating means 53 refers to the output node address B3 of the network information storage means 61 and grasps the part storing the information of each output node 23 in the node information storage means 62, and each output node According to the function of the logic circuit constituting 23, the output C9 of each output node 23 is calculated (step S509).

[0110] The calculation processing of the node performed in the above steps S508 and S509 is the same as the processing of the normal logic circuit, and the input side link corresponding to all the input side link addresses C1 of the node in the node information storage means 62 The link output D7 of the link information storage means 63 is read, and the output D7 of these input side links is used as the input of the relevant node to be calculated. Then, by referring to the AND'OR node flag C5 of the relevant node in the node information storage means 62, it is ascertained whether the relevant node is an AND node or an OR node, and if it is an AND node, the same processing as the AND circuit is performed. If this is the case, perform the same process as the OR circuit to Calculate the C9 output C9.

[0111] For example, in the case of the node power intermediate OR node, the test intermediate OR node, and the output node 23 (in this embodiment, only the OR node) to be calculated, the output C9 of the node Is overwritten with False (or 0), and if at least one of the output D7 of all input side links corresponding to the input side link address C1 is True (or 1), the output C9 of that node is set to True. Overwrite with (or 1). On the other hand, if the node being calculated is an intermediate AND node or a test intermediate AND node, the output C9 of the node is overwritten with True (or 1), and then the input side link address C1 is set. If at least one of the outputs D7 of all corresponding input links is False (or 0), the output C9 of the node is overwritten with False (or 0).

[0112] The link calculation processing performed in conjunction with the node calculation processing in steps S508 and S509 is the same as the normal logic circuit processing, and the link to be calculated is an inverted link or test inversion. In the case of a link, the value obtained by inverting the value of the output C9 of the node of the node information storage means 62 for the input side node corresponding to the input side node address D1 of the link of the link information storage means 63 is When the link output D7 is overwritten and the link to be calculated is a non-inverted link or a test non-inverted link, the input corresponding to the input side node address D1 of the link in the link information storage means 63 The value of the node output C9 of the node information storage means 62 for the side node is directly overwritten to the output D7 of the link.

[0113] After that, the output conversion means 54 refers to the output node address B3 of the network information storage means 61 to grasp the part for storing the information of each output node 23 in the node information storage means 62, and outputs each output. The output C9 (binary number) of the node 23 is converted into a real number, and is written in the output array A2 of the robot information storage means 60 as a motor output signal (rotation number) (step S510).

[0114] In FIG. 3, after the processing by the network processing means 50, the motor output signal transmitting means 44 writes the data in the output array A2 of the robot information storage means 60 based on the output result of the network 20 (output C9 of each output node 23). The motor output signal (rotation speed) is transmitted to the motors 32 and 34 of the robot 30, thereby driving the motors 32 and 34 to operate the robot 30 (step S6). [0115] Subsequently, the sensor signal acquisition means 41 acquires sensor signals detected by the eight infrared sensors 35, and the acquired eight sensor signals are input to the input array Al (1) of the robot information storage means 60. Write to ~ A1 (8) (step S7).

[0116] Then, the state evaluation signal acquisition means 42 uses the eight infrared sensors 35 stored in the input arrays Al (1) to A1 (8) of the robot information storage means 60 as state evaluation signals. Signal, the motor output signal (number of revolutions) stored in the output array A2 (l), A2 (2) of the robot information storage means 60, and the one step before the step stored in the robot information storage means 60. The state index value A6 is acquired (step S8). In this case, unlike the case of the initial state evaluation signal acquisition process (in the case of step S4), the output arrays A2 (1) and A2 (2) are structured in the learning process of step S5. The motor output signal (rotation speed) based on the output result of the changed network 20 is included, and the state index value as a result of the state evaluation at the previous step is also included in the state index value A 6 one step before. Therefore, the enhancement signal generated by the enhancement signal generation means 43 according to the evaluation result based on the state evaluation signal is a meaningful enhancement signal according to the appropriate state evaluation result.

[0117] Subsequently, the reinforcement signal generation means 43 is formed on the basis of the state evaluation signal acquired by the state evaluation signal acquisition means 42 based on the output result of the network 20 in which the structure is changed in the learning process of step S5. The state (behavior result) of the robot 30 that is the controlled object is evaluated, and a strengthening signal to be given as reward or punishment to the network 20 is generated according to the evaluation result (step S8). For example, if the robot 30 is traveling straight, a reinforcement signal (reward) of “+1” is generated, and if the total value of sensor signals is greater than a threshold value (for example, 0), the total value Generates an enhancement signal (reward or punishment) that is a value obtained by multiplying the increase / decrease (difference from the total value of the previous step) by 1 and a constant. Otherwise, for example, an enhancement signal of “-0.01” ( Produce a small punishment). Then, the enhancement signal generation means 43 writes the enhancement signal generated in this way into the enhancement signal A4 of the robot information storage means 60. Further, the reinforcement signal generation means 43 stores the current state index value obtained by evaluating the state (behavior result) of the robot 30 at the current step for the state evaluation processing at the next step, and stores the robot state information. Write to state index value A6 one step before means 60 and save.

[0118] Thereafter, it is determined whether or not an instruction to end the operation control of the robot 30 has been issued (step Step S9), if the end instruction is not issued, the process returns to the network 20 process of step S5, and thereafter, the process of steps S5 to S9 is repeated until the end instruction is issued, while the end instruction is issued. In this case, the operation control of the robot 30 is ended (step S10).

In the following, the learning process flow of the intermediate node 22 (intermediate OR node, intermediate AND node, test intermediate OR node, test intermediate AND node) and output node 23 by the learning means 51 will be described.

[0120] <Intermediate OR node learning process>

FIG. 7 shows an example of an intermediate OR node (real node) 100 to be learned. For example, three input links 101, 102, 103 and an output link 104 are coupled to the intermediate OR node 100, and a test link 105 is provided on the input side of the intermediate OR node 100. Yes. Each input 佃 J-link 101, 102, 103 is coupled to input 佃 J-node 106, 107, 108, respectively, output-side link 104 is coupled to output-side node 109, and test link 105 is connected to any node 110. Are randomly combined.

Here, the input to the intermediate OR node 100 by the input side links 101, 102, 103 is assumed to be X (l), X (2), X (3). More generally, if there are N input links, let X (l) -X (N). That is, an input to the intermediate OR node 100 by the kth input side link is X (k) (k = 1 to N). The output of the intermediate OR node 100 is Y. Also, let NumT be the true number of X (1) to X (N), and let R be the enhanced signal given to the intermediate OR node 100. Furthermore, the strengthening signal to be given to each input side link 101, 102, 103 is Rl (l), Rl (2), Rl (3) and given to these input side nodes 106, 107, 108. The enhancement signals to be used are R2 (l), R2 (2), and R2 (3). More generally, the enhancement signal given to the k-th input side link of interest is Rl (k) (k = 1 to N), and the enhancement signal given to the input side node is Let R2 (k) (k = 1 to N).

In FIG. 5, first, the learning means 51, based on the reinforcement signal R given to the intermediate OR node 100, sets the input side links 101, 102 according to the input / output state of the intermediate OR node 100. , 103 intermediate OR node 100 for each input link 101, 102, 103 so that the enhanced signal is distributed (propagated) to each input link 101, 102, 103 according to the contribution to output Y of node 100 The reinforcement signals Rl (1), R1 (2), R1 (3) to be given are calculated (step S50401). Also In conjunction with this, the reinforcement signals R2 (l), R2 (2), R2 (3) to be given to the human power J nodes 106, 107, 108 of each human power J links 101, 102, 103 are calculated. (Step S50402).

[0123] At this time, the input Χ (1), Χ (2), 中間 (3) to the intermediate OR node 100 refers to the input side link address C1 of the intermediate OR node 100 in the node information storage means 62. Then, the output D7 of each input side link 101, 102, 103 of the link information storage means 63 is read and obtained. Further, the output Y of the intermediate OR node 100 is obtained by reading the output C 9 of the intermediate OR node 100 in the node information storage means 62. Further, the reinforcement signal R given to the intermediate OR node 100 is obtained by reading the total value C10 of the enhancement signals of the intermediate OR node 100 in the node information storage means 62.

[0124] Then, the learning means 51 uses the following rule to provide a reinforcement signal Rl (to be given to one input-side link of interest among the N input-side links coupled to the intermediate OR node. k) (k = 1 to N) and the strengthening signal R2 (k) (k = l to N) given to the input side node of one input side link of interest. In other words, k-th (k = l to N) input side link force Determine which of the following cases 1 to 5 applies, and calculate the reinforcement signal Rl (k) for each one input side link At the same time, calculate the reinforcement signal R2 (k) for the input side node of each input side link!

[0125] Case 1: When (Υ = Τ) Λ (X (k) = F), Rl (k) = 0 and R2 (k) = 0. In this case, since the input X (k) force by the kth input side link does not contribute to the output Y of the intermediate OR node, the enhancement signal is set to 0.

Case 2: When Y = F, Rl (k) = R / N, R2 (k) = RZN. In this case, since Y = F, all the inputs X (k) (k = l to NW¾: (k) = F and contribute to the output Y evenly, so the enhancement signal is evenly distributed.

Case 3: When (Υ = Τ) Λ (NumT = 1), let Rl (k) = R, R2 (k) = R. In this case, the input by the input link of interest is X (k) = T, and the input with true force is only the input by this input link, and the contribution of this input link to the output Y is Because it is large, it gives an enhancement signal with a large absolute value.

[0128] Case 4: (Υ = Τ) Λ (NumT ≠ 1) If Λ (R≥0), Rl (k) = -RX (NumT

-D / N, R2 (k) = 0. In this case, the input from the input link of interest is X (k) = T, but since the true input is not only the input by this input side link, even if the input by this input side link is true, the output Y is caused by the input by the other input side link. Since Y = T, the contribution of this input side link to output Υ is low. Therefore, a relatively small penalty is given as the reinforcement signal Rl (k).

Case 5: When (Υ = Τ) Λ (NumT ≠ 1) Λ (R≤0), Rl (k) = RX NumT / N, R2 (k) = 0. In this case as well, as in Case 4, the input by the input link of interest is X (k) = T, but since the input of True is not only the input by this input side link, this input side link Even if the input by is untrue, the output Y is Y = T due to the input by the other input side link, so the contribution of this input side link to the output Υ is low. Also, since the punishment is given as the reinforcement signal R for the intermediate OR node of the propagation source, a punishment larger than the case 4 is given as the reinforcement signal Rl (k).

[0130] FIG. 8 shows an example of distribution of the enhancement signal calculated according to the rules of cases 1 to 5 above when the intermediate OR node of the propagation source is the intermediate OR node 100 of FIG. .

[0131] Further, the learning means 51 calculates the reinforcement signal RT to be given to the test link 105. At this time, the learning means 51 calculates the reinforcement signal assuming that the test link 105 exists as an input side link of the intermediate OR node 100 (step S50403). First, when the output Y does not change due to the addition of the test link 105 as an input side link, the input TX (x) of the test link 105, that is, the output of the test link 105 (the test link of the link information storage means 63) It is obtained by reading the output D7 of 105.) is added, and the enhancement signal RT is calculated according to the cases 1 to 5 described above. Next, if the output Υ changes due to the addition of the test link 105 as an input side link, the input TX by the test link 105, that is, the output D7 of the test link 105 is added to the input X (k), Substitute the inverted value of the output C9 (actual output) of the OR node 100, and change the sign of the intermediate OR node 100 enhancement signal C10 (actual enhancement signal sum) to R — C10 By substituting and applying the rules in Cases 1-5 above, the enhancement signal RT is calculated. [0132] Then, the enhancement signal calculated as described above, that is, the enhancement signal Rl (1), R1 (2), R1 (3) to be given to each of the input side links 101, 102, 1103, and the test The strengthening signal RT to be given to the link 105 is added to the cumulative value D8 of the strengthening signal of the link in the link information storage means 63 to update the cumulative value, overwriting the strengthening signal D9 of the link, and , Each input 佃 J link 101, 102, 103 input 佃 J node 106, 107, 108 [Reinforced signal R2 (l), R2 (2), R2 (3) is sent to the node information storage means 62 Is added to the total value C10 of the strengthening signal of the relevant node (meaning that the strengthening signal is propagated to the relevant node also from other constituent elements, meaning that they are added) (step S50404).

Subsequently, the learning unit 51 sets the cumulative value D8 of the reinforcement signal of the link in the link information storage unit 63 for each of the input side links 101, 102, and 103 as a threshold value (in this embodiment, as an example, 0). It is determined whether or not the force is below, and if it is below, the input side link is deleted (step S50405). In this case, the reverse link deletion process E5 or the non-inverted link deletion process E6 shown in FIG.

Further, in the test link 105, the learning means 51 has a cumulative value D8 of the reinforcement signal of the link in the link information storage means 63 that falls below a threshold value (in this embodiment, 0 as an example). Test link 105 is deleted (step S50406) _{o In} this case, test reverse link deletion process E7 or test non-reverse link in Fig. 12 described later is deleted. Delete process E8. Then, a new test link coupled to an arbitrary node is randomly generated and registered in the test link address C4 of the intermediate OR node 100 of the node information storage means 62.

[0135] Further, the learning means 51 determines whether or not the cumulative value D8 of the reinforcement signal of the link in the link information storage means 63 exceeds the threshold for the test link 105. Uses the test link address C4 of the intermediate OR node 100 of the node information storage unit 62, the address B2 of the intermediate OR node 100, and the network address C3 in order to promote the test link 105 to a real link for practical use. Thus, a real link is newly generated and additionally registered to the input side link address C1 of the intermediate OR node 100. At this time, if the inversion / non-inversion flag D5 of the link information storage means 63 for the test link 105 is True (meaning an inversion link), an inversion link is newly generated and False (a non-inversion link is set). Means) Create a new reverse link. At the same time, the test link 105 is deleted. In this case, the test inversion link deletion process E7 or the test non-inversion link deletion process E8 in FIG. 12 described later is performed. Then, a new test link coupled to an arbitrary node is randomly generated and registered in the test link address C4 of the intermediate OR node 100 of the node information storage means 62 (step S 50407).

Then, the learning means 51 deletes the intermediate OR node 100 when the power of the input side link registered in the input side link address C1 of the intermediate OR node 100 becomes 1 or less (step S50408). ). In this case, the intermediate OR node deletion process E1 shown in FIG.

Then, the learning means 51 clears the total value C 10 of the reinforcement signals of the intermediate OR node 100 in the node information storage means 62 to 0 (step S 50409).

[0138] <Intermediate AND node learning processing>

The intermediate AND node learning process is substantially the same as the intermediate OR node learning process described above. First, based on the reinforcement signal R given to the intermediate AND node, the learning means 51 determines each of the input links according to the contribution to the output Y of the intermediate AND node according to the input / output state of the intermediate AND node. The reinforcement signal Rl (k) to be given to each input link is calculated so that the reinforcement signal is distributed (propagated) to the input link. At the same time, the reinforcement signal R2 (k) given to the input side node of each input side link is calculated.

[0139] At this time, to the input X (k) and output Y in the calculation of the reinforcement signal of the intermediate OR node described above, the output D7 and intermediate of each input side link corresponding to the input side link address C1 of the intermediate AND node AND node output When substituting the value of C9, invert those values and substitute them. This is because, according to De Morgan's law, if all inputs and outputs of an AND node are reversed, it becomes an OR node.

[0140] Then, the learning means 51 is coupled to the intermediate AND node according to the same rule as the above-described intermediate OR node learning process by inverting all inputs and outputs according to the de Morgan's law as described above. Reinforcement signal Rl (k) (k = 1 to N) to be given to one input-side link of interest among the N input-side links being used, and the input side of one input-side link of interest The reinforcement signal R2 (k) (k = l to N) given to the node is calculated. Ie K-th (k = l to N) input side link force Judgment as to which of the above cases 1 to 5 applies, and the reinforcement signal Rl (k) is calculated for each one input side link. The reinforcement signal R2 (k) is calculated for the input side node of one input side link.

[0141] Fig. 9 shows the intermediate AND node power of the propagation source. In the case of having three input side links like the intermediate OR node 100 in Fig. 7, the reinforcement signal calculated according to the rules of cases 1 to 5 described above is used. Examples of distribution are shown. Fig. 9 shows the inputs / outputs X (l), X (2), X (3), Y in Fig. 8 inverted and other enhanced signals R, Rl (1), R1 (2), R1 (3), R2 (l), R2 (2) and R2 (3) are left as they are.

[0142] Further, the learning means 51 calculates the reinforcement signal RT to be given to the test link. At this time, the learning means 51 calculates the reinforcement signal assuming that the test link force exists as an input link of the intermediate AND node. First, if the output Y does not change due to the addition of the test link as the input side link, the input TX (ie, the test link output of the link information storage means 63) is input to the input X (k). Is obtained by inverting the output D7 of the intermediate AND node and substituting the value obtained by inverting the output C9 of the intermediate AND node for the output 、. Calculate. Next, if the output Y changes due to the addition of the test link as the input side link, the input TX by the test link, that is, the output D7 of the test link is inverted and added to the input X (k). The value of the intermediate AND node output C9 (actual output) is substituted, and the sign of the intermediate AND node enhancement signal sum C10 (actual enhancement signal sum) is changed to R—substituting C10, The reinforcement signal RT is calculated by applying the rules of cases 1 to 5 described above.

[0143] The enhancement signal calculated as described above, that is, the enhancement signal Rl (k) (k = l to N) given to each input side link and the enhancement signal RT given to the test link. Is added to the cumulative value D8 of the link strengthening signal in the link information storage means 63 to update the cumulative value, and overwrites the strengthening signal D9 of the link, and the input side node of each input side link The enhancement signal R2 (k) (k = l to N) given to the node is added to the total value C10 of the enhancement signal of the node in the node information storage means 62 (the node also includes other component elements). Since the reinforcement signal is propagated, it means to add them). Subsequently, for each input side link, the learning unit 51 sets the cumulative value D8 of the enhancement signal of the link in the link information storage unit 63 as a threshold value (in this embodiment, it is set to 0 as an example). Judgment is made whether or not the force is below. If it is below, the input side link is deleted. In this case, the reverse link deletion process E5 or the non-inverted link deletion process E6 shown in FIG.

[0145] In addition, for the test link, the learning unit 51 determines whether or not the cumulative value D8 of the enhancement signal of the link in the link information storage unit 63 is lower than a threshold value (in this embodiment, 0 as an example). If it is below, delete the test link. In this case, the test inversion link deletion process E7 or the test non-inversion link deletion process E8 in FIG. 12 described later is performed. Then, a new test link coupled to an arbitrary node is randomly generated and registered in the test link address C4 of the intermediate AND node of the node information storage means 62.

[0146] Further, the learning means 51 determines whether or not the cumulative value D8 of the enhancement signal of the link in the link information storage means 63 exceeds the threshold for the test link. To the actual link, the actual link is newly created by using the test link address C4 of the intermediate AND node of the node information storage means 62, the address B2 of the intermediate AND node, and the network address C3. Generated and additionally registered to the input AND link address C1 of the intermediate AND node. At this time, if the inversion 'non-inversion flag D5 of the link information storage means 63 for the test link is True (meaning an inversion link), a new inversion link is generated and False (meaning a non-inversion link). In the case of, a new non-inverted link is created. At the same time, the test link is deleted. In this case, the test inversion link deletion process E7 or the test non-inversion link deletion process E8 in FIG. Then, a new test link coupled to an arbitrary node is randomly generated and registered in the test link address C4 of the intermediate AND node of the node information storage means 62.

Then, the learning means 51 deletes the intermediate AND node when the power of the input side link registered in the input side link address C1 of the intermediate AND node is 1 or less. In this case, the intermediate AND node deletion process E2 in FIG.

Then, the learning means 51 clears the total value C10 of the reinforcement signal of the intermediate AND node in the node information storage means 62 to zero. [0149] <Test intermediate OR node learning process>

The test intermediate OR node learning process is a simplified version of the previously described intermediate OR node learning process (see Fig. 7). First, the learning means 51, based on the reinforcement signal R given to the test intermediate OR node, according to the input / output state of the test intermediate OR node, the first and second input side test links (FIG. (As in 10) The first and second input sides so that the enhancement signal is distributed (propagated) to the first and second input test links according to the contribution to the output Y of the test intermediate OR node. The reinforcement signals Rl (1) and R1 (2) given to the test link are calculated. However, unlike the learning process of the intermediate OR node (see Fig. 7), in the learning process of the test intermediate OR node, the reinforcement signal Rl (l) given to the first and second input side test links , Rl (2) is calculated, and the enhancement signals R2 (1) and R2 (2) to be given to each input node of the first and second input test links are not calculated!

[0150] At this time, the learning means 51 is exactly the same rule as the above-described intermediate OR node learning process, and corresponds to any of the first and second input-side test link forces described above. Each force is judged and the reinforcement signals Rl (1) and R1 (2) are calculated. Since there is no test link to be registered at the test link address C4 of the node information storage means 62 in the test intermediate OR node, the calculation of the reinforcement signal RT to be given to the test link corresponding to the test link address C4 is as follows. Do not do.

[0151] Then, the reinforcement signals Rl (l) and Rl (2) to be given to the first and second input side test links calculated as described above are used to strengthen the link of the link information storage means 63. The cumulative value is updated by adding to the cumulative value D8 of the signal, and overwritten on the enhancement signal D9 of the link. Note that the reinforcement signals R2 (l) and R2 (2) to be given to the input side nodes of the first and second input side test links are not calculated. Processing to add to the total value C10 of the enhancement signal is not performed.

[0152] Subsequently, the learning means 51 sets the cumulative value D8 of the enhancement signal of the link in the link information storage means 63 for each of the first and second input-side test links as a threshold value (in this embodiment, as an example) It is determined whether or not the force is lower than 0. If it is lower, the input test link is deleted. In this case, the test inversion link deletion process E7 or the test non-inversion link deletion process E8 in FIG. 12 described later is performed. Note that the first input side test line In this case, it is the second input side test link that is deleted in the end, since a sufficiently large positive signal is stored so that it is not deleted. In addition to the case where the second input-side test link is deleted in this way, the second input-side test link is deleted along with the deletion of the input-side node (real node) of the second input-side test link. When the number of links reaches 1 (ie, when only the first input test link is reached), including the case where the is deleted, a new second input test that joins to any node A link is randomly generated and registered in the test link OR node CI (C1 (2)) of the test intermediate OR node of the node information storage means 62.

Further, the learning means 51 is the first input corresponding to the first input-side test link address C1 (which is the first array C1 (1)) of the test intermediate OR node of the node information storage means 62. Set the accumulated value D8 of the enhanced signal on the input side test link to a sufficiently large positive value (for example, 10 ^3QQ ) so that the accumulated value D8 always keeps a sufficiently large positive value. Make sure that the side test link is not deleted.

[0154] <Test intermediate AND node learning process>

The test intermediate AND node learning process is a simplification of the intermediate AND node learning process described above. First, the learning means 51 determines the first and second input side test links (FIG. 10 described later) according to the input / output state of the test intermediate AND node based on the reinforcement signal R given to the test intermediate AND node. Test intermediate AND node to the first and second input test links so that the enhanced signal is distributed (propagated) to the first and second input test links according to the contribution to the output Y of the test intermediate AND node The reinforcement signals Rl (l) and Rl (2) to be given are calculated. However, unlike the intermediate AND node learning process, in the test intermediate AND node learning process, the reinforcement signals Rl (l) and R1 ( Only 2) is calculated, and the enhanced signals R2 (1) and R2 (2) to be given to each input node of the first and second input test links are not calculated!

[0155] At this time, the learning means 51 is exactly the same rule as in the above-described intermediate AND node learning process, and the first and second input-side test link forces correspond to any of the cases 1 to 5 described above. Judgment is made for each of the corresponding keys, and the reinforcement signals Rl (l) and R1 (2) are calculated. In the test intermediate AND node, the test link address C4 of the node information storage means 62 should be registered. Since there is no link, the enhancement signal RT to be assigned to the test link corresponding to this test link address C4 is not calculated.

Then, the reinforcement signals Rl (l) and Rl (2) given to the first and second input-side test links calculated as described above are used to strengthen the link of the link information storage means 63. The cumulative value is updated by adding to the cumulative value D8 of the signal, and overwritten on the enhancement signal D9 of the link. Note that the reinforcement signals R2 (l) and R2 (2) to be given to the input side nodes of the first and second input side test links are not calculated. Processing to add to the total value C10 of the enhancement signal is not performed.

Subsequently, the learning means 51 sets the cumulative value D8 of the enhancement signal of the link in the link information storage means 63 for each of the first and second input side test links as a threshold value (in this embodiment, as an example) It is determined whether or not the force is lower than 0. If it is lower, the input test link is deleted. In this case, the test inversion link deletion process E7 or the test non-inversion link deletion process E8 in FIG. 12 described later is performed. Note that the first input side test link is stored with a sufficiently large positive signal so that it is not deleted. It becomes a test link. In addition to the case where the second input-side test link is deleted in this way, the second input-side test link is deleted along with the deletion of the input-side node (real node) of the second input-side test link. When the number of links reaches 1 (ie, when only the first input test link is reached), including the case where the is deleted, a new second input test that joins to any node A link is randomly generated and registered in the test-side AND node input-side test link address CI (C1 (2)) of the node information storage means 62.

Furthermore, the learning means 51 is the first input corresponding to the first input-side test link address C1 of the test intermediate AND node of the node information storage means 62 (which is the first array C1 (1)). Set the accumulated value D8 of the enhanced signal on the input side test link to a sufficiently large positive value (for example, 10 ^3QQ ) so that the accumulated value D8 always keeps a sufficiently large positive value. Make sure that the side test link is not deleted.

[0159] <Learning process of output node>

The learning process for the output node is substantially the same as the learning process for the intermediate OR node described above. Ahead First, the learning means 51 determines each input side link according to the contribution to the output Y of the output node of each input side link according to the input / output state of the output node based on the reinforcement signal R given to the output node. The reinforcement signal Rl (k) (k = l to N) to be given to each input side link is calculated so that the reinforcement signal is distributed (propagated). At the same time, the reinforcement signal R2 (k) (k = 1 to N) given to the input side node of each input side link is calculated.

[0160] At this time, the learning means 51 has exactly the same rule as the learning process of the intermediate OR node described above, and one of the N input side links coupled to the output node is focused on. Reinforcement signal Rl (k) (k = 1 to N) given to the link and reinforcement signal R2 (k) (k = l) given to the input side node of one input side link of interest ~ N) is calculated. In other words, the k-th (k = l to N) input side link force It is determined which of the above cases 1 to 5 applies, and the reinforcement signal Rl (k ) And the reinforcement signal R2 (k) for each input node of each input link.

[0161] Further, the learning means 51 calculates the reinforcement signal RT to be given to the test link coupled to the input side of the output node (the test link corresponding to the test link address C4 of the output node). At this time, the learning means 51 calculates the reinforcement signal assuming that the test link exists as the input side link of the output node. First, when the output Y does not change due to the addition of the test link as the input side link, the input TX (X) of the test link, that is, the output of the test link (the output D of the test link of the link information storage means 63) 7) is added, and the enhancement signal RT is calculated according to the cases 1-5 described above. Next, when the output Υ changes due to the addition of the test link as the input side link, the input TX by the test link, that is, the output D7 of the test link is added to the input X (k), and the output node of 出力 is added to Υ. Substituting the inverted value of output C9 (actual output), and substituting C 10 with the sign of the total value of the enhancement signal C10 (total value of the actual enhancement signal) changed to R By applying the rules of Cases 1-5, the enhancement signal RT is calculated.

[0162] Then, the enhancement signal calculated as described above, that is, the enhancement signal Rl (k) (k = l to N) given to each input side link and the enhancement signal RT given to the test link. To the cumulative value D8 of the link enhancement signal of the link information storage means 63 to update the cumulative value. In addition to overwriting the strengthening signal D9 of the link, the strengthening signal R2 (k) (k = l to N) to be given to the input side node of each input side link is stored in the node information storage means 62. It is added to the total value C10 of the enhancement signal of the node (the enhancement signal is propagated to the node from other constituent elements, meaning that they are added).

[0163] Subsequently, the learning means 51 sets a threshold value (in this embodiment, 0 as an example) for the cumulative value D8 of the reinforcement signal of the link information storage means 63 for each input side link. Judgment is made whether or not the force is below. If it is below, the input side link is deleted. In this case, the reverse link deletion process E5 or the non-inverted link deletion process E6 shown in FIG.

[0164] In addition, for the test link, the learning unit 51 determines whether or not the cumulative value D8 of the strengthening signal of the link in the link information storage unit 63 is below a threshold value (in this embodiment, it is 0 as an example). If it is below, delete the test link. In this case, the test inversion link deletion process E7 or the test non-inversion link deletion process E8 in FIG. 12 described later is performed. Then, a new test link coupled to an arbitrary node is randomly generated and registered in the test link address C4 of the output node of the node information storage means 62.

[0165] Further, for the test link, the learning means 51 determines whether or not the cumulative value D8 of the enhancement signal of the link in the link information storage means 63 exceeds the threshold value. To create a real link using the test link address C4 of the output node of the node information storage means 62, the address B3 of this output node, and the network address C3. And additionally register with the input side link address C1 of the output node. At this time, when the inversion / non-inversion flag D5 of the link information storage means 63 for the test link is True (meaning an inversion link), an inversion link is newly generated and False (meaning a non-inversion link). In this case, a new non-inverted link is generated. At the same time, the test link is deleted. In this case, the test inversion link deletion process E7 or the test non-inversion link deletion process E8 in FIG. 12 described later is performed. Then, a new test link coupled to an arbitrary node is randomly generated and registered in the test link address C4 of the output node of the node information storage means 62.

[0166] Then, the learning means 51 receives the input registered at the input side link address C1 of the output node. When the power of the power side link becomes 0, the network information storage means 61 is referred to by the network address C3, and the node address randomly selected from the input node address Bl, the intermediate node address B2, and the output node address B3 Then, using the address of the output node and the network address C3, either an inverted link or a non-inverted link is randomly selected to generate a new actual link, and the generated actual link address is used as the output node. Add to the link address C1. In this case, an inversion link initialization process G9 or a non-inversion link initialization process G10 in FIG. 11 described later is performed.

[0167] Then, the learning means 51 clears the total value C10 of the reinforcement signal of the output node of the node information storage means 62 to zero.

[0168] <Inverted link learning process>

Since the reverse link learning process is the same as the non-inverted link learning process described later, description thereof is omitted.

[0169] <Non-inverted link learning process>

FIG. 10 shows an example of a non-inverted link (real link) 120 to be learned. An input side node 121 is coupled to the input side of the non-inverted link 120, and an output side node 122 is coupled to the output side. In addition, a test node 123 (in the illustrated example, a force test intermediate OR node that is a test intermediate AND node) may be provided along with the non-inverted link 120. First and second input side test links 124, 125 are coupled to the input side of the test node 123, and an output side test link 126 is coupled to the output side. However, since the output side test link 126 does not perform substantial information transmission in this embodiment, it is indicated by a two-dot chain line. The first input test link 124 is coupled to the input node 121 of the non-inverted link 120, and the second input test link 125 is randomly coupled to any node 127 for output test. Link 126 is coupled to output node 122 of non-inverted link 120.

[0170] Here, the output of the non-inverted link 120 is set to Y, the output of the test node 123 is set to TY, the enhancement signal given to the non-inverted link 120 is set to R 1, and the input side node of the non-inverted link 120 is set. 1 The enhancement signal given to 21 is R2, and the enhancement signal given to test node 123 is RT. In FIG. 6, first, the learning means 51, based on the reinforcement signal R1 given to the non-inverted link 120 of the propagation source, the output Y of the non-inverted link 120 of the propagation source and the test node 123 of the propagation destination. In accordance with the state of the output TY, the reinforcement signal RT to be given to the propagation destination test node 123 is calculated (step S50601).

At this time, the enhancement signal R1 given to the non-inverted link 120 of the propagation source is obtained by reading the enhanced signal D9 of the non-inverted link 120 of the link information storage unit 63. Further, the output Y of the non-inverted link 120 of the propagation source can be obtained by reading the output D7 of the non-inverted link 120 of the link information storage means 63. Further, the output TY of the test node 123 of the transmission destination refers to the test node address D4 of the non-inverted link 120 of the link information storage means 63, and stores the node information about the test node 123 corresponding to the test node address D4. It is obtained by reading the output C9 of the node of means 62.

[0173] Then, the learning means 51 calculates the reinforcement signal RT to be given to the propagation destination test node 123 according to the following rules.

Case 1: When (R1> 0) Λ (TY = Y), RT = 0. In this case, since TY = Y, if the non-inverted link 120 exists, the test node 123 is not necessary because it is sufficient.

[0175] Case 2: When (R1> 0) Λ (TY ≠ Y), test node 123 is deleted and a new test node is generated (the second input-side test link of the test node to be generated is Randomly connect to an arbitrary node.), Register to the test node address D4 of the non-inverted link 120 of the link information storage means 63. At this time, when the AND'OR node flag C5 force True (meaning an AND node) of the output node 122 corresponding to the output node address D2 of the non-inverted link 120 is generated, a test intermediate OR node is generated. If it is False (meaning an OR node), a test intermediate AND node is generated. In this case, R1> 0 and the non-inverted link 120 works well, whereas TY ≠ Y and the test node 123 outputs differently than the non-inverted link 120. This is because the test node 123 is considered to work badly.

[0176] Case 3: When (R1≤0) Λ (TY = Y), RT = R1. In this case, R1≤0 and non-inverted link 120 is working badly, whereas TY = Y and test node Since 123 outputs the same as the non-inverted link 120, the test node 123 is given a punishment as an enhancement signal as in the case of the non-inverted link 120.

[0177] Case 4: When (R1≤0) Λ (TY ≠ Y), RT = —R1. In this case, Rl≤

0, and the non-inverted link 120 is working badly, whereas TY ≠ Y, and the test node 123 outputs differently from the non-inverted link 120. Unlike the non-inverted link 120, it rewards as an enhanced signal.

[0178] The enhancement signal RT to be given to the test node 123 calculated as described above is

Then, it is added to the total value C10 of the reinforcement signal of the node in the node information storage means 62 (step S50602 in FIG. 6).

[0179] Subsequently, when the learning means 51 has the AND'OR node flag C5 force True (meaning an AND node) of the test node 123 of the node information storage means 62, the above-described test intermediate for the test node 123 is performed. An AND node learning process is performed, and if it is False (which means an OR node), the above-described test intermediate OR node learning process is performed (step S50603).

[0180] Thereafter, the learning means 51 sets the cumulative value D8 of the reinforcement signals of these links in the link information storage means 63 for both the first and second input side test links 124, 125 of the test node 123 as the threshold value. In order to determine whether or not the force exceeds V, and the deviation also exceeds the threshold, the test node 123 is promoted to a real node and put into practical use. A new real node is generated using the address of the non-inverted link 120 to be learned and the network address D3, and additionally registered to the intermediate node address B2 of the network information storage means 61 with reference to the network address D3. (Step S50604). At this time, when the AND 'OR node flag C5 of the test node 123 of the node information storage means 62 is True (meaning an AND node), an intermediate AND node is generated and False (meaning an OR node) is generated. Sometimes an intermediate OR node is created. At the same time, the test node 123 is deleted, and the non-inverted link 120 to be learned is also deleted.

[0181] <Test reverse link learning process>

The test reversal link does not learn.

[0182] <Test non-inverted link learning process>

Test non-inverted links do not learn. FIG. 11 shows an initialization configuration. In Fig. 11, robot initialization processing G1, network initialization processing G2, input node initialization processing G3, and output node initialization processing G4 are performed only immediately after starting the program and starting the operation control of robot 30. Force Initialization processing of other nodes and links G5 to G12 are performed not only immediately after the operation control of the robot 30 is started, but also every time a node or link is generated by subsequent learning. In addition, the initialization method differs depending on the type of node and the type of link, and there are cases where a plurality of initializations are used properly depending on the situation. In addition, it may be necessary to perform another initialization process within the initialization process, and each initialization is related. Figure 11 shows the relationship between these initializations. In Fig. 11, it is necessary to initialize the tip of the arrow to initialize the root of the arrow. The solid line in the figure means that it must be used, and the dotted line means that it may be used. In the figure, the alternate long and short dash line indicates a case where a test node or test link is changed to a real node or real link by promotion.

[0184] <Robot initialization processing Gl>

In the robot initialization process G1, the input array A1 and the output array A2 of the robot information storage means 60 do not need to be initialized. As an example, in the present embodiment, the network address A3 is initialized with the number of input nodes 128 and the number of output nodes 32, and the obtained network address is registered. A4, A5, A6 shall be 0.

[0185] <Network initialization processing G2>

In the network initialization process G2, the information stored in the network information storage means 61 is initialized. The network 20 is initialized by specifying the number of input nodes 21 and the number of output nodes 23. The input node address B1 uses the address of the network 20 to be initialized, performs the initialization process G3 of the input node 21 for the specified number of input nodes, and obtains the address of the input node 21 obtained. Are registered sequentially. Since registration to the intermediate node address B2 is performed every time the intermediate node 22 is generated, it is not necessary to initialize the intermediate node address B2. For the output node address B3, the address of the network 20 to be initialized is used, the initialization process G4 of the output node 23 is performed for the specified number of output nodes, and the obtained output node 23 address is sequentially sign up. The enhancement signal B4 for network 20 is zero. [0186] <Input node initialization processing G3>

The input node 21 specifies and initializes the address of the network 20 to which the input node 21 belongs (network address C3 stored in the node information storage means 62). Since input node 21 is a dummy node, there is no need to initialize input side link address C1. Since registration to the output side link address C2 is performed every time an output side link coupled to the output side of the input node 21 is generated, there is no need to initialize the output side link address C2. The network address C3 is overwritten with the specified network 20 address. Since input node 21 is a dummy node, there is no need to initialize test link address C4 and AND 'OR node flag C5. Since this node is the input node 21, the input node flag C6 is set to True, and the output node flag C7 and the test node flag C8 are not initialized or set to False. Since the output C9 of the node is set by the input conversion means 52 (see step S507 in FIG. 4), there is no need for initialization. The total value C10 of the enhancement signal is 0.

[0187] <Output node initialization processing G4>

The output node 23 specifies and initializes the address of the network 20 to which the output node 23 belongs (network address C3 stored in the node information storage means 62). For the input side link address C1, a node address randomly selected from the input node address Bl and the output node address B3 of the network information storage means 61 referenced by the designated network address C3 (in addition, the intermediate node address B2) Since no data is entered at this point, it cannot be selected.), And the address of the output node 23 to be initialized and the specified network address C3 are used. In this way, the real link 141 coupled to the randomly selected node 140 is newly generated by randomly selecting the V of the inverted link or the non-inverted link and the shift (the inverted link initialization process G9 in FIG. 11). Alternatively, non-inverted link initialization processing G10 is performed), and the generated address of the real link 141 is stored in the input side link address C1. At this time, a test node 142 attached to the actual link 141 is also newly generated (the test intermediate OR node initialization process G7 or the test intermediate AND node initialization process G8 in FIG. 11 is performed) and further coupled to the node 140. A first input test link 143 and an output test link 144 coupled to the output node 23 to be initialized; A second input test link 146 that is randomly coupled to an arbitrary node 145 is newly generated (the test reverse link initialization process G 11 or the test non-reverse link initialization process G 12 in FIG. 11 is performed). O)

[0188] There is no need to initialize the output side link address C2. The network address C3 is overwritten with the specified network 20 address.

[0189] For the test link address C4, a node address randomly selected from the input node address Bl and output node address B3 of the network information storage means 61 referenced by the designated network address C3 (note that the intermediate node address B2 Since no data is entered at this point, it is not a selection target.), And the address of the output node 23 to be initialized and the specified network address C3 are used. As shown in Fig. 11, a new test link 148 coupled to a randomly selected node 147 is generated by randomly selecting either the test reversal link or the test non-reversal link (the test in Fig. 11). Reverse link initialization process G 11 or test non-reverse link initialization process G 12).

[0190] The AND'OR node flag C5 is set to False (meaning an OR node) because the output node 23 is an OR node in this embodiment. Since the node is the output node 23, the input node flag C6 is set to False, the output node flag C7 is set to True, and the test node flag C8 is set to False. Furthermore, the node output C9 is set to False, and the total value C10 of the enhancement signal is set to 0.

[0191] <Intermediate OR node initialization processing G5>

Intermediate OR node initialization processing G5 is performed by specifying and referencing the actual link to be deleted (the force of any link other than the test link). In this process, one real link is deleted from the network 20, and the test node associated with the real link (the test node corresponding to the test node address D4 of the link information storage means 63 for the real link) is transferred to the real node. It is the processing power that is used when promoting.

[0192] For the input side link address C1, as shown in FIG. 14, the test intermediate OR node 161 (the test node address of the real link 160 of the link information storage means 63) associated with the real link 160 designated as the deletion target Test node corresponding to D4) first and second input side test Use the addresses CI (CI (1), CI (2)) of the strings 162 and 163 and the address of the intermediate OR node 180 to be initialized that is generated (the address of the memory area that is also trying to secure this). If the first input test link 162 is a test reversal link, a reversal link (real link) is newly added. If the first input test link 162 is a test non-reverse link, a non-reverse link (real link) is renewed. (Inverted link initialization process G9 or non-inverted link initialization process G10 in Fig. 11 is performed.), And the generated actual link is registered as input side link 181 of intermediate OR node 180. Next, register the input link 181 address as the input link address C1. Similarly, if the second input test link 163 is a test inverting link, the inverting link (real link) is used. If the second input test link 163 is a test non-inverting link, the non-inverting link (real link) is used. ) Is newly generated (inverted link initialization processing G9 or non-inverted link initialization processing G10 in FIG. 11 is performed), and the generated actual link is used as the input side link 182 of the intermediate OR node 18 0 To register, the address of the input side link 182 is registered to the input side link address C1. In other words, the inversion and non-inversion of the first input side test link 162 and the input side link 181 are matched, and the inversion and non-inversion of the second input side test link 163 and the input side link 182 are Match. At this time, the input side node of the input side link 181 is the node 164 coupled to the input side of the first input side test link 162 (that is, the input side node of the actual link 160 to be deleted), and the input side link The input side node of 182 is the node 165 coupled to the input side of the second input side test link 163. Although not shown in the figure, the newly generated input links 181, 182 are provided with test nodes respectively associated with them (test intermediate OR node initialization process G7 or test intermediate AND in FIG. 11). Execute node initialization processing G8.) O

Then, strengthen the first input side link 181 of the intermediate OR node 180 (the input side link corresponding to the input side link address C (l) stored at the beginning of the array out of the input side link address C1). Initialization is performed by overwriting the accumulated signal value D8 with the accumulated value D8 of the enhancement signal of the actual link 160 specified as the deletion target. This is because the test link OR node initialization process G7 and the test intermediate AND node initialization process G8, which will be described later, the actual link 160 to be deleted and the first input side test of the test intermediate OR node 161. Since the inversion and non-inversion of link 162 match, after all, the first input link 181 is the actual link 16 Since 0 and inversion * non-inversion match, the first input link 181 takes over the strengthening signal of the actual link 160 to be deleted.

[0194] As shown in Fig. 14, the output-side link address C2 includes the generated intermediate OR node 180 address to be initialized and the output-side node address D2 of the real link 160 specified as the deletion target. And the network address D3 of the real link 160 designated as the deletion target, a non-inverted link (real link) is newly initialized and generated (the non-inverted link initialization process G10 in FIG. 11 is performed). In order to register the generated real link as the output side link 183 of the intermediate OR node 180, the address of the output side link 183 is registered in the output side link address C2. At this time, the output side node of the output side link 183 is the node 167 coupled to the output side of the output side test link 166 of the test intermediate OR node 161 (that is, the output side node of the actual link 160 to be deleted). . It should be noted that the force is not shown in the figure. The newly generated output side link 183 has a test node associated with it (the test node OR node initialization process G7 or the test intermediate AND node initialization process in FIG. 11). Do G8).

[0195] In addition, initialization is performed by overwriting the cumulative value D8 of the enhanced signal of the output link 183, which is the generated non-inverted link (real link), with the cumulative value D8 of the enhanced signal of the actual link 160 to be deleted. To do.

[0196] The network address C3 is overwritten with the network address D3 of the real link 160 specified as the deletion target.

[0197] For the test link address C4, one node address randomly selected from the input node address Bl, the intermediate node address B2, and the output node address B3 of the network information storage means 61 referenced by the designated network address C3. Test link 185 coupled to randomly selected node 184, as shown in FIG. 14, using the generated intermediate OR node 180 address to be initialized and network address D3. Either a reverse link or a test non-reverse link is randomly selected and newly generated (perform test reverse link initialization process G 11 or test non-reverse link initialization process G 12 in Fig. 11;), The generated address of the test link 185 is registered in the test link address C4.

[0198] Since the initialization target is the intermediate OR node 180, the AND'OR node flag C5 is set to False (meaning an OR node), the input node flag C6 is set to False, and the output node flag is set. Group C7 is False, and test node flag C8 is False. Furthermore, the node output C9 is set to False, and the total value C10 of the enhancement signal is set to 0.

[0199] <Intermediate AND node initialization processing G6>

The intermediate AND node initialization process G6 is substantially the same as the intermediate OR node initialization process G5 described above. In other words, the intermediate AND node initialization process G6 is performed by designating and referring to the actual link to be deleted (any of the links other than the test link). In this process, one real link is deleted from the network 20, and the test node attached to the real link (the test node corresponding to the test node address D4 of the link information storage means 63 for the real link) is changed to the real node. It is the processing power that is used to promote to

[0200] For the input side link address C1, the first and second test intermediate AND nodes (the test node corresponding to the test link address D4 of the real link in the link information storage means 63) associated with the real link designated as the deletion target Using the address CI (CI (1), C1 (2)) of the second input side test link and the address of the intermediate AND node to be initialized (address of the memory area to be secured) If the first input test link is a test reversal link, a reverse link (real link) is newly added. If the first input test link is a test non-reverse link, a non-reverse link (real link) is newly set. Generate by initialization (reverse link initialization process G9 or non-reverse link initialization process G10 in Fig. 11), and register the generated real link as the input link of the intermediate AND node Therefore, the address of the actual link is registered in the input side link address C1. Similarly, if the second input test link is a test reversal link, a reverse link (real link) is newly added. If the second input test link is a test non-reverse link, a non-reverse link (real link) is renewed. (Inverted link initialization process G9 or non-inverted link initialization process G10 in Fig. 11 is performed.), And the generated actual link is registered as an input link of the intermediate AND node. Register the link address in the input side link address C1.

[0201] After that, strengthen the first input link of the intermediate AND node (the input link corresponding to the input link address C (l) stored at the beginning of the array out of the input link address C1) Initialization is performed by overwriting the accumulated value D8 of the signal with the accumulated value D8 of the enhancement signal of the real link specified as the deletion target. [0202] For the output side link address C2, the generated intermediate AND node address to be initialized, the output side node address D2 of the real link specified as the deletion target, and the real link specified as the deletion target A non-inverted link (actual link) is newly initialized and generated using the network address D3 (the non-inverted link initialization process G10 in Fig. 11 is performed), and the generated actual link is intermediate ANDed. In order to register as the output link of the node, the actual link address is registered in the output link address C2. Also, initialization is performed by overwriting the accumulated value D8 of the strengthened signal of the output link that is the generated non-inverted link (actual link) with the accumulated value D8 of the strengthened signal of the actual link to be deleted.

[0203] The network address C3 is overwritten with the network address D3 of the real link specified for deletion.

[0204] For the test link address C4, one node address randomly selected from the input node address Bl, intermediate node address B2, and output node address B3 of the network information storage means 61 referred to by the designated network address C3. The test link coupled to the randomly selected node using the address of the intermediate AND node to be initialized generated and the network address D3 can be used as a test inverted link or a test non-inverted link. Either one of them is selected at random, and a new one is generated (test inversion link initialization processing G11 or test non-inversion link initialization processing G12 in Fig. 11 is performed), and the generated test link address is set to test link address C4. Register with.

[0205] Since the initialization target is the intermediate AND—, the AND 'OR node flag C5 is True.

The input node flag C6 is set to False, the output node flag C7 is set to False, and the test node flag C8 is set to False. Further, the output C9 of the node is set to False, and the total value C10 of the enhancement signal is set to 0.

[0206] <Test intermediate OR node initialization processing G7>

Test intermediate OR node initialization process G7 is performed by specifying the real link and the network address D3 of this real link. This is also the force that is always attached to one actual link (registered at the actual link test node address D4).

[0207] As shown in FIG. 15, the input-side test link address C1 includes the input-side node address D1 of the designated real link 200 and the address of the generated test intermediate OR node 201. Using the network address D3 of the specified real link 200, the test reverse link is used if the specified real link 200 is a reverse link, and the test non-reverse if the specified real link 200 is a non-reverse link. A new link is initialized and created (test reverse link initialization process G11 or test non-reverse link initialization process G12 in Fig. 11 is performed), and the generated link is designated as the first input test link 202 and its address Is registered as the first input side test link address CI (1), and the accumulated value D8 of the reinforcement signal of the first input side test link 202 is set to a sufficiently large positive value (for example, 10 ³ °°). Overwrite. This is to prevent the first input side test link 202 from being deleted.

[0208] Further, for the input side test link address C1, the input node address B1, the intermediate node address B2, and the output node address B3 of the network information storage means 61 referred to by the network address D3 of the designated real link 200 are used. Using one randomly selected node address, the address of the test intermediate OR node 201 to be initialized to be generated, and the network address D3 of the specified real link 200, as shown in Figure 15, randomly A second input test link 204 coupled to the selected node 203 is newly generated by randomly selecting a test inversion link or a test non-inversion link (test inversion in Fig. 11). Link initialization process G11 or test non-inverted link initialization process G12)), and the generated second input side test link 204 is added. Registering a scan as a second input test link address Cl (2).

[0209] The output side test link address C2 does not need to be initialized. This is because the strengthened signal is not stored and the information is not transmitted in the output side test link 205 of the test intermediate OR node 201. Accordingly, in FIG. 15, the output side test link 205 is indicated by a one-dot chain line.

[0210] The network address C3 is overwritten with the network address D3 of the specified real link 200. In the case of a test node, there is no test link to be registered in the test link address C4, so it is necessary to initialize the test link address C4! /.

[0211] Since the initialization target is a test intermediate OR node, AND 'OR node flag C5 is set to False (meaning an OR node), input node flag C6 is set to False, and output node flag C7 is set to False, and test node flag C8 is True. In addition, the node output C9 Is False, and the total value CIO of the enhancement signal is 0.

[0212] <Test intermediate AND node initialization processing G8>

The test intermediate AND node initialization process G8 is substantially the same as the test intermediate OR node initialization process G7 described above. That is, the test intermediate AND node initialization process G8 is performed by designating the actual link and the network address D3 of the actual link. This is also the force that is always attached to one real link (registered in the test node address D4 of the real link) during the test.

[0213] For the input side test link address C1, the specified real link input side node address D1, the generated test intermediate AND node address, and the specified real link network address D3 If the specified real link is a reverse link, a test reverse link is generated. If the specified real link is a non-reverse link, a new test non-reverse link is generated (see the test reverse link in Figure 11). Initialize processing G11 or test non-inverted link initialization processing G12)), register the generated link as the first input test link address C1 (1) Overwrite the accumulated signal D8 of the first input side test link with a sufficiently large positive value (eg, 10 ³ °°). This is to prevent the first input test link from being deleted.

[0214] Further, for the input side test link address C1, from the input node address Bl, the intermediate node address B2, and the output node address B3 of the network information storage means 61 referred to by the network address D3 of the designated real link. Combined with a randomly selected node using one randomly selected node address, the generated test intermediate AND node address to be initialized, and the network address D3 of the specified real link A second input test link is newly generated by randomly selecting either the test reversal link or the test non-reverse link (test reversal link initialization process G11 in Fig. 11 or test non-reversed link initial). ), The address of the generated second input side test link is changed to the second input side test link address. Register as C1 (2).

[0215] The output side test link address C2 does not need to be initialized. In the test link at the output side of the test intermediate AND node, the enhancement signal is not stored and the information is not transmitted. [0216] The network address C3 is overwritten with the network address D3 of the specified real link. In the case of a test node, there is no test link to be registered in the test link address C4, so it is necessary to initialize the test link address C4! /.

[0217] Since the initialization target is a test intermediate AND node, AND 'OR node flag C5 is set to True (meaning an AND node), input node flag C6 is set to False, and output node flag C7 is set to False, and test node flag C8 is True. Furthermore, the node output C9 is set to True, and the total value C10 of the enhancement signal is set to 0.

[0218] <Inverted link initialization processing G9>

Reverse link initialization process G9 has the following two cases. One is to promote the test reverse link, and the other is to create the reverse link directly without the original test reverse link. In the latter case, immediately after starting the program and starting the operation control of the robot 30, there is a case where it is generated from the output node 23 toward another node, and there is a real link connected to the output node 23. And when it is deleted, it may be generated instead.

[0219] <Reverse link initialization process G9: Initialization process using test reverse link>

When using a test reverse link, specify the original test reverse link and output node address D2 for initialization. Since the generated inverted link is due to promotion, the output node of the generated inverted link is the same node as the output node of the original test inverted link.

[0220] For the input side node address D1, register the input side node address D1 of the original test reverse link. For the output side node address D2, register the specified output side node address. For network address D3, register network address D3 of the original test reverse link.

[0221] For test node address D4, specify this generated inverted link and network address D3, and output node corresponding to the specified output node address D2 (output of the generated inverted link) Side node) AND 'OR node flag C5 force True (means AND node), test intermediate OR node, false (means OR node), test intermediate AND node, new Initialized and generated (Test intermediate OR node in Figure 11) Perform initialization processing G7 or test intermediate AND node initialization processing G8. ) And register the generated test node at test node address D4. In other words, the AND'OR between the output node of the generated inverted link and the test node associated with the inverted link is reversed.

[0222] Inverted 'Non-inverted flag D5 is set to True (meaning inverted link), and test link flag D6 is set to False. The output D7 of the link is set to False, the accumulated value D8 of the strengthening signal is overwritten with the accumulated value D8 of the strengthening signal of the specified test inversion link, and the strengthening signal D9 is set to 0.

[0223] <Inverted link initialization processing G9: Direct initialization processing without using test inversion link> Direct initialization processing without using test inversion link consists of input node address D1 and output node address D2. Specify the network address D3. Register the specified addresses in D1 to D3. Since the inverted link (actual link) generated in the initialization process in this case only comes out from the output node 23, the output side node of the generated inverted link is the output node 23. On the other hand, the input side node of the generated reverse link is determined randomly.

[0224] D5 to D9 are initialized before the test node address D4 is initialized. Inverted and non-inverted flag D5 is set to True (meaning inverted link), and test link flag D6 is set to False. The link output D7 is set to False, the enhancement signal accumulated value D8 is set to 0, and the enhancement signal D9 is set to 0.

[0225] For test node address D4, specify this generated inverted link and network address D3, and output node corresponding to the specified output node address D2 (output of the generated inverted link) Side node) AND 'OR node flag C5 force True (means AND node), test intermediate OR node, false (means OR node), test intermediate AND node, new Generate and initialize (perform test intermediate OR node initialization process G7 or test intermediate AND node initialization process G8 in Fig. 11), and register the generated test node at test node address D4. In other words, the AND'OR between the output node of the generated inverted link and the test node associated with the inverted link is reversed. The reason why the test node address D4 is initialized later is that the inversion / non-inversion flag D5 of the inversion link associated with the test node is referred to when the test node is initialized. [0226] Finally, in both the initialization process using the test inversion link and the direct initialization process without using the test inversion link, the output of the input side node corresponding to the input side node address D1 is finally provided. Register the generated inverted link address to the side link address C2 and the input side link address C1 of the output side node corresponding to the output side node address D2, and finish the initialization.

[0227] <Non-inverted link initialization processing G10>

The non-inversion link initialization process G10 is substantially the same as the above-described inversion link initialization process G9. In other words, there are the following two cases in the non-inverted link initialization process G10. One is to promote a test non-inverted link, and the other is to create a non-inverted link directly without the original test non-inverted link. In the latter case, immediately after starting the program and starting the operation control of the robot 30, there is a case where it is generated from the output node 23 toward another node, and the real link connected to the output node 23 is once. When deleted, it may be generated instead.

[0228] <Non-inverted link initialization processing G10: Initialization processing using test non-inverted link>

When using a test non-inversion link, specify the original test non-inversion link and output node address D2 for initialization. Since the generated non-inverted link is due to promotion, the output node of the generated non-inverted link is the same node as the output node j of the original test non-inverted link.

[0229] For the input side node address D1, register the input side node address D1 of the original test non-inversion link. For output node address D2, register the specified output node address. For network address D3, register network address D3 of the original test non-inverted link.

[0230] For test node address D4, specify this generated non-inverted link and network address D3, and output node corresponding to the specified output node address D2 (generated non-inverted link) AND 'OR node flag of C5 force True (meaning AND node), test intermediate OR node if false (meaning OR node), test intermediate AND node if false (meaning OR node) , Newly initialized and generated (perform test intermediate OR node initialization process G7 or test intermediate AND node initialization process G8 in Figure 11) Register the test node to test node address D4. In other words, the AND 'OR of the output node of the generated non-inverted link and the test node attached to the non-inverted link is reversed.

[0231] Inversion 'Non-inversion flag D5 is set to False (means non-inversion link), and test link flag D6 is set to False. The output D7 of the link is set to False, the accumulated value D8 of the enhancement signal is overwritten with the accumulated value D8 of the enhancement signal of the designated non-inverted test link, and the enhancement signal D9 is set to 0.

[0232] <Non-inverted link initialization process G10: Direct initialization process without using test non-inverted link

>

Direct initialization without using the test non-inverted link is performed by designating the input side node address D1, the output side node address D2, and the network address D3. The specified addresses are registered in D1 to D3. Since the non-inverted link (actual link) generated in the initialization process in this case is output only from the output node 23, the output-side node of the generated non-inverted link is the output node 23. On the other hand, the input side node of the generated non-inverted link is determined randomly.

[0233] D5 to D9 are initialized before the test node address D4 is initialized. Invert 'non-invert flag D5 is set to False (means non-inverted link), and test link flag D6 is set to False. The link output D7 is False, the enhancement signal cumulative value D8 is 0, and the enhancement signal D9 is 0.

[0234] For test node address D4, specify this generated non-inverted link and network address D3, and output node corresponding to the specified output node address D2 (generated non-inverted link) AND 'OR node flag of C5 force True (meaning AND node), test intermediate OR node if false (meaning OR node), test intermediate AND node if false (meaning OR node) Then, a new initialization is performed (the test intermediate OR node initialization process G7 or the test intermediate AND node initialization process G8 in FIG. 11 is performed), and the generated test node is registered in the test node address D4. In other words, the AND 'OR of the output node of the generated non-inverted link and the test node attached to the non-inverted link is reversed. The test node address D4 is initialized later when the test node is initialized. This is because the non-inverted flag D5 of the non-inverted link associated with the test node is referred to.

[0235] Finally, in both the initialization process using the test non-inversion link and the direct initialization process not using the test non-inversion link, finally, the input side node corresponding to the input side node address D1 Register the generated non-inverted link address to the output side link address C2 and the input side link address C1 of the output side node corresponding to the output side node address D2, and finish the initialization.

[0236] <Test reverse link initialization processing Gl l>

The test reverse link initialization process Gil is performed by specifying the input side node address D1, the output side node address D2, and the network address D3. For the input side node address D1, register the specified input side node address. For output node address D2, register the specified output node address. For network address D3, register the specified network address.

[0237] Since the test link is not provided in the test link, the test node address D4 does not need to be initialized. Inverted 'Non-inverted flag D5 is True (meaning inverted link), and test link flag D6 is True. The link output D7 is False, the enhancement signal accumulated value D8 is 0, and the enhancement signal D9 is 0.

[0238] Finally, the address of the generated test inversion link is registered in the output side link address C2 of the input side node corresponding to the input side node address D1, and the initialization is completed.

[0239] <Test non-inverted link initialization processing G12>

The test non-reverse link initialization process G12 is substantially the same as the test reverse link initialization process G11 described above. That is, the test non-inversion link initialization process G12 is performed by designating the input side node address D1, the output side node address D2, and the network address D3. For the input side node address D1, the specified input side node address is registered. For the output side node address D2, register the specified output side node address. For network address D3, register the specified network address.

[0240] Since the test link is not provided in the test link, the test node address D4 does not need to be initialized. Inverted 'non-inverted flag D5 is False (means non-inverted link To do. ) And the test link flag D6 is set to True. The link output D7 is set to False, the enhancement signal accumulated value D8 is set to 0, and the enhancement signal D9 is set to 0.

[0241] Finally, the address of the generated test non-inversion link is registered in the output side link address C2 of the input side node corresponding to the input side node address D1, and the initialization is completed.

[0242] FIG. 12 shows the configuration of the deletion process during learning. In FIG. 12, the end process corresponding to the robot initialization process Gl, network initialization process G2, input node initialization process G3, and output node initialization process G4 in FIG. Although it is performed only immediately before the program is terminated, these termination processes are not directly related to the structural change of the network 20 and will not be described. Since the other node and link termination processing is performed each time the node or link is deleted, it will be described as the deletion processing E1 to E8 during learning. The deletion method varies depending on the type of node and the type of link. In Fig. 12, to delete the root of the arrow, it is necessary to delete the tip of the arrow. The solid line in the figure must be used, and the dotted line means that it may be used.

[0243] <Intermediate OR node deletion processing E1>

In the intermediate OR node deletion process E1, first, the memory for the test link corresponding to the test link address C4 of the intermediate OR node to be deleted is released. That is, the information on the test link in the link information storage means 63 is released according to a test reverse link deletion process E7 or a test non-reverse link deletion process E8 described later, and the test link is deleted. Next, with reference to the network address C3 of the intermediate OR node to be deleted, the address of the intermediate OR node to be deleted is searched from the intermediate node address B2 of the network information storage means 61 and removed. Then, depending on the conditions, perform one of the following three different processes (1), (2), and (3).

[0244] (1) As shown in FIG. 16, there is one input side link corresponding to the input side link address C1 of the intermediate OR node 220 to be deleted (this is referred to as the input side link 221). If the input side node 222 corresponding to the input side node address D1 of the input side link 221 is not the intermediate OR node 220 itself to be deleted, the output side corresponding to each output side link address C2 of the intermediate OR node 220 Link (in Figure 16, for example, three output links 223, 224, 2 25. ) For each of the following three types of processing (1-A), (1-B), (1-C).

[0245] (1 -A) Output link 223 test link flag D6 is True (means a test link) and output node 226 corresponding to output node address D2 of output link 223 If the test node flag C8 is False (meaning a real node), it corresponds to the output link 223, that is, the test link coupled to the output node 226 (corresponding to the test link address C4 of the output node 226) (Test link) is deleted (test inversion link deletion processing E7 or test non-inversion link deletion processing E8 in Fig. 12 described later is performed), and test link 241 to be connected to randomly selected node 240 is randomly generated. (The test reverse link initialization process G11 or the test non-reverse link initialization process G12 shown in FIGS. 11 and 12 is performed.), The address of the test link 241 is set to the output node 226. To register to strike link address C4.

[0246] (1 -B) Output link 224 Test link flag D6 is True (means test link), and output node 227 corresponding to output node address D2 of output link 223 If the test node flag C8 is True (meaning a test node), the output side link 224 is deleted.

[0247] (1 -C) If the test link flag D6 of the output link 225 is False (meaning a real link), the input node address D1 of the output link 225 is changed to the intermediate OR to be deleted. Overwrite with input side node address D1 of input side link 221 (only one input side link) corresponding to input side link address C1 of node 220. That is, the output side node 228 of the output side link 225 before the setting change and the input side node 222 of the input side link 221 are connected by the newly set output side link 225. Also, delete the test node (test node corresponding to the test node address D4 of the output side link 225 before the setting change) attached to the output side link 225 before the setting change (the test intermediate OR in Fig. 12 described later). Node deletion process E3 or test intermediate AND node deletion process E4 is performed), and a new test node 229 is generated (test intermediate OR node initialization process G7 or test intermediate AND node initialization process G8 in Fig. 11 described above) Register the generated address of the test node 229 in the test node address D4 of the output side link 225 after the setting change. After that, input side node of input side link 221 Add the output side link address C2 of the intermediate OR node 220 to be deleted (the output side link 225 address) to the output side link address C2 of the port 222 and delete the input side link 221 (see Figure 12 below) Reverse link deletion process E5 or non-inverted link deletion process E6 is performed.) ₀ [0248] (2) As shown in Fig. 17, the input side link corresponding to the input side link address C1 of the intermediate OR node 260 to be deleted is If there is one (input side link 265), and the input side node force corresponding to the input side node address D1 of this input side link 265 is the intermediate OR node 260 to be deleted, the intermediate OR node For each of the output side links corresponding to each output side link address C2 of 260 (in FIG. 17, two output side links 261 and 262 are taken as an example), the following two processes (2 — A) or (2— B) Carry out the management.

(2-A) Output link 261 test link flag D6 is True (means test link) and output node 263 corresponding to output node address D2 of output link 261 When the test node flag C8 is False (meaning a real node), it corresponds to the output link 261, that is, the test link coupled to the output node 263 (corresponding to the test link address C4 of the output node 263). (Test link) is deleted (test inversion link deletion processing E7 or test non-inversion link deletion processing E8 in Fig. 12 described later is performed), and test link 281 is randomly generated to connect to node 280 selected at random. (The test inversion link initialization process G11 or the test non-inversion link initialization process G12 in FIGS. 11 and 12 described above is performed.) The address of the test link 281 is assigned to the output node 263. To register to strike link address C4.

(2-B) If the conditions of the output side link 262 and the output side node 264 of the output side link 262 are other than the above (2-A), the output side link 262 is deleted.

[0251] (3) As shown in FIG. 18, when the number of input side links corresponding to the input side link address C1 of the intermediate OR node 300 to be deleted is zero, each output side link address of the intermediate OR node 300 is The following two types of processing (3-A) and (3-B) are performed for each of the output side links corresponding to less C2 (in FIG. 18, two output side links 301 and 302 are taken as an example). Do one of the following.

[0252] (3-A) Output link 301 Test link flag D6 is True (means test link) If the test node flag C8 of the output side node 303 corresponding to the output side node address D2 of the output side link 301 is False (meaning a real node), the output side link 301, that is, the output Delete the test link (the test link corresponding to the test link address C4 of the output side node 303) connected to the side node 303 (test inversion link deletion processing E7 or test non-inversion link deletion processing E8 in Fig. 12 described later) And randomly generate the test link 321 to be coupled to the randomly selected node 320 (perform the test reverse link initialization process G11 or the test non-reverse link initialization process G12 of FIGS. 11 and 12 described above) The address of the test link 321 is registered in the test link address C4 of the output side node 303.

(3-B) When the conditions of the output side link 302 and the output side node 304 of the output side link 302 are other than the above (3-A), the output side link 302 is deleted.

[0254] After the above processing (1) to (3) is completed, if there is an input side link of the intermediate OR node to be deleted, it is deleted (reverse link deletion processing in Fig. 12 described later). E5 or non-inverted link deletion processing E6 is performed.) Further, the memory of C1 to C10 of the intermediate OR node to be deleted is released, and the intermediate OR node is deleted.

[0255] <Intermediate AND node deletion processing E2>

Intermediate AND node deletion processing E2 is almost the same as the previous intermediate OR node deletion processing E1 Intermediate OR node deletion processing In the description of E1, only the intermediate OR node is read as an intermediate AND node, so the description is omitted. .

[0256] <Test intermediate OR node deletion processing E3>

Delete the first and second input side test links corresponding to the first and second input side test link addresses C1 of the test intermediate OR node to be deleted (test inversion link deletion process E7 or test of Fig. 12 described later) Non-inverted link deletion processing E8 is performed.) ₀ After that, the memory of C1 to C10 of the test intermediate OR node to be deleted is released and the test intermediate OR node is deleted

[0257] <Test intermediate AND node deletion processing E4>

Test intermediate AND node deletion process E4 is the test intermediate OR node deletion process E3, which is almost the same as the test intermediate OR node deletion process E3 described above. The description is omitted because it is simply read as test intermediate AND node.

[0258] <Reverse link deletion processing E5>

Reverse link deletion process E5 searches for and deletes the reverse link address to be deleted from the output side link address C2 of the input side node corresponding to the input side node address D1 of the reverse link to be deleted. Then, the address of the reverse link to be deleted is searched and deleted from the input side link address C1 of the output side node corresponding to the output side node address D2 of the reverse link to be deleted.

[0259] Also, the test node associated with the reverse link to be deleted (the test node corresponding to the test node address D4 of the reverse link to be deleted) is deleted. At this time, if the AN D.OR node flag C5 of this test node is True (meaning an AND node), the above-mentioned test intermediate AND node deletion processing E4 is performed, and False (meaning an OR node). Then, the test intermediate OR node deletion process E3 described above is performed.

[0260] After that, the memory of D1 to D9 of the reverse link to be deleted is released, and the reverse link deletion ends.

[0261] <Non-inverted link deletion processing E6>

Since the non-inverted link deletion process E6 is the same as the above-described inverted link deletion process E5, the description is omitted.

[0262] <Test reversal link deletion processing E7>

Test reverse link deletion processing E7 searches for and deletes the test reverse link address to be deleted from the output side link address C2 of the input side node corresponding to the input side node address D1 of the test reverse link to be deleted. .

[0263] If the test node flag C8 of the output node corresponding to the output node address D2 of the test inversion link to be deleted is True (meaning a test node), the output node (test The address of the test reverse link to be deleted is detected and deleted from the test link address C1 of the node), and if it is False (means a real node), the output node (real node) Delete test link address C4.

[0264] After that, the D1-D9 memory of the test inversion link to be deleted is released, and the deletion of the test inversion link is completed. [0265] <Test non-inverted link deletion processing E8>

The test non-reverse link deletion process E8 is the same as the test reverse link deletion process E7 described above, and thus the description thereof is omitted.

[0266] According to the present embodiment, the following effects are obtained. That is, since the information processing system 10 includes the reinforcement signal generation means 43, it is possible to generate an enhancement signal to be given to the network 20 according to the evaluation result of the state of the robot 30 that is the control target. .

[0267] Further, since the information processing system 10 includes the learning means 51, the enhancement signal generation means

The enhancement signal generated by 43 can be propagated from the configuration element of the network 20 to other configuration elements. At this time, the learning means 51 generates a reinforcement signal to be propagated, that is, a reinforcement signal to be given to the propagation destination constituent element, for each constituent element according to the input / output state of the propagation source and Z or the destination constituent element. Therefore, it is determined whether to generate (add) or delete (淘汰) the configuration element for each configuration element using the cumulative value of the enhancement signal assigned to each configuration element. This process can be executed and the structure of the network 20 can be changed autonomously.

[0268] Therefore, unlike the above-described learning device based on Eurogenetic learning, the information processing system 10 does not evaluate the entire network 20 as an evaluation unit when changing the structure of the network 20. Since the evaluation is performed in units of constituent elements (that is, the unit of each node or link), and generation or deletion is performed in units of constituent elements, the time required for evaluation can be shortened and the time order is low. The network 20 can be built autonomously and the calculation cost can be reduced accordingly.

[0269] In addition, as described in Patent Documents 2 and 3 described above-the network 20 structure is determined in accordance with the use environment and tasks of the network 20, and the determination is made. In the structure, the information processing system 10 that does not optimize the coupling coefficient between the Euron units autonomously changes and optimizes the structure of the network 20 itself. Limitation to can be avoided. Therefore, even if the usage environment and tasks of the network 20 change, Learning that reuses fruits as existing knowledge can be performed.

[0270] Further, the information processing system 10 includes a state evaluation signal acquisition unit 42. Based on the state evaluation signal acquired by the state evaluation signal acquisition unit 42, the state of the robot 30 to be controlled is determined. Since it is configured to evaluate, it is possible to evaluate the state of the robot 30 to be controlled without intervention of human judgment. For this reason, the autonomous construction speed of the network 20 can be improved, and consistent learning can be easily performed according to the purpose.

[0271] Then, the learning means 51 performs propagation according to the contribution to the node output of the input link on the propagation destination determined according to the input / output state of the propagation source node based on the reinforcement signal given to the propagation source node. Since the enhancement signal to be given to the previous input side link is generated (see Fig. 8 and Fig. 9), the enhancement signal given to the network 20 can be propagated back from the output node 23. In addition, a reasonable evaluation can be performed for each link individually, and appropriate generation or deletion for each component element can be realized.

[0272] Further, the learning means 51 performs back propagation of the reinforcement signal from the node to the input side node of the input side link in addition to the back propagation of the reinforcement signal from the node to the input side link as described above. Therefore, the smoother back propagation of the enhancement signal can be realized.

[0273] Furthermore, the learning means 51 is configured to delete the link when the cumulative value of the enhancement signal given to the link falls below the threshold value, and is thus a control target as intended. It is possible to perform appropriate dredging of links that are considered to be useless to control the robot 30, that is, links that are considered unnecessary, and to autonomously change the structure of the network 20.

[0274] Then, the learning means 51 is configured to delete this node when the number of links on the input side of the node becomes 1 or less. Appropriate selection of nodes that are considered not useful for control, that is, unnecessary nodes, can be performed, and the structure of the network 20 can be changed autonomously.

[0275] In the information processing system 10, since a test link is provided for a node, the test link is considered to be useful for controlling the robot 30 to be controlled as intended. Promote the link to a real link that contributes to node output and formally input You can register as a link. Therefore, autonomous link generation can be realized, and the structure of the network 20 can be changed autonomously.

[0276] Furthermore, the learning means 51 deletes the test link when the cumulative value of the enhancement signal given to the test link falls below the threshold, and creates a new test coupled to an arbitrary node. Since it is configured to generate a link, a test link that is a suitable candidate for a newly generated link (actual link) can always be prepared. Therefore, appropriate and smooth generation of links can be realized, and the structure of the network 20 can be changed autonomously.

[0277] In the information processing system 10, since a test node associated with this link is provided in the real link, a newly generated node (real node) candidate can always be prepared. Therefore, autonomous node generation can be realized, and the structure of the network 20 can be changed autonomously.

[0278] In addition, the learning means 51 is configured to propagate the reinforcement signal from the test node to the first and second input side test links, so that a newly generated link ( (Real link) candidates can be prepared, and the structure of the network 20 can be changed autonomously.

[0279] Further, the learning means 51, when the cumulative value of the reinforcement signal given to the first or second input side test link described above falls below the threshold, the input side test link below the threshold. Since the test link is deleted and a new input-side test link is generated, a test link that is a suitable candidate for a newly generated link (actual link) can always be prepared. Therefore, appropriate and smooth generation of links can be realized, and the structure of the network 20 can be changed autonomously.

[0280] Then, the learning means 51 puts the test node into practical use when the cumulative value of the enhancement signals given to the first and second input side test links exceeds the threshold value. Since it is configured, a new node (real node) can be generated (added), and the structure of the network 20 can be changed autonomously.

[0281] In addition, in the information processing system 10, each node is configured by using a logic circuit, so that an information processing system capable of realizing a desired control is constructed with a simple structure. be able to.

[0282] In order to confirm the effect of the present invention, the following experiment was conducted.

[0283] As a circuit to perform the target IZO operation, 10 small circuits of about 2 or 3 bits were prepared. A generation experiment was performed for all 10 circuits in a range that does not include the history, and a generation experiment that included a history of about one step was performed for some of the 10 circuits.

[0284] In the initial state, only one OR node is prepared in the output layer of the network, random input is added, and when the target IZO operation is completed, a reward is given as a reinforcement signal and the failure occurs. If so, the circuit was generated by giving punishment as an enhancement signal.

[0285] FIG. 19 shows the results of this experiment. Figure 19 shows a 3-bit XOR circuit as the target circuit, showing the correct answer rate with a moving average of 100 steps, that is, the ratio of the steps that output the correct answer in the last 100 steps (100 outputs). It is shown.

[0286] The experiment for each target circuit was repeated 10 times, and all of the target circuits were configured correctly. It was also confirmed that the correct circuit was constructed even when noise was added to the correctness judgment, and that the circuit that reached the correct answer was structurally stable.

[0287] As another experiment, after learning a 2-bit XOR circuit from the above experiment, the problem was changed and a 3-bit XOR circuit was learned. By examining whether these learning contents have relevance, it was confirmed that a new structure using the previous learning results was obtained.

[0288] FIG. 20 shows the results of this experiment. In the structure of the 3-bit XOR circuit generated in the experiment, the part where the 2-bit XOR circuit structure given as known knowledge at the start of the experiment is reused is shown in bold lines. Whether or not the structure is actually reused can be checked by tracking the history of structure generation. Also, in FIG. 20, the node 部分 part of the 3-bit XOR circuit structure is different from the 2-bit XOR circuit structure. This is also the force that replaced this part of the link with node A, and structural reuse can be achieved.

[0289] Furthermore, in addition to the above two experiments, the experiment using the Keppera robot simulator has about 10,000 nodes, real-time learning of 64 ms per step, and backup. It has a function and can be applied to the maze problem by delayed reward. From the above, the effect of the present invention was remarkably shown.

[0290] It should be noted that the present invention is not limited to the above-described embodiments, but includes modifications and the like within a range in which the object of the present invention can be achieved.

That is, in the above embodiment, the control target is not limited to the robot 30, and may be a game character or the like. For example, in the case of a fighting game, the opponent's character The relative position of the opponent, the opponent's character appears, the type of skill to be input to the network, and the action of your own character, that is, the type of skill that your own character appears and the The direction of movement may be determined by the network output.

[0292] Further, in the above embodiment, the enhancement signal is distributed (propagated) as shown in Figs. 8 and 9, but the distribution method is not limited to this. Based on the reinforcement signal given to the configuration element, a reinforcement signal to be given to the propagation destination configuration element is generated according to the input / output state of the propagation source and Z or propagation destination configuration elements. It is only necessary to realize the propagation of the reinforcement signal from one to other constituent elements.

[0293] Furthermore, in the above-described embodiment, the network 20 used in the information processing system 10 is not limited to the power realized mainly by software, and is realized by using a hardware circuit at least in part. Moyo! /

[0294] In the embodiment, the node is configured by a logic circuit using a logic circuit that is configured by a logic circuit using an AND circuit or an OR circuit. Use other logic circuits.

Industrial applicability

[0295] As described above, the information processing system, information processing method, and program of the present invention can be used for general IZO learning. For example, robot motion control, game character motion control on a display screen, and the like. Suitable for use in air conditioning management.

Brief Description of Drawings FIG. 1 is an overall configuration diagram of an information processing system according to an embodiment of the present invention.

FIG. 2 is a diagram showing a data structure used in processing by the information processing system of the embodiment.

FIG. 3 is a flowchart showing an overall flow of robot operation control by the information processing system of the embodiment.

FIG. 4 is a flowchart showing the flow of network processing by the information processing system of the embodiment.

FIG. 5 is a flowchart showing a learning process flow of an intermediate OR node (real node) by the information processing system of the embodiment.

FIG. 6 is a flowchart showing a non-inverted link learning process performed by the information processing system according to the embodiment.

FIG. 7 is an explanatory diagram of intermediate OR node learning processing by the information processing system of the embodiment.

FIG. 8 is a diagram showing an example of distribution of reinforcement signals when learning an intermediate OR node by the information processing system of the embodiment.

FIG. 9 is a diagram showing an example of distribution of reinforcement signals when learning an intermediate AND node by the information processing system of the embodiment.

FIG. 10 is an explanatory diagram of learning processing of a non-inverted link (real link) by the information processing system of the embodiment.

FIG. 11 is an explanatory diagram of a configuration of initialization by the information processing system of the embodiment.

FIG. 12 is an explanatory diagram of the configuration of deletion processing during learning by the information processing system of the embodiment.

FIG. 13 is an explanatory diagram of output node initialization processing by the information processing system of the embodiment.

FIG. 14 is an explanatory diagram of intermediate OR node initialization processing by the information processing system of the embodiment.

FIG. 15 is an explanatory diagram of test intermediate OR node initialization processing by the information processing system of the embodiment.

FIG. 16 is an explanatory diagram of intermediate OR node deletion processing by the information processing system of the embodiment.

FIG. 17 is another explanation of intermediate OR node deletion processing by the information processing system of the embodiment. Figure.

FIG. 18 is still another explanatory diagram of intermediate OR node deletion processing by the information processing system of the embodiment.

FIG. 19 is a diagram showing the results of an effect confirmation experiment of the present invention.

FIG. 20 is a diagram showing the results of another effect confirmation experiment of the present invention.

Explanation of symbols

10 Information processing system

20 network

21 Input element, which is a component element

22 Intermediate nodes, which are constituent elements

23 Output elements that are constituent elements,

24 Links that are constituent elements

42 State evaluation signal acquisition means

43 Strengthening signal generation means

51 Means of learning

53 Output generation means

61 Network information storage means functioning as network structure storage means and enhanced signal storage means

62 Node information storage means that functions as network structure storage means, input / output state storage means, and enhancement signal storage means

63 Link information storage means functioning as network structure storage means, input / output state storage means, and enhancement signal storage means

105, 148, 185 test link

123, 142, 161, 201, 229 test node

124, 143, 162, 202 First input test link

125, 146, 163, 204 Second input test link

Claims

The scope of the claims

An information processing system using a network including a plurality of nodes that perform information processing and a link that connects these nodes and transmits information between the nodes as constituent elements.

Network structure storage means for storing a structure of the network including a coupling relationship between the constituent elements;

Input / output state storage means for storing input / output states of the constituent elements formed by the output generation processing of the network;

An enhanced signal generating means for generating an enhanced signal to be given as a reward or punishment to the network according to the evaluation result of the state of the controlled object formed based on the output result of the network;

The enhancement signal generated by the enhancement signal generation means is applied to at least one component element, and the enhancement signal is applied in accordance with a chain connection relationship between the component elements from the component element to which the enhancement signal is applied to another component element. To propagate the signal

The configuration of the propagation destination according to the input / output state of the propagation source and Z or the propagation destination configuration element stored in the input / output state storage means based on the reinforcement signal given to the configuration element of the propagation source sequentially A reinforcement signal to be given to the element as a reward or punishment is generated, and the above-described configuration is made for each constituent element by using the enhancement signal given to the constituent element or its history, or the cumulative value of the enhancement signal or the history thereof. Learning means for generating or deleting elements to change the structure of the network, and storing the changed network structure in the network structure storage means;

Output generation means for generating the output of the network using the network whose structure has been changed by the learning means with reference to the network structure stored in the network structure storage means;

Strengthening signal storage means for storing the strengthening signal of the constituent element generated by the learning means or its history or the cumulative value of the strengthening signal or its history for each constituent element; An information processing system comprising:

[2] In the information processing system according to claim 1,

A state detection means for detecting the state of the control object or a state evaluation signal acquisition means for acquiring a state evaluation signal for evaluating the state of the control object from the control object itself;

The enhancement signal generation unit is configured to evaluate the state of the control target based on the state evaluation signal acquired by the state evaluation signal acquisition unit and generate the enhancement signal according to the evaluation result. Scold

An information processing system characterized by this.

[3] In the information processing system according to claim 1,

An evaluation result input receiving means for receiving an input of an evaluation result of the state of the controlled object by a user;

The enhancement signal generation means is configured to generate the enhancement signal according to the evaluation result received by the evaluation result input reception means.

An information processing system characterized by this.

[4] In the information processing system according to any one of claims 1 to 3,

The learning means equally applies the enhancement signal generated by the enhancement signal generation means to all output nodes constituting the output layer of the network, and uses the constituent element of the propagation source as a node, The constituent element of the propagation destination is the input side link of the propagation source node, and the propagation destination input side link determined according to the input / output state of the propagation source node based on the reinforcement signal given to the propagation source node. An information processing system configured to generate a reinforcement signal to be given as a reward or punishment to the input link on the propagation destination according to the contribution to the node output.

[5] In the information processing system according to claim 4,

The learning means uses the propagation source configuration element as a node, the propagation destination configuration element as an input side node coupled to an input side of an input side link of the propagation source node, and sets the propagation source node as the propagation source node. Based on the given enhancement signal, the input of the propagation destination is determined according to the contribution to the node output of the input link determined according to the input / output state of the propagation source node. An information processing system configured to generate a reinforcement signal to be given as a reward or punishment to a side node.

[6] In the information processing system according to claim 4 or 5,

The enhancement signal storage means is configured to store, for each link, a history of the enhancement signal given to a link or a cumulative value of the enhancement signal.

The learning means is configured to delete the link when the cumulative value of the enhancement signal given to the link falls below a threshold value.

An information processing system characterized by this.

[7] In the information processing system according to claim 6,

The information processing system according to claim 1, wherein the learning means is configured to delete a node when the number of links on the input side of the node becomes 1 or less.

[8] In the information processing system according to claim 4,

On the input side of the propagation source node, in addition to the propagation destination input side link, a test link that does not contribute to node output is provided,

The enhancement signal storage means is configured to store a history of the enhancement signal given to the test link or a cumulative value of the enhancement signal,

The learning means registers the test link as an input side link of the propagation source node in the network structure storage means when a cumulative value of the enhancement signal given to the test link exceeds a threshold value. It is configured

An information processing system characterized by this.

[9] In the information processing system according to claim 8,

The learning means deletes the test link when a cumulative value of the enhancement signal given to the test link falls below a threshold value, and generates a new test link coupled to an arbitrary node. An information processing system characterized by being configured to be registered in the network structure storage means.

[10] In the information processing system according to any one of claims 1 to 3,

The link is provided with a test node that does not contribute to the output of the network associated with the link, and the test node is connected to the input node of the link at the first input test link. And connected to an output side node of the link via an output side test link, and to any node via a second input side test link,

The learning means uses the propagation source component element as the link, the propagation destination component element as the test node, and based on the reinforcement signal given to the propagation source link, An information processing system configured to generate an enhancement signal to be given as a reward or punishment to the propagation destination test node according to an output and an output state of the propagation destination test node.

[11] In the information processing system according to claim 10,

The learning means uses the propagation source configuration element as the test node, the propagation destination configuration element as the first and second input test links of the test node, and the reinforcement given to the propagation source test node. Based on the signal, the first and second propagation destinations are determined according to the contribution to the test node output of the first and second input test links of the propagation destination determined according to the input / output state of the propagation test node. An information processing system characterized in that it is configured to generate an enhanced signal to be given as a reward or punishment to the test link on the input side of the system.

[12] In the information processing system according to claim 11,

The enhancement signal storage means is configured to store the history of the enhancement signal or the cumulative value of the enhancement signal assigned to the first and second input side test links of the propagation destination for each link,

The learning means deletes the input-side test link that has fallen below the threshold when the cumulative value of the enhancement signal given to the first or second input-side test link of the propagation destination has fallen below the threshold. An information processing system characterized in that a new input side test link coupled to an arbitrary node is generated and registered in the network structure storage means.

[13] In the information processing system according to claim 11,

The enhancement signal storage means is configured to store the history of the enhancement signal or the cumulative value of the enhancement signal assigned to the first and second input side test links of the propagation destination for each link, In order to put the test node into practical use when the accumulated value of the enhancement signal given to the first and second input side test links of the propagation destinations exceeds a threshold value, the learning means The test node is promoted to a real node contributing to the output of the network and registered in the network structure storage means.

An information processing system characterized by this.

[14] In the information processing system according to any one of claims 1 to 13,

The information processing system, wherein the node is configured to perform information processing using at least one logic circuit.

[15] An information processing method using a network including a plurality of nodes that perform information processing and a link that links these nodes and transmits information between the nodes as a constituent element, and includes a connection relationship between the constituent elements. Including the network structure including the network structure storage means;

The input / output state of the constituent element formed by the output generation processing of the network is stored in the input / output state storage means,

The enhancement signal generation means performs a process of generating an enhancement signal to be given as a reward or punishment to the network according to the evaluation result of the state of the control target formed based on the output result of the network.

A learning means gives the enhancement signal generated by the enhancement signal generation means to at least one of the constituent elements, and a chain connection between the constituent elements from the constituent element to which the enhancement signal is given to another constituent element In order to propagate the enhancement signal according to the relationship, the propagation source and Z or propagation destination constituent elements stored in the pre-written output state storage means are sequentially based on the reinforcement signals given to the propagation source constituent elements. The reinforcement signal to be given as a reward or punishment to the propagation destination component element according to the input / output state of the propagation element is generated, and the generated enhancement signal of the component element or its accumulated value is generated for each component element And the enhancement signal assigned to the component element or its history or the cumulative value of the enhancement signal. Properly changes the structure of the network generation, or by performing the deletion of the configuration elements for each of the construction elements using the history, the structure of the network after the change Nettowa Process to be stored in the key structure storage means,

The output generation means refers to the network structure stored in the network structure storage means, and performs processing for generating the network output using the network whose structure has been changed by the learning means.

An information processing method characterized by the above.

A program for causing a computer to function as an information processing system using a network including a plurality of nodes that perform information processing and a link that links these nodes and transmits information between the nodes as constituent elements,

The configuration of the propagation destination according to the input / output state of the propagation source and the Z or propagation destination configuration element stored in the input / output state storage means based on the reinforcement signal given to the propagation source configuration element sequentially A reinforcement signal to be given to the element as a reward or punishment is generated, and the above-described configuration is made for each constituent element by using the strengthening signal given to the constituent element or its history or the cumulative value of the strengthening signal or the history thereof. Learning means for generating or deleting elements to change the structure of the network, and storing the changed network structure in the network structure storage means;

The network structure stored in the network structure storage means is referred to, and the network is output using the network whose structure has been changed by the learning means. Output generating means for generating force,

Strengthening signal storage means for storing the strengthening signal of the constituent element generated by the learning means or its history or the cumulative value of the strengthening signal or its history for each constituent element;

A program for causing a computer to function as an information processing system characterized by comprising: