CN116151385A - Robot autonomous learning method based on generation of countermeasure network - Google Patents
Robot autonomous learning method based on generation of countermeasure network Download PDFInfo
- Publication number
- CN116151385A CN116151385A CN202111344484.8A CN202111344484A CN116151385A CN 116151385 A CN116151385 A CN 116151385A CN 202111344484 A CN202111344484 A CN 202111344484A CN 116151385 A CN116151385 A CN 116151385A
- Authority
- CN
- China
- Prior art keywords
- sample
- function
- robot
- samples
- autonomous learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000006870 function Effects 0.000 claims abstract description 71
- 230000009471 action Effects 0.000 claims description 27
- 238000005070 sampling Methods 0.000 claims description 11
- 230000000875 corresponding effect Effects 0.000 claims description 5
- 238000000926 separation method Methods 0.000 claims description 3
- 238000013209 evaluation strategy Methods 0.000 claims 1
- 230000002787 reinforcement Effects 0.000 abstract description 7
- 230000006399 behavior Effects 0.000 abstract description 2
- 238000005457 optimization Methods 0.000 description 7
- 241000282414 Homo sapiens Species 0.000 description 4
- 238000010276 construction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000036632 reaction speed Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1602—Programme controls characterised by the control system, structure, architecture
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/163—Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mechanical Engineering (AREA)
- Robotics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Automation & Control Theory (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Feedback Control In General (AREA)
Abstract
The invention constructs a robot autonomous learning method based on generating an countermeasure network, and is applied to robot autonomous learning with few samples or zero samples in an industrial scene. The method comprises the following steps: 1) Establishing a chain model for the behavior of the robot based on the Markov chain; 2) Acquiring more samples by utilizing the generated countermeasure network according to the existing samples or expert data; 3) Obtaining a reward function and training an optimal decision through inverse reinforcement learning; 4) Acquiring an optimal value function and an optimal strategy function according to the reward function and the optimal strategy; 5) And (5) completing the establishment of an autonomous learning model of the robot. The robot autonomous learning method based on the generated countermeasure network is mainly oriented to the situation that an experience sample is absent in an industrial scene, and the goal of robot autonomous learning is achieved through the combination of the generated countermeasure network and the inverse reinforcement learning, so that the automation and intelligence level of the robot is improved.
Description
Technical Field
The invention belongs to the field of intelligent control of robots and autonomous learning of robots, and particularly relates to an autonomous learning method of a robot based on a generated countermeasure network.
Background
The robot autonomous learning method mainly refers to a machine learning method which enables a robot to accumulate experience data through self interaction with the environment so as to autonomously perform action decision. Robot autonomous learning belongs to one of important means of robot control, and often plays an important role in functions of robot environment perception, behavior control, dynamic decision, automatic execution and the like in an intelligent integrated control system. The method not only needs a decision-making method learned by a robot to have high optimization degree, but also has extremely high requirements on indexes such as learning speed, reaction speed and the like. Therefore, the continuous improvement of the autonomous learning method of the robot is an important subject of the current robot research.
Typically, such learning methods require extensive sample training and key parameter settings by human beings to ensure learning efficiency and accuracy. This makes the learning result of the robot often limited by data and size and human parameter settings. Meanwhile, if the data set has contaminated data, the final optimization degree is likely to be greatly reduced, and even the actual requirements cannot be met. In addition, the method requires a designer to have a higher experience base on an actual scene, so that parameters can be accurately set, if the designer cannot accurately judge the actual requirement, deviation of the learning direction is likely to occur, and finally the expected decision capability cannot be achieved. The problems are the problems which need to be solved in the autonomous learning of the robot at present.
Disclosure of Invention
The invention combines the generation of the countermeasure network technology and the inverse reinforcement learning method, combines the two technologies into a whole, and provides the robot autonomous learning method, which aims to reduce the dependence of the robot autonomous learning on expert samples, improve the learning efficiency of the robot and increase the optimization degree of the robot autonomous decision.
The technical scheme adopted by the invention for achieving the purpose is as follows:
a robot autonomous learning method based on generating a countermeasure network, comprising the steps of:
constructing a Markov chain model, acquiring a complete action track and decision steps of the robot, sampling the action track and decision steps to generate a real sample set representing the action, and storing the real sample set into a real sample pool;
randomly generating signals and transmitting the signals into a generator, generating samples by the generator, and storing the generated samples into a virtual sample pool;
the generated sample is transmitted into a discriminator, the discriminator compares the generated sample with the real sample, the generated sample is dynamically adjusted according to the comparison result, and the virtual sample pool is updated;
mixing the updated virtual sample pool with the real sample pool to form a mixed sample pool, and randomly extracting data in the mixed sample pool;
randomly generating a strategy and executing the strategy;
sampling the executed strategy, and comparing the sampling result with the data extracted from the mixed sample pool to obtain a reward function and an optimal strategy;
training a Markov chain model according to the reward function, taking the state of the robot as the input of the model, obtaining the corresponding action, and completing the autonomous learning of the robot.
The construction of the Markov chain model is specifically as follows: and establishing a five-tuple (S, A, P, R, gamma) according to the Markov chain model, wherein the set S represents a current state set, the set A represents a next moment action set, P is the probability of various actions in A, R is a reward function, and gamma epsilon (0, 1) is a discount coefficient.
The discriminator compares the generated sample with the real sample, specifically: the generated sample and the real sample are mixed to form a training sample, the training sample is sent into a discriminator for discrimination, and the probability D (x) of the training sample from the generated sample is output.
And dynamically adjusting and generating samples according to the comparison result, namely respectively calculating the loss function of the discriminator and the loss function of the generator according to the probability D (x), and stopping adjusting and generating the samples when the loss function of the discriminator and the loss function of the generator reach Nash equilibrium.
Loss function L of said arbiter discriminator (D) The method comprises the following steps:
L discriminator (D)=E x~P [-log D(x)]+E x~G [-log(1-D(x))]
wherein ,Ex~P [-log D(x)]Representing loss of separation of real samples into generated samples, E x~G [-log(1-D(x))]Representing the loss of splitting the generated samples into true samples.
The loss function L of the generator generator (G) The method comprises the following steps:
L generator (G)=E x~G [-log D(x)]+E x~G [log(1-D(x))]
wherein ,Ex~G [-log D(x)]Representing loss of the discriminator classifying the generated samples into generated samples, E x~G [log(1-D(x))]Representing the loss of the discriminator classifying the generated sample into a true sample.
Evaluating a policy using a value function comprising a representation of a state value function V π (s) and a function Q representing an action value π (s, a) wherein:
wherein pi (s, a) is the policy of the (s, a) state, R is the reward function, P (s, a, s ') is the probability of the state s→s', and action a 'is the action of the next state s'.
The executed strategy is sampled, and the sampling result is compared with the data extracted from the mixed sample pool to obtain a reward function and an optimal strategy, specifically:
optimum value function V *(s) and Q* (s, a) is obtained by the following formula,
further obtain action a 1 Is the optimal strategy pi * The conditions for(s) are:
wherein ,action a is performed when the state is s 1 Probability of state occurrence s→s->An optimal state value function representing a state s', P sa (s ') represents the probability that any action a epsilon A is performed when the state is s, the state occurs s→s',
bond V * (s) obtaining
Finally, the rewarding function and the optimal strategy are obtained.
Training a Markov chain model according to the reward function, specifically, obtaining parameters in the reward function by calculating the maximum entropy, and further determining the Markov chain model.
The calculated maximum entropy is specifically as follows:
max-p log p
where p is the probability, l i Representing the ith in the probabilistic modelTrace, f represents feature expectations, f E Representing expert characteristics expectations τ i An ith element in the expert sample set;
wherein ,λi (i=0 to n) is the i-th parameter in the bonus function R.
The invention has the following beneficial effects and advantages:
1. the invention provides a robot autonomous learning method based on a generated countermeasure network, which is mainly oriented to the robot learning problem in the application scene of the robot, and realizes the robot learning under the condition of few samples by combining the generated countermeasure network model and the inverse reinforcement learning method, thereby reducing the dependence on the size of a sample data set and effectively improving the learning efficiency.
2. The robot learning method is autonomously learned by the robot, basically does not need human intervention, reduces the interference of human factors, and improves the optimization degree of robot decision.
3. The robot learning method adopts an inverse reinforcement learning method, can obtain a proper reward function according to the environment, and finally trains out an optimal strategy function. The generalization performance of the robot is greatly improved.
4. The robot learning method adopts the generation of the countermeasure network model, and can generate a large number of approximately real samples, so that a large amount of data can be learned when the real samples are fewer, more samples with higher optimization degree can be obtained, the final performance is not influenced by the optimization degree of the samples, and the intelligent performance of the robot is effectively improved.
Drawings
FIG. 1 is a flow chart of a robot autonomous learning process;
fig. 2 is a diagram of the relationship between the parts.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
As shown in fig. 1 and 2, step one: a markov chain model is built. A five-tuple (S, A, P, R, gamma) can be established according to the Markov chain model, wherein the set S represents the current state set, the set A represents the next moment action set, P is the probability of various actions in A, R is the reward function, gamma E (0, 1) is the discount coefficient, and the discount coefficient is used for calculating the accumulated reward value.
Step two: a small number of expert samples or instances are provided and placed into a real sample pool. Expert samples or examples generally refer to complete motion trajectories or decision steps, and after systematic sampling, a series of motion sets d= { τ are generated 1 ,τ 2 ,…,τ n }. Storing the sampled action sets in a real sample pool D 1 Waiting for subsequent comparisons.
And thirdly, randomly generating signals, and transmitting the signals into a generator to generate corresponding data. The random signal is typically noise and is used to characterize a certain random environmental element. The generator G generates corresponding samples x from the signal, and the samples x to G are expressed as x to G.
And step four, the generator transmits the data to a discriminator, and the discriminator compares the data and feeds back the result to the generator. The task of the arbiter D is to classify the sample input as the output of the generator, or as an actual sample from the underlying data distribution p (x). And searching similar samples from the real sample pool, mixing the samples with the samples x generated in the generator, sending the samples into a discriminator for discrimination, and outputting the probability D (x) of training samples from the generated samples. Further, the arbiter loss can be calculated. The loss of the arbiter is the average logarithmic probability that it assigns to the correct class, evaluated on a mixed set of actual samples and output from the generator,
L discriminator (D)=E x~P [-log D(x)]+E x~G [-log(1-D(x))]
wherein ,Ex~P [-log D(x)]Representing loss of separation of real samples into generated samples, E x~G [-log(1-D(x))]Representing the loss of splitting the generated samples into true samples.
And fifthly, the generator adjusts the sample according to the feedback result of the discriminator. The task of the generator is to produce an output that is classified by the arbiter as coming from the underlying data distribution. If the loss of the arbiter is large, it indicates that the quality of the generator that generates the set of samples is high, otherwise, the arbiter quality is high. The loss of generator generates the sum of the average log probability that the sample was classified as correct and the average log probability that the sample was classified as incorrect,
L generator (G)=E x~G [-log D(x)]+E x~G [log(1-D(x))]
wherein ,Ex~G [-log D(x)]Representing the loss of dividing the generated samples into generated samples, E x~G [log(1- D(x))]Representing the loss of splitting the generated samples into true samples.
And step six, mixing the real sample pool and the virtual sample pool. Through solving two loss functions, the generator model G and the discriminant model D finally achieve a Nash balance, and the generated samples have higher similarity with the real data and are stored in the sample pool D 2 Referred to as a virtual sample pool. At this time, the real sample pool and the virtual sample pool are fully mixed to form a mixed sample pool D d So that samples can be randomly drawn to real samples or generated samples with a certain probability when the samples are drawn.
Step seven, randomly extracting data D 'from the mixed sample pool' d . At this time, it is possible to draw either the generated sample or the real sample, but through the generation of continuous updates against the network, the generated sample also has a sample quality similar to that of the real sample.
And step eight, randomly generating a strategy and executing the strategy. A policy q can be randomly generated according to the environment k The policy is enforced and sampled. The concept of a value function is typically introduced when evaluating policies. In general, V π (s) represents a state value function and Q π (s, a) represents an action value function. The calculation formula is that,
wherein pi (s, a) is the strategy of the (s, a) state, R is the reward function, and P (s, a, s ') is the probability of the state s- & gts'.
And step nine, comparing the executed strategy with the data in the sample pool, and updating the rewarding value. Sampling the executed strategy and comparing it with the high quality sample in the mixed sample cell, using the current strategy sample D' s With high quality sample D' d Finding the optimal reward function under the current condition. Optimum value function V *(s) and Q* (s, a) can be obtained by the following formula,
further, the action a is known 1 Is the optimal strategy pi * The filling conditions of(s) are that
And can be written as
Bond V * (s) it can be seen that
Eventually, a bonus function and an optimal strategy can be obtained.
And step ten, optimizing the strategy function. After the optimal rewarding function is found, the current rewarding value can be determined, and the strategy function is optimized according to the rewarding value and the high-quality sample, so that the performance of the strategy function is improved.
And step eleven, obtaining a reward function and an optimal decision. Through continuous optimization, the strategy corresponding to the optimal rewarding function under any condition can be finally obtained.
And step twelve, training a model according to the reward function, obtaining an optimal value function and a strategy function, and completing model construction. Through the reward function, reinforcement learning training can be performed, and finally the optimal value function strategy function is obtained. The reward function typically obtained by inverse reinforcement learning has some ambiguity and it is often also necessary to find the maximum entropy in order to avoid ambiguity. Namely, the following problems are obtained,
max-p log p
where p is the probability, l i Representing trajectories in a probabilistic model, f representing feature expectations, f E Representing expert feature expectations;
wherein ,λi (i=0 to n) is a parameter in the bonus function.
As described above, while the present invention has been particularly shown and described, it is not to be construed as limiting the invention itself. Various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. The robot autonomous learning method based on the generation of the countermeasure network is characterized by comprising the following steps:
constructing a Markov chain model, acquiring a complete action track and decision steps of the robot, sampling the action track and decision steps to generate a real sample set representing the action, and storing the real sample set into a real sample pool;
randomly generating signals and transmitting the signals into a generator, generating samples by the generator, and storing the generated samples into a virtual sample pool;
the generated sample is transmitted into a discriminator, the discriminator compares the generated sample with the real sample, the generated sample is dynamically adjusted according to the comparison result, and the virtual sample pool is updated;
mixing the updated virtual sample pool with the real sample pool to form a mixed sample pool, and randomly extracting data in the mixed sample pool;
randomly generating a strategy and executing the strategy;
sampling the executed strategy, and comparing the sampling result with the data extracted from the mixed sample pool to obtain a reward function and an optimal strategy;
training a Markov chain model according to the reward function, taking the state of the robot as the input of the model, obtaining the corresponding action, and completing the autonomous learning of the robot.
2. The autonomous learning method of a robot based on generating an countermeasure network according to claim 1, wherein the constructing a markov chain model specifically includes: and establishing a five-tuple (S, A, P, R, gamma) according to the Markov chain model, wherein the set S represents a current state set, the set A represents a next moment action set, P is the probability of various actions in A, R is a reward function, and gamma epsilon (0, 1) is a discount coefficient.
3. The autonomous learning method of a robot based on generating an countermeasure network according to claim 1, wherein the discriminator compares the generated sample with the true sample, specifically: the generated sample and the real sample are mixed to form a training sample, the training sample is sent into a discriminator for discrimination, and the probability D (x) of the training sample from the generated sample is output.
4. A robot autonomous learning method based on generating an countermeasure network according to claim 1 or 3, wherein the generating samples are dynamically adjusted according to the comparison result, specifically, the loss function of the discriminator and the loss function of the generator are calculated according to the probability D (x), respectively, and when the loss function of the discriminator and the loss function of the generator reach nash equilibrium, the adjusting of the generating samples is stopped.
5. The autonomous learning method of a robot based on generating an countermeasure network according to claim 4, wherein the loss function L of the arbiter discriminator (D) The method comprises the following steps:
L discriminator (D)=E x~P [-logD(x)]+E x~G [-log(1-D(x))]
wherein ,Ex~P [-logD(x)]Representing loss of separation of real samples into generated samples, E x~G [-log(1-D(x))]Representing the loss of splitting the generated samples into true samples.
6. The autonomous learning method of a robot based on generating an countermeasure network according to claim 4, wherein the loss function L of the generator generator (G) The method comprises the following steps:
L generator (G)=E x~G [-logD(x)]+E x~G [log(1-D(x))]
wherein ,Ex~G [-logD(x)]Representing loss of the discriminator classifying the generated samples into generated samples, E x~G [log(1-D(x))]Representing the loss of the discriminator classifying the generated sample into a true sample.
7. A method of autonomous learning by a robot based on generating an countermeasure network according to claim 1, characterized by using a value function evaluation strategy, the value function including a state-representative value function V π (s) and a function Q representing an action value π (s, a) wherein:
wherein pi (s, a) is the policy of the (s, a) state, R is the reward function, P (s, a, s ') is the probability of the state s→s', and action a 'is the action of the next state s'.
8. The autonomous learning method of a robot based on generating an countermeasure network according to claim 1, wherein the steps of sampling the executed policy and comparing the sampling result with the data extracted from the mixed sample pool to obtain a reward function and an optimal policy are as follows:
optimum value function V *(s) and Q* (s, a) is obtained by the following formula,
further obtain action a 1 Is the optimal strategy pi * The conditions for(s) are:
wherein ,action a is performed when the state is s 1 Probability of state occurrence s→s->An optimal state value function representing a state s', P sa (s') means that arbitrary motion is performed when the state is sAs a e a, the probability of the state occurrence s→s',
bond V * (s) obtaining
Finally, the rewarding function and the optimal strategy are obtained.
9. The autonomous learning method of a robot based on generating a countermeasure network according to claim 1, wherein the training of the markov chain model according to the reward function is specifically that parameters in the reward function are obtained by calculating a maximum entropy, and the markov chain model is further determined.
10. The autonomous learning method of a robot based on generating an countermeasure network according to claim 9, wherein the calculating the maximum entropy is specifically:
max-plogp
where p is the probability, l i Represents the ith trace in the probabilistic model, f represents the feature expectations, f E Representing expert characteristics expectations τ i An ith element in the expert sample set;
wherein ,λi (i=0 to n) is the i-th parameter in the bonus function R.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111344484.8A CN116151385A (en) | 2021-11-15 | 2021-11-15 | Robot autonomous learning method based on generation of countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111344484.8A CN116151385A (en) | 2021-11-15 | 2021-11-15 | Robot autonomous learning method based on generation of countermeasure network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116151385A true CN116151385A (en) | 2023-05-23 |
Family
ID=86354821
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111344484.8A Pending CN116151385A (en) | 2021-11-15 | 2021-11-15 | Robot autonomous learning method based on generation of countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116151385A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117250970A (en) * | 2023-11-13 | 2023-12-19 | 青岛澎湃海洋探索技术有限公司 | Method for realizing AUV fault detection based on model embedding generation countermeasure network |
-
2021
- 2021-11-15 CN CN202111344484.8A patent/CN116151385A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117250970A (en) * | 2023-11-13 | 2023-12-19 | 青岛澎湃海洋探索技术有限公司 | Method for realizing AUV fault detection based on model embedding generation countermeasure network |
CN117250970B (en) * | 2023-11-13 | 2024-02-02 | 青岛澎湃海洋探索技术有限公司 | Method for realizing AUV fault detection based on model embedding generation countermeasure network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112668235B (en) | Robot control method based on off-line model pre-training learning DDPG algorithm | |
CN113511082B (en) | Hybrid electric vehicle energy management method based on rule and double-depth Q network | |
CN111191769B (en) | Self-adaptive neural network training and reasoning device | |
CN110594317B (en) | Starting control strategy based on double-clutch type automatic transmission | |
WO2022252457A1 (en) | Autonomous driving control method, apparatus and device, and readable storage medium | |
CN113722980A (en) | Ocean wave height prediction method, system, computer equipment, storage medium and terminal | |
CN116151385A (en) | Robot autonomous learning method based on generation of countermeasure network | |
Schuman et al. | Low size, weight, and power neuromorphic computing to improve combustion engine efficiency | |
CN112487933B (en) | Radar waveform identification method and system based on automatic deep learning | |
CN118097228A (en) | Multi-teacher auxiliary instance self-adaptive DNN-based mobile platform multi-target classification method | |
CN112307674B (en) | Low-altitude target knowledge assisted intelligent electromagnetic sensing method, system and storage medium | |
CN118247393A (en) | AIGC-based 3D digital man driving method | |
CN117574776A (en) | Task planning-oriented model self-learning optimization method | |
CN117709712A (en) | Situation prediction method and terminal for power distribution network based on hybrid neural network | |
Gladwin et al. | A controlled migration genetic algorithm operator for hardware-in-the-loop experimentation | |
Lee et al. | A real-time intelligent speed optimization planner using reinforcement learning | |
CN115423149A (en) | Incremental iterative clustering method for energy internet load prediction and noise level estimation | |
Riid et al. | Interpretability of fuzzy systems and its application to process control | |
CN117544508B (en) | Network equipment configuration query method and device, terminal equipment and storage medium | |
CN116755046B (en) | Multifunctional radar interference decision-making method based on imperfect expert strategy | |
CN118343165B (en) | Personification vehicle following method based on driver characteristics | |
CN114691518B (en) | EFSM input sequence generation method based on deep reinforcement learning | |
Chen et al. | Optimizing driving conditions for distributed drive electric vehicles using minimum unit encoded neural networks | |
CN114386601B (en) | HTM efficient anomaly detection method for server load data | |
CN116176606A (en) | Method and device for reinforcement learning of intelligent agent for controlling vehicle driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |