CN114527666B - CPS system reinforcement learning control method based on attention mechanism - Google Patents

CPS system reinforcement learning control method based on attention mechanism Download PDF

Info

Publication number
CN114527666B
CN114527666B CN202210221958.8A CN202210221958A CN114527666B CN 114527666 B CN114527666 B CN 114527666B CN 202210221958 A CN202210221958 A CN 202210221958A CN 114527666 B CN114527666 B CN 114527666B
Authority
CN
China
Prior art keywords
sensor
environment
state
strategy
control object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210221958.8A
Other languages
Chinese (zh)
Other versions
CN114527666A (en
Inventor
卢岩涛
李青
孙仕琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202210221958.8A priority Critical patent/CN114527666B/en
Publication of CN114527666A publication Critical patent/CN114527666A/en
Application granted granted Critical
Publication of CN114527666B publication Critical patent/CN114527666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a CPS system reinforcement learning control method based on an attention mechanism, which comprises the following steps: the control object selects a proper strategy through a strategy network and executes the environment; the environment generates change and response under the execution of the strategy, and generates a reward; detecting the environment by a plurality of preset sensors to obtain detection information of the plurality of sensors; and transmitting the sensor detection information into a self-attention network, inputting the acquired rewards and the current state of the sensor information into a strategy network at the same time, updating the gradient of the strategy network, selecting the strategy of the next time period as the input of the strategy network, and repeating the steps to finish the learning control method. When the invention uses the reinforcement learning algorithm to solve the actual control problem, the method has more relaxed and convenient design requirements for rewards, namely, partial information can be learned through the hidden knowledge of the sensor.

Description

CPS system reinforcement learning control method based on attention mechanism
Technical Field
The invention belongs to the technical field of CPS system learning control methods, and particularly relates to a CPS system reinforcement learning control method based on an attention mechanism.
Background
In current CPS systems, how to combine the sensing information of sensors to design a reasonably intelligent control algorithm for the CPS system has become a long-felt problem. In the design of intelligent algorithms, reinforcement learning has received a great deal of attention as an algorithm located at the forefront of academic. Although the reinforcement learning, especially q learning, is a black box model based on machine learning algorithm, resulting in a weaker interpretability than the traditional model, the reinforcement learning does not need to be redesigned according to the model, has strong adaptability, can be trained more easily, has better effect and more intelligent, and is favored by a series of characteristics.
However, there is a significant problem in that the conventional reinforcement learning model is essentially an interpretation model for learning without considering the modification required when applied to the CPS system, and because of the large number of sensors in the complex CPS system, the training itself of the reinforcement learning model is more difficult, thereby affecting the improvement effect that the model can obtain.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a CPS system reinforcement learning control method based on an attention mechanism to solve the problems in the background art.
In order to solve the technical problems, the invention adopts the following technical scheme: CPS system reinforcement learning control method based on attention mechanism includes the following steps:
s1, selecting a proper strategy by a control object through a strategy network, and executing the environment;
s2, the environment generates change and response under the execution of the strategy, and a reward is generated;
s3, detecting the environment by a plurality of preset sensors to obtain detection information of the plurality of sensors;
s4, transmitting the sensor detection information into a self-attention network, and simultaneously, automatically acquiring the last-step behavior of the control object by the self-attention network, and calculating the needed sensor information by taking the sensor detection information and the last-step behavior of the control object as references;
s5, inputting the rewards and the current state of the acquired sensor information into a strategy network at the same time, updating the gradient of the strategy network, selecting the strategy of the next time period as the input of the strategy network, and repeating the steps to finish the learning control method.
Further, the learning control method is further divided into a training mode and an execution mode.
Further, the execution mode includes the steps of:
s101, at time k, the state of the control object isState of the environmentTake action u k ∈A;
S102, under the influence of the behavior, the state of the environment:state of control object: />Prize value: />
S103, for the environment state of the time k+1The sensor captures information in the environment, obtains:
s104, sensor information based on the time periodBehavior u with previous time period k Sensor information after screening is obtained using a model of the self-attention mechanism: />
S105, in combination with the above information, the control object starts to infer an action that should be performed in the next time period:
s106, executing action u (k+1) Returning to S101;
wherein S is env Representing the state of the environment; s is S agent Representing the state of the control object; s is S sensor Representing the state of each parameter obtained by the sensor;
a represents a limited set of actions, i.e. actions that the control object can take; p represents the transition probability, i.e. after taking an action,transfer to->Probability of (2); r is a reward function; gamma represents a discount factor;
sensor reading environment: f (F) sensor :S env →S sensor
The environment changes: f (F) env :S env ×A→S env
Bonus function: f (F) reward :S agent ×A→R;
State change function: f (F) state :S agent ×A→S agent
There is also provided an end-to-end model that can be obtained by machine learning: neural networks of self-attention mechanisms: sigma (sigma) attention :S sensor ×A→S att_sensor The method comprises the steps of carrying out a first treatment on the surface of the Neural networks controlling object selection behavior strategies: sigma (sigma) agent :S att_sensor ×S agent →A
S sensor Information obtained on behalf of the sensor sensing the external environment;
S att_sensor representing the sensor information left after passing through the self-attention mechanism.
Further, the training mode includes the steps of:
s201, at time k, the state of the control object isStatus of the environment-> Take action u k ∈A;
S202, under the influence of the behavior, the state of the environment:state of control object: />Prize value: />
S203, for the environmental state of this time k+1The sensor captures information in the environment, obtains:
s204, sensor information based on the time periodBehavior u with previous time period k Sensor information after screening is obtained using a model of the self-attention mechanism: />
S205, in combination with the above information, the control object starts to infer an action that should be performed in the next time period:
s206, executing action u (k+1) Returning to S101, each data pairing is collected:
s207, pairing data as a data set, and regarding the neural network sigma attention Sum sigma agent And carrying out joint gradient descent, taking the descended parameters as a new neural network, and returning to the first step until convergence.
Compared with the prior art, the invention has the following advantages:
according to the method, a self-attention mechanism is introduced into information screening of the sensor, and the action of the last time period is taken as a part of screening, so that the information is considered, and when the actual control problem is solved by using a reinforcement learning algorithm, the method has more relaxed and convenient design requirements on rewards, namely, part of information can be learned through hidden knowledge of the sensor; meanwhile, due to the fact that screening of the attention mechanism to which the sensor belongs is set, a large number of sensors can be added when the CPS system is built, the CPS system is suitable for more application scenes, and the application range of the CPS system is widened.
Detailed Description
The following description of the technical solutions in the embodiments of the present invention will be clear and complete, and it is obvious that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a technical scheme that: CPS system reinforcement learning control method based on attention mechanism includes the following steps:
s1, selecting a proper strategy by a control object through a strategy network, and executing the environment;
s2, the environment generates change and response under the execution of the strategy, and a reward is generated;
s3, detecting the environment by a plurality of preset sensors to obtain detection information of the plurality of sensors;
s4, transmitting the sensor detection information into a self-attention network, and simultaneously, automatically acquiring the last-step behavior of the control object by the self-attention network, and calculating the needed sensor information by taking the sensor detection information and the last-step behavior of the control object as references;
s5, inputting the rewards and the current state of the acquired sensor information into a strategy network at the same time, updating the gradient of the strategy network, selecting the strategy of the next time period as the input of the strategy network, and repeating the steps to finish the learning control method.
Specifically, the method comprises a main control object agent, wherein the control object comprises a strategy selector network for automatically selecting a control strategy, and the strategy selector network comprises an external environment which is mainly an external scene to which the control method is applied;
when the control object agent executes the strategy in the environment, some interactions with the environment, such as walking over some obstacles and taking away some objects in the scene, the changes can generate some stimulus to the environment, and the states of the control object and the environment are influenced at the same time, so that certain changes can be generated;
after these changes occur, the rewarding mechanism will determine how much of the overall environment and between the control object and the final target needs to be completed;
comparing the distance between the control object and the target before executing the strategy and the distance between the control object and the target after executing the strategy, the strategy can be known to play a positive role or a negative role for overall control, and then rewarding review is defined according to the role;
if positive effects are given to overall control, a positive prize is given, and if negative effects are given, a penalty is given.
The rewards can be defined at any time according to specific application scenes, for example, the rewards can be defined as the distance between a robot and a target in a robot path planning task;
in the context of algorithm application, a large number of sensors, such as infrared sensors, distance sensors, temperature sensors, pressure sensors, etc., may be included to construct a series of sensors for external environmental conditions in real time, to obtain the state of the environment, so that the environment can be sensitively captured after the environment interacts with the control object to generate a change.
The sensor system is connected with a screening network constructed by a self-attention mechanism, and mainly aims to combine with an execution strategy of a control object in one step, screen a large amount of sensor information through the correlation of actions and environment interaction and the self-attention of a sensing space, and directly screen the end-to-end algorithm based on machine learning through the screening network and the self-attention algorithm to obtain the needed sensor information.
The self-attention mechanism network and rewards from the environment are input into the strategy selection network of the control object agent after data normalization and coupling to select a proper strategy, wherein the strategy needs to specifically design a strategy space according to different scenes, the strategy space can be continuous or discrete, and the discrete behavior space comprises various discrete behaviors such as switching equipment, object taking and the like; the continuous behavior space comprises a continuous number of behaviors, for example, controlling at what speed and angle the robot moves.
The learning control method is also divided into a training mode and an execution mode.
The execution mode comprises the following steps:
the execution mode comprises the following steps:
s101, at time k, the state of the control object isState of the environmentTake action u k ∈A;
S102, under the influence of the behavior, the state of the environment:state of control object: />Prize value: />
S103, for the environment state of the time k+1The sensor captures information in the environment, obtains:
s104, sensor information based on the time periodBehavior u with previous time period k Sensor information after screening is obtained using a model of the self-attention mechanism: />
S105, in combination with the above information, the control object starts to infer an action that should be performed in the next time period:
s106, executing action u (k+1) Returning to S101;
wherein S is env Representing the state of the environment; s is S agent Representing the state of the control object; s is S sensor Representing the state of each parameter obtained by the sensor;
a represents a limited set of actions, i.e. actions that the control object can take; p represents transition probability, i.eAfter an action is taken, the user can take,transfer to->Probability of (2); r is a reward function; gamma represents a discount factor;
sensor reading environment: f (F) sensor :S env →S sensor
The environment changes: f (F) env :S env ×A→S env
Bonus function: f (F) reward :S agent ×A→R;
State change function: f (F) state :S agent ×A→S agent
There is also provided an end-to-end model that can be obtained by machine learning: neural networks of self-attention mechanisms: sigma (sigma) attention :S sensor ×A→S att_sensor The method comprises the steps of carrying out a first treatment on the surface of the Neural networks controlling object selection behavior strategies: sigma (sigma) agent :S att_sensor ×S agent →A
S sensor Information obtained on behalf of the sensor sensing the external environment;
S att_sensor representing the sensor information left after passing through the self-attention mechanism.
The training mode comprises the following steps:
s201, at time k, the state of the control object isStatus of the environment-> Take action u k ∈A;
S202, under the influence of the behavior, the state of the environment:state of control object: />Prize value: />
S203, for the environmental state of this time k+1The sensor captures information in the environment, obtains:
s204, sensor information based on the time periodBehavior u with previous time period k Sensor information after screening is obtained using a model of the self-attention mechanism: />
S205, in combination with the above information, the control object starts to infer an action that should be performed in the next time period:
s206, executing action u (k+1) Returning to S101, each data pairing is collected:
s207, pairing data as a data set, and regarding the neural network sigma attention Sum sigma agent And carrying out joint gradient descent, taking the descended parameters as a new neural network, and returning to the first step until convergence.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (1)

1. The CPS system reinforcement learning control method based on the attention mechanism is characterized by comprising the following steps of:
s1, selecting a proper strategy by a control object through a strategy network, and executing the environment;
s2, the environment generates change and response under the execution of the strategy, and a reward is generated;
s3, detecting the environment by a plurality of preset sensors to obtain detection information of the plurality of sensors;
s4, transmitting the sensor detection information into a self-attention network, and simultaneously, automatically acquiring the last-step behavior of the control object by the self-attention network, and calculating the needed sensor information by taking the sensor detection information and the last-step behavior of the control object as references;
s5, inputting the rewards and the current state of the acquired sensor information into a strategy network at the same time, updating the gradient of the strategy network, selecting the strategy of the next time period as the input of the strategy network, and repeating the steps to finish a learning control method which is further divided into a training mode and an executing mode;
the execution mode comprises the following steps:
s101, at time k, the state of the control object isState of the environmentTake action u k ∈A;
S102, under the influence of the behavior, the state of the environment:state of control object: />Prize value: />
S103, for the environment state of the time k+1The sensor captures information in the environment, obtains:
s104, sensor information based on the time periodBehavior u with previous time period k Sensor information after screening is obtained using a model of the self-attention mechanism: />
S105, in combination with the above information, the control object starts to infer an action that should be performed in the next time period:
s106, executing action u (k+1) Returning to S101;
wherein S is env Representing the state of the environment; s is S agent Representing the state of the control object; s is S sensor Representing the state of each parameter obtained by the sensor;
a represents a limited set of actions, i.e. actions that the control object can take; p represents the transition probability, i.e. after taking an action,transfer to->Probability of (2); r is a reward function; gamma represents a discount factor;
sensor reading environment: f (F) sensor :S env →S sensor
The environment changes: f (F) env :S env ×A→S env
Bonus function: f (F) reward :S agent ×A→R;
State change function: f (F) state :S agent ×A→S agent
There is also provided an end-to-end model that can be obtained by machine learning: neural networks of self-attention mechanisms: sigma (sigma) attention :S sensor ×A→S att_sensor The method comprises the steps of carrying out a first treatment on the surface of the Neural networks controlling object selection behavior strategies: sigma (sigma) agent :S att_sensor ×S agent →A
S sensor Information obtained on behalf of the sensor sensing the external environment;
S att_sensor representing sensor information left after passing through the self-attention mechanism;
the training mode comprises the following steps:
s201, at time k, the state of the control object isStatus of the environment->∈S env The method comprises the steps of carrying out a first treatment on the surface of the Take action u k ∈A;
S202, under the influence of the behavior, the state of the environment:state of control object: />Prize value: />
S203, for the environmental state of this time k+1The sensor captures information in the environment, obtains:
s204, sensor information based on the time periodBehavior u with previous time period k Sensor information after screening is obtained using a model of the self-attention mechanism: />
S205, in combination with the above information, the control object starts to infer an action that should be performed in the next time period:
s206, executing action u (k+1) Returning to S101, each data pairing is collected:
s207, pairing data as a data set, and regarding the neural network sigma attention Sum sigma agent And carrying out joint gradient descent, taking the descended parameters as a new neural network, and returning to the first step until convergence.
CN202210221958.8A 2022-03-09 2022-03-09 CPS system reinforcement learning control method based on attention mechanism Active CN114527666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210221958.8A CN114527666B (en) 2022-03-09 2022-03-09 CPS system reinforcement learning control method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210221958.8A CN114527666B (en) 2022-03-09 2022-03-09 CPS system reinforcement learning control method based on attention mechanism

Publications (2)

Publication Number Publication Date
CN114527666A CN114527666A (en) 2022-05-24
CN114527666B true CN114527666B (en) 2023-08-11

Family

ID=81626389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210221958.8A Active CN114527666B (en) 2022-03-09 2022-03-09 CPS system reinforcement learning control method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN114527666B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110824917A (en) * 2019-10-29 2020-02-21 西北工业大学 Semiconductor chip test path planning method based on attention mechanism reinforcement learning
CN111881772A (en) * 2020-07-06 2020-11-03 上海交通大学 Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning
CN113255054A (en) * 2021-03-14 2021-08-13 南京晓庄学院 Reinforcement learning automatic driving method based on heterogeneous fusion characteristics
CN113255936A (en) * 2021-05-28 2021-08-13 浙江工业大学 Deep reinforcement learning strategy protection defense method and device based on simulation learning and attention mechanism
CN113283169A (en) * 2021-05-24 2021-08-20 北京理工大学 Three-dimensional group exploration method based on multi-head attention asynchronous reinforcement learning
CN113313267A (en) * 2021-06-28 2021-08-27 浙江大学 Multi-agent reinforcement learning method based on value decomposition and attention mechanism
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
CN113641192A (en) * 2021-07-06 2021-11-12 暨南大学 Route planning method for unmanned aerial vehicle crowd sensing task based on reinforcement learning
CN114038212A (en) * 2021-10-19 2022-02-11 南京航空航天大学 Signal lamp control method based on two-stage attention mechanism and deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3827092B2 (en) * 2003-10-22 2006-09-27 オムロン株式会社 Control system setting device, control system setting method, and setting program

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110824917A (en) * 2019-10-29 2020-02-21 西北工业大学 Semiconductor chip test path planning method based on attention mechanism reinforcement learning
CN111881772A (en) * 2020-07-06 2020-11-03 上海交通大学 Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning
CN113255054A (en) * 2021-03-14 2021-08-13 南京晓庄学院 Reinforcement learning automatic driving method based on heterogeneous fusion characteristics
CN113283169A (en) * 2021-05-24 2021-08-20 北京理工大学 Three-dimensional group exploration method based on multi-head attention asynchronous reinforcement learning
CN113255936A (en) * 2021-05-28 2021-08-13 浙江工业大学 Deep reinforcement learning strategy protection defense method and device based on simulation learning and attention mechanism
CN113313267A (en) * 2021-06-28 2021-08-27 浙江大学 Multi-agent reinforcement learning method based on value decomposition and attention mechanism
CN113641192A (en) * 2021-07-06 2021-11-12 暨南大学 Route planning method for unmanned aerial vehicle crowd sensing task based on reinforcement learning
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
CN114038212A (en) * 2021-10-19 2022-02-11 南京航空航天大学 Signal lamp control method based on two-stage attention mechanism and deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Yantao Lu等.Efficient Human Activity Classification from Egocentric Videos Incorporating Actor-Critic Reinforcement Learning.《2019 IEEE International Conference on Image Processing (ICIP)》.2019,全文. *

Also Published As

Publication number Publication date
CN114527666A (en) 2022-05-24

Similar Documents

Publication Publication Date Title
Salmeron et al. Dynamic optimization of fuzzy cognitive maps for time series forecasting
CN114818515A (en) Multidimensional time sequence prediction method based on self-attention mechanism and graph convolution network
CN108683614B (en) Virtual reality equipment cluster bandwidth allocation device based on threshold residual error network
CN111027686A (en) Landslide displacement prediction method, device and equipment
CN113325721B (en) Model-free adaptive control method and system for industrial system
CN109344992B (en) Modeling method for user control behavior habits of smart home integrating time-space factors
CN113077052A (en) Reinforced learning method, device, equipment and medium for sparse reward environment
CN116842856B (en) Industrial process optimization method based on deep reinforcement learning
EP3502978A1 (en) Meta-learning system
CN109615058A (en) A kind of training method of neural network model
Zhou et al. Time-varying trajectory modeling via dynamic governing network for remaining useful life prediction
CN114527666B (en) CPS system reinforcement learning control method based on attention mechanism
CN114781248A (en) Off-line reinforcement learning method and device based on state offset correction
JP7438365B2 (en) Learning utilization system, utilization device, learning device, program and learning utilization method
Kuo et al. Generalized part family formation through fuzzy self-organizing feature map neural network
CN117058235A (en) Visual positioning method crossing various indoor scenes
Mabu et al. Adaptability analysis of genetic network programming with reinforcement learning in dynamically changing environments
CN113743572A (en) Artificial neural network testing method based on fuzzy
CN116798198A (en) Sensor abnormality detection and early warning method based on multivariate time sequence prediction model
Markinos et al. Introducing Fuzzy Cognitive Maps for decision making in precision agriculture
CN116128028A (en) Efficient deep reinforcement learning algorithm for continuous decision space combination optimization
CN115459982A (en) Power network false data injection attack detection method
CN113723757A (en) Decision generation model training method, decision generation method and device
JP2022035737A (en) Control system, control method, control device and program
CN110084358A (en) A kind of smart home multi-Sensor Information Fusion Approach neural network based

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant