CN113050565B - Gate control method and device, electronic device and storage medium - Google Patents

Gate control method and device, electronic device and storage medium Download PDF

Info

Publication number
CN113050565B
CN113050565B CN202110272117.5A CN202110272117A CN113050565B CN 113050565 B CN113050565 B CN 113050565B CN 202110272117 A CN202110272117 A CN 202110272117A CN 113050565 B CN113050565 B CN 113050565B
Authority
CN
China
Prior art keywords
reward
gate
control
training
inputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110272117.5A
Other languages
Chinese (zh)
Other versions
CN113050565A (en
Inventor
马曼曼
豆渊博
李青锋
段春青
马丹丹
张卫红
高帆
乔雨
杜东峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Innovation Research Institute of Beihang University
Original Assignee
Hangzhou Innovation Research Institute of Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Innovation Research Institute of Beihang University filed Critical Hangzhou Innovation Research Institute of Beihang University
Priority to CN202110272117.5A priority Critical patent/CN113050565B/en
Publication of CN113050565A publication Critical patent/CN113050565A/en
Application granted granted Critical
Publication of CN113050565B publication Critical patent/CN113050565B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS], computer integrated manufacturing [CIM]
    • G05B19/41885Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS], computer integrated manufacturing [CIM] characterised by modeling, simulation of the manufacturing system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/32Operator till task planning
    • G05B2219/32339Object oriented modeling, design, analysis, implementation, simulation language
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The embodiment of the application provides a gate control method and device, electronic equipment and a storage medium, and relates to the technical field of gate control. The gate control method comprises the following steps: firstly, acquiring a state parameter set of a gate to be processed; secondly, inputting the state parameter set into a preset control model to obtain control data of the gate to be processed; and then, controlling the gate to be processed according to the control data. By the method, the gate control can be directly carried out through the control model, and the problem that in the prior art, an accurate physical model of the canal needs to be established, but the models are not always effective due to complex construction, mechanical operation and unpredictable disturbance, so that the gate control efficiency is low is solved.

Description

Gate control method and device, electronic device and storage medium
Technical Field
The present disclosure relates to the field of gate technologies, and in particular, to a gate control method and apparatus, an electronic device, and a storage medium.
Background
The inventor has found that the conventional gate control strategy in the prior art requires the establishment of an accurate physical model of the canal, but due to complicated construction, mechanical operation and unpredictable disturbances, these models are not always effective and thus suffer from inefficient gate control.
Disclosure of Invention
In view of the above, the present application aims to provide a gate control method and apparatus, an electronic device, and a storage medium, so as to solve the problems in the prior art.
In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
in a first aspect, the present invention provides a gate control method, including:
acquiring a state parameter set of a gate to be processed;
inputting the state parameter set into a preset control model to obtain control data of the gate to be processed;
and controlling the gate to be processed according to the control data.
In an optional embodiment, the control data includes at least one control combination and an expected value corresponding to each control combination, and the step of controlling the gate to be processed according to the control data includes:
selecting a target control combination from the at least one control combination according to the expected value;
and controlling the gate to be processed according to the target control combination.
In an alternative embodiment, the step of selecting a target control combination from the at least one control combination according to the desired value includes:
and sequencing the expected values corresponding to the control combinations, and taking the control combination with the maximum expected value as a target control combination.
In an optional embodiment, the gate control method further includes a step of training an initial model to obtain a preset control model, where the initial model includes a depth network and a reward network, and the step includes:
acquiring training data, wherein the training data comprises a current state parameter set and a current control combination of a training gate;
inputting the current state parameter set into the depth network to obtain the current high-level feature and the next control combination of the training gate, and controlling the training gate according to the next control combination to obtain the next state parameter set;
inputting the current state parameter set, the current high-level characteristics and the current control combination into the reward network to obtain the total reward of the training gate;
and training the initial model according to the total reward to obtain the preset control model.
In an alternative embodiment, the reward network stores a first corresponding relationship between a state parameter set and a security reward and a speed reward, and a second corresponding relationship between a control combination and an efficiency reward, the reward network includes a reward full connection layer, and the step of inputting the current state parameter set, the current high-level feature and the current control combination into the reward network to obtain the total reward of the training gate includes:
processing according to the current state parameter set and the first corresponding relation to obtain the safety reward and the speed reward of the training gate;
processing according to the current control combination and the second corresponding relation to obtain the efficiency reward of the training gate;
and inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into the reward full-connection layer to obtain the total reward of the training gate.
In an alternative embodiment, the step of inputting the security reward, the speed reward, the efficiency reward and the current high-level feature into the reward full link layer to obtain the total reward of the training gate comprises:
inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into the reward full-connection layer, and processing to obtain an output reward;
and cutting the safety reward, the speed reward, the efficiency reward and the output reward to obtain the total reward of the training gate.
In an optional embodiment, the deep network includes six deep fully-connected layers, and the step of inputting the current state parameter set into the deep network to obtain a current high-level feature and a next control combination of the training gate includes:
inputting the current state parameter set into a first depth full-link layer to obtain a first result, inputting the first result into a second depth full-link layer to obtain a second result, inputting the second result into a third depth full-link layer to obtain a third result, inputting the third result into a fourth depth full-link layer to obtain a fourth result, and inputting the fourth result into a fifth depth full-link layer to obtain the current high-level characteristics of the training gate;
and inputting the current advanced features into a sixth depth full-link layer to obtain a next control combination.
In a second aspect, the present invention provides a shutter control device including:
the parameter acquisition module is used for acquiring a state parameter set of the gate to be processed;
the control data acquisition module is used for inputting the state parameter set into a preset control model to obtain control data of the gate to be processed;
and the control module is used for controlling the gate to be processed according to the control data.
In a third aspect, the present invention provides an electronic device comprising: the gate control system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the gate control method of any one of the preceding embodiments.
In a fourth aspect, the present invention provides a storage medium, where the storage medium includes a computer program, and the computer program controls, when running, an electronic device where the storage medium is located to execute the gate control method according to any one of the foregoing embodiments.
According to the gate control method and device, the electronic equipment and the storage medium, the state parameter set is input into the preset control model to obtain the control data, the gate is controlled according to the control data, the gate control is directly carried out through the control model, and the problem that in the prior art, an accurate physical model of the canal needs to be established, but due to complex construction, mechanical operation and unpredictable disturbance, the models are not always effective, and the gate control efficiency is low is solved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 shows a block diagram of an electronic device according to an embodiment of the present application.
Fig. 2 shows a schematic flow chart of a gate control method according to an embodiment of the present application.
Fig. 3 shows an application scenario diagram of a gate provided in an embodiment of the present application.
Fig. 4 shows a schematic structural diagram of a sensor network provided in an embodiment of the present application.
Fig. 5 shows another schematic flow chart of a gate control method according to an embodiment of the present application.
Fig. 6 shows an application scenario diagram of a preset control model provided in an embodiment of the present application.
Fig. 7 shows a schematic structural diagram of a preset control model provided in an embodiment of the present application.
Fig. 8 illustrates another structural schematic diagram of the preset control model provided in the embodiment of the present application.
Fig. 9 is a block diagram illustrating a structure of a gate control device according to an embodiment of the present application.
Icon: 100-an electronic device; 110 — a first memory; 120-a first processor; 130-a communication module; 900-gate control device; 910-parameter obtaining module; 920-a control data acquisition module; 930-control module.
Detailed Description
Fresh water on earth is very limited and very unevenly distributed. In order to increase the availability of water resources, efforts have been made, one of which is to transport water resources from rich areas to water-deficient areas. There are many large-scale diversion projects, such as the U.S. california hydraulic engineering and the north-south water diversion project (SNMTP) in china. These diversion works generally consist of a plurality of canals. A gate is built between adjacent canal ponds to adjust water diversion. In order to achieve automatic control of the trench ponds, a large number of sensors are arranged along the trench ponds to form the internet of things (IoT), and an automatic control strategy is integrated into the gate actuators.
A great deal of research has been carried out on the effective management of canals and great progress has been made. The existing methods may be largely classified into a model-based channel gate control method and a model-free channel gate control method. The predictive control based on the model is a method for effectively controlling channels with complex dynamic characteristics by researchers, and can be divided into a centralized control mode and a distributed control mode according to the number of used controllers. The model-free channel gate control method comprises a genetic algorithm based on the concept of 'superior and inferior' and natural evolution and a method based on reinforcement learning. The reinforcement learning has two important characteristics of trial and error search and delayed response, and is suitable for channel control under the condition that a physical model cannot be established.
Common gate control strategies are based on accurate canal physics models, but these models are not always effective due to complex construction, mechanical operation and unpredictable disturbances, and the computational cost of each step of decision is a non-negligible issue.
Aiming at the problems, model-free control strategies such as a genetic algorithm, an enhanced learning algorithm and the like can be adopted to realize effective control of the water channel, but most of the methods focus on channel control with a single target, and channels with a plurality of optimization targets are difficult to effectively control.
In order to improve at least one of the above technical problems of the present application, embodiments of the present application provide a gate control method and apparatus, an electronic device, and a storage medium, and the following describes technical solutions of the present application through possible implementations.
The defects existing in the above solutions are the results obtained after the inventor has practiced and studied carefully, so the discovery process of the above problems and the solutions proposed by the embodiments of the present application in the following description to the above problems should be the contributions made by the inventor in the invention process.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It is to be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
It should be noted that the features of the embodiments of the present application may be combined with each other without conflict.
Referring to fig. 1, a block diagram of an electronic device 100 according to an embodiment of the present disclosure is shown, where the electronic device 100 in this embodiment may be a server, a processing device, a processing platform, and the like, which are capable of performing data interaction and processing. The electronic device 100 includes a first memory 110, a first processor 120, and a communication module 130. The elements of the first memory 110, the first processor 120 and the communication module 130 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.
The first memory 110 is used for storing programs or data. The first Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The first processor 120 is used to read/write data or programs stored in the first memory 110 and perform corresponding functions. The communication module 130 is used for establishing a communication connection between the electronic device 100 and another communication terminal through a network, and for transceiving data through the network.
It should be understood that the configuration shown in fig. 1 is merely a schematic diagram of the configuration of the electronic device 100, and that the electronic device 100 may include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
Referring to fig. 2, a flowchart of a gate control method according to an embodiment of the present disclosure may be executed by the electronic device 100 in fig. 1, for example, by the first processor 120 in the electronic device 100. It should be understood that, in other embodiments, the order of some steps in the gate control method of this embodiment may be interchanged according to actual needs, or some steps may be omitted or deleted. The flow of the shutter control method shown in fig. 2 is described in detail below.
And step S210, acquiring a state parameter set of the gate to be processed.
And S220, inputting the state parameter set into a preset control model to obtain control data of the gate to be processed.
And step S230, controlling the gate to be processed according to the control data.
According to the method, the state parameter set is input into the preset control model to obtain the control data, the gate is controlled according to the control data, the gate control is directly carried out through the control model, and the problem that in the prior art, an accurate physical model of the canal needs to be established, but the models are not always effective due to complex construction, mechanical operation and unpredictable disturbance, and the gate control efficiency is low is solved.
For step S210, it should be noted that the specific manner of acquiring the state parameter set is not limited, and may be set according to the actual application requirement. For example, in an alternative example, the state parameter set of the gate to be processed may be obtained by manual input. For another example, in another alternative example, the set of state parameters may be acquired over a communicatively connected sensor network.
Referring to fig. 3, a channel is typically made up of a plurality of interconnected trench ponds, with adjacent trench ponds connected by gates to control the flow of water from upstream to downstream. Multiple gates operating in coordination to control the volume of water required to pass through the cascade of tanks andspeed. In channel control, the status parameters to be acquired may include oi(opening size of the ith door),
Figure BDA0002974695600000081
(water level before the ith gate),
Figure BDA0002974695600000082
(water level after gate of ith gate), fi(flow rate of the ith gate),
Figure BDA0002974695600000083
(jth tap of the channel downstream of the ith gate).
The status information of the channel can be collected by arranging a sensor network around the channel, and the structure is pyramid-shaped as shown in fig. 4. At the lowest level of the network, there are many sensors deployed around the trench ponds and gates, and the information collected by the sensors is sent to the second level gateways (cellular, Zig-zag, Wi-Fi, and wired) in different ways. The interface of the sensor CAN be CAN, RS232, RS485 or wireless. At the second level, the intelligent gateway deploys a small amount of computation for data pre-processing (e.g., compression, translation, and encryption). The preprocessed data is sent from the gateway to the data center via a telecommunications network or an internal network. And finally, converting the data stored in the data center into a state parameter set, and inputting the state parameter set into a preset control model.
For step S220, it should be noted that, in order to solve the problem of multiple optimization objectives, such as unavailability of a mathematical model and safety, rapidness, and high efficiency of a canal basin gate, the invention provides a model-free multi-objective optimization method to realize effective control of a channel. The method comprises the steps of formulating a control strategy based on a Deep Q Network (Deep Qnetwork, DQN), realizing model-free control through reinforcement learning of a large number of events, establishing a reward neural Network (R-Network), approximating appropriate final reward through learning from rewards of a plurality of optimization targets, and integrating the R-Network into the DQN to construct a Deep canal control (RDCC) model of channel multi-optimization target model-free control.
Before step S220, the gate control method provided in this embodiment of the present application may further include a step of training an initial model to obtain a preset control model, where the initial model includes a depth network and a reward network, and with reference to fig. 5, the step may include:
in step S240, training data is acquired.
Wherein the training data comprises a current state parameter set and a current control combination of the training gate.
And step S250, inputting the current state parameter set into the depth network to obtain the current high-level feature and the next control combination of the training gate, and controlling the training gate according to the next control combination to obtain the next state parameter set.
And step S260, inputting the current state parameter set, the current high-level characteristics and the current control combination into a reward network to obtain the total reward of the training gate.
And step S270, training the initial model according to the total reward to obtain a preset control model.
In conjunction with fig. 6, from the perspective of reinforcement learning, the cascade channel may be regarded as an environment (Env for short), and the proposed control model RDCC may be regarded as an Agent. During learning, the Agent will interact with Env multiple times, each potentially involving a different number of time steps. At each time step t, the environment generates a state, which may be represented as
Figure BDA0002974695600000091
SiIs the status of the ith gate and can be expressed as
Figure BDA0002974695600000092
At the same time, a prize is generated that includes a plurality of sub-prizes: secure rewards
Figure BDA0002974695600000095
Speed reward
Figure BDA0002974695600000096
And efficiency rewards
Figure BDA0002974695600000097
The research aiming at multi-target reinforcement learning is more, and the method can be divided into a single strategy method and a multi-strategy method. The invention designs a reward neural Network (R-Network) to approximate total reward, and adopts a single strategy method to carry out multi-target reinforcement learning, which is different from the traditional single strategy method for manually obtaining reward. The total reward provided for an Agent at each time step may be expressed as
Figure BDA0002974695600000098
R is an approximate function of the R network,
Figure BDA0002974695600000093
and
Figure BDA0002974695600000094
respectively representing three sub-awards, F (S), obtained by querying corresponding tablest) Is a high level property of the Env state at each time step.
Each set of actions can be represented as
Figure BDA0002974695600000101
NgIs the gate number of the cascade channel. The output of the Q network is the Q value of the set of possible actions for all gates. The Q value is the expected reward that an Agent can obtain from the current time step to the end of the round after taking a set of actions.
For step S250, it should be noted that the specific steps for obtaining the current high-level feature and the next control combination are not limited, and may be set according to the actual application requirements. For example, in an alternative example, where the deep network (Q network) includes six deep fully connected layers, step S250 may include the following sub-steps:
inputting the current state parameter set into a first depth full-link layer to obtain a first result, inputting the first result into a second depth full-link layer to obtain a second result, inputting the second result into a third depth full-link layer to obtain a third result, inputting the third result into a fourth depth full-link layer to obtain a fourth result, and inputting the fourth result into a fifth depth full-link layer to obtain the current high-level characteristics of the training gate; and inputting the current advanced features into the sixth depth full-link layer to obtain the next control combination.
In detail, the method enhances the DQN, and adds an intelligent neural network to perform fitting approximation of a reward value on the basis of an original depth Q network to construct an RDCC model. As shown in FIG. 7, at each time step, the set of state parameters for all the training gates is input to the RDCC model. These inputs can be considered as raw features, and the present invention uses some fully connected layers to extract the high-level features of the raw features and obtain the desired Q-value for each set of gate actions in the output of the model. And at each time step, selecting a next control combination according to the maximum Q value by using an epsilon-greedy strategy to control the training gate for the next time to obtain a next state parameter set. Using empirical replay reduces the empirical dependence of gate action and makes learning more stable by using an additional target Q network.
For step S260, it should be noted that the specific steps for obtaining the total reward are not limited, and may be set according to the actual application requirements. For example, in an alternative example, a bonus network (Q network) stores a first correspondence of a set of state parameters with a security bonus and a speed bonus, a second correspondence of a control combination with an efficiency bonus, the bonus network includes a bonus fully connected layer, and step S260 may include the sub-steps of:
processing according to the current state parameter set and the first corresponding relation to obtain the safety reward and the speed reward of the training gate; processing according to the current control combination and the second corresponding relation to obtain efficiency reward of the training gate; and inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into a reward full-connection layer to obtain the total reward of the training gate.
It is noted that effective channel control is achieved byEach time step is realized by sending appropriate switching-off action to the channel. The time step may be a fixed minute, hour or day, which may depend on the specific needs of the channel. The opening of the gate can be adjusted independently, but is actually adjusted and controlled globally according to the overall goal. To simplify this problem, the present invention assumes that all gates are regulated synchronously. a is ai,r,siRespectively representing the motion of the ith gate, the reward generated by the environment, and the state variable of the ith gate at time step t.
Figure BDA0002974695600000111
Has three values
Figure BDA0002974695600000112
Wherein the content of the first and second substances,
Figure BDA0002974695600000113
meaning that the ith gate increases (decreases) in scale at a rate of a, and 0 means no operation. In the effective control of the canal, multiple optimization control objectives of safety, rapidness, high efficiency and the like need to be realized.
Safety: safety means ensuring that the water level is stabilized around the design water level. The design level is a determined value related to the characteristics of the canal determined during the canal design and construction process. Different canals have different design levels. The invention adopts the design water level at the upstream of the gate to express the design water level hd. The actual water level should not be higher or lower than hudOtherwise the canal may risk damage and flooding. Therefore, in the reinforcement learning process, the following constraint condition (N) must be satisfiedgIs the number of gates in the cascade pool):
Figure BDA0002974695600000114
in each time step, if any one gate
Figure BDA0002974695600000116
Does not satisfy the above formulaIf the RDCC model fails, triggering one failure to stop the training of the RDCC model; otherwise each gate will generate a prize according to the prize function in table 1. Total reward for all current time steps
Figure BDA0002974695600000117
Security objectives expressed as follows:
Figure BDA0002974695600000115
according to the human experience of operating channels, the gates directly connected to the longer upstream and downstream canals play a more important role in the channel control process. Therefore, the present invention will hjAnd rjThe two different rewards (the reward of the important gate and the reward of the normal gate) are divided, and the reward of the important gate is set to be larger in absolute value, namely, the reward of the important gate is larger than the positive value of the reward of the normal gate and smaller than the negative value of the reward of the normal gate. H in Table 1jAnd
Figure BDA0002974695600000121
is a prize of different levels, hudIs the design water level of the upstream gate, and λ is the importance coefficient of the channel. Here the security reward criteria is divided into nsThe level may be set to an appropriate value according to the actual situation. Each prize in table 1 satisfies:
Figure BDA0002974695600000122
TABLE 1 safety goals reward Table
Figure BDA0002974695600000123
High speed: the meaning of high velocity is that the water should be transferred as quickly as possible from the first canal to the last canal. This goal can be achieved by encouraging large flows, and by state changesQuantity fiTo achieve a maximization of the equation fmin=minfi,i=1,2,...,Ng. In order to obtain a larger fminAccording to f shown in Table 2minValue generating current time step speed reward
Figure BDA0002974695600000124
These speed awards satisfy conditions
Figure BDA0002974695600000125
In the table, fjAnd rjRespectively representing the criteria and values of different awards, where the criterion of the speed award is divided into nqAnd (4) grading.
TABLE 2 reward table for speed goals
Figure BDA0002974695600000131
Efficiency: the meaning of efficiency is to perform as few gate operations as possible, that is to say the gate remains as immobile as possible. Will regulate and control the times
Figure BDA0002974695600000136
Is represented as follows:
Figure BDA0002974695600000132
average regulation times of all gates in a fixed time period
Figure BDA0002974695600000137
The smaller the more preferable, it can be represented by the following formula:
Figure BDA0002974695600000133
the efficiency goals may also be understood as the bonus table shown in Table 3 below, giving a smaller one
Figure BDA0002974695600000138
Assigning a larger secondary prize
Figure BDA00029746956000001312
And each prize in the table is satisfied
Figure BDA0002974695600000134
Wherein
Figure BDA00029746956000001310
Is based on ngEach gate is at ntAt the upper part of a time step
Figure BDA00029746956000001311
Is calculated from the value of (c).
TABLE 3 reward table for efficiency goals
Figure BDA0002974695600000135
The first corresponding relationship between the state parameter set and the safety reward and the speed reward can be obtained through reward tables in tables 1 and 2, and the second corresponding relationship between the control combination and the efficiency reward can be obtained through the reward table in table 3.
It should be noted that the specific steps of obtaining the total reward according to the security reward, the speed reward, the efficiency reward and the current advanced features are not limited, and can be set according to the actual application requirements. For example, in one alternative example, the following sub-steps may be included:
inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into a reward full-connection layer, and processing to obtain an output reward; and cutting the safety reward, the speed reward, the efficiency reward and the output reward to obtain the total reward of the training gate.
According to the embodiment of the application, the three subtask rewards of the corresponding safety, speed and efficiency targets can be generated according to the three reward tables (namely the first corresponding relation and the second corresponding relation), and the intelligent neural Network R-Network designed by the invention can be approximately fitted with the final reward R and is provided for Agent to carry out reinforcement learning. As shown in FIG. 8, the input to the R-Network includes the current advanced features and three sub-rewards of the penultimate level of the Q Network, with the fully connected level in the middle of the R-Network approximating the Agent's total reward. It is noted that the invention adds the input of three sub-rewards to the output of the fully-connected layer according to the idea of the residual unit, which is used here to make the learning and output of R-Network more stable at the beginning of learning. After the output of the full connection layer is added with the three directly connected sub-awards, the final output award is clipped to [ -1, +1] by using the tanh activation function, so that the Q value is prevented from being too large, and the good gradient state is ensured. To avoid oscillation, a separate target r-network is used to stabilize the parameters that are learned from the r-network.
For step S270, it should be noted that channel control is a sequential decision problem, and the RDCC model can be formulated as a Markov Decision Process (MDP) represented by 4-tuple: (S, A, R, P). The learning algorithm P to be described in the invention represents the action strategy of the Agent, and the maximum return can be obtained. Suppose Q(s)t) Is Q(s)t)=maxQ(stA), then the policy can be expressed as: pi(s)t)=argmaxQ(st,a)。π(st) Meaning in state stThe lower one will produce the maximum Q value, RtIs in a state stTake action atThe reward obtained, gamma is a decay factor of the reward value, and a recurrence formula Q(s) can be obtainedt,at)=Rt+γargmaxQ(st+1,at+1)。
Q(st,at) The function may be approximated by a deep Q network and the prize value at time t may be approximated by an R network. The parameters of the Q network and the R network are represented by thetaQAnd thetaRThe equation can be obtained:
Figure BDA0002974695600000151
as shown in fig. 7, the RDCC model includes a Q network and an R network, and a common loss function needs to be specified to optimize the neural network parameters of the model. In order to make the learning more stable, the Q target network and the R target network used are represented as Q and DQN respectively*And R*Network, the loss function can be expressed as:
Figure BDA0002974695600000152
where a is the action taken by the Agent in the s state, s ' is the state obtained by the environment after taking the action of a, and a ' is the action that can be taken in the s ' state. During the learning of the RDCC model, the invention stores the experience of the Agent like DQN, and randomly selects a mini-batch to update the model parameters in a back propagation mode through a minimum loss function.
For step S230, it should be noted that the specific step of controlling the gate to be processed is not limited, and may be set according to the actual application requirement. For example, in an alternative example where the control data includes at least one control combination and a desired value corresponding to each control combination, step S230 may include the sub-steps of:
selecting a target control combination from at least one control combination according to the expected value; and controlling the gate to be processed according to the target control combination.
Optionally, the specific step of selecting the target control combination is not limited, and may be set according to the actual application requirements. For example, in an alternative example, the expected values corresponding to the respective control combinations may be sorted, and the control combination with the highest expected value may be directly used as the target control combination.
For another example, in another alternative example, a greedy-epsilon strategy may be used to select a target desired value from desired values corresponding to respective control combinations, and the control combination of the target desired values may be used as the target control combination.
It should be noted that, in step 3, DNN is used in the hidden layer when the RDCC model is constructed, but other neural network models may also implement the technical solution of the present invention (such as DNN + window, LSTM, etc.). By the method, the problem of multi-target channel control can be solved, the DQN is adopted to make a control strategy, model-free control is realized by intensive learning for a large amount of time, and the problem that a mathematical model cannot be used in a complex environment is solved. The invention solves the problems of multi-target control such as safety, rapidness, high efficiency and the like by realizing the reward neural network and the multi-optimization target reward function, and can obtain satisfactory performance no matter in the aspects of water level difference, flow integral, gate operation times and the like.
With reference to fig. 9, an embodiment of the present application further provides a gate control device 900, where the functions implemented by the gate control device 900 correspond to the steps executed by the method described above. The shutter control device 900 may be understood as a processor of the electronic apparatus 100, or may be understood as a component that implements the functions of the present application under the control of the electronic apparatus 100, independently of the electronic apparatus 100 or the processor. The gate control device 900 may include a parameter obtaining module 910, a control data obtaining module 920, and a control module 930.
And a parameter obtaining module 910, configured to obtain a state parameter set of the gate to be processed. In this embodiment of the application, the parameter obtaining module 910 may be configured to perform step S210 shown in fig. 2, and for relevant contents of the parameter obtaining module 910, reference may be made to the foregoing description of step S210.
And a control data obtaining module 920, configured to input the state parameter set into a preset control model, so as to obtain control data of the gate to be processed. In the embodiment of the present application, the control data obtaining module 920 may be configured to perform step S220 shown in fig. 2, and reference may be made to the foregoing description of step S220 regarding the relevant content of the control data obtaining module 920.
And a control module 930, configured to control the gate to be processed according to the control data. In the embodiment of the present application, the control module 930 may be configured to execute step S230 shown in fig. 2, and reference may be made to the foregoing description of step S230 for relevant contents of the control module 930.
In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the gate control method.
The computer program product of the gate control method provided in the embodiment of the present application includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute steps of the gate control method in the foregoing method embodiment, which may be referred to specifically in the foregoing method embodiment, and details are not described here again.
In summary, the gate control method and apparatus, the electronic device, and the storage medium provided in the embodiments of the present application obtain the control data by inputting the state parameter set into the preset control model, and control the gate according to the control data, so that the gate control is directly performed through the control model, and the problem that in the prior art, an accurate physical model of the canal needs to be established, but due to complicated construction, mechanical operation, and unpredictable disturbance, the models are not always effective, and the gate control efficiency is low is solved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (8)

1. A gate control method, comprising:
acquiring a state parameter set of a gate to be processed;
inputting the state parameter set into a preset control model to obtain control data of the gate to be processed;
controlling the gate to be processed according to the control data;
the gate control method further comprises the step of training an initial model to obtain a preset control model, wherein the initial model comprises a depth network and a reward network, and the step comprises the following steps of:
acquiring training data, wherein the training data comprises a current state parameter set and a current control combination of a training gate;
inputting the current state parameter set into the depth network to obtain the current high-level feature and the next control combination of the training gate, and controlling the training gate according to the next control combination to obtain the next state parameter set;
inputting the current state parameter set, the current high-level characteristics and the current control combination into the reward network to obtain the total reward of the training gate;
training the initial model according to the total reward to obtain the preset control model;
the rewarding network stores a first corresponding relation of a state parameter set and a safety reward and a speed reward, and a second corresponding relation of a control combination and an efficiency reward, the rewarding network comprises a reward full-connection layer, and the step of inputting the current state parameter set, the current high-level feature and the current control combination into the rewarding network to obtain the total reward of the training gate comprises the following steps:
processing according to the current state parameter set and the first corresponding relation to obtain the safety reward and the speed reward of the training gate;
processing according to the current control combination and the second corresponding relation to obtain the efficiency reward of the training gate;
and inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into the reward full-connection layer to obtain the total reward of the training gate.
2. The gate control method according to claim 1, wherein the control data includes at least one control combination and a desired value corresponding to each of the control combinations, and the step of controlling the gate to be processed according to the control data includes:
selecting a target control combination from the at least one control combination according to the expected value;
and controlling the gate to be processed according to the target control combination.
3. The gate control method of claim 2, wherein said step of selecting a target control combination from said at least one control combination based on said desired value comprises:
and sequencing the expected values corresponding to the control combinations, and taking the control combination with the maximum expected value as a target control combination.
4. The gate control method of claim 1, wherein said step of inputting said security reward, speed reward, efficiency reward, and current premium feature into said reward global connection layer to obtain a total reward for said training gate comprises:
inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into the reward full connection layer, and processing to obtain an output reward;
and cutting the safety reward, the speed reward, the efficiency reward and the output reward to obtain the total reward of the training gate.
5. The gate control method of claim 1, wherein the deep network comprises six deep fully-connected layers, and the step of inputting the current set of state parameters into the deep network to obtain a current high-level feature and a next control combination of the training gate comprises:
inputting the current state parameter set into a first depth full-link layer to obtain a first result, inputting the first result into a second depth full-link layer to obtain a second result, inputting the second result into a third depth full-link layer to obtain a third result, inputting the third result into a fourth depth full-link layer to obtain a fourth result, and inputting the fourth result into a fifth depth full-link layer to obtain the current high-level characteristics of the training gate;
and inputting the current advanced features into a sixth depth full-link layer to obtain a next control combination.
6. A gate control device, comprising:
the parameter acquisition module is used for acquiring a state parameter set of the gate to be processed;
the control data acquisition module is used for inputting the state parameter set into a preset control model to obtain control data of the gate to be processed;
the control module is used for controlling the gate to be processed according to the control data;
the device also comprises a training module used for training the initial model to obtain a preset control model;
the initial model comprises a deep network and a reward network, and the training module is further configured to:
acquiring training data, wherein the training data comprises a current state parameter set and a current control combination of a training gate;
inputting the current state parameter set into the depth network to obtain the current high-level feature and the next control combination of the training gate, and controlling the training gate according to the next control combination to obtain the next state parameter set;
inputting the current state parameter set, the current high-level characteristics and the current control combination into the reward network to obtain the total reward of the training gate;
training the initial model according to the total reward to obtain the preset control model;
the reward network stores a first corresponding relation of a state parameter set and a safety reward and a speed reward, and a second corresponding relation of a control combination and an efficiency reward, the reward network comprises a reward full-connection layer, and the training module is further used for:
processing according to the current state parameter set and the first corresponding relation to obtain the safety reward and the speed reward of the training gate;
processing according to the current control combination and the second corresponding relation to obtain the efficiency reward of the training gate;
and inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into the reward full-connection layer to obtain the total reward of the training gate.
7. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the gate control method of any one of claims 1 to 5 when executing the program.
8. A storage medium, characterized in that the storage medium includes a computer program, and the computer program controls an electronic device where the storage medium is located to execute the gate control method according to any one of claims 1 to 5 when executed.
CN202110272117.5A 2021-03-12 2021-03-12 Gate control method and device, electronic device and storage medium Active CN113050565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110272117.5A CN113050565B (en) 2021-03-12 2021-03-12 Gate control method and device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110272117.5A CN113050565B (en) 2021-03-12 2021-03-12 Gate control method and device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN113050565A CN113050565A (en) 2021-06-29
CN113050565B true CN113050565B (en) 2022-05-20

Family

ID=76512420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110272117.5A Active CN113050565B (en) 2021-03-12 2021-03-12 Gate control method and device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113050565B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6311949B1 (en) * 1997-12-11 2001-11-06 Akio Iida Apparatus for operating water gate
CN104880985A (en) * 2015-05-28 2015-09-02 广州番禺职业技术学院 Internet of things sluice remote detection control system
JP2016199930A (en) * 2015-04-13 2016-12-01 株式会社日立製作所 Gate door control system
CN108459570A (en) * 2018-03-14 2018-08-28 河海大学常州校区 Based on the irrigation water distribution intelligence control system and method for generating the confrontation network architecture
CN110262218A (en) * 2019-05-20 2019-09-20 北京航空航天大学 Control method, device, equipment and the storage medium of machine fish
JP2020092490A (en) * 2018-12-03 2020-06-11 富士通株式会社 Reinforcement learning program, reinforcement learning method, and reinforcement learning device

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9679258B2 (en) * 2013-10-08 2017-06-13 Google Inc. Methods and apparatus for reinforcement learning
US11774944B2 (en) * 2016-05-09 2023-10-03 Strong Force Iot Portfolio 2016, Llc Methods and systems for the industrial internet of things
US10619760B2 (en) * 2016-10-24 2020-04-14 Fisher Controls International Llc Time-series analytics for control valve health assessment
WO2018211139A1 (en) * 2017-05-19 2018-11-22 Deepmind Technologies Limited Training action selection neural networks using a differentiable credit function
CN108303896A (en) * 2018-02-28 2018-07-20 武汉理工大学 Ditch shutter intelligent control method and device
JP2020119008A (en) * 2019-01-18 2020-08-06 富士通株式会社 Reinforcement learning method, reinforcement learning program, and reinforcement learning apparatus
CN110053053B (en) * 2019-06-14 2022-04-12 西南科技大学 Self-adaptive method of mechanical arm screwing valve based on deep reinforcement learning
CN111191399B (en) * 2019-12-24 2021-11-05 北京航空航天大学 Control method, device and equipment of robot fish and storage medium
CN111783369B (en) * 2020-07-22 2024-01-26 中国水利水电科学研究院 Short-term multi-objective optimal scheduling method for multi-gate-group open channel water diversion project
CN112241123B (en) * 2020-10-23 2022-05-03 南京航空航天大学 Aeroengine acceleration control method based on deep reinforcement learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6311949B1 (en) * 1997-12-11 2001-11-06 Akio Iida Apparatus for operating water gate
JP2016199930A (en) * 2015-04-13 2016-12-01 株式会社日立製作所 Gate door control system
CN104880985A (en) * 2015-05-28 2015-09-02 广州番禺职业技术学院 Internet of things sluice remote detection control system
CN108459570A (en) * 2018-03-14 2018-08-28 河海大学常州校区 Based on the irrigation water distribution intelligence control system and method for generating the confrontation network architecture
JP2020092490A (en) * 2018-12-03 2020-06-11 富士通株式会社 Reinforcement learning program, reinforcement learning method, and reinforcement learning device
CN110262218A (en) * 2019-05-20 2019-09-20 北京航空航天大学 Control method, device, equipment and the storage medium of machine fish

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
A Survey of Routing Protocols for Underwater Wireless Sensor Networks;Junhai Luo 等;《IEEE Communications Survey & Tutorials》;20210101;全文 *
关于车务站段纠错互控保障铁路运输安全的研究;付炎博;《辽宁工业大学学报(社会科学版)》;20201015(第05期);全文 *
基于深度强化学习的单目视觉自动驾驶决策系统;杨明珠;《自动化博览》;20200515(第05期);全文 *
深度强化学习中稀疏奖励问题研究综述;杨惟轶等;《计算机科学》;20191122(第03期);全文 *
输水渠道中闸门调节速度与水面线变化研究;丁志良 等;《南水北调与水利科技》;20051230;全文 *

Also Published As

Publication number Publication date
CN113050565A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
Amjady et al. Daily hydrothermal generation scheduling by a new modified adaptive particle swarm optimization technique
US8260441B2 (en) Method for computer-supported control and/or regulation of a technical system
Bernardelli et al. Real-time model predictive control of a wastewater treatment plant based on machine learning
Tyukin et al. Feasibility of random basis function approximators for modeling and control
CN107590567A (en) A kind of Recognition with Recurrent Neural Network short-term load forecasting method based on comentropy cluster and notice mechanism
Espitia Observer-based event-triggered boundary control of a linear 2× 2 hyperbolic systems
Behandish et al. Concurrent pump scheduling and storage level optimization using meta-models and evolutionary algorithms
CN104900063B (en) A kind of short distance running time Forecasting Methodology
CN106200381B (en) A method of according to the operation of processing water control by stages water factory
CN107038878A (en) Signal phase design method based on integer programming model
CN113050565B (en) Gate control method and device, electronic device and storage medium
CN107643684B (en) Valve flow function optimization method and device
CN102566426A (en) Fractional order parameter adjustment controller algorithm of PI<alpha>D<beta> controller
Le Ngo Optimising reservoir operation: A case study of the Hoa Binh reservoir, Vietnam
CN104112035A (en) Effectiveness and fuzzy theory based collaborative decision-making method for product design
CN116880191A (en) Intelligent control method of process industrial production system based on time sequence prediction
Bergman et al. Adjusting parameters of genetic algorithms by fuzzy control rules
CN108459570A (en) Based on the irrigation water distribution intelligence control system and method for generating the confrontation network architecture
Blanco et al. Flooding prevention of the demer river using model predictive control
Hoang et al. Dissolved oxygen control of the activated sludge wastewater treatment process using Hedge Algebraic control
CN113341768B (en) Gate regulating method, device, equipment and medium
CN111062485A (en) Novel AUTOML frame
Salama et al. Short term optimal generation scheduling of fixed head hydrothermal system using genetic algorithm and constriction factor based particle swarm optimization technique
Rosenbloom et al. An architectural integration of Temporal Motivation Theory for decision making
CN116436013B (en) Power distribution system power distribution method, system, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant