CN113050565B

CN113050565B - Gate control method and device, electronic device and storage medium

Info

Publication number: CN113050565B
Application number: CN202110272117.5A
Authority: CN
Inventors: 马曼曼; 豆渊博; 李青锋; 段春青; 马丹丹; 张卫红; 高帆; 乔雨; 杜东峰
Original assignee: Hangzhou Innovation Research Institute of Beihang University
Current assignee: Hangzhou Innovation Research Institute of Beihang University
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2022-05-20
Anticipated expiration: 2041-03-12
Also published as: CN113050565A

Abstract

The embodiment of the application provides a gate control method and device, electronic equipment and a storage medium, and relates to the technical field of gate control. The gate control method comprises the following steps: firstly, acquiring a state parameter set of a gate to be processed; secondly, inputting the state parameter set into a preset control model to obtain control data of the gate to be processed; and then, controlling the gate to be processed according to the control data. By the method, the gate control can be directly carried out through the control model, and the problem that in the prior art, an accurate physical model of the canal needs to be established, but the models are not always effective due to complex construction, mechanical operation and unpredictable disturbance, so that the gate control efficiency is low is solved.

Description

Gate control method and device, electronic device and storage medium

Technical Field

The present disclosure relates to the field of gate technologies, and in particular, to a gate control method and apparatus, an electronic device, and a storage medium.

Background

The inventor has found that the conventional gate control strategy in the prior art requires the establishment of an accurate physical model of the canal, but due to complicated construction, mechanical operation and unpredictable disturbances, these models are not always effective and thus suffer from inefficient gate control.

Disclosure of Invention

In view of the above, the present application aims to provide a gate control method and apparatus, an electronic device, and a storage medium, so as to solve the problems in the prior art.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

in a first aspect, the present invention provides a gate control method, including:

acquiring a state parameter set of a gate to be processed;

inputting the state parameter set into a preset control model to obtain control data of the gate to be processed;

and controlling the gate to be processed according to the control data.

In an optional embodiment, the control data includes at least one control combination and an expected value corresponding to each control combination, and the step of controlling the gate to be processed according to the control data includes:

selecting a target control combination from the at least one control combination according to the expected value;

and controlling the gate to be processed according to the target control combination.

In an alternative embodiment, the step of selecting a target control combination from the at least one control combination according to the desired value includes:

and sequencing the expected values corresponding to the control combinations, and taking the control combination with the maximum expected value as a target control combination.

In an optional embodiment, the gate control method further includes a step of training an initial model to obtain a preset control model, where the initial model includes a depth network and a reward network, and the step includes:

acquiring training data, wherein the training data comprises a current state parameter set and a current control combination of a training gate;

inputting the current state parameter set into the depth network to obtain the current high-level feature and the next control combination of the training gate, and controlling the training gate according to the next control combination to obtain the next state parameter set;

inputting the current state parameter set, the current high-level characteristics and the current control combination into the reward network to obtain the total reward of the training gate;

and training the initial model according to the total reward to obtain the preset control model.

In an alternative embodiment, the reward network stores a first corresponding relationship between a state parameter set and a security reward and a speed reward, and a second corresponding relationship between a control combination and an efficiency reward, the reward network includes a reward full connection layer, and the step of inputting the current state parameter set, the current high-level feature and the current control combination into the reward network to obtain the total reward of the training gate includes:

processing according to the current state parameter set and the first corresponding relation to obtain the safety reward and the speed reward of the training gate;

processing according to the current control combination and the second corresponding relation to obtain the efficiency reward of the training gate;

and inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into the reward full-connection layer to obtain the total reward of the training gate.

In an alternative embodiment, the step of inputting the security reward, the speed reward, the efficiency reward and the current high-level feature into the reward full link layer to obtain the total reward of the training gate comprises:

inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into the reward full-connection layer, and processing to obtain an output reward;

and cutting the safety reward, the speed reward, the efficiency reward and the output reward to obtain the total reward of the training gate.

In an optional embodiment, the deep network includes six deep fully-connected layers, and the step of inputting the current state parameter set into the deep network to obtain a current high-level feature and a next control combination of the training gate includes:

inputting the current state parameter set into a first depth full-link layer to obtain a first result, inputting the first result into a second depth full-link layer to obtain a second result, inputting the second result into a third depth full-link layer to obtain a third result, inputting the third result into a fourth depth full-link layer to obtain a fourth result, and inputting the fourth result into a fifth depth full-link layer to obtain the current high-level characteristics of the training gate;

and inputting the current advanced features into a sixth depth full-link layer to obtain a next control combination.

In a second aspect, the present invention provides a shutter control device including:

the parameter acquisition module is used for acquiring a state parameter set of the gate to be processed;

the control data acquisition module is used for inputting the state parameter set into a preset control model to obtain control data of the gate to be processed;

and the control module is used for controlling the gate to be processed according to the control data.

In a third aspect, the present invention provides an electronic device comprising: the gate control system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the gate control method of any one of the preceding embodiments.

In a fourth aspect, the present invention provides a storage medium, where the storage medium includes a computer program, and the computer program controls, when running, an electronic device where the storage medium is located to execute the gate control method according to any one of the foregoing embodiments.

According to the gate control method and device, the electronic equipment and the storage medium, the state parameter set is input into the preset control model to obtain the control data, the gate is controlled according to the control data, the gate control is directly carried out through the control model, and the problem that in the prior art, an accurate physical model of the canal needs to be established, but due to complex construction, mechanical operation and unpredictable disturbance, the models are not always effective, and the gate control efficiency is low is solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 shows a block diagram of an electronic device according to an embodiment of the present application.

Fig. 2 shows a schematic flow chart of a gate control method according to an embodiment of the present application.

Fig. 3 shows an application scenario diagram of a gate provided in an embodiment of the present application.

Fig. 4 shows a schematic structural diagram of a sensor network provided in an embodiment of the present application.

Fig. 5 shows another schematic flow chart of a gate control method according to an embodiment of the present application.

Fig. 6 shows an application scenario diagram of a preset control model provided in an embodiment of the present application.

Fig. 7 shows a schematic structural diagram of a preset control model provided in an embodiment of the present application.

Fig. 8 illustrates another structural schematic diagram of the preset control model provided in the embodiment of the present application.

Fig. 9 is a block diagram illustrating a structure of a gate control device according to an embodiment of the present application.

Icon: 100-an electronic device; 110 — a first memory; 120-a first processor; 130-a communication module; 900-gate control device; 910-parameter obtaining module; 920-a control data acquisition module; 930-control module.

Detailed Description

Fresh water on earth is very limited and very unevenly distributed. In order to increase the availability of water resources, efforts have been made, one of which is to transport water resources from rich areas to water-deficient areas. There are many large-scale diversion projects, such as the U.S. california hydraulic engineering and the north-south water diversion project (SNMTP) in china. These diversion works generally consist of a plurality of canals. A gate is built between adjacent canal ponds to adjust water diversion. In order to achieve automatic control of the trench ponds, a large number of sensors are arranged along the trench ponds to form the internet of things (IoT), and an automatic control strategy is integrated into the gate actuators.

A great deal of research has been carried out on the effective management of canals and great progress has been made. The existing methods may be largely classified into a model-based channel gate control method and a model-free channel gate control method. The predictive control based on the model is a method for effectively controlling channels with complex dynamic characteristics by researchers, and can be divided into a centralized control mode and a distributed control mode according to the number of used controllers. The model-free channel gate control method comprises a genetic algorithm based on the concept of 'superior and inferior' and natural evolution and a method based on reinforcement learning. The reinforcement learning has two important characteristics of trial and error search and delayed response, and is suitable for channel control under the condition that a physical model cannot be established.

Common gate control strategies are based on accurate canal physics models, but these models are not always effective due to complex construction, mechanical operation and unpredictable disturbances, and the computational cost of each step of decision is a non-negligible issue.

Aiming at the problems, model-free control strategies such as a genetic algorithm, an enhanced learning algorithm and the like can be adopted to realize effective control of the water channel, but most of the methods focus on channel control with a single target, and channels with a plurality of optimization targets are difficult to effectively control.

In order to improve at least one of the above technical problems of the present application, embodiments of the present application provide a gate control method and apparatus, an electronic device, and a storage medium, and the following describes technical solutions of the present application through possible implementations.

The defects existing in the above solutions are the results obtained after the inventor has practiced and studied carefully, so the discovery process of the above problems and the solutions proposed by the embodiments of the present application in the following description to the above problems should be the contributions made by the inventor in the invention process.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It is to be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

It should be noted that the features of the embodiments of the present application may be combined with each other without conflict.

Referring to fig. 1, a block diagram of an electronic device 100 according to an embodiment of the present disclosure is shown, where the electronic device 100 in this embodiment may be a server, a processing device, a processing platform, and the like, which are capable of performing data interaction and processing. The electronic device 100 includes a first memory 110, a first processor 120, and a communication module 130. The elements of the first memory 110, the first processor 120 and the communication module 130 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The first memory 110 is used for storing programs or data. The first Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The first processor 120 is used to read/write data or programs stored in the first memory 110 and perform corresponding functions. The communication module 130 is used for establishing a communication connection between the electronic device 100 and another communication terminal through a network, and for transceiving data through the network.

It should be understood that the configuration shown in fig. 1 is merely a schematic diagram of the configuration of the electronic device 100, and that the electronic device 100 may include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Referring to fig. 2, a flowchart of a gate control method according to an embodiment of the present disclosure may be executed by the electronic device 100 in fig. 1, for example, by the first processor 120 in the electronic device 100. It should be understood that, in other embodiments, the order of some steps in the gate control method of this embodiment may be interchanged according to actual needs, or some steps may be omitted or deleted. The flow of the shutter control method shown in fig. 2 is described in detail below.

And step S210, acquiring a state parameter set of the gate to be processed.

And S220, inputting the state parameter set into a preset control model to obtain control data of the gate to be processed.

And step S230, controlling the gate to be processed according to the control data.

According to the method, the state parameter set is input into the preset control model to obtain the control data, the gate is controlled according to the control data, the gate control is directly carried out through the control model, and the problem that in the prior art, an accurate physical model of the canal needs to be established, but the models are not always effective due to complex construction, mechanical operation and unpredictable disturbance, and the gate control efficiency is low is solved.

For step S210, it should be noted that the specific manner of acquiring the state parameter set is not limited, and may be set according to the actual application requirement. For example, in an alternative example, the state parameter set of the gate to be processed may be obtained by manual input. For another example, in another alternative example, the set of state parameters may be acquired over a communicatively connected sensor network.

Referring to fig. 3, a channel is typically made up of a plurality of interconnected trench ponds, with adjacent trench ponds connected by gates to control the flow of water from upstream to downstream. Multiple gates operating in coordination to control the volume of water required to pass through the cascade of tanks andspeed. In channel control, the status parameters to be acquired may include oⁱ(opening size of the ith door),

(water level before the ith gate),

(water level after gate of ith gate), fⁱ(flow rate of the ith gate),

(jth tap of the channel downstream of the ith gate).

The status information of the channel can be collected by arranging a sensor network around the channel, and the structure is pyramid-shaped as shown in fig. 4. At the lowest level of the network, there are many sensors deployed around the trench ponds and gates, and the information collected by the sensors is sent to the second level gateways (cellular, Zig-zag, Wi-Fi, and wired) in different ways. The interface of the sensor CAN be CAN, RS232, RS485 or wireless. At the second level, the intelligent gateway deploys a small amount of computation for data pre-processing (e.g., compression, translation, and encryption). The preprocessed data is sent from the gateway to the data center via a telecommunications network or an internal network. And finally, converting the data stored in the data center into a state parameter set, and inputting the state parameter set into a preset control model.

For step S220, it should be noted that, in order to solve the problem of multiple optimization objectives, such as unavailability of a mathematical model and safety, rapidness, and high efficiency of a canal basin gate, the invention provides a model-free multi-objective optimization method to realize effective control of a channel. The method comprises the steps of formulating a control strategy based on a Deep Q Network (Deep Qnetwork, DQN), realizing model-free control through reinforcement learning of a large number of events, establishing a reward neural Network (R-Network), approximating appropriate final reward through learning from rewards of a plurality of optimization targets, and integrating the R-Network into the DQN to construct a Deep canal control (RDCC) model of channel multi-optimization target model-free control.

Before step S220, the gate control method provided in this embodiment of the present application may further include a step of training an initial model to obtain a preset control model, where the initial model includes a depth network and a reward network, and with reference to fig. 5, the step may include:

in step S240, training data is acquired.

Wherein the training data comprises a current state parameter set and a current control combination of the training gate.

And step S250, inputting the current state parameter set into the depth network to obtain the current high-level feature and the next control combination of the training gate, and controlling the training gate according to the next control combination to obtain the next state parameter set.

And step S260, inputting the current state parameter set, the current high-level characteristics and the current control combination into a reward network to obtain the total reward of the training gate.

And step S270, training the initial model according to the total reward to obtain a preset control model.

In conjunction with fig. 6, from the perspective of reinforcement learning, the cascade channel may be regarded as an environment (Env for short), and the proposed control model RDCC may be regarded as an Agent. During learning, the Agent will interact with Env multiple times, each potentially involving a different number of time steps. At each time step t, the environment generates a state, which may be represented as

SⁱIs the status of the ith gate and can be expressed as

At the same time, a prize is generated that includes a plurality of sub-prizes: secure rewards

Speed reward

And efficiency rewards

The research aiming at multi-target reinforcement learning is more, and the method can be divided into a single strategy method and a multi-strategy method. The invention designs a reward neural Network (R-Network) to approximate total reward, and adopts a single strategy method to carry out multi-target reinforcement learning, which is different from the traditional single strategy method for manually obtaining reward. The total reward provided for an Agent at each time step may be expressed as

R is an approximate function of the R network,

and

respectively representing three sub-awards, F (S), obtained by querying corresponding tables_t) Is a high level property of the Env state at each time step.

Each set of actions can be represented as

N_gIs the gate number of the cascade channel. The output of the Q network is the Q value of the set of possible actions for all gates. The Q value is the expected reward that an Agent can obtain from the current time step to the end of the round after taking a set of actions.

For step S250, it should be noted that the specific steps for obtaining the current high-level feature and the next control combination are not limited, and may be set according to the actual application requirements. For example, in an alternative example, where the deep network (Q network) includes six deep fully connected layers, step S250 may include the following sub-steps:

inputting the current state parameter set into a first depth full-link layer to obtain a first result, inputting the first result into a second depth full-link layer to obtain a second result, inputting the second result into a third depth full-link layer to obtain a third result, inputting the third result into a fourth depth full-link layer to obtain a fourth result, and inputting the fourth result into a fifth depth full-link layer to obtain the current high-level characteristics of the training gate; and inputting the current advanced features into the sixth depth full-link layer to obtain the next control combination.

In detail, the method enhances the DQN, and adds an intelligent neural network to perform fitting approximation of a reward value on the basis of an original depth Q network to construct an RDCC model. As shown in FIG. 7, at each time step, the set of state parameters for all the training gates is input to the RDCC model. These inputs can be considered as raw features, and the present invention uses some fully connected layers to extract the high-level features of the raw features and obtain the desired Q-value for each set of gate actions in the output of the model. And at each time step, selecting a next control combination according to the maximum Q value by using an epsilon-greedy strategy to control the training gate for the next time to obtain a next state parameter set. Using empirical replay reduces the empirical dependence of gate action and makes learning more stable by using an additional target Q network.

For step S260, it should be noted that the specific steps for obtaining the total reward are not limited, and may be set according to the actual application requirements. For example, in an alternative example, a bonus network (Q network) stores a first correspondence of a set of state parameters with a security bonus and a speed bonus, a second correspondence of a control combination with an efficiency bonus, the bonus network includes a bonus fully connected layer, and step S260 may include the sub-steps of:

processing according to the current state parameter set and the first corresponding relation to obtain the safety reward and the speed reward of the training gate; processing according to the current control combination and the second corresponding relation to obtain efficiency reward of the training gate; and inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into a reward full-connection layer to obtain the total reward of the training gate.

It is noted that effective channel control is achieved byEach time step is realized by sending appropriate switching-off action to the channel. The time step may be a fixed minute, hour or day, which may depend on the specific needs of the channel. The opening of the gate can be adjusted independently, but is actually adjusted and controlled globally according to the overall goal. To simplify this problem, the present invention assumes that all gates are regulated synchronously. a is aⁱ，r，sⁱRespectively representing the motion of the ith gate, the reward generated by the environment, and the state variable of the ith gate at time step t.

Has three values

Wherein the content of the first and second substances,

meaning that the ith gate increases (decreases) in scale at a rate of a, and 0 means no operation. In the effective control of the canal, multiple optimization control objectives of safety, rapidness, high efficiency and the like need to be realized.

Safety: safety means ensuring that the water level is stabilized around the design water level. The design level is a determined value related to the characteristics of the canal determined during the canal design and construction process. Different canals have different design levels. The invention adopts the design water level at the upstream of the gate to express the design water level h_d. The actual water level should not be higher or lower than h_udOtherwise the canal may risk damage and flooding. Therefore, in the reinforcement learning process, the following constraint condition (N) must be satisfied_gIs the number of gates in the cascade pool):

in each time step, if any one gate

Does not satisfy the above formulaIf the RDCC model fails, triggering one failure to stop the training of the RDCC model; otherwise each gate will generate a prize according to the prize function in table 1. Total reward for all current time steps

Security objectives expressed as follows:

according to the human experience of operating channels, the gates directly connected to the longer upstream and downstream canals play a more important role in the channel control process. Therefore, the present invention will h_jAnd r_jThe two different rewards (the reward of the important gate and the reward of the normal gate) are divided, and the reward of the important gate is set to be larger in absolute value, namely, the reward of the important gate is larger than the positive value of the reward of the normal gate and smaller than the negative value of the reward of the normal gate. H in Table 1_jAnd

is a prize of different levels, h_udIs the design water level of the upstream gate, and λ is the importance coefficient of the channel. Here the security reward criteria is divided into n_sThe level may be set to an appropriate value according to the actual situation. Each prize in table 1 satisfies:

TABLE 1 safety goals reward Table

High speed: the meaning of high velocity is that the water should be transferred as quickly as possible from the first canal to the last canal. This goal can be achieved by encouraging large flows, and by state changesQuantity fⁱTo achieve a maximization of the equation f_min＝minf_i,i＝1，2，...，N_g. In order to obtain a larger f_minAccording to f shown in Table 2_minValue generating current time step speed reward

These speed awards satisfy conditions

In the table, f_jAnd r_jRespectively representing the criteria and values of different awards, where the criterion of the speed award is divided into n_qAnd (4) grading.

TABLE 2 reward table for speed goals

Efficiency: the meaning of efficiency is to perform as few gate operations as possible, that is to say the gate remains as immobile as possible. Will regulate and control the times

Is represented as follows:

average regulation times of all gates in a fixed time period

The smaller the more preferable, it can be represented by the following formula:

the efficiency goals may also be understood as the bonus table shown in Table 3 below, giving a smaller one

Assigning a larger secondary prize

And each prize in the table is satisfied

Wherein

Is based on n_gEach gate is at n_tAt the upper part of a time step

Is calculated from the value of (c).

TABLE 3 reward table for efficiency goals

The first corresponding relationship between the state parameter set and the safety reward and the speed reward can be obtained through reward tables in tables 1 and 2, and the second corresponding relationship between the control combination and the efficiency reward can be obtained through the reward table in table 3.

It should be noted that the specific steps of obtaining the total reward according to the security reward, the speed reward, the efficiency reward and the current advanced features are not limited, and can be set according to the actual application requirements. For example, in one alternative example, the following sub-steps may be included:

inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into a reward full-connection layer, and processing to obtain an output reward; and cutting the safety reward, the speed reward, the efficiency reward and the output reward to obtain the total reward of the training gate.

According to the embodiment of the application, the three subtask rewards of the corresponding safety, speed and efficiency targets can be generated according to the three reward tables (namely the first corresponding relation and the second corresponding relation), and the intelligent neural Network R-Network designed by the invention can be approximately fitted with the final reward R and is provided for Agent to carry out reinforcement learning. As shown in FIG. 8, the input to the R-Network includes the current advanced features and three sub-rewards of the penultimate level of the Q Network, with the fully connected level in the middle of the R-Network approximating the Agent's total reward. It is noted that the invention adds the input of three sub-rewards to the output of the fully-connected layer according to the idea of the residual unit, which is used here to make the learning and output of R-Network more stable at the beginning of learning. After the output of the full connection layer is added with the three directly connected sub-awards, the final output award is clipped to [ -1, +1] by using the tanh activation function, so that the Q value is prevented from being too large, and the good gradient state is ensured. To avoid oscillation, a separate target r-network is used to stabilize the parameters that are learned from the r-network.

For step S270, it should be noted that channel control is a sequential decision problem, and the RDCC model can be formulated as a Markov Decision Process (MDP) represented by 4-tuple: (S, A, R, P). The learning algorithm P to be described in the invention represents the action strategy of the Agent, and the maximum return can be obtained. Suppose Q(s)_t) Is Q(s)_t)＝maxQ(s_tA), then the policy can be expressed as: pi(s)_t)＝argmaxQ(s_t，a)。π(s_t) Meaning in state s_tThe lower one will produce the maximum Q value, R_tIs in a state s_tTake action a_tThe reward obtained, gamma is a decay factor of the reward value, and a recurrence formula Q(s) can be obtained_t，a_t)＝R_t+γargmaxQ(s_t+1，a_t+1)。

Q(s_t，a_t) The function may be approximated by a deep Q network and the prize value at time t may be approximated by an R network. The parameters of the Q network and the R network are represented by theta_QAnd theta_RThe equation can be obtained:

as shown in fig. 7, the RDCC model includes a Q network and an R network, and a common loss function needs to be specified to optimize the neural network parameters of the model. In order to make the learning more stable, the Q target network and the R target network used are represented as Q and DQN respectively^*And R^*Network, the loss function can be expressed as:

where a is the action taken by the Agent in the s state, s ' is the state obtained by the environment after taking the action of a, and a ' is the action that can be taken in the s ' state. During the learning of the RDCC model, the invention stores the experience of the Agent like DQN, and randomly selects a mini-batch to update the model parameters in a back propagation mode through a minimum loss function.

For step S230, it should be noted that the specific step of controlling the gate to be processed is not limited, and may be set according to the actual application requirement. For example, in an alternative example where the control data includes at least one control combination and a desired value corresponding to each control combination, step S230 may include the sub-steps of:

selecting a target control combination from at least one control combination according to the expected value; and controlling the gate to be processed according to the target control combination.

Optionally, the specific step of selecting the target control combination is not limited, and may be set according to the actual application requirements. For example, in an alternative example, the expected values corresponding to the respective control combinations may be sorted, and the control combination with the highest expected value may be directly used as the target control combination.

For another example, in another alternative example, a greedy-epsilon strategy may be used to select a target desired value from desired values corresponding to respective control combinations, and the control combination of the target desired values may be used as the target control combination.

It should be noted that, in step 3, DNN is used in the hidden layer when the RDCC model is constructed, but other neural network models may also implement the technical solution of the present invention (such as DNN + window, LSTM, etc.). By the method, the problem of multi-target channel control can be solved, the DQN is adopted to make a control strategy, model-free control is realized by intensive learning for a large amount of time, and the problem that a mathematical model cannot be used in a complex environment is solved. The invention solves the problems of multi-target control such as safety, rapidness, high efficiency and the like by realizing the reward neural network and the multi-optimization target reward function, and can obtain satisfactory performance no matter in the aspects of water level difference, flow integral, gate operation times and the like.

With reference to fig. 9, an embodiment of the present application further provides a gate control device 900, where the functions implemented by the gate control device 900 correspond to the steps executed by the method described above. The shutter control device 900 may be understood as a processor of the electronic apparatus 100, or may be understood as a component that implements the functions of the present application under the control of the electronic apparatus 100, independently of the electronic apparatus 100 or the processor. The gate control device 900 may include a parameter obtaining module 910, a control data obtaining module 920, and a control module 930.

And a parameter obtaining module 910, configured to obtain a state parameter set of the gate to be processed. In this embodiment of the application, the parameter obtaining module 910 may be configured to perform step S210 shown in fig. 2, and for relevant contents of the parameter obtaining module 910, reference may be made to the foregoing description of step S210.

And a control data obtaining module 920, configured to input the state parameter set into a preset control model, so as to obtain control data of the gate to be processed. In the embodiment of the present application, the control data obtaining module 920 may be configured to perform step S220 shown in fig. 2, and reference may be made to the foregoing description of step S220 regarding the relevant content of the control data obtaining module 920.

And a control module 930, configured to control the gate to be processed according to the control data. In the embodiment of the present application, the control module 930 may be configured to execute step S230 shown in fig. 2, and reference may be made to the foregoing description of step S230 for relevant contents of the control module 930.

In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the gate control method.

The computer program product of the gate control method provided in the embodiment of the present application includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute steps of the gate control method in the foregoing method embodiment, which may be referred to specifically in the foregoing method embodiment, and details are not described here again.

In summary, the gate control method and apparatus, the electronic device, and the storage medium provided in the embodiments of the present application obtain the control data by inputting the state parameter set into the preset control model, and control the gate according to the control data, so that the gate control is directly performed through the control model, and the problem that in the prior art, an accurate physical model of the canal needs to be established, but due to complicated construction, mechanical operation, and unpredictable disturbance, the models are not always effective, and the gate control efficiency is low is solved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A gate control method, comprising:

acquiring a state parameter set of a gate to be processed;

controlling the gate to be processed according to the control data;

the gate control method further comprises the step of training an initial model to obtain a preset control model, wherein the initial model comprises a depth network and a reward network, and the step comprises the following steps of:

training the initial model according to the total reward to obtain the preset control model;

the rewarding network stores a first corresponding relation of a state parameter set and a safety reward and a speed reward, and a second corresponding relation of a control combination and an efficiency reward, the rewarding network comprises a reward full-connection layer, and the step of inputting the current state parameter set, the current high-level feature and the current control combination into the rewarding network to obtain the total reward of the training gate comprises the following steps:

2. The gate control method according to claim 1, wherein the control data includes at least one control combination and a desired value corresponding to each of the control combinations, and the step of controlling the gate to be processed according to the control data includes:

3. The gate control method of claim 2, wherein said step of selecting a target control combination from said at least one control combination based on said desired value comprises:

4. The gate control method of claim 1, wherein said step of inputting said security reward, speed reward, efficiency reward, and current premium feature into said reward global connection layer to obtain a total reward for said training gate comprises:

inputting the safety reward, the speed reward, the efficiency reward and the current high-grade characteristics into the reward full connection layer, and processing to obtain an output reward;

5. The gate control method of claim 1, wherein the deep network comprises six deep fully-connected layers, and the step of inputting the current set of state parameters into the deep network to obtain a current high-level feature and a next control combination of the training gate comprises:

6. A gate control device, comprising:

the control module is used for controlling the gate to be processed according to the control data;

the device also comprises a training module used for training the initial model to obtain a preset control model;

the initial model comprises a deep network and a reward network, and the training module is further configured to:

the reward network stores a first corresponding relation of a state parameter set and a safety reward and a speed reward, and a second corresponding relation of a control combination and an efficiency reward, the reward network comprises a reward full-connection layer, and the training module is further used for:

7. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the gate control method of any one of claims 1 to 5 when executing the program.

8. A storage medium, characterized in that the storage medium includes a computer program, and the computer program controls an electronic device where the storage medium is located to execute the gate control method according to any one of claims 1 to 5 when executed.