CN113723757B - Decision generation model training method, decision generation method and device - Google Patents

Decision generation model training method, decision generation method and device Download PDF

Info

Publication number
CN113723757B
CN113723757B CN202110872525.4A CN202110872525A CN113723757B CN 113723757 B CN113723757 B CN 113723757B CN 202110872525 A CN202110872525 A CN 202110872525A CN 113723757 B CN113723757 B CN 113723757B
Authority
CN
China
Prior art keywords
agricultural production
data
target
acquiring
decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110872525.4A
Other languages
Chinese (zh)
Other versions
CN113723757A (en
Inventor
李茹杨
赵雅倩
李仁刚
张亚强
魏辉
李雪雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110872525.4A priority Critical patent/CN113723757B/en
Publication of CN113723757A publication Critical patent/CN113723757A/en
Application granted granted Critical
Publication of CN113723757B publication Critical patent/CN113723757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Mathematical Physics (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Computing Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Biophysics (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Biology (AREA)
  • Primary Health Care (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)

Abstract

The application discloses a decision generation model training method, a decision generation method and a decision generation device, agricultural production learning data at the current moment are obtained and placed in a playback buffer zone, and the number of model training steps is recorded to be increased by one. And acquiring the preset number of historical data from the playback buffer zone, and calculating the corresponding target value according to the preset number of historical data. And obtaining a loss function corresponding to the evaluation network according to the target value corresponding to the historical data of the preset number group. Based on the loss function, the evaluation network parameters are updated. And acquiring an objective function corresponding to the strategy network according to the historical data of the preset number group. And updating the strategy network parameters according to the objective function. And acquiring updated target evaluation network parameters and target strategy network parameters from the updated evaluation network parameters and the updated strategy network parameters. And stopping training when the preset condition is reached. The decision generation model can adapt to the intelligent agricultural production environment with real-time variation, and can generate accurate agricultural production decisions.

Description

Decision generation model training method, decision generation method and device
Technical Field
The application relates to the technical field of intelligent agriculture, in particular to a decision generation model training method, a decision generation method and a decision generation device.
Background
In the intelligent agriculture stage, various parameters in the agricultural production are determined by utilizing the sensor device based on the acquired various parameters, so that agricultural intelligent control and production and the like are realized. For example, decisions on irrigation water quantity, decisions on sowing quantity density, decisions on pesticide spraying quantity, decisions on fertilization quantity, and the like.
Currently, in the prior art related to intelligent agricultural production, a supervised learning method is mostly adopted to perform model training for a large amount of agricultural production data. And generating decisions by means of trained models in the face of specific scenes such as sowing, irrigation and the like. However, such methods require training the model with a large number of existing agricultural production data samples, resulting in high data acquisition and model training costs. Meanwhile, models trained based on existing data samples may be difficult to adapt to real-time varying agricultural production scenarios, and decisions made are not appropriate.
Disclosure of Invention
In order to solve the technical problems, the application provides a decision generation model training method, a decision generation method and a decision generation device, which can adapt to agricultural production scenes with real-time variation and make decisions which are more accurate and meet actual needs.
In order to achieve the above object, the technical solution provided in the embodiments of the present application is as follows:
the embodiment of the application provides a decision generation model training method, wherein the model comprises a strategy network, an evaluation network, a target strategy network and a target evaluation network, and the method comprises the following steps:
acquiring agricultural production learning data at the current moment, placing the agricultural production learning data at the current moment in a playback buffer area, and recording the number of model training steps plus one; the agricultural production learning data at the current moment comprises the agricultural production state data at the current moment, the agricultural production decision at the current moment, the rewarding value at the current moment and the agricultural production state data at the next moment;
acquiring a preset number of historical data from the playback buffer zone, and calculating a target value corresponding to the preset number of historical data according to the preset number of historical data; the historical data comprises agricultural production state data at a first moment, agricultural production decisions at the first moment, rewards values at the first moment and agricultural production state data at a second moment;
acquiring a loss function corresponding to the evaluation network according to the target value corresponding to the historical data of the preset number group;
Updating the evaluation network parameters based on the loss function, and acquiring the updated evaluation network parameters;
acquiring an objective function corresponding to the strategy network according to the preset number group history data;
updating the strategy network parameters according to the objective function, and acquiring the updated strategy network parameters;
updating a target evaluation network parameter and a target policy network parameter according to the updated evaluation network parameter and the updated policy network parameter, and acquiring the updated target evaluation network parameter and the updated target policy network parameter;
re-executing the agricultural production learning data at the current moment and the subsequent steps until reaching preset conditions, and obtaining the decision generation model after training; the preset condition is that the number of training steps of the preset model is reached or the quantitative value of the crop condition is beyond a preset range.
Optionally, the acquiring the agricultural production learning data at the current moment includes:
acquiring agricultural production state data at the current moment; the agricultural production state data at least comprises environmental state data and agricultural machinery parameter data;
inputting the agricultural production state data into the strategy network, and acquiring the agricultural production decision at the current moment output by the strategy network;
Executing the agricultural production decision, and calculating a reward value of the current moment under the agricultural production decision; the rewarding value is a growth condition quantized value of the crops;
acquiring the agricultural production state data at the next moment based on the agricultural production decision;
and generating agricultural production learning data at the current moment based on the agricultural production state data at the current moment, the agricultural production decision at the current moment, the rewarding value at the current moment and the agricultural production state data at the next moment.
Optionally, obtaining a preset number of sets of historical data from the playback buffer, and calculating a target value corresponding to the preset number of sets of historical data according to the preset number of sets of historical data, including:
acquiring a preset number of groups of historical data from the playback buffer area, and calculating a target value corresponding to the target historical data; the target historical data is any one of the preset number of groups of historical data;
and determining the target value corresponding to the preset number of groups of historical data based on the target value corresponding to the target historical data.
Optionally, the obtaining a preset number of sets of historical data from the playback buffer, and calculating a target value corresponding to the target historical data, includes:
Acquiring a preset number of groups of historical data from the playback buffer area, and determining target historical data from the preset number of groups of historical data;
acquiring an object strategy output by the target strategy network and an object value output by the target evaluation network based on the target historical data;
and calculating the target value corresponding to the target historical data based on the object strategy, the object value and the rewarding value in the target historical data.
Optionally, the obtaining the loss function corresponding to the evaluation network according to the target value corresponding to the preset number group of historical data includes:
acquiring a preset quantity group cost value output by the evaluation network based on the preset quantity group history data;
and acquiring a loss function corresponding to the evaluation network based on the target value corresponding to the historical data of the preset quantity group and the cost value output by the evaluation network of the preset quantity group.
Optionally, before the acquiring the agricultural production learning data at the current time, the method further includes:
initializing model parameters; the model parameters include policy network parameters, evaluation network parameters, target policy network parameters, and target evaluation network parameters.
Optionally, before the acquiring the agricultural production learning data at the current time, the method further includes:
setting a preset model training step number and initializing the preset model training step number.
The embodiment of the application also provides a decision generation model training device, wherein the model comprises a strategy network, an evaluation network, a target strategy network and a target evaluation network, and the device comprises:
the first acquisition unit is used for acquiring agricultural production learning data at the current moment, placing the agricultural production learning data in a playback buffer area, and recording the number of model training steps plus one; the agricultural production learning data at the current moment comprises the agricultural production state data at the current moment, the agricultural production decision at the current moment, the rewarding value at the current moment and the agricultural production state data at the next moment;
the calculating unit is used for acquiring a preset number of historical data from the playback buffer area and calculating a target value corresponding to the preset number of historical data according to the preset number of historical data; the historical data comprises agricultural production state data at a first moment, agricultural production decisions at the first moment, rewards values at the first moment and agricultural production state data at a second moment;
The second acquisition unit is used for acquiring a loss function corresponding to the evaluation network according to the target value corresponding to the historical data of the preset number group;
a third obtaining unit, configured to update an evaluation network parameter based on the loss function, and obtain the updated evaluation network parameter;
a fourth obtaining unit, configured to obtain, according to the preset number of sets of historical data, an objective function corresponding to the policy network;
a fifth obtaining unit, configured to update a policy network parameter according to the objective function, and obtain the updated policy network parameter;
a sixth obtaining unit, configured to update a target evaluation network parameter and a target policy network parameter according to the updated evaluation network parameter and the updated policy network parameter, and obtain the updated target evaluation network parameter and the updated target policy network parameter;
the execution unit is used for re-executing the agricultural production learning data at the current moment and the subsequent steps until reaching the preset condition, and obtaining the decision generation model after training is completed; the preset condition is that the number of training steps of the preset model is reached or the quantitative value of the crop condition is beyond a preset range.
The embodiment of the application also provides a decision generation method, which comprises the following steps:
acquiring agricultural production state data at the current moment; the agricultural production state data at least comprises environmental state data and agricultural machinery parameter data;
inputting the agricultural production state data into a strategy network, and acquiring an agricultural production decision at the current moment output by the strategy network; the strategy network belongs to a decision generation model; the decision generation model is obtained through training according to the decision generation model training method;
and executing the agricultural production decision.
The embodiment of the application also provides a decision generation device, which comprises:
the acquisition unit is used for acquiring the agricultural production state data at the current moment; the agricultural production state data at least comprises environmental state data and agricultural machinery parameter data;
the input unit is used for inputting the agricultural production state data into a strategy network and acquiring the agricultural production decision at the current moment output by the strategy network; the strategy network belongs to a decision generation model; the decision generation model is obtained through training according to the decision generation model training method;
And the execution unit is used for executing the agricultural production decision.
According to the technical scheme, the application has the following beneficial effects:
the embodiment of the application provides a decision generation model training method, a decision generation method and a device, wherein the decision generation model comprises a strategy network, an evaluation network, a target strategy network and a target evaluation network, and the decision generation model training method comprises the following steps: acquiring agricultural production learning data at the current moment, placing the agricultural production learning data in a playback buffer area, and recording the model training step number plus one; the agricultural production learning data at the current time includes agricultural production status data at the current time, agricultural production decisions at the current time, prize values at the current time, and agricultural production status data at the next time. Acquiring a preset number of historical data from a playback buffer zone, and calculating a target value corresponding to the preset number of historical data according to the preset number of historical data; the historical data includes agricultural production status data at a first time, agricultural production decisions at the first time, prize values at the first time, and agricultural production status data at a second time. And obtaining a loss function corresponding to the evaluation network according to the target value corresponding to the historical data of the preset number group. And updating the evaluation network parameters based on the loss function, and acquiring the updated evaluation network parameters. And acquiring an objective function corresponding to the strategy network according to the historical data of the preset number group. And updating the strategy network parameters according to the objective function, and acquiring the updated strategy network parameters. And updating the target evaluation network parameters and the target strategy network parameters according to the updated evaluation network parameters and the updated strategy network parameters, and acquiring the updated target evaluation network parameters and the updated target strategy network parameters. Re-executing the agricultural production learning data at the current moment and the subsequent steps until reaching the preset condition, and obtaining a decision generation model after training; the preset condition is that the number of training steps of the preset model is reached or the quantitative value of the crop condition is beyond a preset range. The online learning strategy network, the evaluation network, the target strategy network and the target evaluation network form a decision generation model, and the model is suitable for continuous action control and can adapt to a real-time variable intelligent agricultural production environment to generate an accurate agricultural production decision.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an exemplary application scenario provided in an embodiment of the present application;
FIG. 2 is a flowchart of a training method for a decision-making model according to an embodiment of the present application;
FIG. 3 is a flowchart of a decision generation method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a training device for a decision-making model according to an embodiment of the present application;
fig. 5 is a schematic diagram of a decision making device according to an embodiment of the present application.
Detailed Description
In order to make the above objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures and detailed description are described in further detail below.
With the recent technological revolution and the change of social demands, agricultural development has undergone a history of mechanization, greenization and precision. In recent years, with the wide application of high-performance sensors, internet technology, big data analysis technology, and intelligent control systems, agricultural production is gradually advancing to intelligent.
In the smart agriculture phase, agricultural production is divided into a perception layer, a transport layer, an application layer, and the like. In the sensing layer, by means of sensor devices such as cameras, spectrum sensors, soil monitors, temperature/humidity sensors, satellite remote sensing systems and the like, information such as production environment states, crop growth conditions, agricultural machinery parameters and the like can be obtained, so that the omnibearing sensing and monitoring of agricultural production scenes are realized. And at the transmission layer, the sensing data acquired by the sensor are transmitted to the digital management platform through data transmission technologies of different layers, different ranges and different action ranges, such as the Internet, 4G/5G, a wireless local area network, a personal area network and the like. At the application layer, based on the input sensing agricultural production data, large data analysis is completed by means of computing equipment and intelligent algorithms, autonomous diagnosis is carried out on the growth condition and the environmental state of crops, and then adaptive decisions are made, so that agricultural intelligent control and production are realized.
Currently, in the prior art related to intelligent agricultural production, a supervised learning method is mostly adopted to perform model training for a large amount of agricultural production data. The decision is generated by means of a trained model when the device faces specific scenes such as sowing, ploughing, irrigation and the like. However, such methods require training the model with a large number of existing agricultural production data samples, resulting in high data acquisition and model training costs. Meanwhile, models trained based on existing data samples may be difficult to adapt to real-time varying agricultural production scenarios, and the most appropriate decisions cannot be made.
Based on the above, the embodiment of the application provides a method and a device for training a decision generation model. In order to facilitate understanding of the decision-making model training method provided in the embodiments of the present application, the following description is provided with reference to the scenario example shown in fig. 1. Referring to fig. 1, this is a schematic diagram of an exemplary application scenario provided in an embodiment of the present application. The method may be applied in the terminal device 101.
The decision-making model in the terminal device 101 includes a policy network, an evaluation network, a target policy network, and a target evaluation network.
The terminal device 101 acquires the agricultural production learning data at the current time based on the agricultural production environment 102, places it in the playback buffer, and records the model training step number plus one. The agricultural production learning data at the current moment comprises agricultural production state data at the current moment, agricultural production decision at the current moment, rewarding value at the current moment and agricultural production state data at the next moment.
The terminal device 101 acquires a preset number of sets of history data from the playback buffer, and calculates a target value corresponding to the preset number of sets of history data from the preset number of sets of history data. Wherein the historical data includes agricultural production status data at a first time, agricultural production decisions at the first time, prize values at the first time, and agricultural production status data at a second time.
The terminal device 101 obtains a loss function corresponding to the evaluation network according to the target value corresponding to the historical data of the preset number group. And updating the evaluation network parameters based on the loss function, and acquiring the updated evaluation network parameters.
The terminal device 101 obtains an objective function corresponding to the policy network according to the preset number of sets of history data. And updating the strategy network parameters according to the objective function, and acquiring the updated strategy network parameters.
The terminal device 101 updates the target evaluation network parameter and the target policy network parameter according to the updated evaluation network parameter and the updated policy network parameter, and obtains the updated target evaluation network parameter and the updated target policy network parameter.
The terminal device 101 re-executes the acquisition of the agricultural production learning data at the current moment to train the decision generation model until reaching the preset condition, and acquires the trained decision generation model. The preset condition is that the number of training steps of the preset model is reached or the quantitative value of the crop condition is beyond a preset range.
Those skilled in the art will appreciate that the schematic diagram shown in fig. 1 is but one example in which embodiments of the present application may be implemented. The scope of applicability of the embodiments of the application is not limited in any way by the schematic diagram.
Based on the above description, a training method of the waveform recognition model provided in the present application will be described in detail with reference to the accompanying drawings.
Referring to fig. 2, fig. 2 is a flowchart of a decision-making model training method according to an embodiment of the present application. The decision generation model includes a policy network, an evaluation network, a target policy network, and a target evaluation network. Wherein the policy network and the evaluation network are online learning networks.
As an example, the decision generation model is a depth determination strategy gradient algorithm model, comprising an online learned strategy network μ (s|θ) and an evaluation network Q (s, a|ω), and a target strategy network μ '(s|θ') and a target evaluation network Q '(s, a|ω') having the same structure but different in update manner.
It should be noted that, the decision generation model training method may be performed by the terminal device 101 in the above embodiment. As shown in fig. 2, the decision generation model training method includes S201 to S208:
s201: acquiring agricultural production learning data at the current moment, placing the agricultural production learning data at the current moment in a playback buffer area, and recording the number of model training steps plus one; the agricultural production learning data at the current time includes agricultural production status data at the current time, agricultural production decisions at the current time, prize values at the current time, and agricultural production status data at the next time.
In specific implementation, the agricultural production learning data at the current moment is obtained by a decision generation model.
In one possible implementation manner, the embodiment of the application provides a specific implementation manner for acquiring agricultural production learning data at the current moment, which includes:
a1: acquiring agricultural production state data at the current moment; the agricultural production status data includes at least environmental status data and agricultural machine parameter data.
The agricultural production status data at the current time t can be represented by s t The representation is performed.
The environmental status data is environmental data in agricultural production. The agricultural machinery parameter data is parameter data related to crops in agricultural production. In irrigation scenarios, the environmental status data are for example solar effective radiation, ambient air/soil temperature and humidity, soil nutrient content, etc. Agricultural machine parameter data such as crop growth conditions, etc.
In practical application, sensor devices such as cameras, spectrum sensors, soil monitors, temperature/humidity sensors, satellite remote sensing systems, inertial measurement units and the like are utilized to acquire environmental state data and agricultural machinery parameter data. In addition, agricultural production status data includes agronomic expertise, historical data, and the like.
A2: and inputting the agricultural production state data into a strategy network, and obtaining the agricultural production decision at the current moment output by the strategy network.
At time t, inputting the agricultural production state data into a strategy network and outputting an agricultural production decision a t =μ(s t |θ). Where θ is a policy network parameter. a, a t =μ(s t I theta) is an input-output function of the policy network, and μ(s) is available t I θ) directly represents agricultural production decision a t Such as irrigation decision actions.
In an alternative example, the policy network uses a layer 4 network architecture. Wherein the input layer inputs agricultural production status data. The middle 2 hidden layers consist of 100 neurons each, using the modified linear unit ReLU function as the activation function. The output layer directly outputs the calculated and represented agricultural production decision a without using an activation function t
A3: executing the agricultural production decision, and calculating a reward value at the current moment under the agricultural production decision; the reward value is a quantitative value of the growth condition of crops.
Executing agricultural production decision a t For example, an irrigation action is performed, the growth condition of the irrigated crop is quantified, such as whether the growth height is within a normal range, whether a dry/flooded condition occurs, etc., and the irrigation action a is calculated t The prize value r of the current time of (2) t
A4: based on the agricultural production decision, agricultural production state data at the next moment is acquired.
Under the agricultural production decision at the current moment, the environment enters the agricultural production state at the next moment, and the agricultural production state data s at the next moment is acquired t+1
A5: the agricultural production learning data at the current time is generated based on the agricultural production status data at the current time, the agricultural production decision at the current time, the bonus value at the current time, and the agricultural production status data at the next time.
It is necessary to learn the agricultural production data (s t ,a t ,r t ,s t+1 ) A parameter update process stored in the playback buffer for use in a subsequent decision generation model.
In one possible implementation manner, before acquiring the agricultural production learning data at the current moment, the method provided by the embodiment of the application further includes:
model parameters are initialized. The model parameters comprise strategy network parameters, evaluation network parameters, target strategy network parameters and target evaluation network parameters. Wherein, the policy network parameter is θ, the evaluation network parameter is ω, the target policy network parameter is θ ', and the target evaluation network parameter ω'.
In one possible implementation manner, before acquiring the agricultural production learning data at the current moment, the method provided by the embodiment of the application further includes:
setting a preset model training step number and initializing the preset model training step number.
Therefore, after S201 is performed, the model training step number needs to be recorded and incremented by one, and when the model training step number reaches the preset model training step number, the decision generation model training is finished, and the trained decision generation model is obtained.
S202: acquiring a preset number of historical data from a playback buffer zone, and calculating a target value corresponding to the preset number of historical data according to the preset number of historical data; the historical data includes agricultural production status data at a first time, agricultural production decisions at the first time, prize values at the first time, and agricultural production status data at a second time.
And acquiring a preset number of groups of historical data from the playback buffer zone, and updating model parameters in the decision generation model by using the preset number of groups of historical data. Based on the above, the target value corresponding to the history data of the preset number group is calculated according to the history data of the preset number group, and the target value is marked as y i ,i∈[0,N-1]Wherein N is a preset number. Each set of history data (s i ,a i ,r i ,s i+1 ) Agricultural production status data s comprising a first moment in time i Agricultural production decision a at first moment i Prize value r at first time i And agricultural production status data s at a second time i+1
It will be appreciated that the target value corresponding to the historical data may be considered as an expected value for evaluating the cost of the network output under the historical data. The objective is to make the cost value of the evaluation network output obtained based on the history data more approximate to the target value. The target value may be used to represent an expected value of agricultural production performance at an agricultural production decision.
In one possible implementation manner, the embodiment of the present application provides a specific implementation manner of obtaining a preset number of sets of history data from a playback buffer, and calculating a target value corresponding to the preset number of sets of history data according to the preset number of sets of history data, where the specific implementation manner includes:
acquiring a preset number of groups of historical data from a playback buffer area, and calculating a target value corresponding to the target historical data; the target historical data is any one of a preset number of groups of historical data;
and determining the target value corresponding to the preset number of groups of historical data based on the target value corresponding to the target historical data.
It can be understood that, the target value corresponding to the history data of any one group in the preset number group history data is calculated first, so that the target value corresponding to the history data of the preset number group can be determined.
In a possible implementation manner, the embodiment of the present application provides a specific implementation manner for obtaining a preset number of sets of history data from a playback buffer, and calculating a target value corresponding to the target history data, where the specific implementation manner includes:
b1: and acquiring a preset number of groups of historical data from the playback buffer area, and determining target historical data from the preset number of groups of historical data.
B2: and acquiring an object strategy output by the target strategy network and an object value output by the target evaluation network based on the target historical data.
Determination ofAfter the target history data, for example, the target history data is (s i ,a i ,r i ,s i+1 ). Based on the target history data, obtaining the target strategy mu'(s) output by the target strategy network i+1 I θ ') and an object value Q'(s) output from the target evaluation network i+1 ,μ′(s i+1 |θ′)|ω′)。
B3: and calculating the target value corresponding to the target historical data based on the object strategy, the object value and the rewarding value in the target historical data.
And calculating the target value corresponding to the target historical data based on the object strategy, the object value and the rewarding value in the target historical data.
As an alternative example, the calculation formula of the target value corresponding to the target history data is:
y i =r i +γQ′(s i+1 ,μ′(s i+1 |θ′)|ω′)
where γ is a discount factor, typically taken as a constant between 0 and 1.
S203: and obtaining a loss function corresponding to the evaluation network according to the target value corresponding to the historical data of the preset number group.
In a possible implementation manner, the embodiment of the present application provides a specific implementation manner of obtaining a loss function corresponding to an evaluation network according to a target value corresponding to a preset number of sets of historical data, where the specific implementation manner includes:
C1: and acquiring the preset quantity group cost value output by the evaluation network based on the preset quantity group history data.
As one example, the evaluation network uses a 5-layer network architecture. Wherein the input layer reads in the agricultural production status data s i . The 1 st hidden layer consists of 100 neurons, uses the ReLU function as an activation function, and this layer takes the agricultural production decision a obtained by the strategy network i Together as an input. The 2 nd hidden layer outputs the 1 st hidden layer to the agricultural production decision a i And (5) fusion is carried out, and a point-by-point addition result is obtained. The 3 rd hidden layer, like the 1 st hidden layer, consists of 100 neurons, using the ReLU function as a stimulusA living function. The output layer directly calculates the representation based on the environmental state s without using an activation function i And irrigation action a i Cost value Q(s) i ,a i |ω)。
C2: and acquiring a loss function corresponding to the evaluation network based on the target value corresponding to the historical data of the preset quantity group and the cost value output by the evaluation network of the preset quantity group.
It can be understood that the target value corresponding to the historical data of the preset number group and the cost value output by the evaluation network of the preset number group are in one-to-one correspondence. For example, y i And Q(s) i ,a i |ω) corresponds to.
In the specific implementation, the target value corresponding to each group of historical data and the cost value output by the evaluation network are subjected to difference, and the square of the difference is calculated. And taking the obtained average value of the squares of each group of differences as the value of the loss function.
As an alternative example, the calculation formula of the loss function L corresponding to the evaluation network is:
wherein Q(s) i ,a i Co) is a cost value calculated using an online learning evaluation network, and γ is a discount factor, typically taken as a constant between 0-1.
S204: and updating the evaluation network parameters based on the loss function, and acquiring the updated evaluation network parameters.
In particular implementations, the evaluation network parameter ω is updated based on the loss function, minimizing the loss function.
S205: and acquiring an objective function corresponding to the strategy network according to the historical data of the preset number group.
Wherein, the objective function corresponding to the strategy network is marked as J (theta). The objective function is generally expressed in terms of a prize r t Is a function of (2).
S206: and updating the strategy network parameters according to the objective function, and acquiring the updated strategy network parameters.
As an alternative example, the policy network parameter θ is updated:
wherein J (theta) is an objective function corresponding to the policy network.The gradient of the objective function with respect to the strategic network parameter θ is obtained by maximizing the objective function. / >Q (s, a|ω) relative to a=μ (s i ) Is a gradient of (a). />Mu (s|theta) |s i Gradient with respect to policy network parameter θ. By the formula->And updating the strategy network parameter theta. Where κ is a fixed time step parameter.
S207: and updating the target evaluation network parameters and the target strategy network parameters according to the updated evaluation network parameters and the updated strategy network parameters, and acquiring the updated target evaluation network parameters and the updated target strategy network parameters.
As an alternative example, the target evaluation network parameters and the updated target policy network parameters are updated by the following formula:
ω′←τω+(1-τ)ω′
θ′←τθ+(1-τ)θ′
where τ is a setting parameter. And tau < 1, the target strategy network and the target evaluation network can slowly follow the online learning strategy network and the rating network, and the training stability is greatly improved.
S208: re-executing the agricultural production learning data at the current moment and the subsequent steps until reaching the preset condition, and obtaining a decision generation model after training; the preset condition is that the number of training steps of the preset model is reached or the quantitative value of the crop condition is beyond a preset range.
It can be understood that the preset range is set according to the actual application scene and the actual situation. The preset range of quantitative values for crop conditions characterizes the severe damage or even death of crop growth.
The embodiment of the application provides a decision generation model training method, agricultural production learning data at the current moment is obtained and placed in a playback buffer zone, and the number of model training steps is recorded and increased by one. And acquiring the preset quantity group history data from the playback buffer zone, and calculating the target value corresponding to the preset quantity group history data according to the preset quantity group history data. And obtaining a loss function corresponding to the evaluation network according to the target value corresponding to the historical data of the preset number group. Based on the loss function, the evaluation network parameters are updated. And acquiring an objective function corresponding to the strategy network according to the historical data of the preset number group. And updating the strategy network parameters according to the objective function. And acquiring updated target evaluation network parameters and target strategy network parameters from the updated evaluation network parameters and the updated strategy network parameters. And stopping training when the preset condition is reached. The decision generation model can adapt to the intelligent agricultural production environment with real-time variation, and can generate accurate agricultural production decisions.
Referring to fig. 3, fig. 3 is a flowchart of a decision generation method according to an embodiment of the present application. As shown in fig. 3, the method includes S301-S303:
s301: acquiring agricultural production state data at the current moment; the agricultural production status data includes at least environmental status data and agricultural machine parameter data;
S302: inputting the agricultural production state data into a strategy network, and obtaining an agricultural production decision at the current moment output by the strategy network; the policy network belongs to a decision generation model; the decision generation model is obtained by training according to the decision generation model training method;
s303: and executing agricultural production decisions.
The agricultural production decisions output by the strategy network in the decision generation model based on training completion are accurate decisions in the intelligent agricultural production environment adapting to real-time variation.
Based on the decision-making model training method provided by the method embodiment, the embodiment of the application also provides a decision-making model training device. The decision-making model training apparatus will be described below with reference to the accompanying drawings.
Referring to fig. 4, fig. 4 is a schematic diagram of a decision-making model training apparatus according to an embodiment of the present application. The decision-making model includes a policy network, an evaluation network, a target policy network, and a target evaluation network, as shown in fig. 4, the apparatus includes:
a first obtaining unit 401, configured to obtain agricultural production learning data at the current moment, place the agricultural production learning data in a playback buffer, and record the model training step number plus one; the agricultural production learning data at the current moment comprises the agricultural production state data at the current moment, the agricultural production decision at the current moment, the rewarding value at the current moment and the agricultural production state data at the next moment;
A calculating unit 402, configured to obtain a preset number of sets of history data from the playback buffer, and calculate a target value corresponding to the preset number of sets of history data according to the preset number of sets of history data; the historical data comprises agricultural production state data at a first moment, agricultural production decisions at the first moment, rewards values at the first moment and agricultural production state data at a second moment;
a second obtaining unit 403, configured to obtain a loss function corresponding to the evaluation network according to a target value corresponding to the preset number of sets of historical data;
a third obtaining unit 404, configured to update an evaluation network parameter based on the loss function, and obtain the updated evaluation network parameter;
a fourth obtaining unit 405, configured to obtain an objective function corresponding to the policy network according to the preset number of sets of historical data;
a fifth obtaining unit 406, configured to update a policy network parameter according to the objective function, and obtain the updated policy network parameter;
a sixth obtaining unit 407, configured to update a target evaluation network parameter and a target policy network parameter according to the updated evaluation network parameter and the updated policy network parameter, and obtain the updated target evaluation network parameter and the updated target policy network parameter;
The execution unit 408 is configured to re-execute the acquiring the current agricultural production learning data and the subsequent steps until a preset condition is reached, and acquire the decision generation model after training is completed; the preset condition is that the number of training steps of the preset model is reached or the quantitative value of the crop condition is beyond a preset range.
In one possible implementation manner, the first obtaining unit 401 includes:
the first acquisition subunit is used for acquiring agricultural production state data at the current moment; the agricultural production state data at least comprises environmental state data and agricultural machinery parameter data;
the second acquisition subunit is used for inputting the agricultural production state data into the strategy network and acquiring the agricultural production decision at the current moment output by the strategy network;
a first calculation subunit, configured to execute the agricultural production decision, and calculate a prize value at the current time under the agricultural production decision; the rewarding value is a growth condition quantized value of the crops;
a third obtaining subunit, configured to obtain the agricultural production status data at the next moment based on the agricultural production decision;
and the generation subunit is used for generating the agricultural production learning data at the current moment based on the agricultural production state data at the current moment, the agricultural production decision at the current moment, the rewarding value at the current moment and the agricultural production state data at the next moment.
In one possible implementation, the computing unit 402 includes:
the second calculating subunit is used for acquiring a preset number of groups of historical data from the playback buffer area and calculating a target value corresponding to the target historical data; the target historical data is any one of the preset number of groups of historical data;
and the determining subunit is used for determining the target value corresponding to the preset number group of historical data based on the target value corresponding to the target historical data.
In one possible implementation, the second computing subunit includes:
a fourth obtaining subunit, configured to obtain a preset number of sets of history data from the playback buffer, and determine target history data from the preset number of sets of history data;
a fifth obtaining subunit, configured to obtain, based on the target history data, an object policy output by the target policy network and an object value output by the target evaluation network;
and the third calculation subunit is used for calculating the target value corresponding to the target historical data based on the object strategy, the object value and the rewarding value in the target historical data.
In one possible implementation manner, the second obtaining unit 403 includes:
A sixth obtaining subunit, configured to obtain a preset number group cost value output by the evaluation network based on the preset number group history data;
and a seventh obtaining subunit, configured to obtain a loss function corresponding to the evaluation network based on the target value corresponding to the historical data of the preset number group and the cost value output by the evaluation network of the preset number group.
In one possible implementation, the apparatus further includes:
the initialization unit is used for initializing model parameters before the agricultural production learning data at the current moment is acquired; the model parameters include policy network parameters, evaluation network parameters, target policy network parameters, and target evaluation network parameters.
In one possible implementation, the apparatus further includes:
the step number setting unit is used for setting a preset model training step number and initializing the preset model training step number before the agricultural production learning data at the current moment is acquired.
The embodiment of the application provides a decision generation model training device, which is used for acquiring agricultural production learning data at the current moment, placing the agricultural production learning data in a playback buffer zone and adding one to the number of training steps of a recording model. And acquiring the preset quantity group history data from the playback buffer zone, and calculating the target value corresponding to the preset quantity group history data according to the preset quantity group history data. And obtaining a loss function corresponding to the evaluation network according to the target value corresponding to the historical data of the preset number group. Based on the loss function, the evaluation network parameters are updated. And acquiring an objective function corresponding to the strategy network according to the historical data of the preset number group. And updating the strategy network parameters according to the objective function. And acquiring updated target evaluation network parameters and target strategy network parameters from the updated evaluation network parameters and the updated strategy network parameters. And stopping training when the preset condition is reached. The decision generation model can adapt to the intelligent agricultural production environment with real-time variation, and can generate accurate agricultural production decisions.
Referring to fig. 5, fig. 5 is a schematic diagram of a decision making device according to an embodiment of the present application. As shown in fig. 5, the apparatus includes:
an acquiring unit 501, configured to acquire agricultural production status data at a current time; the agricultural production state data at least comprises environmental state data and agricultural machinery parameter data;
an input unit 502, configured to input the agricultural production status data into a policy network, and obtain an agricultural production decision at the current time output by the policy network; the strategy network belongs to a decision generation model; the decision generation model is obtained by training according to the decision generation model training method;
an execution unit 503 for executing the agricultural production decision.
From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus necessary general purpose hardware platforms. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the method disclosed in the embodiment, since it corresponds to the system disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the system part.
It should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of training a decision-making model, the model comprising a policy network, an evaluation network, a target policy network, and a target evaluation network, the method comprising:
acquiring agricultural production learning data at the current moment, placing the agricultural production learning data at the current moment in a playback buffer area, and recording the number of model training steps plus one; the agricultural production learning data at the current moment comprises the agricultural production state data at the current moment, the agricultural production decision at the current moment, the rewarding value at the current moment and the agricultural production state data at the next moment;
acquiring a preset number of historical data from the playback buffer zone, and calculating a target value corresponding to the preset number of historical data according to the preset number of historical data; the historical data comprises agricultural production state data at a first moment, agricultural production decisions at the first moment, rewards values at the first moment and agricultural production state data at a second moment;
acquiring a loss function corresponding to the evaluation network according to the target value corresponding to the historical data of the preset number group;
updating the evaluation network parameters based on the loss function, and acquiring the updated evaluation network parameters;
Acquiring an objective function corresponding to the strategy network according to the preset number group history data;
updating the strategy network parameters according to the objective function, and acquiring the updated strategy network parameters;
updating a target evaluation network parameter and a target policy network parameter according to the updated evaluation network parameter and the updated policy network parameter, and acquiring the updated target evaluation network parameter and the updated target policy network parameter;
re-executing the agricultural production learning data at the current moment and the subsequent steps until reaching preset conditions, and obtaining the decision generation model after training; the preset condition is that the number of training steps of the preset model is reached or the quantitative value of the crop condition is beyond a preset range.
2. The method of claim 1, wherein the obtaining agricultural production learning data for the current time comprises:
acquiring agricultural production state data at the current moment; the agricultural production state data at least comprises environmental state data and agricultural machinery parameter data;
inputting the agricultural production state data into the strategy network, and acquiring the agricultural production decision at the current moment output by the strategy network;
Executing the agricultural production decision, and calculating a reward value of the current moment under the agricultural production decision; the rewarding value is a growth condition quantized value of the crops;
acquiring the agricultural production state data at the next moment based on the agricultural production decision;
and generating agricultural production learning data at the current moment based on the agricultural production state data at the current moment, the agricultural production decision at the current moment, the rewarding value at the current moment and the agricultural production state data at the next moment.
3. The method of claim 1, wherein obtaining a preset number of sets of history data from the playback buffer and calculating a target value for the preset number of sets of history data from the preset number of sets of history data, comprises:
acquiring a preset number of groups of historical data from the playback buffer area, and calculating a target value corresponding to the target historical data; the target historical data is any one of the preset number of groups of historical data;
and determining the target value corresponding to the preset number of groups of historical data based on the target value corresponding to the target historical data.
4. The method of claim 3, wherein the obtaining a preset number of sets of history data from the playback buffer, and calculating a target value corresponding to the target history data, comprises:
Acquiring a preset number of groups of historical data from the playback buffer area, and determining target historical data from the preset number of groups of historical data;
acquiring an object strategy output by the target strategy network and an object value output by the target evaluation network based on the target historical data;
and calculating the target value corresponding to the target historical data based on the object strategy, the object value and the rewarding value in the target historical data.
5. The method according to claim 1, wherein the obtaining the loss function corresponding to the evaluation network according to the target value corresponding to the preset number of sets of historical data includes:
acquiring a preset quantity group cost value output by the evaluation network based on the preset quantity group history data;
and acquiring a loss function corresponding to the evaluation network based on the target value corresponding to the historical data of the preset quantity group and the cost value output by the evaluation network of the preset quantity group.
6. The method according to any one of claims 1 to 5, wherein prior to the acquiring of the agricultural production learning data at the current time, the method further comprises:
initializing model parameters; the model parameters include policy network parameters, evaluation network parameters, target policy network parameters, and target evaluation network parameters.
7. The method according to any one of claims 1 to 5, wherein prior to the acquiring of the agricultural production learning data at the current time, the method further comprises:
setting a preset model training step number and initializing the preset model training step number.
8. A method of decision generation, the method comprising:
acquiring agricultural production state data at the current moment; the agricultural production state data at least comprises environmental state data and agricultural machinery parameter data;
inputting the agricultural production state data into a strategy network, and acquiring an agricultural production decision at the current moment output by the strategy network; the strategy network belongs to a decision generation model; the decision-making model is trained and obtained by the decision-making model training method according to any one of claims 1-7;
and executing the agricultural production decision.
9. A decision-making model training apparatus, wherein the model comprises a policy network, an evaluation network, a target policy network, and a target evaluation network, the apparatus comprising:
the first acquisition unit is used for acquiring agricultural production learning data at the current moment, placing the agricultural production learning data in a playback buffer area, and recording the number of model training steps plus one; the agricultural production learning data at the current moment comprises the agricultural production state data at the current moment, the agricultural production decision at the current moment, the rewarding value at the current moment and the agricultural production state data at the next moment;
The calculating unit is used for acquiring a preset number of historical data from the playback buffer area and calculating a target value corresponding to the preset number of historical data according to the preset number of historical data; the historical data comprises agricultural production state data at a first moment, agricultural production decisions at the first moment, rewards values at the first moment and agricultural production state data at a second moment;
the second acquisition unit is used for acquiring a loss function corresponding to the evaluation network according to the target value corresponding to the historical data of the preset number group;
a third obtaining unit, configured to update an evaluation network parameter based on the loss function, and obtain the updated evaluation network parameter;
a fourth obtaining unit, configured to obtain, according to the preset number of sets of historical data, an objective function corresponding to the policy network;
a fifth obtaining unit, configured to update a policy network parameter according to the objective function, and obtain the updated policy network parameter;
a sixth obtaining unit, configured to update a target evaluation network parameter and a target policy network parameter according to the updated evaluation network parameter and the updated policy network parameter, and obtain the updated target evaluation network parameter and the updated target policy network parameter;
The execution unit is used for re-executing the agricultural production learning data at the current moment and the subsequent steps until reaching the preset condition, and obtaining the decision generation model after training is completed; the preset condition is that the number of training steps of the preset model is reached or the quantitative value of the crop condition is beyond a preset range.
10. A decision making apparatus, the apparatus comprising:
the acquisition unit is used for acquiring the agricultural production state data at the current moment; the agricultural production state data at least comprises environmental state data and agricultural machinery parameter data;
the input unit is used for inputting the agricultural production state data into a strategy network and acquiring the agricultural production decision at the current moment output by the strategy network; the strategy network belongs to a decision generation model; the decision-making model is trained and obtained by the decision-making model training method according to any one of claims 1-7;
and the execution unit is used for executing the agricultural production decision.
CN202110872525.4A 2021-07-30 2021-07-30 Decision generation model training method, decision generation method and device Active CN113723757B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110872525.4A CN113723757B (en) 2021-07-30 2021-07-30 Decision generation model training method, decision generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110872525.4A CN113723757B (en) 2021-07-30 2021-07-30 Decision generation model training method, decision generation method and device

Publications (2)

Publication Number Publication Date
CN113723757A CN113723757A (en) 2021-11-30
CN113723757B true CN113723757B (en) 2023-07-18

Family

ID=78674393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110872525.4A Active CN113723757B (en) 2021-07-30 2021-07-30 Decision generation model training method, decision generation method and device

Country Status (1)

Country Link
CN (1) CN113723757B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818913A (en) * 2022-04-22 2022-07-29 北京百度网讯科技有限公司 Decision generation method and device
CN117376661B (en) * 2023-12-06 2024-02-27 山东大学 Fine-granularity video stream self-adaptive adjusting system and method based on neural network
CN117807410B (en) * 2024-02-29 2024-05-31 东北大学 Method and device for determining set speed of steel-turning roller, storage medium and terminal

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626776A (en) * 2020-05-26 2020-09-04 创新奇智(西安)科技有限公司 Method for training strategy model, method and device for determining advertisement putting strategy
CN112072643A (en) * 2020-08-20 2020-12-11 电子科技大学 Light-storage system online scheduling method based on depth certainty gradient strategy

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101225853B1 (en) * 2011-05-31 2013-01-23 삼성에스디에스 주식회사 Apparatus and Method for Controlling IP Address of Data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626776A (en) * 2020-05-26 2020-09-04 创新奇智(西安)科技有限公司 Method for training strategy model, method and device for determining advertisement putting strategy
CN112072643A (en) * 2020-08-20 2020-12-11 电子科技大学 Light-storage system online scheduling method based on depth certainty gradient strategy

Also Published As

Publication number Publication date
CN113723757A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN113723757B (en) Decision generation model training method, decision generation method and device
CN111310889B (en) Evaporation waveguide profile estimation method based on deep neural network
CN112465151A (en) Multi-agent federal cooperation method based on deep reinforcement learning
CN105921370B (en) Film thickness control method for extrusion-coating machine
Byakatonda et al. Prediction of onset and cessation of austral summer rainfall and dry spell frequency analysis in semiarid Botswana
CN110456026B (en) Soil moisture content monitoring method and device
Chen et al. Greenhouse protection against frost conditions in smart farming using IoT enabled artificial neural networks
CN107255920A (en) PID control method and apparatus and system based on network optimization algorithm
CN110999766A (en) Irrigation decision method, device, computer equipment and storage medium
CN114297907A (en) Greenhouse environment spatial distribution prediction method and device
CN105184400A (en) Tobacco field soil moisture prediction method
Taki et al. Application of Neural Networks and multiple regression models in greenhouse climate estimation
CN112527037A (en) Greenhouse environment regulation and control method and system with environment factor prediction function
CN112131661A (en) Method for unmanned aerial vehicle to autonomously follow moving target
CN115526839A (en) Flower drying control method, device and medium based on neural network
Kalaiarasi et al. Crop yield prediction using multi-parametric deep neural networks
Rahimikhoob Estimating sunshine duration from other climatic data by artificial neural network for ET 0 estimation in an arid environment
CN117252344A (en) Suitability evaluation method and device for cultivation facility, electronic equipment and storage medium
CN112925207A (en) Greenhouse environment temperature self-adaption method based on parameter identification
CN110852415B (en) Vegetation index prediction method, system and equipment based on neural network algorithm
KR20180024171A (en) Server and method for determining actuator parameter of greenhouse
CN115907204A (en) Forest transpiration water consumption prediction method for optimizing BP neural network by sparrow search algorithm
CN115034159A (en) Power prediction method, device, storage medium and system for offshore wind farm
Khedkar et al. Estimation of evapotranspiration using neural network approach
Ahmad et al. Neural network modeling and identification of naturally ventilated tropical greenhouse climates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant