CN114859734B - Greenhouse environment parameter optimization decision method based on improved SAC algorithm - Google Patents

Greenhouse environment parameter optimization decision method based on improved SAC algorithm Download PDF

Info

Publication number
CN114859734B
CN114859734B CN202210675362.5A CN202210675362A CN114859734B CN 114859734 B CN114859734 B CN 114859734B CN 202210675362 A CN202210675362 A CN 202210675362A CN 114859734 B CN114859734 B CN 114859734B
Authority
CN
China
Prior art keywords
greenhouse
data
environment parameter
neural network
sac algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210675362.5A
Other languages
Chinese (zh)
Other versions
CN114859734A (en
Inventor
师佳
文柯超
徐星海
谢惠民
洪文晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202210675362.5A priority Critical patent/CN114859734B/en
Publication of CN114859734A publication Critical patent/CN114859734A/en
Application granted granted Critical
Publication of CN114859734B publication Critical patent/CN114859734B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture
    • Y02A40/25Greenhouse technology, e.g. cooling systems therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a greenhouse environment parameter optimization decision method based on an improved SAC algorithm, which comprises the following steps: s1, taking greenhouse state data, greenhouse environment parameter decision data and greenhouse output data as reinforcement learning elements of a SAC algorithm, and filling an experience buffer area in advance by using historical artificial planting experience data; s2, generating a greenhouse simulator for simulating a greenhouse planting process; s3, designing the quantity of the SAC algorithm criticizer neural networks, and designing an objective function of the SAC algorithm actor neural network to obtain an improved SAC algorithm; s4, generating new greenhouse environment parameter decision data by utilizing the improved SAC algorithm, inputting the new greenhouse environment parameter decision data into the greenhouse simulator for a new planting period, putting the data into an experience buffer zone, and updating parameters of the criticizer neural network and parameters of the actor neural network by utilizing the data of the experience buffer zone; and S5, repeatedly executing the step S4 until a plurality of planting periods are completed, and obtaining the neural network.

Description

Greenhouse environment parameter optimization decision method based on improved SAC algorithm
Technical Field
The invention relates to the field of greenhouse environment parameter optimization, in particular to a greenhouse environment parameter optimization decision method based on an improved SAC algorithm.
Background
The greenhouse climate is an environmental climate under a semi-closed building created by people by utilizing agricultural facilities such as a plastic greenhouse, a sunlight greenhouse, a multi-span greenhouse and the like. Has wide application in melon, fruit, vegetable, flower and aquaculture. Greenhouse climate is a microclimate environment and is mainly affected by factors such as radiation, temperature, humidity, carbon dioxide and the like. The microclimate environment in the greenhouse directly influences the growth and development conditions of crops, so that the final crop yield and economic benefits of the greenhouse are determined by the climate selection of the whole greenhouse planting period.
The selection of the set values of the climate environment parameters of the greenhouse is a very complex optimization decision problem due to the influence of the external environment, crop growth, energy cost and other factors. The current greenhouse environment parameter optimization method mainly comprises optimal control and model predictive control, and the control key point is that a relatively accurate model is utilized, and the temperature, carbon dioxide, artificial light supplementing size and the like in a crop planting period are optimized by taking the maximization of net profit of the greenhouse as a target. However, the methods are seriously dependent on the accuracy degree of the model, are not favorable for popularization to different types of greenhouse production and planting, and the optimization control algorithm using the model has high calculation complexity and low solving speed, and are not favorable for real-time decision of greenhouse environment parameter set values.
The invention aims at solving the problems existing in the prior art and designing a greenhouse environment parameter optimization decision method based on an improved SAC algorithm.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a greenhouse environment parameter optimization decision method based on an improved SAC algorithm, which can effectively solve the problems in the prior art.
The technical scheme of the invention is as follows:
a greenhouse environment parameter optimization decision method based on an improved SAC algorithm comprises the following steps:
S1, taking greenhouse state data, greenhouse environment parameter decision data and greenhouse output data as reinforcement learning elements of a SAC algorithm, and filling an experience buffer area in advance by using historical artificial planting experience data, wherein the historical artificial planting experience data comprises historical artificial greenhouse environment parameter decision data, historical greenhouse state data and historical greenhouse output data;
S2, generating a greenhouse simulator for simulating a greenhouse planting process;
S3, designing the quantity of the SAC algorithm criticizer neural networks, and designing an objective function of the SAC algorithm actor neural network to obtain an improved SAC algorithm;
s4, generating new greenhouse environment parameter decision data by utilizing the improved SAC algorithm, inputting the new greenhouse environment parameter decision data into the greenhouse simulator for a new planting period, simulating to obtain new greenhouse state data and new greenhouse output data, putting the new greenhouse environment parameter decision data, the new greenhouse state data and the new greenhouse output data into an experience buffer zone, and updating parameters of the criticizer neural network and parameters of an actor neural network by utilizing the experience buffer zone data;
And S5, repeatedly executing the step S4 until a plurality of planting periods are completed, obtaining a greenhouse environment parameter optimization decision neural network, and generating a greenhouse environment parameter optimization decision through the greenhouse environment parameter optimization decision neural network.
Further, the greenhouse state data includes greenhouse interior climate data and greenhouse interior crop growth number
One or more of the data, future climate forecast data; the greenhouse environment parameter decision data comprises one or more of temperature of day of the future, carbon dioxide concentration of day of the future, artificial light supplementing set values of day of the future, and artificial light supplementing set values of night of the future, carbon dioxide concentration of night of the future and night of the future;
The greenhouse output data are obtained through calculation according to the formula (1);
Greenhouse yield data = mature crop increment x market value-electricity consumption-carbon dioxide fertilizer consumption, equation (1).
Further, the historical artificial greenhouse environment parameter decisions are greenhouse day set values and night set values corresponding to seedling periods of crops, and greenhouse day set values and night set values corresponding to fruiting periods of the crops.
Further, the designing the number of SAC algorithm criticizer neural networks includes:
the quantity of the SAC algorithm criticizer neural networks is more than three, and each SAC algorithm criticizer neural network corresponds to one SAC algorithm target criticizer neural network.
Further, the objective function of designing the SAC algorithm actor neural network includes: designing a supervised learning function;
adding the supervised learning function into the actor neural network objective function of the SAC algorithm to obtain:
wherein E is mathematical expectation, s t is greenhouse current state data, D is experience buffer, a t is actor action output, pi is actor strategy for generating action, pi (a t|st) represents probability of actor strategy output action a t under state s t, sigma is weight coefficient, initializing to 1, Q (a i,st) represents evaluation of state-action value by criticizing person, k is supervised learning function weight coefficient, k takes value between 0.3 and 0.5, a expData represents historical artificial planting experience data, In order to indicate the function,Representing when (s t,at) comes from historical artificial planting experience dataThe function value is 1, otherwise 0.
Further, the greenhouse simulator comprises a greenhouse environment model for simulating greenhouse internal environment parameters through a mass balance equation and an energy balance equation, and a crop model for modeling according to photosynthesis, respiration, biomass distribution of crops and simulating growth and yield of crops under the greenhouse internal environment parameters.
Further, the greenhouse internal environment parameters include one or more of temperature, carbon dioxide concentration, illumination intensity.
Further, the updating the parameters of the criticizing neural network and the parameters of the actor neural network by using the experience buffer data comprises:
Randomly sampling part of historical artificial planting experience data, greenhouse environment parameter decision data and greenhouse state data and greenhouse output data corresponding to the greenhouse environment parameter decision data in the experience buffer zone data;
And updating parameters of the criticizer neural network and parameters of the actor neural network according to the historical artificial planting experience data, the greenhouse environment parameter decision data, greenhouse state data corresponding to the greenhouse environment parameter decision data and greenhouse output data.
Further, a gradient descent algorithm is adopted to update the parameter phi of the criticizing neural network and the parameter theta of the actor neural network.
Further, the step S4 is repeatedly executed until a plurality of planting periods are completed, and the obtaining of the greenhouse environment parameter optimization decision neural network includes:
and (4) repeatedly executing the step (S4) until a plurality of planting periods are completed, and obtaining the greenhouse environment parameter optimization decision neural network when the greenhouse output difference of 5 continuous greenhouse planting periods is less than 5%.
Accordingly, the present invention provides the following effects and/or advantages:
The invention utilizes the improved SAC algorithm to generate new greenhouse environment parameter decision data, inputs the new greenhouse environment parameter decision data into the greenhouse simulator to carry out a new planting period, simulates to obtain new greenhouse state data and new greenhouse output data, places the new greenhouse environment parameter decision data, the new greenhouse state data and the new greenhouse output data into an experience buffer zone, utilizes the experience buffer zone data to update the parameters of the criticizer neural network and the actor neural network, utilizes the improved SAC algorithm to interact with the greenhouse simulator, collects data and stores the data into the experience buffer zone, and utilizes the experience buffer zone data to update the parameters of the criticizer neural network and the parameters of the actor neural network until an economic and optimal climate setting strategy of the greenhouse is obtained. The training efficiency of the reinforcement learning algorithm in the early stage can be quickened and the data utilization rate of reinforcement learning can be improved by filling the artificial planting experience data into the experience buffer zone of the algorithm in advance and initializing the experience buffer zone in advance.
The improved SAC algorithm provided by the application has high training efficiency, meanwhile, the greenhouse planting profit of the improved SAC algorithm in the training process is always higher than that of the original SAC algorithm, and finally, the greenhouse planting profit is higher than that of the artificial strategy and the original SAC algorithm. The method provided by the application is feasible and superior to the original SAC algorithm, and can obtain the economic and optimal climate setting strategy of the greenhouse.
By increasing the number of the criticizer neural networks, the method reduces the problem of inaccurate criticizer value estimation caused by the advanced filling of the artificial experience data, improves the stability of the algorithm, and prevents the defect of rapid decline of the performance of the algorithm in the early stage of training. According to the invention, through improving the objective function of the original SAC algorithm actor neural network and adding the supervision learning item, the actor strategy is well initialized in the early training stage, and the actor strategy is prevented from sinking into local optimum.
The obtained greenhouse unit area profit of the improved SAC algorithm is higher than that of an artificial experience strategy and an original SAC algorithm strategy, and the superiority of the improved SAC algorithm applied to greenhouse climate set value decision is demonstrated.
It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to provide further explanation of the invention as claimed
Drawings
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a table diagram of artificial planting strategies.
FIG. 3 is a diagram of a greenhouse simulator.
FIG. 4 is a graph showing the change of greenhouse planting profits in the training process of the invention.
FIG. 5 is a graph of the calculated greenhouse environment parameter settings of the present invention.
FIG. 6 is a graph of final greenhouse production profit contrast of the present invention.
Detailed Description
For the purpose of facilitating understanding to those skilled in the art, the present invention will now be described in further detail with reference to the accompanying drawings: it should be understood that, in this embodiment, the steps mentioned in this embodiment may be performed sequentially or sequentially, or may be performed simultaneously or partially, unless specifically stated otherwise.
The reinforcement learning is an artificial intelligence decision algorithm based on data driving, wherein the SAC algorithm is one of the most advanced algorithms in the reinforcement learning at present, the SAC algorithm has strong optimization capability and high robustness, and can make real-time decisions according to state feedback information. However, the SAC algorithm requires a large amount of data to train to obtain an optimal control strategy, and has low data utilization efficiency, so that the SAC algorithm is unfavorable for the decision of the climate setting value of the greenhouse.
Accordingly, the applicant proposes the following method for optimization. In the present application, the term "SAC algorithm" refers to a flexible Actor-criticizer (SAC) algorithm. The SAC algorithm solves the problem of reinforcement learning of a discrete action space and a continuous action space, and is an off-policy reinforcement learning algorithm.
Referring to fig. 1, a greenhouse environment parameter optimization decision method based on an improved SAC algorithm, based on the SAC algorithm, comprises the following steps:
S1, taking greenhouse state data, greenhouse environment parameter decision data and greenhouse output data as reinforcement learning elements of a SAC algorithm, and filling an experience buffer area in advance by using historical artificial planting experience data, wherein the historical artificial planting experience data comprises historical artificial greenhouse environment parameter decision data, historical greenhouse state data and historical greenhouse output data;
In this embodiment, the SAC algorithm adopted may continuously perform policy evaluation, update the value function, and evaluate the policy, and simultaneously, through policy improvement, update the policy, and evaluate whether the policy is improved by using the value function of the previous step, so as to continuously evolve an optimal policy.
In this embodiment, the amount of manual experience data for filling the experience buffer in advance may be 1000 to 2000 sets of data pairs, and in this example, 1200 sets of data pairs are selected.
In the step, the historical artificial planting experience data is used for filling the experience buffer area in advance, wherein the historical artificial planting experience data can be set according to the artificial experience, or set according to the artificial experience, the climate in the greenhouse is set, and the historical greenhouse state data and the historical greenhouse output data corresponding to the set values. In this embodiment, the tomatoes are simulated, and it is understood that the application sets the climate set point of the greenhouse simulator through artificial experience or sets the climate set point in the greenhouse according to artificial experience, so that the state data generated in the greenhouse and the growth, result and other data of the tomatoes in the greenhouse are generated under the set point.
Specifically, the greenhouse state data comprises one or more of greenhouse interior climate data, greenhouse interior crop growth data and future climate forecast data; the greenhouse environment parameter decision data comprises one or more of temperature of day of the future, carbon dioxide concentration of day of the future, artificial light supplementing set values of day of the future, and artificial light supplementing set values of night of the future, carbon dioxide concentration of night of the future and night of the future;
The greenhouse output data are obtained through calculation according to the formula (1);
greenhouse yield data = mature crop increment x market value-electricity consumption-carbon dioxide fertilizer consumption, equation (1). Wherein, electric energy consumption = electric energy consumption amount = electric energy unit price, carbon dioxide fertilizer consumption = carbon dioxide fertilizer consumption amount = carbon dioxide fertilizer unit price. In this embodiment, the tomato simulator is used as the object, so that the tomato market value is 16 yuan/kg, the electricity fee is 0.503 yuan/kwh, and the carbon dioxide fertilizer is 330 yuan/ton.
Specifically, the historical artificial planting experience data are a greenhouse day set value and a night set value corresponding to seedling periods of crops, and a greenhouse day set value and a night set value corresponding to fruiting periods of the crops. The set value refers to temperature setting, carbon dioxide concentration setting and manual light supplement setting in the greenhouse. In this example, whether or not tomatoes grow out of fruits is defined as a dividing line, tomatoes are divided into a seedling stage and a fruiting stage, and this example defines 6:00-18:00 as daytime and the rest as nighttime. Referring to fig. 2, one of the historical artificial planting experience data may be: the set value of the white day in a greenhouse in the seedling stage of tomatoes is 24 ℃, the carbon dioxide concentration is 1200ppm, and the set value of artificial light supplementing is 0W/m2; setting the temperature at 20 ℃ at night in a greenhouse in the tomato seedling stage, the carbon dioxide concentration at 400ppm, and setting the artificial light supplementing setting at 0W/m < 2 >; setting the white day in a greenhouse in the tomato fruiting period at 24 ℃, and setting the carbon dioxide concentration to 1200ppm and the artificial light supplementing setting to 150W/m2; the night set value in the greenhouse in the tomato seedling stage is 24 ℃, the carbon dioxide concentration is 800ppm, and the artificial light supplementing set value is 150W/m2.
Referring to fig. 3, a greenhouse simulator is generated for simulating a greenhouse planting process, and the greenhouse simulator can simulate and generate corresponding greenhouse state data in a greenhouse and greenhouse output data of crops in the greenhouse according to input different greenhouse environment parameter decision data;
In the step, the greenhouse simulator is a greenhouse tomato planting simulator, mainly simulating the growth and development conditions of tomatoes in a greenhouse under different temperatures, carbon dioxide concentrations and light intensities, and setting a complete greenhouse planting period to 120 days. Greenhouse plant simulation of other agricultural species is also possible in other embodiments. According to the greenhouse simulator, according to the input greenhouse environment parameter decision data, tomatoes can be simulated to be planted according to the greenhouse environment parameter decision data, and the growth data of the tomatoes can be automatically generated according to the simulation situation of the tomatoes to be planted.
S3, designing the quantity of the SAC algorithm criticizer neural networks, and designing an objective function of the SAC algorithm actor neural network to obtain an improved SAC algorithm;
The SAC algorithm only comprises one actor neural network, and the quantity of SAC algorithm criticizer neural networks is three or more, and each SAC algorithm criticizer neural network corresponds to an objective function of the SAC algorithm actor neural network. Referring to fig. 1, in this embodiment, the number of the criticizer neural networks is three, and each of the three SAC algorithm criticizer neural networks corresponds to one SAC algorithm target criticizer neural network.
The modified SAC algorithm is shown in fig. 1. The experience buffer area is filled with artificial planting experience data in advance, the training efficiency of the reinforcement learning algorithm in the early stage can be improved by initializing the experience buffer area in advance, and the data utilization rate of reinforcement learning is improved.
S4, generating new greenhouse environment parameter decision data by utilizing the improved SAC algorithm, inputting the new greenhouse environment parameter decision data into the greenhouse simulator for a new planting period, simulating to obtain new greenhouse state data and new greenhouse output data, putting the new greenhouse environment parameter decision data, the new greenhouse state data and the new greenhouse output data into an experience buffer zone, and updating parameters of the criticizer neural network and parameters of an actor neural network by utilizing the experience buffer zone data; at this time, the experience buffer zone comprises new greenhouse environment parameter decision data, new greenhouse state data and new greenhouse output data, and historical artificial greenhouse parameter decision data, historical greenhouse state data and historical greenhouse output data.
And S5, repeatedly executing the step S4 until a plurality of planting periods are completed, obtaining a greenhouse environment parameter optimization decision neural network, and generating a greenhouse environment parameter optimization decision through the greenhouse environment parameter optimization decision neural network.
The method comprises the steps of interacting a greenhouse simulator with an improved SAC algorithm, collecting data and storing the data into an experience buffer area, and updating parameters of an criticizer neural network and parameters of an actor neural network by using the data of the experience buffer area until a climate setting strategy with optimal greenhouse economy is obtained.
Further, the objective function of designing the SAC algorithm actor neural network includes:
designing a supervised learning function; the supervised learning function is shown as the supervised learning function in the formula (2);
adding the supervised learning function into the actor neural network objective function of the SAC algorithm to obtain:
Formula (2);
wherein E is mathematical expectation, s t is greenhouse current state data, D is experience buffer, a t is actor action output, pi is actor strategy for generating action, pi (a t||st) represents probability of actor strategy output action a t under state s t, sigma is weight coefficient, initializing to 1, Q (a t,st) represents evaluation of state-action value by criticizing person, k is supervised learning function weight coefficient, k takes value between 0.3 and 0.5, a expData represents historical artificial planting experience data, In order to indicate the function,Representing when (s t,at) comes from historical artificial planting experience dataThe function value is 1, otherwise 0. In this embodiment, k takes a value of 0.3, and in other embodiments k may take a value of 0.4 or 0.5.
The step designs the supervised learning function in a form of minimizing the square error, so that the strategy of the actor can be biased to learn the artificial experience strategy in the initial stage of training.
The method reduces the problem of inaccurate estimation of the criticizer value caused by the advanced filling of the artificial experience data by increasing the number of the neural networks of the criticizers, improves the stability of the algorithm, and prevents the defect of rapid decline of the performance of the algorithm in the early stage of training. According to the invention, through improving the objective function of the original SAC algorithm actor neural network and adding the supervision learning item, the actor strategy is well initialized in the early training stage, and the actor strategy is prevented from sinking into local optimum.
In this embodiment, the original SAC objective function is a prior art, and can be obtained directly from the literature of Soft Actor-Critic Algorithms and Applications (Haarnoja et al, 29, jan, 2019).
In the step, in the process of improving the SAC actor neural network objective function, a supervision learning function is added to the original SAC actor objective function, so that the actor strategy is biased to a manual experience strategy in the initial stage of training, and a better initialization strategy is obtained.
Further, the greenhouse simulator comprises a greenhouse environment model and a crop model, wherein the greenhouse environment model is used for simulating greenhouse internal environment parameters through a mass balance equation and an energy balance equation, and the crop model is used for modeling according to photosynthesis, respiration and biomass distribution of crops and simulating the growth and the yield of the crops under the greenhouse internal environment parameters.
Further, the greenhouse internal environment parameters include one or more of temperature, carbon dioxide concentration, illumination intensity.
Further, the updating parameters of the criticizing and actor neural networks using the empirical buffer data includes:
Randomly sampling part of historical artificial planting experience data, greenhouse environment parameter decision data and greenhouse state data and greenhouse output data corresponding to the greenhouse environment parameter decision data in the experience buffer zone data;
And according to the historical artificial planting experience data, the greenhouse environment parameter decision data, greenhouse state data and greenhouse output data corresponding to the greenhouse environment parameter decision data, updating the criticizing person neural network parameter phi and the actor neural network parameter theta by adopting a gradient descent algorithm.
In the step, small batches of data are randomly sampled in an experience buffer area each time, the data comprise historical manual planting experience data and data obtained through interaction of the improved SAC algorithm, 128 groups of data pairs are randomly extracted at one time, network parameters are updated by adopting a gradient descent algorithm, and when greenhouse output data areas in each planting period are close and the quantity of greenhouse output data is large, training is judged to be completed.
The neural network is a mathematical model composed of a plurality of parameters, is generally used for fitting various nonlinear functions, and is generally used for referring to all parameters in a certain neural network by using theta or phi to optimize the parameters in the neural network to make the output value of the parameters be the output wanted by the application, the objective function is the optimized direction, and the objective function is the mathematical abstraction of the actual task. The application aims to make decision output of the actor neural network as a greenhouse environment parameter set value so as to maximize greenhouse production profit.
Further, the step S4 is repeatedly executed until a plurality of planting periods are completed, and the obtaining of the greenhouse environment parameter optimization decision neural network includes:
and (4) repeatedly executing the step (S4) until a plurality of planting periods are completed, and obtaining the greenhouse environment parameter optimization decision neural network when the greenhouse output difference of 5 continuous greenhouse planting periods is less than 5%.
Experimental data
Referring to fig. 5, fig. 5 is the greenhouse environment parameter decision data obtained after the improved SAC algorithm provided by the present application is optimized, including daytime temperature, carbon dioxide concentration, artificial light supplement setting, nighttime temperature, carbon dioxide concentration, and artificial light supplement setting in the whole tomato growth period (120 days).
Referring to fig. 4, fig. 4 shows greenhouse output data generated by different planting periods in the training process of the improved SAC algorithm and the existing SAC algorithm, so that the training efficiency of the improved SAC algorithm is high, and meanwhile, the greenhouse planting profit of the improved SAC algorithm in the training process is always higher than that of the original SAC algorithm. The method provided by the application is feasible and superior to the original SAC algorithm, and can obtain the economic and optimal climate setting strategy of the greenhouse.
The greenhouse environment parameter decision data obtained after optimization according to the improved SAC algorithm of fig. 5 is applied to a planting simulator, resulting in the graph of fig. 6. Referring to fig. 6, fig. 6 is a final greenhouse production profit comparison chart, the obtained greenhouse unit area profit of the improved SAC algorithm is 50.48 yuan, the greenhouse unit area profit obtained by the artificial experience strategy is 25.18 yuan, the greenhouse unit area profit obtained by the original SAC algorithm is 35.32 yuan, compared with the original SAC strategy, the profit income is improved by 42.3%, compared with the artificial strategy, the profit is improved by 100.5%, the obtained greenhouse unit area profit of the improved SAC algorithm is higher than the artificial experience strategy, and the original SAC algorithm strategy, so that the superiority of the improved SAC algorithm applied to greenhouse environment parameter set value decision is illustrated.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms should not be understood as necessarily being directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Claims (7)

1. A greenhouse environment parameter optimization decision method based on an improved SAC algorithm is characterized in that: based on the SAC algorithm, comprising the following steps:
S1, taking greenhouse state data, greenhouse environment parameter decision data and greenhouse output data as reinforcement learning elements of a SAC algorithm, and filling an experience buffer area in advance by using historical artificial planting experience data, wherein the historical artificial planting experience data comprises historical artificial greenhouse environment parameter decision data, historical greenhouse state data and historical greenhouse output data;
S2, generating a greenhouse simulator for simulating a greenhouse planting process;
S3, designing the quantity of the SAC algorithm criticizer neural networks, and designing an objective function of the SAC algorithm actor neural network to obtain an improved SAC algorithm;
s4, generating new greenhouse environment parameter decision data by utilizing the improved SAC algorithm, inputting the new greenhouse environment parameter decision data into the greenhouse simulator for a new planting period, simulating to obtain new greenhouse state data and new greenhouse output data, putting the new greenhouse environment parameter decision data, the new greenhouse state data and the new greenhouse output data into an experience buffer zone, and updating parameters of the criticizer neural network and parameters of an actor neural network by utilizing the experience buffer zone data;
S5, repeatedly executing the step S4 until a plurality of planting periods are completed, obtaining a greenhouse environment parameter optimization decision neural network, and generating a greenhouse environment parameter optimization decision through the greenhouse environment parameter optimization decision neural network;
The number of the neural networks of the criticizer of the SAC algorithm comprises:
Designing more than three SAC algorithm criticizer neural networks, wherein each SAC algorithm criticizer neural network corresponds to one SAC algorithm target criticizer neural network;
The objective function of designing the SAC algorithm actor neural network comprises the following steps:
Designing a supervised learning function;
adding the supervised learning function into the actor neural network objective function of the SAC algorithm to obtain:
Wherein E is mathematical expectation, s t is greenhouse current state data, D is experience buffer, a t is actor action output, pi is actor strategy for generating action, pi (a t|st) represents probability of actor strategy output action a t under state s t, sigma is weight coefficient, initializing to 1, Q (a t,st) represents evaluation of state-action value by criticizing person, k is supervised learning function weight coefficient, k takes value between 0.3 and 0.5, a expData represents historical artificial planting experience data, In order to indicate the function,Representing when (s t,at) comes from historical artificial planting experience dataThe function value is 1, otherwise 0.
2. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 1, wherein the method comprises the following steps: the greenhouse state data comprises one or more of greenhouse internal climate data, greenhouse internal crop growth data and future climate forecast data; the greenhouse environment parameter decision data comprises one or more of temperature of day of the future, carbon dioxide concentration of day of the future, artificial light supplementing set values of day of the future, and artificial light supplementing set values of night of the future, carbon dioxide concentration of night of the future and night of the future;
The greenhouse output data are obtained through calculation according to the formula (1);
Greenhouse yield data = mature crop increment x market value-electricity consumption-carbon dioxide fertilizer consumption, equation (1).
3. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 1, wherein the method comprises the following steps: the historical artificial greenhouse environment parameter decision data are greenhouse day set values and night set values corresponding to seedling periods of crops, and greenhouse day set values and night set values corresponding to fruiting periods of the crops.
4. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 1, wherein the method comprises the following steps: the greenhouse simulator comprises a greenhouse environment model and a crop model, wherein the greenhouse environment model is used for simulating greenhouse internal environment parameters through a mass balance equation and an energy balance equation, and the crop model is used for modeling according to photosynthesis, respiration and biomass distribution of crops and simulating the growth and the yield of the crops under the greenhouse internal environment parameters.
5. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 4, wherein the method comprises the following steps: the greenhouse internal environment parameters include one or more of temperature, carbon dioxide concentration, illumination intensity.
6. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 1, wherein the method comprises the following steps: the updating parameters of the criticizing neural network and parameters of the actor neural network by using the experience buffer data comprises:
Randomly sampling part of historical artificial planting experience data, greenhouse environment parameter decision data and greenhouse state data and greenhouse output data corresponding to the greenhouse environment parameter decision data in the experience buffer zone data;
And updating parameters of the criticizer neural network and parameters of the actor neural network according to the historical artificial planting experience data, the greenhouse environment parameter decision data, greenhouse state data corresponding to the greenhouse environment parameter decision data and greenhouse output data.
7. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 1, wherein the method comprises the following steps: and step S4 is repeatedly executed until a plurality of planting periods are completed, and the obtaining of the greenhouse environment parameter optimization decision neural network comprises the following steps:
and (4) repeatedly executing the step (S4) until a plurality of planting periods are completed, and obtaining the greenhouse environment parameter optimization decision neural network when the greenhouse output difference of 5 continuous greenhouse planting periods is less than 5%.
CN202210675362.5A 2022-06-15 2022-06-15 Greenhouse environment parameter optimization decision method based on improved SAC algorithm Active CN114859734B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210675362.5A CN114859734B (en) 2022-06-15 2022-06-15 Greenhouse environment parameter optimization decision method based on improved SAC algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210675362.5A CN114859734B (en) 2022-06-15 2022-06-15 Greenhouse environment parameter optimization decision method based on improved SAC algorithm

Publications (2)

Publication Number Publication Date
CN114859734A CN114859734A (en) 2022-08-05
CN114859734B true CN114859734B (en) 2024-06-07

Family

ID=82624122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210675362.5A Active CN114859734B (en) 2022-06-15 2022-06-15 Greenhouse environment parameter optimization decision method based on improved SAC algorithm

Country Status (1)

Country Link
CN (1) CN114859734B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115453868B (en) * 2022-08-31 2024-04-12 中国农业大学 Full-growth-period light intensity regulation and control method based on tomato light response difference characteristics
CN116452358B (en) * 2023-03-07 2024-06-07 东莞市众冠网络科技有限公司 Intelligent agriculture management system based on Internet of things

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110244559A (en) * 2019-05-21 2019-09-17 中国农业大学 A kind of greenhouse intelligent regulation method based on agriculture solar term empirical data
CN110647839A (en) * 2019-09-18 2020-01-03 深圳信息职业技术学院 Method and device for generating automatic driving strategy and computer readable storage medium
CN111008449A (en) * 2019-04-26 2020-04-14 成都蓉奥科技有限公司 Acceleration method for deep reinforcement learning deduction decision training in battlefield simulation environment
CN113050412A (en) * 2021-03-09 2021-06-29 厦门大学 Generation method of batch reaction kettle control method based on iterative learning control
WO2022052406A1 (en) * 2020-09-08 2022-03-17 苏州浪潮智能科技有限公司 Automatic driving training method, apparatus and device, and medium
CN114217524A (en) * 2021-11-18 2022-03-22 国网天津市电力公司电力科学研究院 Power grid real-time self-adaptive decision-making method based on deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020190460A1 (en) * 2019-03-20 2020-09-24 Sony Corporation Reinforcement learning through a double actor critic algorithm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008449A (en) * 2019-04-26 2020-04-14 成都蓉奥科技有限公司 Acceleration method for deep reinforcement learning deduction decision training in battlefield simulation environment
CN110244559A (en) * 2019-05-21 2019-09-17 中国农业大学 A kind of greenhouse intelligent regulation method based on agriculture solar term empirical data
CN110647839A (en) * 2019-09-18 2020-01-03 深圳信息职业技术学院 Method and device for generating automatic driving strategy and computer readable storage medium
WO2022052406A1 (en) * 2020-09-08 2022-03-17 苏州浪潮智能科技有限公司 Automatic driving training method, apparatus and device, and medium
CN113050412A (en) * 2021-03-09 2021-06-29 厦门大学 Generation method of batch reaction kettle control method based on iterative learning control
CN114217524A (en) * 2021-11-18 2022-03-22 国网天津市电力公司电力科学研究院 Power grid real-time self-adaptive decision-making method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
催化裂化装置的人工神经网络建模;邓毅;江青茵;曹志凯;师佳;周华;;计算机与应用化学;20110928(第09期);全文 *
基于参数批次调整模型的间歇精馏最小时间优化策略;吴微;师佳;周华;曹志凯;江青茵;;厦门大学学报(自然科学版);20130328(第02期);全文 *

Also Published As

Publication number Publication date
CN114859734A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN114859734B (en) Greenhouse environment parameter optimization decision method based on improved SAC algorithm
del Sagrado et al. Bayesian networks for greenhouse temperature control
CN108781926B (en) Greenhouse irrigation system and method based on neural network prediction
CN110119766B (en) Green pepper greenhouse temperature intelligent early warning device of multi-combination intelligent model
US20220248616A1 (en) Irrigation control with deep reinforcement learning and smart scheduling
CN101162384A (en) Artificial intelligence plant growth surroundings regulate and control expert decision-making system
CN101315544A (en) Greenhouse intelligent control method
CN103218669A (en) Intelligent live fish cultivation water quality comprehensive forecasting method
Wang et al. Deep reinforcement learning for greenhouse climate control
CN104376142A (en) Rice plant type quantitative control method integrating crop virtual growth model
CN117575094B (en) Crop yield prediction and optimization method and device based on digital twin
CN110119086A (en) A kind of tomato greenhouse environmental parameter intelligent monitoring device based on ANFIS neural network
Shamshiri et al. Adaptive management framework for evaluating and adjusting microclimate parameters in tropical greenhouse crop production systems
Cao et al. igrow: A smart agriculture solution to autonomous greenhouse control
Sánchez-Molina et al. Support system for decision making in the management of the greenhouse environmental based on growth model for sweet pepper
CN108181814A (en) Plant growth environment monitoring method and device, computer storage media
Avigal et al. Simulating polyculture farming to learn automation policies for plant diversity and precision irrigation
Li Prospects of artificial intelligence applications in future agriculture
An et al. A simulator-based planning framework for optimizing autonomous greenhouse control strategy
CN105913326B (en) Constrain the Crop growing stage model Cultivar parameter optimization method of sex knowledge and the tactful Genetic Algorithm Fusion of elite individual
Skobelev et al. Further advances in models and methods for digital twins of plants
Marinković et al. Data mining approach for predictive modeling of agricultural yield data
Gao et al. Application of Artificial Intelligence System Design Based on Genetic Algorithm In Horticultural Cultivation
Martin-Clouaire et al. A survey of computer-based approaches for greenhouse climate management
Bobade et al. Design of Smart Irrigation System Based on MLA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant