CN114859734B

CN114859734B - Greenhouse environment parameter optimization decision method based on improved SAC algorithm

Info

Publication number: CN114859734B
Application number: CN202210675362.5A
Authority: CN
Inventors: 师佳; 文柯超; 徐星海; 谢惠民; 洪文晶
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2024-06-07
Anticipated expiration: 2042-06-15
Also published as: CN114859734A

Abstract

The invention relates to a greenhouse environment parameter optimization decision method based on an improved SAC algorithm, which comprises the following steps: s1, taking greenhouse state data, greenhouse environment parameter decision data and greenhouse output data as reinforcement learning elements of a SAC algorithm, and filling an experience buffer area in advance by using historical artificial planting experience data; s2, generating a greenhouse simulator for simulating a greenhouse planting process; s3, designing the quantity of the SAC algorithm criticizer neural networks, and designing an objective function of the SAC algorithm actor neural network to obtain an improved SAC algorithm; s4, generating new greenhouse environment parameter decision data by utilizing the improved SAC algorithm, inputting the new greenhouse environment parameter decision data into the greenhouse simulator for a new planting period, putting the data into an experience buffer zone, and updating parameters of the criticizer neural network and parameters of the actor neural network by utilizing the data of the experience buffer zone; and S5, repeatedly executing the step S4 until a plurality of planting periods are completed, and obtaining the neural network.

Description

Greenhouse environment parameter optimization decision method based on improved SAC algorithm

Technical Field

The invention relates to the field of greenhouse environment parameter optimization, in particular to a greenhouse environment parameter optimization decision method based on an improved SAC algorithm.

Background

The greenhouse climate is an environmental climate under a semi-closed building created by people by utilizing agricultural facilities such as a plastic greenhouse, a sunlight greenhouse, a multi-span greenhouse and the like. Has wide application in melon, fruit, vegetable, flower and aquaculture. Greenhouse climate is a microclimate environment and is mainly affected by factors such as radiation, temperature, humidity, carbon dioxide and the like. The microclimate environment in the greenhouse directly influences the growth and development conditions of crops, so that the final crop yield and economic benefits of the greenhouse are determined by the climate selection of the whole greenhouse planting period.

The selection of the set values of the climate environment parameters of the greenhouse is a very complex optimization decision problem due to the influence of the external environment, crop growth, energy cost and other factors. The current greenhouse environment parameter optimization method mainly comprises optimal control and model predictive control, and the control key point is that a relatively accurate model is utilized, and the temperature, carbon dioxide, artificial light supplementing size and the like in a crop planting period are optimized by taking the maximization of net profit of the greenhouse as a target. However, the methods are seriously dependent on the accuracy degree of the model, are not favorable for popularization to different types of greenhouse production and planting, and the optimization control algorithm using the model has high calculation complexity and low solving speed, and are not favorable for real-time decision of greenhouse environment parameter set values.

The invention aims at solving the problems existing in the prior art and designing a greenhouse environment parameter optimization decision method based on an improved SAC algorithm.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide a greenhouse environment parameter optimization decision method based on an improved SAC algorithm, which can effectively solve the problems in the prior art.

The technical scheme of the invention is as follows:

a greenhouse environment parameter optimization decision method based on an improved SAC algorithm comprises the following steps:

S1, taking greenhouse state data, greenhouse environment parameter decision data and greenhouse output data as reinforcement learning elements of a SAC algorithm, and filling an experience buffer area in advance by using historical artificial planting experience data, wherein the historical artificial planting experience data comprises historical artificial greenhouse environment parameter decision data, historical greenhouse state data and historical greenhouse output data;

S2, generating a greenhouse simulator for simulating a greenhouse planting process;

S3, designing the quantity of the SAC algorithm criticizer neural networks, and designing an objective function of the SAC algorithm actor neural network to obtain an improved SAC algorithm;

s4, generating new greenhouse environment parameter decision data by utilizing the improved SAC algorithm, inputting the new greenhouse environment parameter decision data into the greenhouse simulator for a new planting period, simulating to obtain new greenhouse state data and new greenhouse output data, putting the new greenhouse environment parameter decision data, the new greenhouse state data and the new greenhouse output data into an experience buffer zone, and updating parameters of the criticizer neural network and parameters of an actor neural network by utilizing the experience buffer zone data;

And S5, repeatedly executing the step S4 until a plurality of planting periods are completed, obtaining a greenhouse environment parameter optimization decision neural network, and generating a greenhouse environment parameter optimization decision through the greenhouse environment parameter optimization decision neural network.

Further, the greenhouse state data includes greenhouse interior climate data and greenhouse interior crop growth number

One or more of the data, future climate forecast data; the greenhouse environment parameter decision data comprises one or more of temperature of day of the future, carbon dioxide concentration of day of the future, artificial light supplementing set values of day of the future, and artificial light supplementing set values of night of the future, carbon dioxide concentration of night of the future and night of the future;

The greenhouse output data are obtained through calculation according to the formula (1);

Greenhouse yield data = mature crop increment x market value-electricity consumption-carbon dioxide fertilizer consumption, equation (1).

Further, the historical artificial greenhouse environment parameter decisions are greenhouse day set values and night set values corresponding to seedling periods of crops, and greenhouse day set values and night set values corresponding to fruiting periods of the crops.

Further, the designing the number of SAC algorithm criticizer neural networks includes:

the quantity of the SAC algorithm criticizer neural networks is more than three, and each SAC algorithm criticizer neural network corresponds to one SAC algorithm target criticizer neural network.

Further, the objective function of designing the SAC algorithm actor neural network includes: designing a supervised learning function;

adding the supervised learning function into the actor neural network objective function of the SAC algorithm to obtain:

wherein E is mathematical expectation, s _t is greenhouse current state data, D is experience buffer, a _t is actor action output, pi is actor strategy for generating action, pi (a _t|s_t) represents probability of actor strategy output action a _t under state s _t, sigma is weight coefficient, initializing to 1, Q (a _i,s_t) represents evaluation of state-action value by criticizing person, k is supervised learning function weight coefficient, k takes value between 0.3 and 0.5, a _expData represents historical artificial planting experience data, In order to indicate the function,Representing when (s _t,a_t) comes from historical artificial planting experience dataThe function value is 1, otherwise 0.

Further, the greenhouse simulator comprises a greenhouse environment model for simulating greenhouse internal environment parameters through a mass balance equation and an energy balance equation, and a crop model for modeling according to photosynthesis, respiration, biomass distribution of crops and simulating growth and yield of crops under the greenhouse internal environment parameters.

Further, the greenhouse internal environment parameters include one or more of temperature, carbon dioxide concentration, illumination intensity.

Further, the updating the parameters of the criticizing neural network and the parameters of the actor neural network by using the experience buffer data comprises:

Randomly sampling part of historical artificial planting experience data, greenhouse environment parameter decision data and greenhouse state data and greenhouse output data corresponding to the greenhouse environment parameter decision data in the experience buffer zone data;

And updating parameters of the criticizer neural network and parameters of the actor neural network according to the historical artificial planting experience data, the greenhouse environment parameter decision data, greenhouse state data corresponding to the greenhouse environment parameter decision data and greenhouse output data.

Further, a gradient descent algorithm is adopted to update the parameter phi of the criticizing neural network and the parameter theta of the actor neural network.

Further, the step S4 is repeatedly executed until a plurality of planting periods are completed, and the obtaining of the greenhouse environment parameter optimization decision neural network includes:

and (4) repeatedly executing the step (S4) until a plurality of planting periods are completed, and obtaining the greenhouse environment parameter optimization decision neural network when the greenhouse output difference of 5 continuous greenhouse planting periods is less than 5%.

Accordingly, the present invention provides the following effects and/or advantages:

The invention utilizes the improved SAC algorithm to generate new greenhouse environment parameter decision data, inputs the new greenhouse environment parameter decision data into the greenhouse simulator to carry out a new planting period, simulates to obtain new greenhouse state data and new greenhouse output data, places the new greenhouse environment parameter decision data, the new greenhouse state data and the new greenhouse output data into an experience buffer zone, utilizes the experience buffer zone data to update the parameters of the criticizer neural network and the actor neural network, utilizes the improved SAC algorithm to interact with the greenhouse simulator, collects data and stores the data into the experience buffer zone, and utilizes the experience buffer zone data to update the parameters of the criticizer neural network and the parameters of the actor neural network until an economic and optimal climate setting strategy of the greenhouse is obtained. The training efficiency of the reinforcement learning algorithm in the early stage can be quickened and the data utilization rate of reinforcement learning can be improved by filling the artificial planting experience data into the experience buffer zone of the algorithm in advance and initializing the experience buffer zone in advance.

The improved SAC algorithm provided by the application has high training efficiency, meanwhile, the greenhouse planting profit of the improved SAC algorithm in the training process is always higher than that of the original SAC algorithm, and finally, the greenhouse planting profit is higher than that of the artificial strategy and the original SAC algorithm. The method provided by the application is feasible and superior to the original SAC algorithm, and can obtain the economic and optimal climate setting strategy of the greenhouse.

By increasing the number of the criticizer neural networks, the method reduces the problem of inaccurate criticizer value estimation caused by the advanced filling of the artificial experience data, improves the stability of the algorithm, and prevents the defect of rapid decline of the performance of the algorithm in the early stage of training. According to the invention, through improving the objective function of the original SAC algorithm actor neural network and adding the supervision learning item, the actor strategy is well initialized in the early training stage, and the actor strategy is prevented from sinking into local optimum.

The obtained greenhouse unit area profit of the improved SAC algorithm is higher than that of an artificial experience strategy and an original SAC algorithm strategy, and the superiority of the improved SAC algorithm applied to greenhouse climate set value decision is demonstrated.

It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to provide further explanation of the invention as claimed

Drawings

FIG. 1 is a schematic flow chart of the present invention.

FIG. 2 is a table diagram of artificial planting strategies.

FIG. 3 is a diagram of a greenhouse simulator.

FIG. 4 is a graph showing the change of greenhouse planting profits in the training process of the invention.

FIG. 5 is a graph of the calculated greenhouse environment parameter settings of the present invention.

FIG. 6 is a graph of final greenhouse production profit contrast of the present invention.

Detailed Description

For the purpose of facilitating understanding to those skilled in the art, the present invention will now be described in further detail with reference to the accompanying drawings: it should be understood that, in this embodiment, the steps mentioned in this embodiment may be performed sequentially or sequentially, or may be performed simultaneously or partially, unless specifically stated otherwise.

The reinforcement learning is an artificial intelligence decision algorithm based on data driving, wherein the SAC algorithm is one of the most advanced algorithms in the reinforcement learning at present, the SAC algorithm has strong optimization capability and high robustness, and can make real-time decisions according to state feedback information. However, the SAC algorithm requires a large amount of data to train to obtain an optimal control strategy, and has low data utilization efficiency, so that the SAC algorithm is unfavorable for the decision of the climate setting value of the greenhouse.

Accordingly, the applicant proposes the following method for optimization. In the present application, the term "SAC algorithm" refers to a flexible Actor-criticizer (SAC) algorithm. The SAC algorithm solves the problem of reinforcement learning of a discrete action space and a continuous action space, and is an off-policy reinforcement learning algorithm.

Referring to fig. 1, a greenhouse environment parameter optimization decision method based on an improved SAC algorithm, based on the SAC algorithm, comprises the following steps:

In this embodiment, the SAC algorithm adopted may continuously perform policy evaluation, update the value function, and evaluate the policy, and simultaneously, through policy improvement, update the policy, and evaluate whether the policy is improved by using the value function of the previous step, so as to continuously evolve an optimal policy.

In this embodiment, the amount of manual experience data for filling the experience buffer in advance may be 1000 to 2000 sets of data pairs, and in this example, 1200 sets of data pairs are selected.

In the step, the historical artificial planting experience data is used for filling the experience buffer area in advance, wherein the historical artificial planting experience data can be set according to the artificial experience, or set according to the artificial experience, the climate in the greenhouse is set, and the historical greenhouse state data and the historical greenhouse output data corresponding to the set values. In this embodiment, the tomatoes are simulated, and it is understood that the application sets the climate set point of the greenhouse simulator through artificial experience or sets the climate set point in the greenhouse according to artificial experience, so that the state data generated in the greenhouse and the growth, result and other data of the tomatoes in the greenhouse are generated under the set point.

Specifically, the greenhouse state data comprises one or more of greenhouse interior climate data, greenhouse interior crop growth data and future climate forecast data; the greenhouse environment parameter decision data comprises one or more of temperature of day of the future, carbon dioxide concentration of day of the future, artificial light supplementing set values of day of the future, and artificial light supplementing set values of night of the future, carbon dioxide concentration of night of the future and night of the future;

greenhouse yield data = mature crop increment x market value-electricity consumption-carbon dioxide fertilizer consumption, equation (1). Wherein, electric energy consumption = electric energy consumption amount = electric energy unit price, carbon dioxide fertilizer consumption = carbon dioxide fertilizer consumption amount = carbon dioxide fertilizer unit price. In this embodiment, the tomato simulator is used as the object, so that the tomato market value is 16 yuan/kg, the electricity fee is 0.503 yuan/kwh, and the carbon dioxide fertilizer is 330 yuan/ton.

Specifically, the historical artificial planting experience data are a greenhouse day set value and a night set value corresponding to seedling periods of crops, and a greenhouse day set value and a night set value corresponding to fruiting periods of the crops. The set value refers to temperature setting, carbon dioxide concentration setting and manual light supplement setting in the greenhouse. In this example, whether or not tomatoes grow out of fruits is defined as a dividing line, tomatoes are divided into a seedling stage and a fruiting stage, and this example defines 6:00-18:00 as daytime and the rest as nighttime. Referring to fig. 2, one of the historical artificial planting experience data may be: the set value of the white day in a greenhouse in the seedling stage of tomatoes is 24 ℃, the carbon dioxide concentration is 1200ppm, and the set value of artificial light supplementing is 0W/m2; setting the temperature at 20 ℃ at night in a greenhouse in the tomato seedling stage, the carbon dioxide concentration at 400ppm, and setting the artificial light supplementing setting at 0W/m < 2 >; setting the white day in a greenhouse in the tomato fruiting period at 24 ℃, and setting the carbon dioxide concentration to 1200ppm and the artificial light supplementing setting to 150W/m2; the night set value in the greenhouse in the tomato seedling stage is 24 ℃, the carbon dioxide concentration is 800ppm, and the artificial light supplementing set value is 150W/m2.

Referring to fig. 3, a greenhouse simulator is generated for simulating a greenhouse planting process, and the greenhouse simulator can simulate and generate corresponding greenhouse state data in a greenhouse and greenhouse output data of crops in the greenhouse according to input different greenhouse environment parameter decision data;

In the step, the greenhouse simulator is a greenhouse tomato planting simulator, mainly simulating the growth and development conditions of tomatoes in a greenhouse under different temperatures, carbon dioxide concentrations and light intensities, and setting a complete greenhouse planting period to 120 days. Greenhouse plant simulation of other agricultural species is also possible in other embodiments. According to the greenhouse simulator, according to the input greenhouse environment parameter decision data, tomatoes can be simulated to be planted according to the greenhouse environment parameter decision data, and the growth data of the tomatoes can be automatically generated according to the simulation situation of the tomatoes to be planted.

The SAC algorithm only comprises one actor neural network, and the quantity of SAC algorithm criticizer neural networks is three or more, and each SAC algorithm criticizer neural network corresponds to an objective function of the SAC algorithm actor neural network. Referring to fig. 1, in this embodiment, the number of the criticizer neural networks is three, and each of the three SAC algorithm criticizer neural networks corresponds to one SAC algorithm target criticizer neural network.

The modified SAC algorithm is shown in fig. 1. The experience buffer area is filled with artificial planting experience data in advance, the training efficiency of the reinforcement learning algorithm in the early stage can be improved by initializing the experience buffer area in advance, and the data utilization rate of reinforcement learning is improved.

S4, generating new greenhouse environment parameter decision data by utilizing the improved SAC algorithm, inputting the new greenhouse environment parameter decision data into the greenhouse simulator for a new planting period, simulating to obtain new greenhouse state data and new greenhouse output data, putting the new greenhouse environment parameter decision data, the new greenhouse state data and the new greenhouse output data into an experience buffer zone, and updating parameters of the criticizer neural network and parameters of an actor neural network by utilizing the experience buffer zone data; at this time, the experience buffer zone comprises new greenhouse environment parameter decision data, new greenhouse state data and new greenhouse output data, and historical artificial greenhouse parameter decision data, historical greenhouse state data and historical greenhouse output data.

The method comprises the steps of interacting a greenhouse simulator with an improved SAC algorithm, collecting data and storing the data into an experience buffer area, and updating parameters of an criticizer neural network and parameters of an actor neural network by using the data of the experience buffer area until a climate setting strategy with optimal greenhouse economy is obtained.

Further, the objective function of designing the SAC algorithm actor neural network includes:

designing a supervised learning function; the supervised learning function is shown as the supervised learning function in the formula (2);

Formula (2);

wherein E is mathematical expectation, s _t is greenhouse current state data, D is experience buffer, a _t is actor action output, pi is actor strategy for generating action, pi (a _t||s_t) represents probability of actor strategy output action a _t under state s _t, sigma is weight coefficient, initializing to 1, Q (a _t,s_t) represents evaluation of state-action value by criticizing person, k is supervised learning function weight coefficient, k takes value between 0.3 and 0.5, a _expData represents historical artificial planting experience data, In order to indicate the function,Representing when (s _t,a_t) comes from historical artificial planting experience dataThe function value is 1, otherwise 0. In this embodiment, k takes a value of 0.3, and in other embodiments k may take a value of 0.4 or 0.5.

The step designs the supervised learning function in a form of minimizing the square error, so that the strategy of the actor can be biased to learn the artificial experience strategy in the initial stage of training.

The method reduces the problem of inaccurate estimation of the criticizer value caused by the advanced filling of the artificial experience data by increasing the number of the neural networks of the criticizers, improves the stability of the algorithm, and prevents the defect of rapid decline of the performance of the algorithm in the early stage of training. According to the invention, through improving the objective function of the original SAC algorithm actor neural network and adding the supervision learning item, the actor strategy is well initialized in the early training stage, and the actor strategy is prevented from sinking into local optimum.

In this embodiment, the original SAC objective function is a prior art, and can be obtained directly from the literature of Soft Actor-Critic Algorithms and Applications (Haarnoja et al, 29, jan, 2019).

In the step, in the process of improving the SAC actor neural network objective function, a supervision learning function is added to the original SAC actor objective function, so that the actor strategy is biased to a manual experience strategy in the initial stage of training, and a better initialization strategy is obtained.

Further, the greenhouse simulator comprises a greenhouse environment model and a crop model, wherein the greenhouse environment model is used for simulating greenhouse internal environment parameters through a mass balance equation and an energy balance equation, and the crop model is used for modeling according to photosynthesis, respiration and biomass distribution of crops and simulating the growth and the yield of the crops under the greenhouse internal environment parameters.

Further, the updating parameters of the criticizing and actor neural networks using the empirical buffer data includes:

And according to the historical artificial planting experience data, the greenhouse environment parameter decision data, greenhouse state data and greenhouse output data corresponding to the greenhouse environment parameter decision data, updating the criticizing person neural network parameter phi and the actor neural network parameter theta by adopting a gradient descent algorithm.

In the step, small batches of data are randomly sampled in an experience buffer area each time, the data comprise historical manual planting experience data and data obtained through interaction of the improved SAC algorithm, 128 groups of data pairs are randomly extracted at one time, network parameters are updated by adopting a gradient descent algorithm, and when greenhouse output data areas in each planting period are close and the quantity of greenhouse output data is large, training is judged to be completed.

The neural network is a mathematical model composed of a plurality of parameters, is generally used for fitting various nonlinear functions, and is generally used for referring to all parameters in a certain neural network by using theta or phi to optimize the parameters in the neural network to make the output value of the parameters be the output wanted by the application, the objective function is the optimized direction, and the objective function is the mathematical abstraction of the actual task. The application aims to make decision output of the actor neural network as a greenhouse environment parameter set value so as to maximize greenhouse production profit.

Experimental data

Referring to fig. 5, fig. 5 is the greenhouse environment parameter decision data obtained after the improved SAC algorithm provided by the present application is optimized, including daytime temperature, carbon dioxide concentration, artificial light supplement setting, nighttime temperature, carbon dioxide concentration, and artificial light supplement setting in the whole tomato growth period (120 days).

Referring to fig. 4, fig. 4 shows greenhouse output data generated by different planting periods in the training process of the improved SAC algorithm and the existing SAC algorithm, so that the training efficiency of the improved SAC algorithm is high, and meanwhile, the greenhouse planting profit of the improved SAC algorithm in the training process is always higher than that of the original SAC algorithm. The method provided by the application is feasible and superior to the original SAC algorithm, and can obtain the economic and optimal climate setting strategy of the greenhouse.

The greenhouse environment parameter decision data obtained after optimization according to the improved SAC algorithm of fig. 5 is applied to a planting simulator, resulting in the graph of fig. 6. Referring to fig. 6, fig. 6 is a final greenhouse production profit comparison chart, the obtained greenhouse unit area profit of the improved SAC algorithm is 50.48 yuan, the greenhouse unit area profit obtained by the artificial experience strategy is 25.18 yuan, the greenhouse unit area profit obtained by the original SAC algorithm is 35.32 yuan, compared with the original SAC strategy, the profit income is improved by 42.3%, compared with the artificial strategy, the profit is improved by 100.5%, the obtained greenhouse unit area profit of the improved SAC algorithm is higher than the artificial experience strategy, and the original SAC algorithm strategy, so that the superiority of the improved SAC algorithm applied to greenhouse environment parameter set value decision is illustrated.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms should not be understood as necessarily being directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Claims

1. A greenhouse environment parameter optimization decision method based on an improved SAC algorithm is characterized in that: based on the SAC algorithm, comprising the following steps:

S5, repeatedly executing the step S4 until a plurality of planting periods are completed, obtaining a greenhouse environment parameter optimization decision neural network, and generating a greenhouse environment parameter optimization decision through the greenhouse environment parameter optimization decision neural network;

The number of the neural networks of the criticizer of the SAC algorithm comprises:

Designing more than three SAC algorithm criticizer neural networks, wherein each SAC algorithm criticizer neural network corresponds to one SAC algorithm target criticizer neural network;

The objective function of designing the SAC algorithm actor neural network comprises the following steps:

Designing a supervised learning function;

Wherein E is mathematical expectation, s _t is greenhouse current state data, D is experience buffer, a _t is actor action output, pi is actor strategy for generating action, pi (a _t|s_t) represents probability of actor strategy output action a _t under state s _t, sigma is weight coefficient, initializing to 1, Q (a _t,s_t) represents evaluation of state-action value by criticizing person, k is supervised learning function weight coefficient, k takes value between 0.3 and 0.5, a _expData represents historical artificial planting experience data, In order to indicate the function,Representing when (s _t,a_t) comes from historical artificial planting experience dataThe function value is 1, otherwise 0.

2. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 1, wherein the method comprises the following steps: the greenhouse state data comprises one or more of greenhouse internal climate data, greenhouse internal crop growth data and future climate forecast data; the greenhouse environment parameter decision data comprises one or more of temperature of day of the future, carbon dioxide concentration of day of the future, artificial light supplementing set values of day of the future, and artificial light supplementing set values of night of the future, carbon dioxide concentration of night of the future and night of the future;

3. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 1, wherein the method comprises the following steps: the historical artificial greenhouse environment parameter decision data are greenhouse day set values and night set values corresponding to seedling periods of crops, and greenhouse day set values and night set values corresponding to fruiting periods of the crops.

4. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 1, wherein the method comprises the following steps: the greenhouse simulator comprises a greenhouse environment model and a crop model, wherein the greenhouse environment model is used for simulating greenhouse internal environment parameters through a mass balance equation and an energy balance equation, and the crop model is used for modeling according to photosynthesis, respiration and biomass distribution of crops and simulating the growth and the yield of the crops under the greenhouse internal environment parameters.

5. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 4, wherein the method comprises the following steps: the greenhouse internal environment parameters include one or more of temperature, carbon dioxide concentration, illumination intensity.

6. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 1, wherein the method comprises the following steps: the updating parameters of the criticizing neural network and parameters of the actor neural network by using the experience buffer data comprises:

7. The greenhouse environment parameter optimization decision-making method based on the improved SAC algorithm as claimed in claim 1, wherein the method comprises the following steps: and step S4 is repeatedly executed until a plurality of planting periods are completed, and the obtaining of the greenhouse environment parameter optimization decision neural network comprises the following steps: