CN112365359A

CN112365359A - Training method, device, equipment and storage medium for planting decision determination model

Info

Publication number: CN112365359A
Application number: CN202011348133.XA
Authority: CN
Inventors: 姚瑶; 罗迪君
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2021-02-12

Abstract

The application discloses a training method of a planting decision-making determination model, a training method, a training device, equipment and a storage medium of the decision-making determination model, and belongs to the field of artificial intelligence. In the embodiment of the application, the server introduces the virtual planting environment in the process of training the model, and expands a plurality of first planting decisions as training samples based on the first virtual environment parameters of the virtual environment. In the process of training the planting decision-making determination model by using the expanded training samples, the server introduces an uncertainty parameter for evaluating the reliability of the samples, and trains the planting decision-making determination model by combining the uncertainty parameter and the expanded training samples, so that the number of the training samples is increased, the influence degree of the training samples on model training is adjusted by combining the uncertainty parameter, and the accuracy of the trained planting decision-making determination model is higher. In the subsequent use process, the planting decision determining model can output an effective planting decision.

Description

Training method, device, equipment and storage medium for planting decision determination model

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a training method for a plant decision-making determination model, an apparatus, a device, and a storage medium.

Background

With the global population aging and water and soil loss and soil pollution caused by transition irrigation and fertilization, agriculture is urgently needed to be advanced from extensive to fine, and meanwhile, the yield, the resource utilization rate and the profit are improved, so that the global famine and environmental pollution problems are relieved.

In the related art, technicians often train a model capable of outputting a planting decision according to changes of a planting environment based on data collected in an agricultural production process, so that agricultural production is guided based on the planting decision output by the model. Due to the fact that the growth cycle of crops is long, the number of historical data is small, and the technology is prone to artificially expanding some training data to train the model.

However, the artificially expanded training data often have a large difference from the actual data, so that the accuracy of the trained model is low, and an effective planting decision cannot be output.

Disclosure of Invention

The embodiment of the application provides a training method of a planting decision-making determination model, a training method, a training device, equipment and a storage medium of the decision-making determination model, and the accuracy of the model can be improved. The technical scheme is as follows:

in one aspect, a method for training a plant decision determination model is provided, the method comprising:

inputting a first virtual environment parameter into a planting decision determining model, and obtaining a plurality of first planting decisions corresponding to the first virtual environment parameter through the planting decision determining model, wherein the first virtual environment parameter is used for representing the environment state of a virtual planting environment, and the first planting decisions are used for changing the environment state of the virtual planting environment;

obtaining a plurality of second virtual environment parameters respectively corresponding to the plurality of first planting decisions based on the first virtual environment parameters and the corresponding plurality of first planting decisions, wherein the second virtual environment parameters are predicted virtual environment parameters after the corresponding planting decisions are executed in the virtual planting environment;

determining a target planting decision based on the plurality of second virtual environment parameters, wherein an evaluation value and a corresponding uncertainty parameter of the target planting decision accord with a target condition, the uncertainty parameter is used for representing the credibility of the corresponding second virtual environment parameter, and the evaluation value is used for representing the degree of influence of the corresponding planting decision on the training of the planting decision determination model;

and adjusting the model parameters of the planting decision determination model based on the evaluation value and the uncertainty parameters of the target planting decision.

In one aspect, a method for training a decision-making model is provided, the method comprising:

inputting a third virtual environment parameter into a decision-making determination model, and obtaining a plurality of first decisions corresponding to the third virtual environment parameter through the decision-making determination model, wherein the third virtual environment parameter is used for representing the environment state of a virtual environment, and the first decisions are used for changing the environment state of the virtual environment;

obtaining a plurality of fourth virtual environment parameters respectively corresponding to the plurality of first decisions based on the third virtual environment parameters and the corresponding plurality of first decisions, wherein the fourth virtual environment parameters are predicted virtual environment parameters after corresponding decisions are executed in the virtual environment;

determining a target decision based on the fourth virtual environment parameters, wherein an evaluation value and a corresponding uncertainty parameter of the target decision meet a target condition, the uncertainty parameter is used for representing the credibility of the corresponding fourth virtual environment parameter, and the evaluation value is used for representing the degree of influence of the corresponding decision on training of the decision determination model;

and adjusting the model parameters of the decision-making determination model based on the evaluation value and the uncertainty parameters of the target decision.

In a possible implementation manner, the inputting a third virtual environment parameter into a decision determination model, and obtaining, by the decision determination model, a plurality of first decisions corresponding to the third virtual environment parameter includes:

inputting the third virtual environment parameter into a decision-making determination model, and multiplying the third virtual environment parameter by a weight matrix of the decision-making determination model to obtain a feature vector of the third virtual environment parameter;

adding the characteristic vector and the bias matrix of the decision-making determination model, and then carrying out normalization processing to obtain a second decision corresponding to the third virtual environment parameter;

and performing data enhancement on the second decision to obtain the plurality of first decisions corresponding to the third virtual environment parameter.

In a possible implementation, the data enhancing the second decision to obtain the plurality of first decisions corresponding to the third virtual environment parameter includes:

and respectively adding a plurality of biases to the second decision to obtain the plurality of first decisions corresponding to the third virtual environment parameters.

In one possible embodiment, the determining a target decision based on the plurality of fourth virtual environment parameters comprises:

obtaining a plurality of uncertainty parameters respectively corresponding to the plurality of first decisions based on the plurality of fourth virtual environment parameters;

obtaining a plurality of evaluation values of the plurality of first decisions based on the fourth virtual environment parameter and the plurality of first decisions;

determining the target decision based on the plurality of uncertainty parameters and the plurality of evaluation values.

In a possible implementation manner, the obtaining, based on the fourth virtual environment parameters, a plurality of uncertainty parameters corresponding to the first decisions includes:

obtaining a mean value of the plurality of fourth virtual environment parameters;

obtaining a plurality of uncertainty parameters respectively corresponding to the plurality of first decisions based on the variances between the plurality of fourth virtual environment parameters and the mean, and the number of the fourth virtual environment parameters.

In a possible implementation, the obtaining, based on the fourth virtual environment parameter and the plurality of first decisions, a plurality of evaluation values for the plurality of first decisions includes:

inputting the plurality of decisions and the plurality of fourth virtual environment parameters into a decision evaluation model, and outputting a plurality of evaluation values of the plurality of first decisions by the decision evaluation model.

In one possible embodiment, the determining the objective decision based on the plurality of uncertainty parameters and the plurality of evaluation values comprises:

respectively fusing the plurality of uncertainty parameters and the corresponding plurality of evaluation values to obtain a plurality of fused evaluation values;

and determining a first decision corresponding to the highest fusion evaluation value in the plurality of fusion evaluation values as the target decision.

In a possible implementation manner, the obtaining, based on the third virtual environment parameter and the corresponding first decisions, a plurality of fourth virtual environment parameters respectively corresponding to the first decisions includes:

inputting the third virtual environment parameter and the corresponding first decisions into a virtual environment model, and outputting a plurality of fourth virtual environment parameters corresponding to the first decisions by the virtual environment model, wherein the virtual environment model is used for simulating the virtual environment.

In a possible embodiment, the virtual environment model includes a plurality of sub models, the plurality of sub models are trained based on different data subsets of the same sample data set, the inputting the third virtual environment parameter and the corresponding first decisions into the virtual environment model, and the outputting, by the virtual environment model, a plurality of fourth virtual environment parameters corresponding to the first decisions, respectively, includes:

and inputting the third virtual environment parameters and the corresponding first decisions into the submodels respectively, and obtaining a plurality of fourth virtual environment parameters corresponding to the first decisions respectively through the submodels.

In a possible implementation, the adjusting the model parameters of the decision-making model based on the evaluation value and uncertainty parameters of the objective decision comprises:

obtaining a penalty weight corresponding to an uncertainty parameter of the target decision, the penalty weight being inversely proportional to the uncertainty parameter of the target decision;

obtaining a loss value corresponding to the decision determination model based on the evaluation value of the target decision;

adjusting model parameters of the decision-making model based on a product of the penalty weight and the loss value.

In one aspect, there is provided a training apparatus for an implant decision determination model, the apparatus comprising:

the system comprises a first input module, a second input module and a control module, wherein the first input module is used for inputting first virtual environment parameters into a planting decision determining model, and obtaining a plurality of first planting decisions corresponding to the first virtual environment parameters through the planting decision determining model, the first virtual environment parameters are used for representing the environment state of a virtual planting environment, and the first planting decisions are used for changing the environment state of the virtual planting environment;

a second virtual environment parameter obtaining module, configured to obtain, based on the first virtual environment parameter and the corresponding first planting decisions, a plurality of second virtual environment parameters corresponding to the first planting decisions, respectively, where the second virtual environment parameter is a predicted virtual environment parameter after a corresponding planting decision is executed in the virtual planting environment;

a target planting decision determining module, configured to determine a target planting decision based on the plurality of second virtual environment parameters, where an evaluation value and a corresponding uncertainty parameter of the target planting decision meet a target condition, the uncertainty parameter is used to indicate a reliability of the corresponding second virtual environment parameter, and the evaluation value is used to indicate an influence degree of the corresponding planting decision on training of the planting decision determining model;

and the first model training module is used for adjusting the model parameters of the planting decision determining model based on the evaluation value and the uncertainty parameters of the target planting decision.

In a possible implementation manner, the first input module is configured to input the first virtual environment parameter into a planting decision determination model, and multiply the first virtual environment parameter by a weight matrix of the planting decision determination model to obtain a feature vector of the first virtual environment parameter; adding the characteristic vector and the bias matrix of the planting decision determining model, and then carrying out normalization processing to obtain a second planting decision corresponding to the first virtual environment parameter; and performing data enhancement on the second planting decision to obtain a plurality of first planting decisions corresponding to the first virtual environment parameters.

In a possible implementation manner, the first input module is configured to add a plurality of offsets to the second planting decisions, respectively, to obtain the plurality of first planting decisions corresponding to the first virtual environment parameter.

In a possible implementation manner, the target planting decision determining module is configured to obtain, based on the plurality of second virtual environment parameters, a plurality of uncertainty parameters respectively corresponding to the plurality of first planting decisions; obtaining a plurality of evaluation values of the plurality of first planting decisions based on the second virtual environment parameter and the plurality of first planting decisions; determining the target planting decision based on the plurality of uncertainty parameters and the plurality of evaluation values.

In a possible implementation manner, the target planting decision determining module is configured to obtain a mean value of the plurality of second virtual environment parameters; obtaining a plurality of uncertainty parameters respectively corresponding to the plurality of first planting decisions based on the variance between the plurality of second virtual environment parameters and the mean value and the number of the second virtual environment parameters.

In a possible implementation, the target planting decision determining module is configured to input the plurality of planting decisions and the plurality of second virtual environment parameters into a planting decision evaluation model, and output, by the planting decision evaluation model, a plurality of evaluation values of the plurality of first planting decisions.

In a possible implementation manner, the target planting decision determining module is configured to fuse the plurality of uncertainty parameters and the plurality of corresponding evaluation values, respectively, to obtain a plurality of fused evaluation values; and determining a first planting decision corresponding to the highest fusion evaluation value in the plurality of fusion evaluation values as the target planting decision.

In a possible implementation manner, the second virtual environment parameter obtaining module is configured to input the first virtual environment parameter and the corresponding first planting decisions into a virtual planting environment model, and output, by the virtual planting environment model, a plurality of second virtual environment parameters respectively corresponding to the first planting decisions, where the virtual planting environment model is configured to simulate the virtual planting environment.

In a possible implementation manner, the virtual planting environment model includes a plurality of sub models, the plurality of sub models are obtained by training different data subsets based on the same sample data set, and the second virtual environment parameter obtaining module is configured to input the first virtual environment parameters and the corresponding first planting decisions into the plurality of sub models, and obtain a plurality of second virtual environment parameters corresponding to the first planting decisions through the plurality of sub models.

In a possible implementation manner, the first model training module is configured to obtain a penalty weight corresponding to an uncertainty parameter of the target planting decision, where the penalty weight is inversely proportional to the uncertainty parameter of the target planting decision; obtaining a loss value corresponding to the planting decision determination model based on the evaluation value of the target planting decision; adjusting model parameters of the plant decision determination model based on a product of the penalty weight and the loss value.

In one aspect, a training apparatus for a decision-making model is provided, the apparatus comprising:

a second input module, configured to input a third virtual environment parameter into a decision determination model, and obtain, through the decision determination model, a plurality of first decisions corresponding to the third virtual environment parameter, where the third virtual environment parameter is used to represent an environment state of a virtual environment, and the first decisions are used to change the environment state of the virtual environment;

a fourth environment parameter obtaining module, configured to obtain, based on the third virtual environment parameter and the corresponding first decisions, a plurality of fourth virtual environment parameters corresponding to the first decisions, where the fourth virtual environment parameters are predicted virtual environment parameters after corresponding decisions are executed in the virtual environment;

a target decision determining module, configured to determine a target decision based on the fourth virtual environment parameters, where an evaluation value and a corresponding uncertainty parameter of the target decision meet a target condition, the uncertainty parameter is used to indicate a reliability of the corresponding fourth virtual environment parameter, and the evaluation value is used to indicate a degree of influence of the corresponding decision on training of the decision determining model;

and the second model training module is used for adjusting the model parameters of the decision-making determination model based on the evaluation value and the uncertainty parameters of the target decision.

In a possible implementation manner, the second input module is configured to input the third virtual environment parameter into a decision-making determination model, and multiply the third virtual environment parameter by a weight matrix of the decision-making determination model to obtain a feature vector of the third virtual environment parameter; adding the characteristic vector and the bias matrix of the decision-making determination model, and then carrying out normalization processing to obtain a second decision corresponding to the third virtual environment parameter; and performing data enhancement on the second decision to obtain the plurality of first decisions corresponding to the third virtual environment parameter.

In a possible implementation manner, the second input module is configured to add a plurality of biases to the second decisions, respectively, to obtain the plurality of first decisions corresponding to the third virtual environment parameters.

In a possible implementation manner, the goal decision determining module is configured to obtain, based on the fourth virtual environment parameters, a plurality of uncertainty parameters corresponding to the first decisions, respectively; obtaining a plurality of evaluation values of the plurality of first decisions based on the fourth virtual environment parameter and the plurality of first decisions; determining the target decision based on the plurality of uncertainty parameters and the plurality of evaluation values.

In a possible implementation manner, the fourth environment parameter obtaining module is configured to obtain a mean value of the plurality of fourth virtual environment parameters; obtaining a plurality of uncertainty parameters respectively corresponding to the plurality of first decisions based on the variances between the plurality of fourth virtual environment parameters and the mean, and the number of the fourth virtual environment parameters.

In a possible implementation, the target decision determining module is configured to input the plurality of decisions and the plurality of fourth virtual environment parameters into a decision evaluation model, and output, by the decision evaluation model, a plurality of evaluation values of the plurality of first decisions.

In a possible implementation manner, the objective decision determining module is configured to fuse the plurality of uncertainty parameters and the plurality of corresponding evaluation values, respectively, to obtain a plurality of fused evaluation values; and determining a first decision corresponding to the highest fusion evaluation value in the plurality of fusion evaluation values as the target decision.

In a possible implementation manner, the fourth environment parameter obtaining module is configured to input the third virtual environment parameter and the corresponding first decisions into a virtual environment model, and output, by the virtual environment model, a plurality of fourth virtual environment parameters respectively corresponding to the first decisions, where the virtual environment model is used to simulate the virtual environment.

In a possible implementation manner, the virtual environment model includes a plurality of submodels, the plurality of submodels are obtained by training different data subsets based on the same sample data set, and the fourth environment parameter obtaining module is configured to input the third virtual environment parameter and the corresponding first decisions into the plurality of submodels, and obtain a plurality of fourth virtual environment parameters corresponding to the first decisions through the plurality of submodels.

In a possible implementation manner, the second model training module is configured to obtain a penalty weight corresponding to an uncertainty parameter of the target decision, where the penalty weight is inversely proportional to the uncertainty parameter of the target decision; obtaining a loss value corresponding to the decision determination model based on the evaluation value of the target decision; adjusting model parameters of the decision-making model based on a product of the penalty weight and the loss value.

In one aspect, a computer device is provided, the computer device comprising one or more processors and one or more memories having stored therein at least one computer program, the computer program being loaded and executed by the one or more processors to implement the training method of the planting decision determination model.

In one aspect, a computer-readable storage medium is provided, in which at least one computer program is stored, which is loaded and executed by a processor to implement the training method of the planting decision determination model.

In one aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising program code, the program code being stored in a computer-readable storage medium, the program code being read by a processor of a computer device from the computer-readable storage medium, the program code being executable by the processor to cause the computer device to perform the above-mentioned training method of a planting decision determination model.

In the embodiment of the application, the server introduces the virtual planting environment in the process of training the model, and expands a plurality of first planting decisions as training samples based on the first virtual environment parameters of the virtual environment. In the process of training the planting decision-making determination model by using the expanded training samples, the server introduces an uncertainty parameter for evaluating the reliability of the samples, and trains the planting decision-making determination model by combining the uncertainty parameter and the expanded training samples. In this case, the server not only increases the number of training samples, but also adjusts the degree of influence of the training samples on model training by combining with uncertainty parameters, so that the accuracy of the model determined by the planting decision obtained by training is higher. In the subsequent use process, the planting decision determining model can output an effective planting decision.

Drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating an environment for implementing a method for training a plant decision-making model according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating an implementation environment of a training method for a plant decision making model according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of a method for training a plant decision determination model provided by an embodiment of the present application;

FIG. 4 is a flow chart of a method for training a plant decision determination model provided by an embodiment of the present application;

FIG. 5 is a flow chart of a method for training a plant decision determination model provided by an embodiment of the present application;

FIG. 6 is a flow chart of a method for training a decision-making model according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a training apparatus for an implantation decision determination model according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a training apparatus for a decision-making model according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution.

The term "at least one" in this application means one or more, "a plurality" means two or more, for example, a plurality of reference face images means two or more reference face images.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge submodel to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

Reinforcement learning: the intelligent agent mainly comprises an environment and an intelligent agent, the intelligent agent selects actions to execute according to the environment state of the environment, the environment is transferred to a new environment state according to the actions of the intelligent agent and feeds back a value reward, and the intelligent agent continuously optimizes a strategy according to the reward fed back by the environment.

Reinforcement learning is generally modeled using a Markov Decision Process (MDP), which can be generally abstracted as a five-tuple (S, A, P, R, γ), where S represents a state space, A represents a decision space, P represents a state transition probability matrix, R represents a reward function, γ represents a discount factor, and ρ represents an initial state S₀Initial distribution of compliance. The agent observes the state s at each instant_tAnd executing decision a according to strategy pi_tThe environment transitions to the next state s upon receiving an action_t+1And feeds back the prize r_tThe goal of reinforcement learning optimization is to learn a strategy that maximizes the jackpot value. In general, reinforcement learning uses a state action value function Q to approximate the expected benefit that can be obtained according to the current strategy from the current state, which is expressed as formula (1).

Wherein E is_π[]K is the number of decisions, as desired.

Model-based reinforcement learning: belongs to a kind of reinforcement learning, and is generally used for solving the problem of sampling complexity. The method is mainly characterized in that a neural network (called as an environment model) is learned firstly for approximating a real external environment, and then the environment model can assist an intelligent agent or a controller to learn strategies in a planning, predicting, virtual data generation mode and the like.

Bayes neural network: the Bayes neural network is one of artificial neural networks, and is mainly characterized in that the output is mean and variance, and the mean and variance can uniquely determine Gaussian distribution corresponding to the output.

Gaussian Distribution (Gaussian Distribution): also known as Normal Distribution (Normal Distribution), the curve of the gaussian Distribution is bell-shaped, high in the middle and low at both ends. The expected value μ of the gaussian determines the position of the gaussian curve and the standard deviation σ determines the extent of the curve. The gaussian distribution when μ ═ 0 and σ ═ 1 is a standard gaussian distribution.

Uncertainty: in recent years, data-driven machine learning has become popular with the maturity of artificial neural network technology. However, models obtained by data learning have problems of over-fitting and under-fitting, and a problem exists for simulator techniques using neural network modeling: and predicting deviation. Such deviations are unavoidable for several reasons: on one hand, due to data shortage, when the model predicts an area which is not covered by the current data set, the prediction is naturally not accurate enough; on the other hand, any process of data collection inevitably has some noise, and if the training is repeated on the noisy data, the model obtained by final training is easy to over-fit the noise, and the prediction deviation is also uncertainty.

The technical scheme provided by the embodiment of the application can be applied to the scenes of planting any type of crops in a greenhouse, such as the scenes of planting small tomatoes or the scenes of planting cucumbers, and the application is not limited to the application.

In some embodiments, the equipment used in the greenhouse includes ventilation systems, shading systems, heating systems, air fertilizer machines, Light Emitting Diode (LED) fill Light, substrate cultivation systems, sensors (for temperature, humidity, Light intensity, and carbon dioxide measurements), and controllers. The controller is in communication connection with the ventilation system, the shading system, the heating system, the air fertilizer machine, the LED light supplement lamp and the substrate cultivation system, and the control system can control the ventilation system, the shading system, the heating system, the air fertilizer machine, the LED light supplement lamp and the substrate cultivation system. The terminal is in communication connection with the controller and the sensor, the terminal can acquire environmental data in the greenhouse through the sensor, and each device in the greenhouse is controlled through the controller.

Due to the fact that the growth period of crops is long, the time is long for completely collecting the environmental data of the crops from sowing to harvesting, and the cost and the difficulty for obtaining the environmental data are high. However, in order to obtain a plant decision determination model with a better prediction effect, a large number of samples are required to train the plant decision determination model, in this case, before training the plant decision determination model provided in the embodiment of the present application, a technician can construct a virtual plant environment for simulating a real plant environment by using limited environment data acquired from the real plant environment, and the amount of sample data for training the plant decision determination model is expanded by using the virtual plant environment, so that the accuracy of a predicted plant decision of the plant decision determination model is improved. The terminal can acquire environment data in the greenhouse through the sensor to serve as an environment data sample for constructing the virtual planting environment, and the running state of each device is acquired through the controller to serve as a planting decision sample for constructing the virtual planting environment. The terminal constructs a virtual planting environment for simulating a real planting environment based on the environment data sample and the planting decision sample, and the virtual planting environment can reflect differently based on different planting decisions. The terminal can train the planting decision determining model in the virtual planting environment, namely the terminal inputs the virtual environment parameters of the virtual planting environment into the planting decision determining model, and the planting decision determining model predicts based on the virtual environment parameters to obtain a predicted planting decision. And the terminal inputs the predicted planting decision into the virtual planting environment, and the virtual planting environment outputs a new virtual environment parameter under the influence of the predicted planting decision. And the terminal inputs the new virtual environment parameters and the predicted planting decision into the planting decision evaluation model, and the planting decision evaluation model outputs an evaluation value corresponding to the predicted planting decision. And updating the model parameters of the planting decision-making determination model by the terminal based on the evaluation values, and training the planting decision-making determination model through a plurality of iteration processes.

After the training of the planting decision determining model provided by the embodiment of the application is completed, the terminal can input the environmental parameters obtained through the sensor in the greenhouse into the planting decision determining model, and the planting decision determining model can predict based on the environmental parameters and output the planting decision corresponding to the environmental parameters. The terminal can send control instructions to different devices in the greenhouse through the control system based on the planting decision so as to execute the planting decision.

In this way, each device in the greenhouse can maintain the environment in the greenhouse in an environment state which is most beneficial to crop production under the control of the terminal, thereby improving the quality and yield of crops.

Referring to fig. 1, a flow chart of the technical solution provided by the embodiment of the present application is shown, where 101 is a real environment, that is, data measured by a terminal through a sensor in a greenhouse, 102 is a planting decision made by an agricultural expert based on the data measured by the sensor, and 103 is a real sample data set including data measured by the sensor in the greenhouse (environmental state t), a planting decision made by the agricultural expert based on the data measured by the sensor (decision), and data measured by the sensor in the greenhouse after the planting decision is taken (environmental state t + 1). 104 is a virtual planting environment trained based on the sample data set 103, the virtual planting environment can output a model sample data set 105, and the model sample data set 105 is used for training a planting decision determination model.

Fig. 2 is a schematic diagram of an implementation environment of a training method for a plant decision determination model according to an embodiment of the present application, and referring to fig. 2, the implementation environment includes a terminal 210 and a server 240.

The terminal 210 is connected to the server 240 through a wireless network or a wired network. Optionally, the terminal 210 is a smartphone, a tablet, a laptop, a desktop computer, etc., but is not limited thereto. The terminal 210 is installed and running with an application that supports model training.

Optionally, the server 240 is an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, distribution Network (CDN), big data and artificial intelligence platform, and the like.

Optionally, the terminal 210 generally refers to one of a plurality of terminals, and the embodiment of the present application is illustrated by the terminal 210.

Those skilled in the art will appreciate that the number of terminals described above may be greater or fewer. For example, the number of the terminal is only one, or several tens or hundreds, or more, and in this case, other terminals are also included in the implementation environment. The number of terminals and the type of the device are not limited in the embodiments of the present application.

In the embodiment of the application, the planting decision determining model can be trained by the terminal and also can be trained by the server, that is, the terminal collects environmental parameters in the greenhouse through a sensor in the greenhouse, the environmental parameters are sent to the server as model training samples, and the server trains the planting decision determining model based on the environmental parameters. And after the server finishes training the planting decision determining model, sending the trained planting decision determining model to the terminal, and controlling a plurality of devices in the greenhouse by the terminal through the planting decision output by the planting decision determining model. Of course, the server may also provide an interface for calling the planting decision determining model for the terminal, and the terminal may directly call the planting decision determining model through the interface, which is not limited in the embodiment of the present application.

In the following description, the server trains the planting decision determination model, and the terminal uses the planting decision determination model as an example.

Fig. 3 is a flowchart of a training method of an implantation decision determination model according to an embodiment of the present application, and referring to fig. 3, the method includes:

301. the server inputs the first virtual environment parameters into the planting decision determining model, and a plurality of first planting decisions corresponding to the first virtual environment parameters are obtained through the planting decision determining model, wherein the first virtual environment parameters are used for representing the environment state of the virtual planting environment, and the first planting decisions are used for changing the environment state of the virtual planting environment.

The planting decision-making determination model is a reinforcement learning model, and in some embodiments, the planting decision-making determination model is an Actor model in a decision-evaluation (Actor-Critic) model. The planting decision determining model comprises a plurality of full connection layers, bias layers and normalization layers and has the capacity of outputting corresponding planting decisions based on virtual environment parameters. In some embodiments, the planting decision comprises a decision on at least one of a heating system, a ventilation system, a shading system, a ventilation system, an air fertilizer machine, a light emitting diode fill light, and a substrate cultivation system in the greenhouse. In some embodiments, the first virtual environment parameter includes a temperature parameter, a humidity parameter, a carbon dioxide concentration, and a light intensity of the virtual planting environment.

302. The server obtains a plurality of second virtual environment parameters respectively corresponding to the plurality of first planting decisions based on the first virtual environment parameters and the corresponding plurality of first planting decisions, wherein the second virtual environment parameters are predicted virtual environment parameters after the corresponding planting decisions are executed in the virtual planting environment.

The server can obtain a plurality of second virtual environment parameters based on the first virtual environment parameters and different first planting decisions due to the existence of a plurality of first planting decisions.

303. The server determines a target planting decision based on the plurality of second virtual environment parameters, wherein the evaluation value and the corresponding uncertainty parameter of the target planting decision accord with a target condition, the uncertainty parameter is used for representing the credibility of the corresponding second virtual environment parameter, and the evaluation value is used for representing the degree of influence of the corresponding planting decision on the training planting decision determination model.

The uncertainty parameter is used to represent the reliability of the corresponding second virtual environment parameter, that is, the score of the virtual planting environment on the predicted second virtual environment parameter. The lower the uncertainty of the second environment parameter, the higher the confidence level, which indicates that the second environment parameter is closer to the environment parameter obtained after the same planting decision is performed in the real environment. The evaluation value is used for representing the amount of content that the planting decision determining model can learn after the corresponding planting decision is adopted to train the planting decision determining model, that is, the planting decision can enable the planting decision determining model to learn more content if the influence degree of one planting decision on the planting decision determining model is larger.

304. And the server adjusts the model parameters of the planting decision determining model based on the evaluation value and the uncertainty parameters of the target planting decision.

Fig. 4 is a flowchart of a training method of an implantation decision determination model according to an embodiment of the present application, and referring to fig. 4, the method includes:

401. the server inputs the first virtual environment parameters into the planting decision determining model, and a plurality of first planting decisions corresponding to the first virtual environment parameters are obtained through the planting decision determining model, wherein the first virtual environment parameters are used for representing the environment state of the virtual planting environment, and the first planting decisions are used for changing the environment state of the virtual planting environment.

The planting decision determining model comprises a plurality of full connection layers, a bias layer and a normalization layer, wherein the full connection layers comprise a plurality of weight matrixes, and the server can adjust numerical values in the weight matrixes and the bias matrix of the planting decision determining model through interaction between the planting decision determining model and the virtual planting environment, namely a training process of the planting decision determining model. The virtual planting environment is a virtual planting environment constructed by the server based on data collected from the real planting environment. The virtual planting environment is also a model that can predict the environmental parameters of the real environment based on the planting decision output by the planting decision determination model.

In a possible implementation manner, the server inputs the first virtual environment parameter into the planting decision determination model, and multiplies the first virtual environment parameter by a weight matrix of the planting decision determination model to obtain a feature vector of the first virtual environment parameter. And the server adds the characteristic vector and the bias matrix of the planting decision determining model and then carries out normalization processing to obtain a second planting decision corresponding to the first virtual environment parameter. And the server performs data enhancement on the second planting decision to obtain a plurality of first planting decisions corresponding to the first virtual environment parameters.

In this implementation manner, the server can output a second planting decision corresponding to the first virtual environment parameter through the planting decision determination model, and generate a candidate planting decision set on the basis of the second planting decision, where the candidate planting decision set also includes a plurality of first planting decisions, and the plurality of first planting decisions are also planting decisions that do not occur in the real planting environment, and the planting decision determination model is trained through the plurality of first planting decisions, so that the generalization capability of the planting decision determination model can be improved.

In the above embodiment, the server will now describe a method for enhancing data of the second planting decision.

In a possible implementation manner, the server adds a plurality of offsets to the second planting decisions to obtain a plurality of first planting decisions corresponding to the first virtual environment parameters. In some embodiments, the server is capable of data enhancing the second planting decision based on equation (2), resulting in a plurality of first planting decisions.

f(a_t)＝a_t+Z，Z～(0，M) M＞0 (2)

Wherein f () is a data enhancement function, a_tFor the second planting decision, Z is the bias and M is the maximum value of the bias.

The above embodiments will be described below by way of a few examples for different planting decisions.

Example 1, the planting decision was to control the temperature of the heating system in the greenhouse for heating. The server inputs a first virtual environment parameter into the planting decision determining model, wherein the first virtual environment parameter comprises a humidity parameter, a carbon dioxide concentration and an illumination intensity of the virtual planting environment. In some embodiments, the first virtual environment parameter is a vector (40, 10, 130), the numbers in the vector (40, 10, 130) correspond to the humidity parameter, the carbon dioxide concentration and the illumination intensity, respectively, the server inputs the vector (40, 10, 130) into the planting decision determination model, and the vector (40, 10, 130) is associated with a weight matrix of the planting decision determination model

The multiplication results in a feature vector (1550, 640, 560). The server adds the feature vectors (1550, 640, 560) to a bias matrix (30, 50, 40) of the plant decision determination model to obtain vectors (1580, 690, 600). The server determines a normalization layer of the model through the planting decisions, namely a soft maximization (Softmax) layer, and normalizes the vectors (440, 270, 190) to obtain probabilities (0.49, 0.3, 0.21) corresponding to the 3 planting decisions, wherein 0.49 is a probability of +1, and +1 is used for indicating that the heater system is improved by 1 ℃; 0.3 is a probability of 0, 0 represents that the temperature of the heating system is unchanged; a probability of-1 for 0.21, -1 indicates that the heating system temperature is lowered by 1 ℃. The server can determine the planting strategy with the highest probability in the 3 planting strategies as the second planting strategy, namely +1. The server can perform data enhancement on the second planting strategy, namely adding bias on the basis of +1 to obtain a plurality of first planting strategies, for example, biasWhen the value range of (1) is (0, 0.3), the obtained multiple first planting strategies may be +1.1, +1.2, +0.9, and +0.8, and the like, which is not limited in the embodiment of the present application.

In this manner, the planting decision-making model trained by the server can be used to control the heating system in the greenhouse, and thus the temperature in the greenhouse. The terminal can automatically control the temperature in the greenhouse, a heating system in the greenhouse is not required to be manually controlled by a farmer, and the workload of the farmer is reduced.

Example 2, the planting decision is to control the illumination intensity of a fill light in the greenhouse. The server inputs first virtual environment parameters into the planting decision determining model, wherein the first virtual environment parameters comprise a humidity parameter, a carbon dioxide concentration and a temperature of the virtual planting environment. In some embodiments, the first virtual environment parameter is a vector (40, 10, 25), the numbers in the vector (40, 10, 25) correspond to the humidity parameter, the carbon dioxide concentration and the temperature, respectively, the server inputs the vector (40, 10, 25) into the plant decision determination model, and the vector (40, 10, 25) is associated with a weight matrix of the plant decision determination model

The multiplication results in a feature vector (125, 30, 85). The server adds the feature vectors (125, 30, 85) to a bias matrix (20, 40, 30) of the planting decision determination model to obtain vectors (145, 70, 115). The server determines a normalization layer of the model through the planting decisions, namely a soft maximization (Softmax) layer, and normalizes the vectors (145, 70, 115) to obtain 3 probabilities (0.44, 0.21, 0.35) corresponding to the planting decisions, wherein 0.44 is a probability of +10, and +10 is used for indicating that the luminance of the fill-in light is improved by 10 candelas; 0.21 is the probability of 0, and 0 represents that the brightness of the supplementary lighting lamp is unchanged; the probability of 0.35 being-10, -10 indicates that the fill-in lamp is reduced in brightness by 10 candelas. The server can determine the planting strategy with the highest probability in the 3 planting strategies as the second planting strategy, namely + 10. The server can perform data enhancement on the second planting strategy, that is, increase the offset on the basis of "+ 10" to obtain a plurality of first planting strategies, for example, when the offset has a value range of (0,3) meanwhile, the obtained plurality of first planting strategies may be +11, +12, +9, +8 and the like, which is not limited in the embodiment of the present application.

In this way, the planting decision determination model trained by the server can be used for controlling a light supplement lamp in the greenhouse, so that the illumination intensity in the greenhouse is controlled. The terminal can be automatically controlled to the illumination intensity in the greenhouse, a light supplement lamp in the greenhouse is not needed to be manually controlled by a farmer, and the workload of the farmer is reduced.

Example 3, the planting decision is to control the carbon dioxide release amount of the air fertilizer machine in the greenhouse. The server inputs a first virtual environment parameter into the planting decision determining model, wherein the first virtual environment parameter comprises a humidity parameter, illumination intensity and temperature of the virtual planting environment. In some embodiments, the first virtual environment parameter is a vector (40, 130, 25), the numbers in the vector (40, 130, 25) correspond to the humidity parameter, the illumination intensity and the temperature, respectively, the server inputs the vector (40, 130, 25) into the planting decision determination model, and the vector (40, 130, 25) is associated with a weight matrix of the planting decision determination model

The multiplication results in a feature vector (220, 430, 325). The server adds the feature vector (220, 430, 325) to a bias matrix (10, 50, 20) of the planting decision determination model to obtain a vector (230, 480, 345). The server determines a normalization layer of the model through the planting decisions, namely a soft maximization (Softmax) layer, and normalizes the vectors (145, 70 and 115) to obtain 3 probabilities (0.22, 0.45 and 0.33) corresponding to the planting decisions, wherein 0.22 is a probability of +10, and +10 is used for increasing the speed of releasing carbon dioxide of the air fertilizer machine by 10%; 0.45 is the probability of 0, and 0 represents that the speed of releasing carbon dioxide by the air fertilizer machine is not changed; a probability of-10 for 0.33, -10 means that the rate of carbon dioxide release from the air fertilizer machine is reduced by 10%. The server can determine the planting strategy with the highest probability in the 3 planting strategies as the second planting strategy, namely + 10. The server can perform data enhancement on the second planting strategy, that is, adding bias on the basis of + 10', to obtain a plurality of first planting strategies, for example,when the offset value range is (0, 3), the obtained multiple first planting strategies may be +11, +12, +9, and +8, and the like, which is not limited in the embodiment of the present application.

In this way, the planting decision determination model trained by the server can be used to control the air fertilizer machine in the greenhouse, and thus the concentration of carbon dioxide in the greenhouse. The terminal can automatically control the concentration of the carbon dioxide in the greenhouse, so that farmers do not need to manually control the gas fertilizer machine in the greenhouse, and the workload of the farmers is reduced.

It should be noted that, in the above three examples, the terminal is used to respectively describe the control of the heating system, the light supplement lamp and the air fertilizer machine, and the terminal determines the method for controlling other devices in the greenhouse through the planting decision model, which belongs to the same inventive concept as the above three examples and is not described herein again.

402. The server inputs the first virtual environment parameters and the corresponding first planting decisions into the virtual planting environment model, the virtual planting environment model outputs a plurality of second virtual environment parameters corresponding to the first planting decisions respectively, the second virtual environment parameters are predicted virtual environment parameters after the corresponding planting decisions are executed in the virtual planting environment, and the virtual planting environment model is used for simulating the virtual planting environment.

In a possible implementation manner, the virtual planting environment model comprises a plurality of sub-models, the plurality of sub-models are obtained by training different data subsets based on the same sample data set, the server inputs the first virtual environment parameters and the corresponding first planting decisions into the plurality of sub-models respectively, and a plurality of second virtual environment parameters corresponding to the first planting decisions are obtained through the plurality of sub-models respectively.

In the embodiment of the application, the virtual planting environment model is adopted to simulate the virtual planting environment, and in this way, the server determines the interaction between the model and the virtual planting environment model based on the planting decision, so that the model training can be completed quickly, and the efficiency is high.

In order to more clearly describe the above embodiments, a method for constructing the virtual planting environment model by the server will be described below.

In the data collection process, the terminal can acquire the environmental parameters S in the real planting environment through the sensor in the greenhouse_t. The terminal obtains planting decisions theta made by agricultural experts under different environmental parameters_t. The terminal can also acquire an environmental parameter S in the real planting environment after a planting decision made by an agricultural expert in the real planting environment through a sensor_t+1. The terminal can couple a plurality of data pairs (S)_t，θ_t，S_t+1) And as a sample data set for constructing the virtual planting environment model, sending the sample data set to a server, and constructing the virtual planting environment model by the server.

In the virtual planting environment model building process, a server receives a sample data set sent by a terminal, initializes a plurality of sub models for building the virtual planting environment model, the structures of the plurality of sub models are the same, wherein the sub model initializing process is a process of randomly assigning values to a weight matrix and an offset matrix of the sub model, and the initialized plurality of sub models have different weight matrices and offset matrices. After initializing the plurality of submodels, the server can randomly select data pairs from the sample data set and input the randomly selected data pairs into the submodels. In some embodiments, the server may be capable of training a plurality of submodels simultaneously or sequentially, which is not limited in this embodiment of the present application. If the server trains multiple submodels simultaneously, the server can randomly select data pairs for different submodels from the sample data set.

Since the method for training a plurality of submodels by the server belongs to the same inventive concept, the following description will take the process of training one submodel by the server as an example.

In one possible embodiment, the server randomly acquires a first data pair from the sample data set in an iterative process (S)₁，θ₁，S₂) The server compares S in the first data pair₁And theta₁Inputting a submodel by which to pair S₁And theta₁Carrying out full connection processing and normalization processing to obtain predicted environment parameters S output by the submodel_p. Server based on predicted environmental parameters S_pAnd the real environment parameter S in the first data pair₂And adjusting the model parameters of the submodel according to the difference information until the loss function of the submodel converges to the target function value or the iteration times of the submodel meet the target time conditions, and finishing the training of the submodel.

It should be noted that, no matter the server trains multiple submodels simultaneously or trains multiple submodels in sequence, since the initial weight matrix and the bias matrix of the multiple submodels are different, and the server randomly selects data pairs from the sample data set to train the model, the random selection means that the server may not use all the data pairs in the sample data set for training the submodels but randomly obtains the data pairs from the sample data set when training each submodel, and the data pairs also form a subset of the sample data set, so that the model parameters of the trained multiple submodels are different, and the trained multiple submodels also form the virtual planting environment model.

Under the embodiment, the server can construct the virtual planting environment model by training a plurality of sub-models, so that the reduction of the simulation accuracy of the virtual planting environment to the real environment caused by the overfitting of a single model is avoided, and the authenticity of the virtual planting environment is improved.

In the above description process, the weight matrix and the bias matrix of the multiple sub-models in the virtual planting environment model are taken as numerical values for example, that is, the structures of the multiple sub-models in the virtual planting environment model are a Logistic Regression (LR) model or a Deep Neural Network (DNN), and may also be models of other structures, such as a probability map model, an auto-supervised learning model, a contrast learning model, and the like, which is not limited in the embodiment of the present application. In other possible embodiments, the plurality of submodels may also be configured as Bayesian Neural Networks (BNNs), since the weight matrix of the BNNs is not fixed by some values, but the values in the weight matrix conform to a gaussian distribution. And (3) a process of training the sub-model, namely a process of enabling the numerical values in the weight matrix to be adjusted to accord with Gaussian distribution.

In other words, BNN can be viewed as a conditional distribution model P (y | x, w): the input x, the output predicts the distribution of values y, w is the weight in BNN. In some embodiments, the mean of the gaussian distribution to which the predicted value fits is also considered as the predicted value y. In the embodiment of the present application, x is a first virtual environment parameter and a plurality of corresponding first planting decisions, y is a second virtual environment parameter, and learning using BNN can be regarded as Maximum Likelihood Estimation (MLE), and each time the submodel inputs a predicted value, it is not a predicted value, but a mean value and a variance of a gaussian distribution that the predicted value conforms to. The process of the server training the submodels can be based on the following equation (3).

Wherein D is a sample data set of the training submodel, w^MLEIs a weight matrix for the submodel.

After describing the method for constructing the virtual planting environment model by the server, the following describes a method for acquiring a plurality of second virtual environment parameters by the server by using several examples.

Example 1, the planting decision was to control the temperature of the heating system in the greenhouse for heating. The server inputs a first virtual environment parameter and a plurality of corresponding first planting decisions into a plurality of sub-models, the first virtual environment parameter comprises a humidity parameter, a carbon dioxide concentration and an illumination intensity of the virtual planting environment, and the first planting decision is a temperature heated by the heating system. Taking the number of the first planting decisions as one, the number of the submodels as two as an example, the first virtual environment parameter is a vector (40, 10, 130), and the numbers in the vector (40, 10, 130) correspond to the humidity parameter and the carbon dioxide concentration respectivelyAnd the illumination intensity, the first planting decision being a number 1 indicating an increase in the temperature of the heating system by 1 ℃. The server splices the vector (40, 10, 130) and the number 1 to obtain the vector (40, 10, 130, 1). The server inputs the vectors (40, 10, 130, 1) into the two submodels, respectively, and associates the vectors (40, 10, 130, 1) with the weight matrices of the two submodels, respectively

And

and multiplying to obtain the feature vector (170, 131, 10) and the feature vector (131, 140, 180). The server adds the feature vectors (170, 131, 10) and the feature vectors (131, 140, 180) to the bias matrixes (30, 50, 40) and (10, 20, 50) of the two submodels respectively to obtain vectors (200, 181, 50) and (151, 160, 230), wherein the vectors (200, 181, 50) and the vectors (151, 160, 230) are second virtual environment parameters output by the two submodels respectively, and the meaning of numbers in the second virtual environment parameters is the same as that of the first simulated environment parameters. The virtual environment can use the average value of the two virtual environment parameters output by the two sub-models as the second environment parameter output by the virtual environment.

Example 2, the planting decision is to control the illumination intensity of a fill light in the greenhouse. The server inputs a first virtual environment parameter and a plurality of corresponding first planting decisions into a plurality of sub-models, the first virtual environment parameter comprises a humidity parameter, carbon dioxide concentration and temperature of the virtual planting environment, and the first planting decisions are illumination intensity of the light supplement lamp. Taking the number of the first planting decisions as one, the number of the submodels as two as an example, the first virtual environment parameter is a vector (40, 10, 25), numbers in the vector (40, 10, 25) respectively correspond to the humidity parameter, the carbon dioxide concentration and the temperature, and the first planting decision is a number 10, which is used for indicating that the illumination intensity of the fill light is increased by 10 candelas. The server splices the vector (40, 10, 25) and the number 10 to obtain the vector (40, 10, 25, 10). The server inputs the vectors (40, 10, 25, 10) into the two submodels, respectively, and the vectors (40, 10,25, 10) weight matrix with two submodels, respectively

And

and multiplying to obtain the feature vector (65, 35, 10) and the feature vector (35, 35, 75). The server adds the feature vector (65, 35, 10) and the feature vector (35, 35, 75) to the bias matrixes (10, 20, 10) and (5, 15, 20) of the two submodels respectively to obtain a vector (75, 55, 20) and a vector (40, 50, 95), wherein the vector (75, 55, 20) and the vector (40, 50, 95) are second virtual environment parameters output by the two submodels respectively, and the meaning of the number in the second virtual environment parameters is the same as that of the first simulated environment parameters. The virtual environment can use the average value of the two virtual environment parameters output by the two sub-models as the second environment parameter output by the virtual environment.

Example 3, the planting decision is to control the carbon dioxide release amount of the air fertilizer machine in the greenhouse. The server inputs a first virtual environment parameter and a plurality of corresponding first planting decisions into a plurality of sub-models, the first virtual environment parameter comprises a humidity parameter, illumination intensity and temperature of the virtual planting environment, and the first planting decision is carbon dioxide release amount of the air fertilizer machine. Taking the number of the first planting decisions as one, the number of the submodels as two as an example, the first virtual environment parameter is a vector (40, 130, 25), the numbers in the vector (40, 130, 25) respectively correspond to the humidity parameter, the illumination intensity and the temperature, and the first planting decision is a number 10, which is used for indicating that the carbon dioxide release amount of the air fertilizer machine is increased by 10%. The server splices the vector (40, 130, 25) and the number 10 to obtain the vector (40, 130, 25, 10). The server inputs the vectors (40, 130, 25, 10) into the two submodels, respectively, and associates the vectors (40, 130, 25, 10) with the weight matrices of the two submodels, respectively

And

and multiplying to obtain the feature vector (65, 35, 130) and the feature vector (35, 155, 195). The server adds the feature vector (65, 35, 130) and the feature vector (35, 155, 195) to the bias matrixes (10, 15, 10) and (5, 10, 15) of the two submodels respectively to obtain a vector (75, 50, 140) and a vector (40, 165, 210), wherein the vector (75, 50, 140) and the vector (40, 165, 210) are second virtual environment parameters output by the two submodels respectively, and the meaning of the number in the second virtual environment parameters is the same as that of the first simulated environment parameters. The virtual environment can use the average value of the two virtual environment parameters output by the two sub-models as the second environment parameter output by the virtual environment.

Example 4 if the sub-model is a BNN model, the server can enter pairs of state actions(s) into the sub-model_t，a_t) Wherein s is_tIs a first virtual environment parameter, a_tIs the corresponding first planting decision. The server can obtain s through the weight matrix of the submodel_t+1Mean and variance (μ) of the coincidental gaussian distributionsⁱ _t，σⁱ _t). The server can average the mean μ of the gaussian distributionⁱ _tAs a second virtual environment parameter.

403. The server determines a target planting decision based on the plurality of second virtual environment parameters, wherein the evaluation value and the corresponding uncertainty parameter of the target planting decision accord with a target condition, the uncertainty parameter is used for representing the credibility of the corresponding second virtual environment parameter, and the evaluation value is used for representing the degree of influence of the corresponding planting decision on the training planting decision determination model.

In a possible implementation manner, the server obtains a plurality of uncertainty parameters corresponding to the plurality of first planting decisions respectively based on the plurality of second virtual environment parameters. The server obtains a plurality of evaluation values of the plurality of first planting decisions based on the second virtual environment parameter and the plurality of first planting decisions. The server determines a target planting decision based on the plurality of uncertainty parameters and the plurality of evaluation values.

The above embodiment will be described in three parts, the first part describing a method for the server to acquire uncertainty parameters, the second part describing a method for the server to acquire evaluation values, and the third part describing a method for the server to determine a target planting decision.

In a first part, the server obtains an average value of a plurality of second virtual environment parameters. The server obtains a plurality of uncertainty parameters respectively corresponding to the plurality of planting decisions based on the variance between the plurality of second virtual environment parameters and the mean value and the number of the second virtual environment parameters.

To illustrate this section more clearly, the physical meaning of the uncertainty parameter is first explained.

In the foregoing description, it has been known that the virtual planting environment model includes a plurality of sub-models, each of which is trained based on a different subset of the sample data set, and thus the model parameters of each sub-model are different. Although each server inputs the same first virtual environment parameter and corresponding first planting decision into each sub-model, the second virtual environment parameters output by the plurality of sub-models may be different. The different second virtual environment parameters output by the multiple sub-models can be regarded as "divergence" of the virtual planting environment model, and the "divergence" can also represent the uncertainty of the virtual planting environment model for the predicted second virtual environment parameters, and the uncertainty parameters in the embodiment of the application are parameters for quantifying the "divergence".

On the basis of the above description, the description is continued on the contents of the first section.

In one possible embodiment, the server can obtain a plurality of uncertainty parameters corresponding to a plurality of planting decisions, respectively, based on a manner indicated by the following formula (4).

Wherein, V(s)_t，a_t) Is the uncertainty parameter of the second virtual environment parameter, K is the number of the second virtual environment parameters, u_t ⁱ(s_t，a_t) A second virtual environment parameter predicted for a different submodel, i being an identification of the submodel,

for representing an average of a plurality of second virtual environment parameters,

is the square of the two-norm,

x_iis the number in the vector and N is the number of numbers in the vector.

For example, if there are three submodels in the virtual planting environment model, the server inputs the first virtual environment parameter and a planting decision into the three submodels, respectively, and then obtains three second virtual environment parameters. After the server respectively inputs the first environment parameter and the other planting decision into the three submodels, the other three second virtual environment parameters can be obtained. For the convenience of understanding, the following description will take the example that the server inputs the first environment parameter and a planting decision into three sub-models respectively. The three submodels output three second virtual environment parameters, such as (80, 55, 20), (60, 50, 30) and (70, 45, 10). The server can obtain the mean of the three second virtual environment parameters, i.e. (70, 50, 20). The server obtains the squared values of the two norms of the differences between the three second virtual environment parameters and the mean value, i.e. 125, 200 and 125, respectively. The server adds 125, 200, and 125, and multiplies by 1/(3-1), so as to obtain the variance 225 between the three second virtual environment parameters and the mean value, that is, the uncertainty parameter corresponding to the second virtual environment parameter output by the virtual environment.

In this embodiment, the variance of the mean is used to represent the uncertainty parameter, which can reflect the inconsistency between different submodels.

In one possible embodiment, the server can obtain a plurality of uncertainty parameters corresponding to a plurality of planting decisions, respectively, based on a manner indicated by the following equation (5).

Wherein the content of the first and second substances,

and the variance between the second virtual environment parameter output for the plurality of sub-models and the second virtual environment parameter output by the virtual planting environment model.

For example, if there are three submodels in the virtual planting environment model, the server inputs the first virtual environment parameter and a planting decision into the three submodels, respectively, and then obtains three second virtual environment parameters. After the server respectively inputs the first environment parameter and the other planting decision into the three submodels, the other three second virtual environment parameters can be obtained. For the convenience of understanding, the following description will take the example that the server inputs the first environment parameter and a planting decision into three sub-models respectively. The three submodels output three second virtual environment parameters, such as (80, 55, 20), (60, 50, 30) and (70, 45, 10). The server can obtain the mean of the three second virtual environment parameters, i.e. (70, 50, 20). The server obtains the squared values of the two norms of the variances between the three second virtual environment parameters and the mean, i.e. 41.6, 66.6 and 41.6, respectively. The server adds 41.6, 66.6 and 41.6, and multiplies 1/3, so as to obtain a sum of variances between the three second virtual environment parameters and the mean value of 49.9, that is, an uncertainty parameter corresponding to the second virtual environment parameter output by the virtual environment.

In this embodiment, the uncertainty parameter is represented by the mean of the variances, which represents the uncertainty level of the individual submodels, and the mean represents the average level.

In one possible embodiment, the server can obtain a plurality of uncertainty parameters corresponding to a plurality of planting decisions, respectively, based on a manner indicated by the following formula (6).

Where max is taken to be the maximum value.

For example, if there are three submodels in the virtual planting environment model, the server inputs the first virtual environment parameter and a planting decision into the three submodels, respectively, and then obtains three second virtual environment parameters. After the server respectively inputs the first environment parameter and the other planting decision into the three submodels, the other three second virtual environment parameters can be obtained. For the convenience of understanding, the following description will take the example that the server inputs the first environment parameter and a planting decision into three sub-models respectively. The three submodels output three second virtual environment parameters, such as (80, 55, 20), (60, 50, 30) and (70, 45, 10). The server can obtain the mean of the three second virtual environment parameters, i.e. (70, 50, 20). The server obtains the squared values of the two norms of the variances between the three second virtual environment parameters and the mean, i.e. 41.6, 66.6 and 41.6, respectively. The server determines the largest variance 66.6 from 41.6, 66.6 and 41.6 as the uncertainty parameter for the second virtual environment parameter of the virtual environment output.

In this embodiment, the maximum value of the variance is used to represent the uncertainty parameter, which can represent the uncertainty level of a single sub-model, and the maximum value represents the worst case.

In a possible implementation manner, if the multiple sub-models are BNN models, the sub-models can directly output the mean and variance of gaussian distribution obeyed by the second virtual environment parameter, and the server can obtain the uncertainty parameter of the second virtual environment parameter output by the virtual planting environment model based on the formulas (4) to (6), and the implementation process and the description thereof belong to the same inventive concept, and are not repeated herein.

And a second part, inputting the plurality of planting decisions and the plurality of second virtual environment parameters into a planting decision evaluation model by the server, and outputting a plurality of evaluation values of the plurality of planting decisions by the planting decision evaluation model.

Wherein the planting decision evaluation model has the capability of outputting an evaluation value based on the planting decision and the second virtual environment parameter, and the decision evaluation model is a model constructed based on formula (1). In some embodiments, if an action-decision (Actor-Critic) model is used to construct the reinforcement learning model, then the decision evaluation model is a Critic model and the planting decision determination model is an Actor model.

When the virtual planting environment is in an environment state indicated by the first virtual environment parameter, the planting decision determining model can make different planting decisions, and different planting decisions bring different benefits, wherein the benefit refers to 'knowledge' which can be obtained by the planting decision model in the training process. And the decision evaluation model can evaluate the benefits of a plurality of planting decisions based on the formula (1), namely outputting a plurality of evaluation values.

And thirdly, the server determines a target planting decision based on the uncertainty parameters and the evaluation values.

In one possible embodiment, the server fuses the plurality of uncertainty parameters and the corresponding plurality of evaluation values, respectively, to obtain a multi-fusion evaluation value. And the server determines a first planting decision corresponding to the highest fusion evaluation value in the fusion evaluation values as a target planting decision. In some embodiments, the scheme by which the server determines the target planting decision may also be referred to as data screening.

For example, the server can determine a target planting decision based on a plurality of uncertainty parameters and a plurality of evaluation values based on a manner indicated by equation (7) below.

a_t＝arg max{Q_θ(s_t，a)+λV(s_t，a)} (7)

Wherein, a_tFor the target planting decision, argmax is the maximum of the acquisition function, a is the first plurality of planting decisions, s_tFor the first virtual environment parameter, Q () is the rating for the first plurality of planting decisions, V () is the uncertainty parameter, λ is a constant, { Q () + λ V () }, i.e., the fusion rating.

In this implementation, the uncertainty parameter is used to represent the reliability of the corresponding second virtual environment parameter, i.e. the "risk", and the evaluation value is used to represent the degree of influence of the corresponding planting decision on the training of the planting decision model, i.e. the "benefit". The server determines a target planting decision by combining the risk and the income, can select the target planting decision with the maximum income under certain risk, and improves the training effect of the planting decision determination model.

404. And the server adjusts the model parameters of the planting decision determining model based on the evaluation value and the uncertainty parameters of the target planting decision.

In one possible embodiment, the server obtains a penalty weight corresponding to the uncertainty parameter of the target planting decision, the penalty weight being inversely proportional to the uncertainty parameter of the target planting decision. And the server acquires a loss value corresponding to the planting decision determination model based on the evaluation value of the target planting decision. And adjusting the model parameters of the planting decision determining model based on the product of the penalty weight and the loss value. In some embodiments, the process by which the server generates penalty weights may also be referred to as the process of data generation.

For example, after the server determines the target planting decision, the server obtains a new data pair(s)_t，a_t，r_t，s_t+1) Wherein s is_tIs a first virtual environment parameter, a_tFor the objective planting decision, r_tEvaluation value for target planting decision, s_t+1A second virtual environment parameter corresponding to the target planting decision for the prediction of the virtual planting environment model. The server can transmit the data pair(s)_t，a_t，r_t，s_t+1) Adding the data pair into a sample pool for training a planting decision model, and distinguishing the data pair from data pairs obtained by a real planting environment by a server_t，a_t，r_t，s_t+1) An uncertainty parameter associated with the second virtual environment parameter is increased. The server is based on the data pair(s)_t，a_t，r_t，s_t+1) R in_tFor planting decisionThe server can update the model parameters of the model based on the data pair(s)_t，a_t，r_t，s_t+1) Corresponding uncertainty parameters generate penalty weights, the penalty weights are in inverse proportion to the uncertainty parameters, namely the larger the uncertainty parameters are, the smaller the penalty weights are, and r is adopted_tThe smaller the influence when updating the model parameters of the planting decision model. In some embodiments, penalty weight w_iIs (0, 1), penalty weight w_iThe larger the value of (a) is, the smaller the uncertainty parameter of the data pair is, the higher the authenticity of the data pair is, and the larger the amplitude when the model parameter of the planting decision determination model is updated based on the evaluation value in the data pair is. Penalty weight w_iThe smaller the value of (a), the larger the uncertainty parameter of the data pair, the lower the authenticity of the data pair, and the smaller the amplitude when updating the model parameter of the planting decision determination model based on the evaluation value in the data pair. In addition, when w_iWhen the value is 0, the data pair has infinite uncertainty, the penalty weight is 0, and the updating of the planting decision determining model is not affected; when w is_iWhen the confidence level is 1, the data pair is the highest, the confidence level is equal to the data sampled from the real environment, and the server can act on the updating of the planting decision determination model like the real data.

In the embodiments of the present application, two methods for determining penalty weights are provided, which are explained below.

Mode 1, the server determines penalty weights based on the mode indicated by equation (8).

w_i＝σ(-V(s_t，a_t)×T)+0.5 (8)

Wherein σ () is Sigmoid function, σ (x) is 1/(1+ e)^-x) V () is an uncertainty parameter and T is a temperature coefficient for controlling the penalty strength.

The penalty weight is determined by the mode 1, so that the data in the sample pool can be more fully utilized when the planting decision determining model is trained, and even if the uncertainty of the data in the sample pool is large, the parameters can still be used for updating the planting decision determining model.

Mode 2, the server determines penalty weights based on the mode indicated by formula (9).

w_i＝σ(-V(s_t，a_t)×T)×2 (9)

The penalty weight is determined by the mode 2, a plurality of submodels of the virtual planting environment model can be more fully utilized when the planting decision determining model is trained, for data in a sample pool with high uncertainty, the data can hardly affect the planting decision determining model in the model training process, and for a deterministic sample, the deterministic sample can be approximately regarded as a real sample for training.

It should be noted that, after the server trains the model parameters of the plant decision determination model, the model parameters of the plant decision evaluation model can be updated based on the penalty weight and the second virtual environment data, and the updating mode and the plant decision determination model are the same inventive concept and are not described herein again.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

The application provides a training method of a planting decision determination model, which corresponds to the balance problem of exploration-utilization of reinforcement learning. By data collection based on uncertain parameters, the sampling complexity of data can be greatly reduced, and the cost of interaction with a real environment can be reduced by constructing a virtual planting environment model; by data generation based on uncertainty parameters, the quality of the generated data can be scored, and corresponding penalty weights can be designed based on the scores. The technical scheme provided by the application can be embedded into the updating process of the planting decision-making determination model, and the model training can be carried out more safely and robustly. The main flow of the application of the method is shown in fig. 5 in combination with the above possible embodiments.

It should be noted that the technical solution provided in the embodiment of the present application can be applied to not only the agricultural planting scene but also other scenes with high sample data collection cost, such as automatic driving, medical treatment, stock and other scenes, which are not limited in the embodiment of the present application.

In the automatic driving scene, the virtual environment is also a virtual driving environment for simulating the driving of the vehicle, and the virtual environment parameters are related parameters of the virtual driving environment obtained by the virtual vehicle through the virtual sensor. In a medical scenario, a virtual environment is also a virtual body environment of a virtual patient, and the virtual environment parameters are related parameters of the virtual body environment acquired by a virtual instrument. In a stock scene, a virtual environment is also a virtual large disk, virtual trading exists in the virtual large disk, and parameters of the virtual environment are also virtual trading information of the large disk.

Fig. 6 is a flowchart of a training method of a decision-making determination model provided in an embodiment of the present application, and referring to fig. 6, the method includes:

601. the server inputs the third virtual environment parameter into the decision determining model, and a plurality of first decisions corresponding to the third virtual environment parameter are obtained through the decision determining model, wherein the third virtual environment parameter is used for representing the environment state of the virtual environment, and the first decisions are used for changing the environment state of the virtual environment.

The decision making meaning is different in different scenes, and the decision making is the decision making on steering of a steering wheel, the opening degree of an accelerator and the opening degree of a brake in an automatic driving scene; in a stock scene, the decision is a decision for buying and selling stock all the time; in a medical scenario, the decision is whether the patient has a certain disease.

In a possible implementation manner, the server inputs the third virtual environment parameter into the decision-making determination model, and multiplies the third virtual environment parameter by the weight matrix of the decision-making determination model to obtain the eigenvector of the third virtual environment parameter. And the server adds the characteristic vector and the bias matrix of the decision-making determination model and then carries out normalization processing to obtain a second decision corresponding to the third virtual environment parameter. And the server performs data enhancement on the second decision to obtain a plurality of first decisions corresponding to the third virtual environment parameters.

In the above embodiment, a method for enhancing data of the second decision is described as follows:

in a possible implementation manner, the server adds a plurality of biases to the second decisions respectively to obtain a plurality of first decisions corresponding to the third virtual environment parameters. The above embodiment and step 401 belong to the same inventive concept, and the implementation process refers to the description of step 401, which is not described herein again.

In order to more clearly describe step 601, the following description will be given taking the application scenarios of auto-driving, stock and medical care as examples.

In the automatic driving scenario, the steering angle of the steering wheel is controlled by taking the decision as an example. And the server inputs a third virtual environment parameter into the decision-making determination model, wherein the third virtual environment parameter comprises the distance between the virtual obstacle in the virtual driving environment and the virtual vehicle, the included angle between the virtual obstacle and the advancing direction of the virtual vehicle and the running speed of the virtual vehicle. In some embodiments, the third virtual environment parameter is a vector (20, 75, 50), and the numbers in the vector (20, 75, 50) correspond to the distance between the virtual obstacle and the virtual vehicle, the angle between the virtual obstacle and the forward direction of the virtual vehicle, and the running speed of the virtual vehicle, respectively. The server inputs the vector (20, 75, 50) into the decision-making model, and associates the vector (20, 75, 50) with the decision-making modelWeight matrix of type

The multiplication results in a feature vector (195, 75, 220). The server adds the feature vectors (195, 75, 220) to the bias matrices (10, 15, 10) of the decision-making models to obtain vectors (205, 90, 230). The server normalizes the vectors (205, 90, 230) through a normalization layer of the decision-making determination model to obtain probabilities (0.39, 0.17, 0.43) corresponding to 3 decisions, wherein 0.39 is a probability of +10, and +10 is used for indicating that the steering wheel is rotated by 10 degrees clockwise; 0.17 is a probability of 0, 0 representing that the steering wheel angle is unchanged; a probability of-10 for 0.43, -10 indicates that the steering wheel is rotated 10 counter clockwise. The server can determine the most probable decision of the 3 decisions as the second decision, i.e., -10. The server can perform data enhancement on the second decision, that is, increase the offset on the basis of "-10" to obtain a plurality of first strategies, for example, when the value range of the offset is (0, 3), the obtained plurality of first planting strategies may be-8, -9, -11, and-12, and the like, which is not limited in the embodiment of the present application. It should be noted that, in this example, the third virtual environment parameter includes a distance from the virtual obstacle to the virtual vehicle in the virtual driving environment, an included angle between the virtual obstacle and a forward direction of the virtual vehicle, and a running speed of the virtual vehicle, in other possible implementations, the third virtual environment parameter may also include more or less parameters, which is not limited in this application.

In the stock scenario, the decision is taken as an example to buy or sell a certain stock. The server inputs a third virtual environment parameter into the decision-making model, the third virtual environment parameter including a share price of the virtual stock, a trading volume of the virtual stock, and an index of the virtual large disk. In some embodiments, the third virtual environment parameter is a vector (15, 750, 3000), and the numbers in the vector (15, 750, 3000) correspond to the stock price of the virtual stock, the trading volume of the virtual stock, and the index of the virtual large disk, respectively. The server inputs the vector (15, 750, 3000) into the decision-making model, and associates the vector (15, 750, 3000) with the decision-making modelWeight matrix of type

The multiplication results in a feature vector (165, 30, 90). The server adds the feature vectors (165, 30, 90) to the bias matrix (5, 10, 5) of the decision-making model to obtain vectors (170, 40, 95). The server normalizes the vectors (170, 40, 95) through a normalization layer of the decision-making determination model to obtain the probabilities (0.56, 0.13, 0.31) corresponding to 3 decisions, wherein 0.56 is the probability of +1000, and +1000 is used for representing that 1000 shares of the virtual stock are bought; 0.13 is a probability of 0, 0 indicating no operation is performed; 0.31 is a probability of-1000, -1000 indicates that 1000 shares of the virtual stock are sold. The server can determine the decision with the highest probability of the 3 decisions as the second decision, i.e., + 1000. The server can perform data enhancement on the second decision, that is, increase the bias on the basis of "+ 1000" to obtain a plurality of first strategies, for example, when the value range of the bias is (100, 300), the obtained plurality of first planting strategies may be +800, +900, +1100, and +1200, and the like, which is not limited in the embodiment of the present application. It should be noted that, in this example, the third virtual environment parameter includes a share price of the virtual stock, a trading volume of the virtual stock, and an index of the virtual large disk, in other possible implementations, the third virtual environment parameter may also include more or less parameters, which is not limited in this application.

In the medical context, the example of a decision to control the amount of fluid to be infused into a patient is described. The server enters a third virtual environment parameter into the decision-making model, the third virtual environment parameter comprising a body temperature of the virtual patient, a white blood cell count of the virtual patient, and a blood oxygen concentration of the virtual patient. In some embodiments, the third virtual environment parameter is a vector (37, 6000, 95), and the numbers in the vector (37, 6000, 95) correspond to the temperature of the virtual patient, the number of white blood cells in the virtual patient, and the blood oxygen concentration in the virtual patient, respectively. The server inputs the vector (37, 6000, 95) into the decision-making model, and associates the vector (37, 6000, 95) with a weight matrix of the decision-making model

Multiplication results in a feature vector (157, 95, 97). The server adds the feature vectors (157, 95, 97) to the bias matrices (10, 5, 5) of the decision-making models to obtain vectors (167, 100, 102). The server normalizes the vectors (167, 100, 102) through a normalization layer of the decision determination model to obtain probabilities (0.45, 0.27, 0.28) corresponding to 3 decisions, wherein 0.45 is a probability of +1, and +1 is used for indicating that the amount of liquid to be input into the virtual patient is increased by 10 milliliters; 0.27 is a probability of 0, 0 representing a constant amount of fluid infused into the virtual patient; a probability of-1 for 0.28, -1 indicates a 10 ml reduction in the amount of fluid infused into the virtual patient. The server can determine the decision with the highest probability of the 3 decisions as the second decision, i.e., +1. The server can perform data enhancement on the second decision, that is, increase the bias on the basis of "+ 1" to obtain a plurality of first strategies, for example, when the value range of the bias is (0, 0.1), the obtained plurality of first planting strategies may be +0.92, +0.95, +1.05, and +1.06, and the like, which is not limited in the embodiment of the present application.

602. The server obtains a plurality of fourth virtual environment parameters respectively corresponding to the plurality of first decisions based on the third virtual environment parameters and the corresponding plurality of first decisions, wherein the fourth virtual environment parameters are predicted virtual environment parameters after the corresponding decisions are executed in the virtual environment.

In one possible embodiment, the server inputs the third virtual environment parameter and the corresponding first decisions into the virtual environment model, and outputs a plurality of fourth virtual environment parameters corresponding to the first decisions, respectively, from the virtual environment model, and the virtual environment model is used for simulating the virtual environment. The implementation process of this embodiment is referred to the above step 402, and is not described herein again.

For example, the virtual environment model includes a plurality of submodels, the submodels are obtained by training different data subsets based on the same sample data set, the server inputs the third virtual environment parameter and the corresponding first decisions into the submodels, and a plurality of fourth virtual environment parameters corresponding to the first decisions are obtained through the submodels.

The above embodiments are further described below with reference to different application scenarios.

In the automatic driving scenario, the steering angle of the steering wheel is controlled by taking the decision as an example. And the server inputs a third virtual environment parameter into the decision-making determination model, wherein the third virtual environment parameter comprises the distance between the virtual obstacle in the virtual driving environment and the virtual vehicle, the included angle between the virtual obstacle and the advancing direction of the virtual vehicle and the running speed of the virtual vehicle. Taking the number of the first decision as one, the number of the submodels as two as an example, the third virtual environment parameter is a number in a vector (20, 75, 50) corresponding to the distance between the virtual obstacle and the virtual vehicle, the angle between the virtual obstacle and the advancing direction of the virtual vehicle and the running speed of the virtual vehicle, respectively, and the first decision is a number "-10", which indicates that the steering wheel is rotated 10 ° counterclockwise. The server splices the vector (20, 75, 50) and the number 10 to obtain the vector (20, 75, 50, 10). The server inputs the vectors (20, 75, 50, 10) into the two submodels, respectively, and associates the vectors (20, 75, 50, 10) with the weight matrices of the two submodels, respectively

And

and multiplying to obtain the feature vector (70, 60, 75) and the feature vector (60, 125, 145). The server adds the feature vector (70, 60, 75) and the feature vector (60, 125, 145) to the bias matrixes (20, -10, -25) and (-20, -20, 30) of the two submodels respectively to obtain a vector (90, 50, 50) and a vector (40, 105, 175), wherein the vector (90, 50, 50) and the vector (40, 105, 175) are fourth virtual environment parameters output by the two submodels respectively, and the meaning of the number in the fourth virtual environment parameters is the same as that of the third virtual environment parameters. The virtual environment can use the average value of the two virtual environment parameters output by the two submodels as the fourth environment parameter output by the virtual environment.

On the strandIn the ticket scenario, the decision is taken as an example to buy or sell a certain stock. The server inputs a third virtual environment parameter into the decision-making model, the third virtual environment parameter including a share price of the virtual stock, a trading volume of the virtual stock, and an index of the virtual large disk. Taking the number of the first decision as one, the number of the submodels as two as an example, the third virtual environment parameter is the number in a vector (15, 750, 3000) corresponding to the stock price of the virtual stock, the trading volume of the virtual stock and the index of the virtual big disk respectively, and the first decision is the number "+ 1000" indicating that 1000 shares of the virtual stock are bought. The server splices the vector (20, 75, 50) and the number 10 to obtain a vector (15, 750, 3000, 10). The server inputs the vectors (15, 750, 3000, 10) into the two submodels respectively, and associates the vectors (15, 750, 3000, 10) with the weight matrixes of the two submodels respectively

And

and multiplying to obtain the feature vector (130, 10, 22.5) and the feature vector (10, 105, 25). The server adds the feature vector (130, 10, 22.5) and the feature vector (10, 105, 25) to the bias matrixes (20, 10, 10) and (10, 20, 10) of the two submodels respectively to obtain a vector (150, 20, 32.5) and a vector (20, 125, 30), wherein the vector (150, 20, 32.5) and the vector (20, 125, 30) are fourth virtual environment parameters output by the two submodels respectively, and the meaning of the number in the fourth virtual environment parameters is the same as that of the third virtual environment parameters. The virtual environment can use the average value of the two virtual environment parameters output by the two submodels as the fourth environment parameter output by the virtual environment.

In the medical context, the example of a decision to control the amount of fluid to be infused into a patient is described. The server enters a third virtual environment parameter into the decision-making model, the third virtual environment parameter comprising a body temperature of the virtual patient, a white blood cell count of the virtual patient, and a blood oxygen concentration of the virtual patient. Take the number of the first decisions as one, and the number of the submodels as two as an exampleThe third virtual environment parameter is a number in a vector (37, 6000, 95) corresponding to the temperature of the virtual patient, the number of white blood cells of the virtual patient and the blood oxygen concentration of the virtual patient, respectively, and the first decision is a number "+ 1" indicating a 10 ml reduction in the amount of fluid infused into the virtual patient. The server splices the vector (37, 6000, 95) and the number 1 to obtain a vector (37, 6000, 95, 1). The server inputs the vectors (37, 6000, 95, 1) into the two submodels respectively, and associates the vectors (37, 6000, 95, 1) with the weight matrixes of the two submodels respectively

And

and multiplying to obtain the feature vector (95, 133, 60) and the feature vector (96, 155, 192). The server adds the feature vector (95, 133, 60) and the feature vector (96, 155, 192) to the bias matrixes (-20, 20, -25) and (-30, -50, 30) of the two submodels respectively to obtain a vector (75, 153, 35) and a vector (66, 105, 222), wherein the vector (75, 153, 35) and the vector (66, 105, 222) are fourth virtual environment parameters output by the two submodels respectively, and the meaning of the number in the fourth virtual environment parameters is the same as that of the third virtual environment parameters. The virtual environment can use the average value of the two virtual environment parameters output by the two submodels as the fourth environment parameter output by the virtual environment.

603. The server determines a target decision based on a plurality of fourth virtual environment parameters, wherein the evaluation value and the corresponding uncertainty parameter of the target decision accord with the target condition, the uncertainty parameter is used for representing the credibility of the corresponding fourth virtual environment parameter, and the evaluation value is used for representing the degree of influence of the corresponding decision on the training decision determination model.

In a possible implementation manner, the server obtains a plurality of uncertainty parameters corresponding to the plurality of first decisions, respectively, based on the plurality of fourth virtual environment parameters. The server obtains a plurality of evaluation values of the first decisions based on the fourth virtual environment parameter and the first decisions. The server determines a target decision based on the plurality of uncertainty parameters and the plurality of evaluation values.

The above-described embodiment will be described in three parts, the first part describing a method for acquiring uncertainty parameters by a server, the second part describing a method for acquiring evaluation values by a server, and the third part describing a method for determining a target decision by a server.

In a first part, the server obtains an average value of a plurality of fourth virtual environment parameters. The server obtains a plurality of uncertainty parameters corresponding to the first decisions based on variances between the plurality of fourth virtual environment parameters and the mean value and the number of the fourth virtual environment parameters.

And a second part, inputting the plurality of decisions and the plurality of fourth virtual environment parameters into a decision evaluation model by the server, and outputting a plurality of evaluation values of the plurality of first decisions by the decision evaluation model.

And a third step of fusing the uncertainty parameters and the corresponding evaluation values by the server to obtain multi-fusion evaluation values. And determining a first decision corresponding to the highest fusion evaluation value in the plurality of fusion evaluation values as a target decision.

It should be noted that the processing procedure of the three parts and the processing procedure in step 403 belong to the same inventive concept, and are not described herein again.

604. And the server adjusts the model parameters of the decision-making determination model based on the evaluation value and the uncertainty parameters of the target decision.

In one possible embodiment, the server obtains a penalty weight corresponding to the uncertainty parameter of the target decision, the penalty weight being inversely proportional to the uncertainty parameter of the target decision. And the server acquires a loss value corresponding to the decision determination model based on the evaluation value of the target decision. The server adjusts model parameters of the decision-making model based on a product of the penalty weight and the loss value.

It should be noted that step 604 and step 404 belong to the same inventive concept, and are not described herein again.

In the embodiment of the application, the server introduces a virtual environment in the process of training the model, and expands a plurality of first decisions as training samples based on a first virtual environment parameter of the virtual environment. In the process of training the decision-making determination model by using the expanded training samples, the server introduces uncertainty parameters for evaluating the sample reliability, and trains the decision-making determination model by combining the uncertainty parameters and the expanded training samples. In this case, the server not only increases the number of training samples, but also adjusts the degree of influence of the training samples on model training by combining with uncertainty parameters, so that the accuracy of the decision-making determination model obtained by training is higher. In the subsequent use process, the decision-making determination model can output effective decisions.

Fig. 7 is a schematic structural diagram of a training apparatus for a plant decision determination model according to an embodiment of the present application, and referring to fig. 7, the apparatus includes: a first input module 701, a second virtual environment parameter obtaining module 702, a target planting decision determining module 703 and a first model training module 704.

The first input module 701 is configured to input the first virtual environment parameter into the planting decision determining model, and obtain a plurality of first planting decisions corresponding to the first virtual environment parameter through the planting decision determining model, where the first virtual environment parameter is used to represent an environment state of the virtual planting environment, and the first planting decisions are used to change the environment state of the virtual planting environment.

The second virtual environment parameter obtaining module 702 is configured to obtain, based on the first virtual environment parameter and the corresponding multiple first planting decisions, multiple second virtual environment parameters corresponding to the multiple first planting decisions, respectively, where the second virtual environment parameter is a predicted virtual environment parameter after the corresponding planting decision is executed in the virtual planting environment.

The target planting decision determining module 703 is configured to determine a target planting decision based on the plurality of second virtual environment parameters, where an evaluation value and a corresponding uncertainty parameter of the target planting decision conform to a target condition, the uncertainty parameter is used to indicate a reliability of the corresponding second virtual environment parameter, and the evaluation value is used to indicate an influence degree of the corresponding planting decision on the training of the planting decision determining model.

The first model training module 704 is configured to adjust model parameters of the planting decision determination model based on the evaluation value and the uncertainty parameter of the target planting decision.

In a possible implementation manner, the first input module is configured to input the first virtual environment parameter into the planting decision determining model, and multiply the first virtual environment parameter by a weight matrix of the planting decision determining model to obtain a feature vector of the first virtual environment parameter. And adding the characteristic vector and the bias matrix of the planting decision determining model, and then carrying out normalization processing to obtain a second planting decision corresponding to the first virtual environment parameter. And performing data enhancement on the second planting decision to obtain a plurality of first planting decisions corresponding to the first virtual environment parameters.

In a possible implementation manner, the first input module is configured to add a plurality of offsets to the second planting decisions, respectively, to obtain a plurality of first planting decisions corresponding to the first virtual environment parameters.

In a possible implementation manner, the target planting decision determining module is configured to obtain a plurality of uncertainty parameters corresponding to the plurality of first planting decisions, respectively, based on the plurality of second virtual environment parameters. And obtaining a plurality of evaluation values of the plurality of first planting decisions based on the second virtual environment parameter and the plurality of first planting decisions. And determining a target planting decision based on the plurality of uncertainty parameters and the plurality of evaluation values.

In a possible implementation manner, the target planting decision determining module is configured to obtain an average value of a plurality of second virtual environment parameters. Obtaining a plurality of uncertainty parameters respectively corresponding to the plurality of first planting decisions based on the variance between the plurality of second virtual environment parameters and the mean value and the number of the second virtual environment parameters.

In a possible implementation manner, the target planting decision determining module is configured to input the plurality of planting decisions and the plurality of second virtual environment parameters into the planting decision evaluation model, and output a plurality of evaluation values of the plurality of first planting decisions by the planting decision evaluation model.

In a possible implementation manner, the target planting decision determining module is configured to fuse the plurality of uncertainty parameters and the plurality of corresponding evaluation values, respectively, to obtain a plurality of fused evaluation values. And determining the first planting decision corresponding to the highest fusion evaluation value in the plurality of fusion evaluation values as a target planting decision.

In a possible implementation manner, the second virtual environment parameter obtaining module is configured to input the first virtual environment parameter and the corresponding plurality of first planting decisions into the virtual planting environment model, output, by the virtual planting environment model, a plurality of second virtual environment parameters respectively corresponding to the plurality of first planting decisions, and use the virtual planting environment model to simulate the virtual planting environment.

In a possible implementation manner, the virtual planting environment model includes a plurality of submodels, the plurality of submodels are obtained by training different data subsets based on the same sample data set, and the second virtual environment parameter obtaining module is configured to input the first virtual environment parameters and the corresponding plurality of first planting decisions into the plurality of submodels, and obtain a plurality of second virtual environment parameters corresponding to the plurality of first planting decisions through the plurality of submodels.

In a possible implementation manner, the first model training module is configured to obtain a penalty weight corresponding to an uncertainty parameter of the target planting decision, where the penalty weight is inversely proportional to the uncertainty parameter of the target planting decision. And obtaining a loss value corresponding to the planting decision determination model based on the evaluation value of the target planting decision. And adjusting the model parameters of the planting decision determining model based on the product of the penalty weight and the loss value.

Fig. 8 is a schematic structural diagram of a training apparatus for a decision-making determination model according to an embodiment of the present application, and referring to fig. 8, the apparatus includes: a second input module 801, a fourth environment parameter obtaining module 802, a goal decision determining module 803, and a second model training module 804.

A second input module 801, configured to input the third virtual environment parameter into the decision determining model, and obtain, through the decision determining model, a plurality of first decisions corresponding to the third virtual environment parameter, where the third virtual environment parameter is used to represent an environment state of the virtual environment, and the first decisions are used to change the environment state of the virtual environment.

A fourth environment parameter obtaining module 802, configured to obtain, based on the third virtual environment parameter and the corresponding multiple first decisions, multiple fourth virtual environment parameters corresponding to the multiple first decisions, where the fourth virtual environment parameters are predicted virtual environment parameters after corresponding decisions are executed in the virtual environment.

And a target decision determining module 803, configured to determine a target decision based on a plurality of fourth virtual environment parameters, where an evaluation value of the target decision and corresponding uncertainty parameters meet a target condition, the uncertainty parameters are used to indicate a confidence level of the corresponding fourth virtual environment parameters, and the evaluation value is used to indicate an influence degree of the corresponding decision on the training decision determining model.

The second model training module 804 is configured to adjust model parameters of the decision-making model based on the evaluation value and the uncertainty parameter of the target decision.

In a possible implementation manner, the second input module is configured to input the third virtual environment parameter into the decision-making determination model, and multiply the third virtual environment parameter by the weight matrix of the decision-making determination model to obtain the eigenvector of the third virtual environment parameter. And adding the characteristic vector and the bias matrix of the decision-making determination model, and then carrying out normalization processing to obtain a second decision corresponding to the third virtual environment parameter. And performing data enhancement on the second decision to obtain a plurality of first decisions corresponding to the third virtual environment parameters.

In a possible implementation manner, the second input module is configured to add a plurality of offsets to the second decisions, respectively, to obtain a plurality of first decisions corresponding to the third virtual environment parameters.

In a possible implementation manner, the goal decision determining module is configured to obtain a plurality of uncertainty parameters corresponding to the plurality of first decisions, respectively, based on the plurality of fourth virtual environment parameters. And obtaining a plurality of evaluation values of the plurality of first decisions based on the fourth virtual environment parameter and the plurality of first decisions. Based on the plurality of uncertainty parameters and the plurality of evaluation values, a target decision is determined.

In a possible implementation manner, the fourth environment parameter obtaining module is configured to obtain an average value of a plurality of fourth virtual environment parameters. Obtaining a plurality of uncertainty parameters respectively corresponding to the plurality of first decisions based on the variances between the plurality of fourth virtual environment parameters and the mean value and the number of the fourth virtual environment parameters.

In a possible implementation, the goal decision determining module is configured to input the plurality of decisions and the plurality of fourth virtual environment parameters into the decision evaluation model, and output, by the decision evaluation model, a plurality of evaluation values of the plurality of first decisions.

In a possible implementation manner, the goal decision determining module is configured to fuse the plurality of uncertainty parameters and the plurality of corresponding evaluation values, respectively, to obtain a plurality of fused evaluation values. And determining a first decision corresponding to the highest fusion evaluation value in the plurality of fusion evaluation values as a target decision.

In a possible implementation manner, the fourth environment parameter obtaining module is configured to input the third virtual environment parameter and the corresponding multiple first decisions into the virtual environment model, and output, by the virtual environment model, multiple fourth virtual environment parameters corresponding to the multiple first decisions, where the virtual environment model is used to simulate a virtual environment.

In a possible implementation manner, the virtual environment model includes a plurality of submodels, the plurality of submodels are obtained by training different data subsets based on the same sample data set, and the fourth environment parameter obtaining module is configured to input the third virtual environment parameter and the corresponding plurality of first decisions into the plurality of submodels, and obtain a plurality of fourth virtual environment parameters corresponding to the plurality of first decisions through the plurality of submodels.

In a possible implementation manner, the second model training module is used for obtaining a penalty weight corresponding to the uncertainty parameter of the target decision, and the penalty weight is inversely proportional to the uncertainty parameter of the target decision. And obtaining a loss value corresponding to the decision determination model based on the evaluation value of the target decision. Model parameters of the decision-making model are adjusted based on a product of the penalty weight and the loss value.

An embodiment of the present application provides a computer device, configured to perform the foregoing method, where the computer device may be implemented as a terminal or a server, and a structure of the terminal is introduced below:

fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal 900 may be: a smartphone, a tablet, a laptop, or a desktop computer. Terminal 900 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.

In general, terminal 900 includes: one or more processors 901 and one or more memories 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 901 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 901 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 901 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 902 is used to store at least one computer program for execution by the processor 901 to implement the training method of the planting decision determination model or the training method of the decision determination model provided by the method embodiments in the present application.

In some embodiments, terminal 900 can also optionally include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 904, a display screen 905, a camera assembly 906, an audio circuit 907, a positioning assembly 908, and a power supply 909.

The peripheral interface 903 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 901, the memory 902 and the peripheral interface 903 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 904 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth.

The display screen 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 905 is a touch display screen, the display screen 905 also has the ability to capture touch signals on or over the surface of the display screen 905. The touch signal may be input to the processor 901 as a control signal for processing. At this point, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard.

The camera assembly 906 is used to capture images or video. Optionally, camera assembly 906 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal.

Audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for realizing voice communication.

The positioning component 908 is used to locate the current geographic Location of the terminal 900 for navigation or LBS (Location Based Service).

Power supply 909 is used to provide power to the various components in terminal 900. The power source 909 may be alternating current, direct current, disposable or rechargeable.

In some embodiments, terminal 900 can also include one or more sensors 910. The one or more sensors 910 include, but are not limited to: acceleration sensor 911, gyro sensor 912, pressure sensor 913, fingerprint sensor 914, optical sensor 915, and proximity sensor 916.

The acceleration sensor 911 can detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 900.

The gyro sensor 912 may be a body direction and a rotation angle of the terminal 900, and the gyro sensor 912 may cooperate with the acceleration sensor 911 to acquire a 3D motion of the user with respect to the terminal 900.

The pressure sensor 913 may be disposed on a side bezel of the terminal 900 and/or underneath the display 905. When the pressure sensor 913 is disposed on the side frame of the terminal 900, the user's holding signal of the terminal 900 may be detected, and the processor 901 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 913. When the pressure sensor 913 is disposed at a lower layer of the display screen 905, the processor 901 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 905.

The fingerprint sensor 914 is used for collecting a fingerprint of the user, and the processor 901 identifies the user according to the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 identifies the user according to the collected fingerprint.

The optical sensor 915 is used to collect ambient light intensity. In one embodiment, the processor 901 may control the display brightness of the display screen 905 based on the ambient light intensity collected by the optical sensor 915.

The proximity sensor 916 is used to collect the distance between the user and the front face of the terminal 900.

Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of terminal 900, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

The computer device may also be implemented as a server, and the following describes a structure of the server:

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1000 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1001 and one or more memories 1002, where the one or more memories 1002 store at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 1001 to implement the methods provided by the foregoing method embodiments. Of course, the server 1000 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 1000 may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory including a computer program, is also provided, which is executable by a processor to perform the methods provided by the various method embodiments described above. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product or a computer program is also provided, which includes program code stored in a computer-readable storage medium, which is read by a processor of a computer device from the computer-readable storage medium, and which is executed by the processor to cause the computer device to execute the method provided by the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of training an implant decision determination model, the method comprising:

2. The method of claim 1, wherein inputting the first virtual environment parameter into a planting decision determination model, wherein obtaining a plurality of first planting decisions corresponding to the first virtual environment parameter from the planting decision determination model comprises:

inputting the first virtual environment parameter into a planting decision determining model, and multiplying the first virtual environment parameter by a weight matrix of the planting decision determining model to obtain a feature vector of the first virtual environment parameter;

adding the characteristic vector and the bias matrix of the planting decision determining model, and then carrying out normalization processing to obtain a second planting decision corresponding to the first virtual environment parameter;

and performing data enhancement on the second planting decision to obtain a plurality of first planting decisions corresponding to the first virtual environment parameters.

3. The method of claim 2, wherein the data enhancing the second planting decision to obtain the plurality of first planting decisions corresponding to the first virtual environment parameter comprises:

and respectively adding a plurality of biases to the second planting decisions to obtain a plurality of first planting decisions corresponding to the first virtual environment parameters.

4. The method of claim 1, wherein determining a target planting decision based on the plurality of second virtual environment parameters comprises:

obtaining a plurality of uncertainty parameters respectively corresponding to the plurality of first planting decisions based on the plurality of second virtual environment parameters;

obtaining a plurality of evaluation values of the plurality of first planting decisions based on the second virtual environment parameter and the plurality of first planting decisions;

determining the target planting decision based on the plurality of uncertainty parameters and the plurality of evaluation values.

5. The method of claim 4, wherein deriving a plurality of uncertainty parameters corresponding to the respective first planting decisions based on the second plurality of virtual environment parameters comprises:

obtaining the average value of the plurality of second virtual environment parameters;

obtaining a plurality of uncertainty parameters respectively corresponding to the plurality of first planting decisions based on the variance between the plurality of second virtual environment parameters and the mean value and the number of the second virtual environment parameters.

6. The method of claim 4, wherein deriving a plurality of valuations for the plurality of first planting decisions based on the second virtual environment parameter and the plurality of first planting decisions comprises:

inputting the plurality of planting decisions and the plurality of second virtual environment parameters into a planting decision evaluation model, and outputting a plurality of evaluation values of the plurality of first planting decisions by the planting decision evaluation model.

7. The method of claim 4, wherein the determining the target planting decision based on the plurality of uncertainty parameters and the plurality of evaluation values comprises:

and determining a first planting decision corresponding to the highest fusion evaluation value in the plurality of fusion evaluation values as the target planting decision.

8. The method of claim 1, wherein deriving a plurality of second virtual environment parameters corresponding to the plurality of first planting decisions, respectively, based on the first virtual environment parameters and the corresponding plurality of first planting decisions comprises:

inputting the first virtual environment parameters and the corresponding first planting decisions into a virtual planting environment model, and outputting a plurality of second virtual environment parameters respectively corresponding to the first planting decisions by the virtual planting environment model, wherein the virtual planting environment model is used for simulating the virtual planting environment.

9. The method of claim 8, wherein the virtual planting environment model comprises a plurality of sub-models trained on different data subsets of the same sample data set, wherein inputting the first virtual environment parameters and the corresponding first plurality of planting decisions into the virtual planting environment model, and wherein outputting, by the virtual planting environment model, a plurality of second virtual environment parameters corresponding to the respective first plurality of planting decisions comprises:

and inputting the first virtual environment parameters and the corresponding first planting decisions into the submodels respectively, and obtaining a plurality of second virtual environment parameters corresponding to the first planting decisions respectively through the submodels.

10. The method of claim 1, wherein the adjusting model parameters of the planting decision determination model based on the evaluation value and uncertainty parameters of the target planting decision comprises:

obtaining a penalty weight corresponding to an uncertainty parameter of the target planting decision, the penalty weight being inversely proportional to the uncertainty parameter of the target planting decision;

obtaining a loss value corresponding to the planting decision determination model based on the evaluation value of the target planting decision;

adjusting model parameters of the plant decision determination model based on a product of the penalty weight and the loss value.

11. A method of training a decision-making model, the method comprising:

12. A training apparatus for an implant decision determination model, the apparatus comprising:

13. An apparatus for training a decision-making model, the apparatus comprising:

14. A computer device, characterized in that the computer device comprises one or more processors and one or more memories, in which at least one computer program is stored, which is loaded and executed by the one or more processors to implement a method of training a planting decision determination model according to any one of claims 1 to 10, or to implement a method of training a decision determination model according to claim 11.

15. A computer-readable storage medium, in which at least one computer program is stored, which is loaded and executed by a processor to implement a method of training a plant decision determination model according to any one of claims 1 to 10, or to implement a method of training a decision determination model according to claim 11.