CN113361768A

CN113361768A - Grain depot health condition prediction method, storage device and server

Info

Publication number: CN113361768A
Application number: CN202110626531.1A
Authority: CN
Inventors: 刘池池; 孔松涛; 史勇; 谢义; 王堃; 王松; 郑袁; 彭博; 蒋思楠
Original assignee: Chongqing University of Science and Technology
Current assignee: Chongqing University of Science and Technology
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2021-09-07

Abstract

The method adopts a deep reinforcement learning technology, compares the prediction result of the previous period of the condition detection and health condition prediction of the grain depot with the acquired real result, sets reward or punishment according to the size interval of the difference, and corrects the model to keep the prediction result in the set reward interval in order to obtain the reward, thereby realizing the reliable prediction of the grain depot data. The method has the advantage of strong prediction capability of a general nonlinear model, combines the environment feedback capability of reinforcement learning, applies the capability to the prediction feedback of data in a certain period in the future, and improves the high reliability of the grain depot condition detection and health condition prediction data in a certain period.

Description

Grain depot health condition prediction method, storage device and server

Technical Field

The invention relates to the field of grain inventory, in particular to a grain warehouse health condition prediction method, storage equipment and a server.

Background

The grain is a necessary product for the survival of the national people, and the grain storage is well done, so that the grain storage has a positive promoting effect on the harmony and stability of the society. However, most of the grain storage bins in China have the problem of grain deterioration, the health condition of the grain depot in the prior art can only be dynamically monitored but the prediction capability is insufficient, the internal state of the grain depot cannot be subjected to health analysis and risk alarm, and the prevention of the grain deterioration cannot be realized.

At present, the detection can be realized for the condition detection and the health condition prediction of the grain depot, but the capability of predicting the future health condition is limited. For a general prediction model, a model can be established by using past real data and then applied to data prediction. The main drawbacks of this model are: the ability of meeting with the real environment and receiving feedback and learning is not provided. The future data is misjudged for a certain reason, after the data is obtained under the real condition and the prediction result is corrected, errors can still be generated in the future due to the same reason, and the learning capability is not provided.

Disclosure of Invention

The invention aims to solve the technical problem that the health condition of a grain depot in the prior art can only be dynamically monitored but the prediction capability is insufficient.

The invention provides a method for predicting the health condition of a grain depot, which comprises the following steps,

s1, collecting data of the sensor of the grain depot regularly, defining parameters to be collected as characteristics, wherein the characteristics comprise atmospheric temperature of stored grain, temperature in the grain depot, average temperature of the grain depot, air humidity and grain water content, and establishing a characteristic element subset F ═ X₁,X₂,X₃,......,X_n]Wherein n represents the number of the programmed features, corresponding to the dimensions of the feature subset;

s2, importing time parameters into the feature element subsets to form feature subsets corresponding to time instants

And forming a learning sample X ═ F by using the characteristic subsets in a certain time period₁,F₂,F₃,…,F_t]^TT represents time, T represents time period;

s3, labeling the established samples, wherein each feature subset corresponds to a score label y, y belongs to [0,100], and the higher the score is, the higher the health score of the grain depot is, and establishing a grain depot health condition data set;

s4, establishing a hybrid deep learning prediction model based on a convolutional neural network and a recurrent neural network;

s5, dividing the data set into a training set and a testing set, training a prediction model by using the training set, hiding Ft of the last moments of the testing set samples in a prediction experiment, and predicting by using the model;

s6, optimizing the prediction model by using a reinforced learning algorithm.

Furthermore, a time stepping feature extraction method is adopted for the training of the prediction model.

Furthermore, the prediction model reads the features extracted at four moments and four moments at one time, expands the time direction, automatically copies the data downwards at one moment, selects the feature data at the fourth moment according to the features, and takes the label at the next moment corresponding to the label.

Further, the reinforcement learning algorithm comprises a reward function, a value function capable of judging a prediction level is established, the difference between a prediction value and a real output value is used as a standard for measuring the quality of a prediction result, a range based on the difference is set, the range based on the difference is defined as a credible range, reward is given in the credible range, otherwise penalty is given, in the early training, the input and prediction period is set to be short, the feedback times are increased, the optimization model is accelerated, and the input updating period is subsequently prolonged.

The present invention also provides a memory device having stored therein a plurality of instructions adapted to be loaded and executed by a processor to:

s6, optimizing the prediction model by using a reinforced learning algorithm.

Furthermore, a time stepping feature extraction method is adopted for the training of the prediction model;

the prediction model reads and extracts the features of four moments and four moments at one time, expands the time direction, automatically copies data downwards for one moment, selects feature data of the fourth moment according to the features, and takes the label of the next moment corresponding to the label;

the reinforcement learning algorithm comprises the steps of setting a reward function, firstly, establishing a value function capable of judging a prediction level, setting a range based on the difference by taking the difference between a prediction value and a real output value as a standard for measuring the quality of a prediction result, defining the range based on the difference as a credible range, giving a reward in the credible range, and giving a penalty on the contrary, wherein in the early training, the input and prediction period is set to be shorter, the feedback times are increased, the optimization model is accelerated, and the input updating period is subsequently prolonged.

The invention also provides a server comprising

A processor adapted to implement instructions; and

a storage device adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by a processor to:

And forming a learning sample X ═ F by using the characteristic subsets in a certain time period₁,F₂,F₃,......,F_t]^TT represents time, T represents time period;

s6, optimizing the prediction model by using a reinforced learning algorithm.

The technical scheme adopted by the invention only needs a plurality of sensors for collecting the parameter data of the grain depot and a computer provided with a deep reinforcement learning model, analyzes and predicts other parameters which are difficult to predict of the grain depot according to reliable environments such as weather, environmental temperature, storage capacity and the like on the basis of known data, generates a reward punishment according to the difference between a predicted value and an actual result after actual data is fed back, and optimizes a prediction network.

Drawings

Fig. 1 is a schematic diagram of a data reading method of a conventional CNN model.

FIG. 2 is a schematic diagram of a time-stepping feature extraction method.

FIG. 3 is a schematic diagram of a time series feature extraction method according to the present invention.

Fig. 4 is a flow chart illustrating the prediction of the health status of the grain depot by deep reinforcement learning.

Detailed Description

The invention has the inventive concept that in order to enable the prediction model to have the environment intersection capacity, a deep reinforcement learning technology is adopted, the prediction result of the previous period is compared with the acquired real result, the reward or punishment is set in the size interval of the difference, and in order to obtain the reward, the model can correct the model to keep the prediction result in the set reward interval, so that the reliable prediction of the grain depot data is realized. The technical scheme has the advantage of strong prediction capability of a general nonlinear model, combines the environment feedback capability of reinforcement learning, and applies the capability to the prediction feedback of data in a certain period in the future, thereby creating high reliability of the prediction data in a certain period.

The powerful capability of deep learning in information extraction is mainly realized by nonlinear transformation inside a multilayer neural network. When a Convolutional Neural Network (CNN) processes high-dimensional information, information is extracted and the dimension of an image is reduced through layer-by-layer calculation of the network, so that computer language mapping of input information is realized. The Recurrent Neural Network (RNN) has prominent application in natural language processing, and is a neural network with "memory ability". The decision making process of Reinforcement Learning (RL) is a horse made by the intersection of an Agent and the environmentAn alcove decision process. The process is that the intelligent agent is in an instant state S according to the environment_tIn order to obtain the maximum reward of the environment feedback to the agent, the agent takes the optimal action a, which is based on the state S after the action a is taken_t+1Value of R_t(S_t，a_t，S_t+1) A cumulative prize G, plus the value of all possible subsequent actions and resulting states multiplied by a discount factor gamma_tComprises the following steps:

G_t＝R_t+γR_t+1+γ²R_t+2+…＝Σ_k＝0γ^kR_t+k+1

the method for predicting the health condition of the grain depot by deep reinforcement learning comprises the following steps: s1, installing an intelligent sensor in a grain depot, collecting data through simulation and network cases, expanding data volume, determining parameters to be collected as characteristics, such as three temperatures (atmospheric temperature, temperature in a warehouse and average temperature of grain piles), air humidity, grain water content and the like of stored grains, and establishing a characteristic element subset F ═ X₁,X₂,X₃,......,X_n]Where n represents the number of features to be programmed, corresponding to the dimensions of the feature subset, and X represents the feature element. S2, importing time parameters by the characteristics to form characteristic subsets corresponding to moments

And forming a learning sample X ═ F by using the characteristic subsets in a certain time period₁,F₂,F₃,......,F_t]^TT denotes the time instant and T denotes the time period, such samples being able to retain the time domain features. S3, labeling the established samples, wherein each feature subset corresponds to a score label y, y belongs to [0,100]]And if the score is higher, the health score of the grain depot is higher, and a grain depot health condition data set is established. S4, according to the characteristics of large quantity of characteristic features and obvious time domain characteristics of the data set, a mixed deep learning prediction model based on a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) is established, wherein the CNN has strong performance in characteristic extraction, and the RNN has short-term memory advantages and is suitable for time sequenceAnd (5) extracting the features. And S5, dividing the data set into a training set and a testing set, and training the prediction model by using the training set. The prediction experiment is as follows: and hiding Ft of the last moments of the test set samples, and predicting by using the model. And S6, introducing a reinforcement learning algorithm, optimizing the prediction model, and endowing the prediction model with the capability of meeting the real condition.

The training of the prediction model adopts a unique time stepping feature extraction method, the feature extraction mode of an original data set is to read in a feature subset and a label at one moment, and perform feature extraction and compression through a CNN network to achieve the purpose of classification or prediction, and the feature reading mode is as shown in figure 1:

the training mode extracts less features of the time sequence of each time, divides each time sample into independent samples, neglects the time development relationship, and has time development continuity for many parameters and health conditions of the grain depot health conditions, so that the reading mode of the prediction model needs to be designed to fully learn the time features.

The technical scheme adopted by the invention can improve the prediction reliability. The feature extraction method is shown in fig. 2:

the reading of the prediction model design does not adopt single moment reading, the characteristics of four moments and four moments are extracted at one time, the time direction is expanded, the data is automatically copied downwards for one moment, the characteristic data of the fourth moment is selected as the characteristic of the data, but the label of the next moment is taken out corresponding to the label to be used as the embodiment of backward prediction. The principle is shown in fig. 3:

in summary, the prediction capability of the prediction model for the health condition of the grain depot at the next moment does not use the traditional prediction method, i.e. the method of predicting all characteristic data at the next moment and then obtaining the health score is not adopted. The method directly adopts the feature of the fourth moment as the feature of the fifth moment, but adopts the score of the fifth moment needing to be predicted as the label.

The input of the prediction model is expanded from single time to multiple time, and the purpose is to change the defect that the sample using the single time neglects the sample time sequence. And assigning the feature value of the fourth moment to the fifth moment needing to be predicted, and predicting through weights inside the neural network, wherein the prediction level is proved by the results of the health scores of the grain depot, but not by performing difference comparison on the prediction results of each feature. The calculation efficiency can be improved by only calculating the difference of the health scores without calculating the difference of each feature.

The reinforcement learning algorithm is mainly used for setting a reward function, firstly, a value function capable of judging the prediction level is established to predict the difference between the value and the real output, the difference is used as a standard for measuring the quality of the prediction result, a range based on the difference is set, the range is called a credible range, reward is given in the credible range, and otherwise penalty is given. In the early training, the input and prediction period is set to be short, the feedback times are increased, the model is optimized in an accelerated mode, and the input updating period can be prolonged according to the requirement subsequently. After the model prediction result is in the credible range for a long time, the model starts to be actually used, and the using flow is shown in fig. 4:

in summary, the key of using reinforcement learning to realize autonomous feedback to the prediction model is that the invention designs a reward function which can have guiding significance to prediction, and the principle design of the reward function and the feedback signal is as follows: the grain depot characteristic parameters predicted by the assumed model are

p represents predicted, i.e. predicted value; and the true characteristic parameter is

r represents real, where n represents the number of features to be programmed, the dimension of the corresponding feature subset, X represents the feature element, and the difference between the two sets of data is represented by:

and the normalization processing is carried out on the data of each dimension,

when any one of L is presentWhen the data of the dimensionality is within the confidence space, the reward is obtained, otherwise, the penalty is given according to the prediction data situation outside the confidence space. On the basis of the theory, because the output of the prediction model only has the health score of the grain depot, the value function is simplified into the real health score Y_nAnd predicting health score

The difference of (a):

and feeding back the reward in a confidence space by the absolute value of the difference, otherwise obtaining a penalty, determining the magnitudes of the reward and the penalty by the magnitude of the absolute value, and determining whether the prediction score is too high or too low.

In the specific using process, the model is used for predicting a new grain depot, the real data in a certain time period are firstly taken, then the next time period is predicted, after the real data in the next period are acquired, the real data are compared with the predicted data, a credible interval is divided according to a difference value, the difference value is in the credible interval, the feedback signal is reward, the difference value is out of the credible interval, the feedback signal is punishment, and reward and punishment grades are divided according to the difference value. And (4) correcting the model according to the feedback signal and accumulating the experience. The prediction time period and the collection time period can be different, for example, grain depot data of seven continuous days can be predicted at the same time, but data can be collected every day and subjected to feedback correction, and data of seven days is predicted on the basis, and data of the predicted coincident date of the previous day is covered. The advantage of doing so is that obtains higher learning efficiency, and the data of a certain moment can be through revising many times, and the rate of accuracy is higher. Theoretically, the more the predicted data is closer to the time when the real data comes, the more the correction times are, the higher the accuracy is, and the data in a certain time period is completely credible. By the technical method, the health condition of the grain depot can be dynamically predicted by matching with the intelligent sensor, so that the possible condition of the grain depot can be prevented in advance, and the prediction result has continuous correction capability and higher accuracy.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for predicting the health condition of grain depot is characterized by comprising the following steps,

s6, optimizing the prediction model by using a reinforced learning algorithm.

2. The method of claim 1, wherein the training of the prediction model is performed by a time-stepping feature extraction method.

3. The method as claimed in claim 1, wherein the prediction model reads the feature data of four moments and four moments at a time, expands the feature data in the time direction, automatically copies the data one moment in the downward direction, and selects the feature data of the fourth moment and takes the label of the next moment corresponding to the label.

4. The method as claimed in claim 1, wherein the reinforcement learning algorithm includes setting a reward function, firstly establishing a cost function capable of determining a prediction level to predict a difference between a value and a real output, setting a range based on the difference as a quality standard for measuring a prediction result, defining the range based on the difference as a confidence range, giving a reward in the confidence range, and giving a penalty, otherwise, setting an input and prediction period to be shorter in a previous training period, increasing a feedback time, accelerating an optimization model, and subsequently extending an input update period.

5. A memory device having stored therein a plurality of instructions adapted to be loaded and executed by a processor:

s6, optimizing the prediction model by using a reinforced learning algorithm.

6. A storage device according to claim 5,

the training of the prediction model adopts a time stepping feature extraction method;

7. A server, comprising

A processor adapted to implement instructions; and

s6, optimizing the prediction model by using a reinforced learning algorithm.

8. The server of claim 7, wherein the predictive model is trained using a time-stepping feature extraction method;