CN113378383A

CN113378383A - Food supply chain hazard prediction method and device

Info

Publication number: CN113378383A
Application number: CN202110647468.XA
Authority: CN
Inventors: 金学波; 张佳帅; 张家辉; 苏婷立; 白玉廷; 孔建磊
Original assignee: Beijing Technology and Business University
Current assignee: Beijing Technology and Business University
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2021-09-10
Anticipated expiration: 2041-06-10
Also published as: CN113378383B

Abstract

The invention provides a method for predicting food supply chain hazards, which comprises the following steps: defining a noise smooth loss function by adopting a regularization method; building a prediction model combining a GRU sub-predictor and the noise smooth loss function, and training the prediction model; and predicting the food supply chain hazards according to the trained prediction model to obtain a prediction result. The invention provides a prediction model combining a GRU sub-predictor and a noise smooth loss function related to regularization, which can reduce the fitting degree of the prediction model to random noise, improve the accuracy of prediction and have better effect in the prediction task with higher probability of actually dealing with the problems of large noise, measurement error and the like.

Description

Food supply chain hazard prediction method and device

Technical Field

The application relates to the field of time series prediction, in particular to a method and a device for predicting food supply chain hazards.

Background

The food supply chain consists of complex links such as raw material suppliers, manufacturers, sellers and consumers, and has wide spanning geographical range and high quality requirement on each link. Once food safety issues arise, not only is the entire supply chain subject to loss, but also adverse social effects. Therefore, the safety of the multilayer food supply chain with low transparency is a key point of attention.

With the development of the internet of things technology, people can obtain more and more information on a food supply chain. Recently, machine learning, especially deep learning methods, have become more widely used. A Recurrent Neural Network (RNN) is used as an important Neural Network in the field of deep learning, and provides a more effective solution for analyzing sequence data. At present, food supply chain safety prediction systems combining machine learning or deep neural networks are more and more, but the food safety prediction models have larger potential risks.

The deep learning model can effectively carry out prediction and auxiliary decision making, and is widely applied to time sequence prediction, automatic driving, recommendation and personalized technologies and other related fields. However, in the field of food safety, measurement noise and errors occur inevitably because the food hazard content actually measured may be caused by sensor performance, reading errors, instrument damage and the like. In the training process of the deep learning model, the selection of the loss function is an important part, and the selection of the loss function determines the training effect of the model. Due to strong nonlinearity and strong randomness of food supply chain hazard data, the influence of noise is difficult to completely remove in the noise-containing data analysis stage, so that the obtained estimated true value still has noise, the model learns the noise in the data excessively in training, the randomness of the noise causes the reduction of the predictive performance, and the learning of the noise also influences the robustness of the model.

Disclosure of Invention

In order to solve one of the above technical problems, the present invention provides a method and an apparatus for predicting food supply chain hazards.

The first aspect of the embodiments of the present invention provides a method for predicting food supply chain hazards, where the method includes:

defining a noise smooth loss function by adopting a regularization method;

building a prediction model combining a GRU sub-predictor and the noise smooth loss function, and training the prediction model;

and predicting the food supply chain hazards according to the trained prediction model to obtain a prediction result.

Preferably, the noise smoothing loss function includes two parts of input data fitting degree measurement and input data smoothing degree measurement, the input data fitting degree measurement part is represented by an average absolute error between a predicted value and a true value, and the input data smoothing degree measurement part is represented by a norm of a matrix of smoothing degrees of every three points in the input data.

Preferably, the calculation process of the norm of the matrix for measuring the smoothness degree of every three points in the input data comprises the following steps:

defining a punishment matrix for punishing smoothness of input data, and calculating to obtain a matrix for measuring smoothness of every three points in the input data according to the punishment matrix and the input data;

and carrying out norm calculation on the matrix for measuring the smoothness degree of every three points in the input data to obtain the norm of the matrix for measuring the smoothness degree of every three points in the input data.

Preferably, the process of training the prediction model is as follows: and training the prediction model according to the data of the content source of the hazardous materials acquired by the platform of the Internet of things.

Preferably, the method further comprises:

optimizing the hyper-parameters of the prediction model by a Bayesian optimization algorithm to obtain optimal hyper-parameters;

training the prediction model according to the optimal hyper-parameter to obtain an optimal prediction model;

and predicting the food supply chain hazards according to the optimal prediction model to obtain a prediction result.

A second aspect of the embodiments of the present invention provides a food supply chain hazard prediction apparatus, including a processor configured with processor-executable operating instructions to perform the following operations:

defining a noise smooth loss function by adopting a regularization method;

Preferably, the processor is configured with processor-executable operating instructions to perform the following operations:

and training the prediction model according to the data of the content source of the hazardous materials acquired by the platform of the Internet of things.

The invention has the following beneficial effects: the invention provides a prediction model combining a GRU sub-predictor and a noise smooth loss function related to regularization, which can reduce the fitting degree of the prediction model to random noise, improve the accuracy of prediction and have better effect in the prediction task with higher probability of actually dealing with the problems of large noise, measurement error and the like.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a flowchart of a method for predicting food supply chain hazards according to embodiment 1 of the present invention;

FIG. 2 is a graph showing DON, Cd, and Pb concentrations for each of the links provided in the examples;

FIG. 3 is a comparison of the results of the three loss function predictions provided in the example;

FIG. 4 is a comparison of the convergence of the three loss functions provided in the example;

FIG. 5 is an illustration of a link 11 cadmium metal content prediction sample provided in an example;

FIG. 6 is an illustration of lead metal content prediction samples for link 11 provided in the examples;

fig. 7 is a plot of a link 11DON hazard content prediction sample provided in the example.

Detailed Description

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Example 1

As shown in fig. 1, the present embodiment provides a method for predicting food supply chain hazards, which includes:

s101, defining a noise smooth loss function by adopting a regularization method;

s102, building a prediction model combining a GRU sub-predictor and the noise smooth loss function, and training the prediction model;

s103, predicting food supply chain hazards according to the trained prediction model to obtain a prediction result.

In particular, regularization is widely used in machine learning and deep learning. The method has the function of limiting parameters in the model, so that the parameters of the model are not too large, and the possibility of overfitting the model is reduced. In this embodiment, in order to reduce the model overfitting caused by the over-learning of random Noise in the supply chain data, the formula of the designed Noise Smoothing Loss function (NSL Loss function) is:

wherein y is_iIs a predicted value of the number of the frames,

is the true value, beta is the regularization term, P_yIs a matrix that measures how smooth every third point in the data is. The calculation formula of the noise smoothing loss function comprises two parts, wherein the first part is used for measuring the data fitting degree and representing the target of minimizing the sum of squares of residual errors between an actual sequence and a fitting sequence; the second part, measuring the degree of smoothness, represents smoothing the sequence during the training processThe degree of need. Wherein the fitting degree is expressed by mean absolute error, and the smoothing degree is calculated by P_yIs achieved by the norm of (a). Where β is a regularization term, which is considered as a trade-off between the two objectives of fitting and smoothing.

P_yThe calculation of (2) is the key to implement the noise smoothing loss function, and firstly, a matrix P needs to be defined during calculation, and the matrix carries out smoothness punishment on data. The P matrix is as follows:

the dimensionality of the P matrix is determined by input data, the dimensionality of the P is (T, T) if the length of the input data is T, and then the matrix P is obtained through multiplication calculation of the matrix formed by the P and the input data_yAnd finally, calculating matrix norm to obtain the norm which can express the smoothness degree between every three points of the data.

The data is smoothed in the training process, that is, smoothing is realized in the process of training the model, so that the model learns the smoothed data in the training process, and the robustness of the model can be further improved.

The GRU is a variant of a Long Short Term Memory network (LSTM), both of which belong to the recurrent neural network. Compared with the LSTM network, the GRU network structure only has an update gate and a reset gate, and the GRU network is simpler and better in effect. This example is based on the Keras tensrflow framework, and fit the data using GRU, while the optimization objective during training is replaced by the NSL loss function proposed in this example.

GRU algorithm pseudo code:

(1) normalizing the data set theta

(2) Model learning of training data

Learn Hbased on θ

return H

The hyper-parameter selection of the deep learning model directly determines the performance of the model, and in this embodiment, a hyper pt library is adopted to realize one of the bayesian optimization algorithms: sequence Model Based Optimization (SMBO). The optimized hyper-parameters mainly comprise neuron number in GRU, Dropout rate, training times, batch processing size and optimizer.

When the Bayesian optimization method is used for determining model parameters, the agent model is used for fitting a real objective function, and the most potential evaluation point is actively selected according to the fitting result. It is necessary to define an objective function g (w) and an optimized hyper-parametric space. The objective function represents the objective of minimization required by bayesian optimization, and the present embodiment uses the root mean square error of the model as the objective function to find the model hyperparameters that yield the best score on this metric.

Where m is the number of input samples, y_i(w) is a predicted value of (A),

is a predicted value. The proxy function of bayesian optimization can be expressed in the form of an equation:

wherein w^*And determining the optimal parameters for Bayesian optimization, wherein W is a set of input hyperparameters, and W is a parameter space of the multidimensional hyperparameters.

Bayesian optimization mainly consists of two steps: the gaussian process is first estimated and updated by the t +1 th step, and then the sampling of the hyper-parameters is guided by maximizing the proxy function. In the gaussian process, the present embodiment sets the objective function g (w) to follow the following gaussian distribution:

g(w)～GP(μ(w),K(w,w′))

where μ (w) is the mean of g (w), K (w, w ') is the covariance matrix of g (w), and the initial K (w, w') can be expressed as:

when Bayes optimization is carried out, the covariance matrix of the Gaussian process changes along with the iterative process, and a group of parameters input at the t +1 step is assumed to be w_t+1Then, at this time, the covariance matrix can be expressed as:

wherein k is [ k (w) ]_t+1,w₁),k(w_t+1，w₂)，...，k(w_t+1,w_t)]At this time, the present embodiment may obtain the posterior probability of the objective function:

where θ is the observed data, μ_t+1(w) is the average of the t +1 th step g (w),

the variance of step g (w) at t + 1.

After the posterior probability is obtained, the optimal hyper-parameter is searched by a hyper-parameter searching method, and the invention uses a UCB acquisition function to complete the hyper-parameter searching:

wherein ζ_t+1Is a constant, S (w | θ)_t) For UCB acquisition function, w_t+1Is the selected hyper-parameter of the t +1 step.

The pseudo code of the Bayesian optimization algorithm is as follows:

inputting: theta is the data set, g (w) isRMSE of the model, W is the hyper-parametric space (W ∈ W), H (W | θ >_i) Is UCB collection function, T is number of hyper-parameters to be selected, l is number of sub-sequence of wavelet decomposition.

And (3) outputting: optimal hyperparameter w^*。

(1) Carry out initialization of^(l)←InitSamples(g(w)，θ，l)

(2)

(3) Modeling the target function g (w), calculating the posterior probability

(4) Updating parameters using UCB acquisition function, w^*←arg max H(w|θ_i ^(l))

(5) Using w^*The hyper-parameter trains the model provided by the invention to obtain the prediction y_i←g(w^*) Calculating and updating

(6)

(7)endfor

(8)

(9)return w^*

The hyper-parameters of the GRU sub-predictor in the embodiment are determined by a Bayesian optimization algorithm during training of the prediction model. The same data test set is used for carrying out comparison tests on the GRU _ NSL, the RNN and the LSTM by using models with different loss functions, and the performance of the models is evaluated by calculating Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Pearson correlation coefficient (R). Wherein, the smaller the two indexes of RMSE and MAE is, the more accurate the prediction is, and the larger the value of the Pearson correlation coefficient is, the more compact the fitting relation between the observed value and the predicted value is. The calculation formula of the evaluation index is as follows:

wherein m is the number of samples, y is the actual value,

in order to predict the value of the target,

is the average of the true values of the data,

the average of the predicted results is shown.

Example 2

Corresponding to embodiment 1, this embodiment proposes a food supply chain hazard prediction apparatus, which includes a processor configured with processor-executable operating instructions to perform the following operations:

defining a noise smooth loss function by adopting a regularization method;

Specifically, the specific working principle of the device proposed in this embodiment can refer to the content described in embodiment 1, and is not described herein again. The embodiment provides a prediction model combining a GRU sub-predictor and a noise smoothing loss function adopting regularization, which can reduce the fitting degree of the prediction model to random noise, improve the accuracy of prediction and have better effect in actually dealing with prediction tasks with high noise, measurement error and other problems and with high probability.

The prediction process and the presented practical effect of the prediction method proposed by the present invention are further illustrated by two specific examples.

Example 1

Six links before use (X) by utilizing the built food supply chain hazard prediction model₁～X₆) Five links after the content of the hazardous substances is predicted (X)₇～X₁₁) Hazard content data. In the experiment, the combination of GRU and Mean Absolute Error (MAE) and Mean Square Error (MSE) loss functions is used as comparison with the model provided by the invention, and the variables such as the number of network layers are controlled to be consistent. The performance of the system was evaluated using three evaluation indexes, RMSE, MAE, R.

Firstly, explaining the data used in the experiment, the data used in the experiment is from a wheat flour supply chain, and the contents of three hazards of Deoxynivalenol (DON), lead and cadmium in the wheat flour supply chain are respectively collected. The supply chain has 10 links of cleaning, moistening wheat, processing 1, etc., and raw grain is X₁The clean wheat is X₂Moistening wheat is X₃Machining a (1M core) to X₄Processing two (2M core) as X₅Processing three (3M core) as X₆Processing four (4M core) as X₇Processing five (5M core) as X₈Processing six (6M core) as X₉The package is X₁₀Warehousing (circulation link detection value in the existing data) is X₁₁. Through the spot inspection of wheat flour in each link, the data of the content of cadmium metal, lead metal and DON three kinds of harmful substances are finally obtained. Wherein, the data formats of the DON 396 group, the lead 1061 group and the cadmium 2057 group are respectively (396,11), (1061,11) and (2057,11), and the hazard content of each link is shown in figure 2.

The comparative experiment only uses the content data of cadmium hazardous substances. Wherein the format of the divided training set is (1857,6), and the format of the label is (1857, 5); the test set is (200,6) and the label is (200, 5). The training set labels are mainly used for adjusting the weight and bias of each neuron of the model in the training process, and the test set labels are used for checking the prediction result.

Table 1 shows the training results of GRU combined with three loss functions of NSL, MAE and MSE on the cadmium hazard data set. The results show that: the GRU _ NSL provided by the invention performs best, and the RMSE reaches 2.4484. Wherein, the GRU _ NSL model is respectively improved by 9.73 percent, 6.31 percent and 0.11 percent compared with the relatively optimal GRU _ MAE model on the three indexes of RMSE, MAE and R.

TABLE 1

Fig. 3 shows the prediction results of the GRU model respectively trained under three loss functions, and it can be seen that the NSL loss function has obvious advantages. In order to highlight the situation of the training process of the NSL, fig. 4 shows the convergence situation of the three Loss functions iterated for the same number of times in the training process of the GRU model, the NSL has the fastest descent speed in the iteration process, the Loss value is finally stabilized at 0.40, and the stable value is the Loss value after considering the noise variance, and therefore, the Loss is slightly larger than the Loss of the MAE and the MSE.

The experimental results show that: in the task of predicting the hazard content in the first 6 links and then 5 links by the model, the normalized loss function is combined with the GRU model, so that the effect of improving the prediction performance of the hazard prediction model of the food supply chain can be achieved.

Example 2:

and predicting the hazard content data of the eleventh link by utilizing the constructed hazard prediction model of the food supply chain according to the hazard content of the first ten links.

The experiment is carried out based on the data of the content of cadmium metal, lead metal and DON which are randomly inspected in 11 links of the wheat flour supply chain. Since the warehouse storage link is the last link in the food market, representing the quality of wheat flour in the market, the experimental setup used the first 10 links to predict the results of the final warehouse storage link. In the experiment, the GRU _ NSL model provided by the invention is compared with RNN, LSTM and GRU models, the training set accounts for 80% of the total data, and the rest is used as a test set. The results of the experiment were evaluated using three indicators, RMSE, MAE, R. According to the national standard for food safety, in the agricultural products, the standard value of cadmium metal in food is not higher than 0.05mg/kg, the standard value of lead metal in food is not higher than 0.1mg/kg, and DON is not higher than 1000 mug/kg.

Tables 2, 3 and 4 show the experimental results of cadmium data, lead data and DON data, and the unit of all the results is ug/kg. The GRU _ NSL model is excellent in the prediction tasks of three groups of data, wherein in the application of cadmium data, RMSE reaches 0.2595ug/kg, which is far less than the upper limit of cadmium metal content of national food standard, and the GRU _ NSL model is improved by 46.37% on the basis of RMSE index of the GRU model. In lead data application, the RMSE of the GRU _ NSL model reaches 0.2700ug/kg, and is improved by 50.52 percent on the basis of the RMSE index of the GRU. In the DON data application, the RMSE of the GRU _ NSL model reaches 1.5245ug/kg, which is far smaller than the national index, and the model is improved by 70.06 percent on the basis of the RMSE of the GRU.

TABLE 2

TABLE 3

TABLE 4

Fig. 5 to 7 are partial display of prediction results, wherein 2 groups of prediction results are randomly extracted from three groups of application experiment results, and each group of results shows prediction results of RNN, LST, GRU and GRU _ NSL models, wherein the predicted result of the GRU _ NSL model is closest to the true value.

Therefore, through experimental verification, the GRU _ NSL model performs better in a prediction task, and NSL can further improve the analysis capability of the GRU _ NSL model on noisy time series data. Therefore, the GRU _ NSL deep learning unit can effectively analyze the noisy time series data and has good prediction performance on noisy data such as food supply chain hazards.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of food supply chain hazard prediction, the method comprising:

defining a noise smooth loss function by adopting a regularization method;

2. The method of claim 1, wherein the noise smoothing loss function comprises measuring input data fitness represented by an average absolute error between a predicted value and a true value and measuring input data smoothness represented by a norm of a matrix measuring smoothness of every three points in the input data.

3. The method of claim 2, wherein the calculating the norm of the matrix that measures the smoothness of every three points in the input data comprises:

4. The method of claim 1, wherein the training of the predictive model comprises: and training the prediction model according to the data of the content source of the hazardous materials acquired by the platform of the Internet of things.

5. The method according to any one of claims 1 to 4, further comprising:

6. A food supply chain hazard prediction apparatus, the apparatus comprising a processor configured with processor-executable operating instructions to:

defining a noise smooth loss function by adopting a regularization method;

7. The apparatus of claim 6, wherein the noise smoothing loss function comprises two parts of input data fitting degree measurement and input data smoothing degree measurement, the input data fitting degree measurement part is represented by average absolute error between a predicted value and a true value, and the input data smoothing degree measurement part is represented by a norm of a matrix measuring smoothing degrees of every three points in the input data.

8. The apparatus of claim 7, wherein the processor is configured with processor-executable operating instructions to:

9. The apparatus of claim 6, wherein the processor is configured with processor-executable operating instructions to:

10. The apparatus of any of claims 6 to 9, wherein the processor is configured with processor-executable operating instructions to: