CN114764682B

CN114764682B - Rice safety risk assessment method based on multi-machine learning algorithm fusion

Info

Publication number: CN114764682B
Application number: CN202210306564.2A
Authority: CN
Inventors: 赵峙尧; 王姿懿; 于家斌; 许继平; 白玉廷; 王小艺
Original assignee: Beijing Technology and Business University
Current assignee: Beijing Technology and Business University
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2023-04-07
Anticipated expiration: 2042-03-25
Also published as: CN114764682A

Abstract

The invention provides a rice safety risk assessment method based on multi-machine learning algorithm fusion. The method comprises the following steps: acquiring rice hazard detection data and preprocessing the rice hazard detection data; starting from the aspect of hazard indexes, realizing expert classification according to an AHP algorithm and an SC algorithm, solving weights in expert categories and weights among experts by combining consistency weight differences of expert evaluation results, constructing a rice safety risk evaluation index system, and carrying out weighted summation on preprocessed hazard detection data and comprehensive weights to obtain a rice hazard risk value; and a multi-machine learning algorithm is adopted to fuse and construct a rice safety risk assessment model so as to realize rapid risk assessment. The method effectively considers the opinions of all experts in a more objective mode, and avoids the amplification of invalid information and the reduction of valid information. The invention can effectively reduce the supervision cost, improve the risk discovery and response handling efficiency and provide accurate and efficient decision basis for supervision departments.

Description

Rice safety risk assessment method based on multi-machine learning algorithm fusion

Technical Field

The invention belongs to the technical field of food quality detection and food safety risk assessment, relates to technologies such as big data processing and machine learning, and particularly relates to a rice safety risk assessment method based on multi-machine learning algorithm fusion.

Background

In recent years, food safety events are frequent, higher requirements are put forward for food safety supervision, and countries in the world have successively introduced a series of strict food safety supervision policies. In order to further strengthen risk monitoring, risk assessment and supply chain management and improve risk discovery and response handling efficiency, all levels of government departments vigorously promote digital construction in the food safety field, strengthen 'big data + food' supervision and play the advantages and roles of technologies such as big data, artificial intelligence and the like in the fields of food safety risk assessment, supervision and the like.

At present, food safety risk assessment methods mainly comprise three major categories, namely a qualitative assessment method, a quantitative assessment method and a comprehensive risk assessment method. The qualitative assessment method is an assessment method with strong subjectivity, and mainly analyzes and judges the risk index according to the knowledge and experience of an evaluator, and calculates the index risk value according to the judgment result and a matrix model. The qualitative assessment method based on single expert assessment is relatively mature and comprises a Delphi method, an analytic hierarchy process, a decision laboratory method, an index scoring method and the like. Qualitative assessment methods based on multiple experts are divided into subjective weighting and objective weighting, wherein the subjective weighting method is to divide the expert weights based on expert prior information, such as: the prestige, the knowledge level and the like, and calculating a risk value according to an expert weight result; and dividing the expert weight based on the consistency index value of the expert evaluation result by an objective weighting method, and calculating the risk value according to the expert weight result. In actual decision making, the qualitative assessment method based on multiple experts has high credibility, and in the research of the expert weighting method, the objective weighting method is more widely applied compared with the subjective method. The quantitative evaluation method is an evaluation method with strong objectivity, and index risk values are calculated through a mathematical model, and the evaluation method comprises a Monte Carlo quantitative evaluation method, a grey correlation theory method, a fuzzy comprehensive evaluation model, a machine learning artificial neural network model and the like. The comprehensive risk assessment method is a combination of qualitative and quantitative assessment methods, an index system is established through the qualitative assessment method, and a risk assessment model is established according to the index system and the quantitative assessment method.

With the acceleration of digital transformation, food detection data grows exponentially and explosively, data processing and analysis are difficult to become the first problems restricting food safety risk supervision, and the accuracy of a risk assessment model taking data as a carrier is directly influenced. In the existing risk assessment method, the qualitative assessment method is high in labor cost and long in assessment process, and the quantitative assessment method has the problems of low index precision or weak overfitting performance and the like, so that the accuracy of a risk assessment result is low, the time cost is high, and the capability of accurately positioning a risk value is lost.

Disclosure of Invention

Aiming at the problems that in the prior art, food safety risk assessment time is long, assessment results are low in accuracy rate, and risks cannot be accurately located, the invention provides a rice safety risk assessment method based on multi-machine learning algorithm fusion.

The invention discloses a rice safety risk assessment method based on multi-machine learning algorithm fusion, which is realized by the following steps:

(1) And acquiring rice hazard detection data and preprocessing the rice hazard detection data.

The preprocessing comprises noise filtering, data integration and normalization processing of the detection data in sequence.

Setting k kinds of hazards, wherein the preprocessed hazard detection data comprise standardized detection values of all the hazards;

(2) And constructing a rice safety risk assessment index system.

Obtaining the evaluation result of the expert on the rice hazard indexes, and then executing: (1) Firstly, calculating the evaluation index weight of each expert based on an Analytic Hierarchy Process (AHP), wherein the evaluation index weight refers to the evaluation weight of each rice hazard index; (2) dividing the expert categories based on a spectral clustering method SC; (3) Calculating inter-expert-category weights and intra-expert-category weights; the more the number of experts in the category is, the smaller the consistency difference is, the greater the weight of the expert category is; (4) finally determining the comprehensive weight of each hazard index;

for the jth index, the evaluation weight of the ith expert to the jth index is calculated to be w _ij The evaluation results of m experts are grouped into H classes by SC algorithm, wherein the ith expert is divided into H classes _i In, h _i ∈{h ₁ ,h ₂ ,...h _H Get category h by calculation _i Is weighted by

Class h _i The inner expert i evaluates in weight ^ based on>

Then the weighting of the ith expert evaluation result to the jth index is obtained

Obtaining the comprehensive weight of the jth index after the group decision

And (4) weighting and summing the hazard detection data preprocessed in the step one with the comprehensive weight to obtain a rice hazard risk value Y.

(3) In order to provide visual rice safety risk assessment results more quickly and accurately, the rice safety risk assessment method adopts a multi-machine learning algorithm to construct a rice safety risk assessment model in a fusion mode.

Constructing a rice hazard risk assessment model, selecting two machine learning algorithms of XGboost and LightGBM to form a base learner, and selecting a long-short term memory network LSTM as a meta-learner; inputting the preprocessed hazardous material detection data into a rice hazardous material risk assessment model, inputting the output of two machine learning algorithms in the base learner and the preprocessed hazardous material detection data into a meta-learner, and finally outputting a rice hazardous material risk value Y by the model.

The method of the invention judges the rice quality safety condition according to the rice hazard risk value Y. According to the detection data of each hazard and the weighted value of the corresponding comprehensive weight, the influence of the hazard on the quality safety of the rice can be determined, and the main hazard can be positioned.

Compared with the prior art, the invention has the advantages that:

(1) According to the method, the rice safety risk indexes are screened based on the group decision model, a rice safety risk index evaluation system is constructed, on the premise that few obeys majority, amplification of 'invalid information' and reduction of 'valid information' in group decision are effectively avoided, and opinions of all experts are effectively considered in a more objective mode; the method of the invention fully considers that the expert knowledge level, the experience and the familiarity degree of rice hazard indexes are different, and constructs a rice safety risk assessment index system in a more objective mode.

(2) The method provided by the invention is constructed based on a fusion algorithm, the difference between the angle and the principle of observation data of each algorithm is comprehensively considered, the advantages and the disadvantages of the differentiation algorithm are made up based on a Stacking integrated learning strategy, the rice hazard risk value can be rapidly and accurately analyzed through a rice safety risk assessment model BXGB-BLGB-GLSTM, and a scientific and effective basis is provided for assessment decisions of supervision departments.

(3) According to the method, the danger detection data is preprocessed, effective information is extracted, and the accuracy of rice danger risk assessment model prediction can be improved.

(4) The method solves the problems that the food safety risk evaluation time is long, the accuracy of the evaluation result is low, and the risk cannot be accurately positioned in the prior art, can effectively reduce the supervision cost, improve the risk discovery and response treatment efficiency, and can provide an accurate and efficient decision basis for supervision departments.

Drawings

FIG. 1 is a schematic overall flow chart of the rice safety risk assessment method of the present invention;

FIG. 2 is a schematic diagram of the framework of the hybrid model BXGB-BLGB-GLSTM of the present invention;

FIG. 3 is a comparison graph of the evaluation results of an embodiment of the present invention using the BXGB-BLGB-GLSTM model;

FIG. 4 is a comparison graph of the evaluation results using the XGboost model according to an embodiment of the present invention;

FIG. 5 is a comparison graph of the results of the LightGBM model evaluation according to the embodiment of the invention;

FIG. 6 is a comparison graph of the results of an evaluation using the LSTM model according to an embodiment of the present invention;

FIG. 7 is a comparison graph of the evaluation results of the embodiment of the present invention using the BP model;

FIG. 8 is a comparison graph of the results of an evaluation using an SVM model according to an embodiment of the present invention;

FIG. 9 is a comparison of the results of the evaluation using the KNN model in accordance with the present invention.

Detailed Description

The invention will be described in further detail below with reference to the drawings and examples.

The invention provides a rice safety risk assessment method based on multi-machine learning algorithm fusion, which comprises the following five steps of realizing process and effect verification. The respective steps are specifically described below.

The method comprises the following steps: and preprocessing the acquired rice hazard detection data.

The embodiment of the invention performs example analysis based on rice hazard spot inspection data of 31 provinces (autonomous region, city in direct jurisdiction) except Hongkong and Macao in 2018, wherein the data comprises detection provinces, detection time, detection items and results, and the like, wherein the detection items comprise chromium, benzo [ alpha ] pyrene, lead, inorganic arsenic, aflatoxin B and the like; according to different kinds of the pests, the method is divided into heavy metal pests, mycotoxin pests and pollutant pests; dividing the data into specific values, less than a specific data or undetected data according to the detection result; the results were classified as either pass or fail and the rice hazard detection data samples are shown in table 1.

TABLE 1 Rice hazard detection data sample

In order to extract effective information in the multivariate data, noise filtration, data integration and normalization processing are sequentially carried out on the detection data. The detection data are preprocessed, and effective information is extracted, so that the accuracy of the estimation model prediction is improved.

(1) And (5) filtering noise. Because the detection result of the hazard, the detection unit and the result judgment are separated from each other, the noise in the invention refers to the statistical error caused by unit record error, and the noise filtration is to delete the data which does not accord with the detection result judgment and the detection result judgment of the sample.

(2) And (6) data integration and normalization processing. Because the formats of the detection results are different, the subsequent risk assessment model construction is not facilitated, the unified detection data format is a floating point type, and the unified hazard detection results are standardized by utilizing the trapezoidal membership function of the formula (1).

Wherein, x represents the detection result of a certain hazard, x _max Is the national standard value of the hazard,

c (x) represents a value normalized to the hazard detection result x, which is a risk-free maximum value.

Step two: and constructing a rice safety risk assessment index system.

When a rice safety index system is constructed, a mature Analytic Hierarchy Process (AHP) in a qualitative assessment method is selected for counting and summarizing scoring results of index experts based on rice hazard detection data and industry authority expert evaluation data, a Spectral Clustering algorithm (SC) which is suitable for high-dimensional Clustering, strong in adaptability to data distribution and excellent in Clustering effect is adopted for constructing a group decision weighting model based on index weight distribution, and the rice safety risk assessment index system is constructed in a more objective mode.

In consideration of different knowledge levels, experiences and familiarity degrees of rice hazard indexes of experts, in order to combine scoring characteristics of different experts, the rice safety risk assessment index system is constructed on the basis of expert scoring results. Firstly, the scoring results of experts are classified without supervision, a group decision weighting model based on index weight distribution is constructed by combining an unsupervised clustering algorithm suitable for high-dimensional data, and a rice hazard risk assessment index system is constructed in a more objective mode, wherein the specific flow is shown in figure 1. Firstly, obtaining the scoring result of each expert on the rice hazard indexes, and then continuing the following steps.

(1) And calculating the weight of the evaluation index based on an AHP algorithm. In the process of calculating the index weight, the AHP algorithm stratifies rice hazard detection items to be analyzed according to different hazard types, and constructs a judgment matrix A shown in a formula (2) according to expert scoring results _k×k And endowing each hazard index with a corresponding weight. Wherein k is the number of the indexes of the hazardous substances.

Wherein, the element a in the matrix is judged _ij After the ith hazard index is compared with the jth hazard index, the relative influence of the ith hazard index is judged according to a scale method of 1 to 9, and a _ij Satisfy the requirement of

a _ij The scale and meaning of the elements in the decision matrix are shown in table 2.

Table 2 shows the scale and meaning of the elements in the decision matrix

a _ij Scale	a _ij Meaning of Scale
		a _ij ＝1	The ith hazard index has the same influence as the jth hazard index
a _ij ＝3	The ith hazard index has slightly stronger influence than the jth hazard index
		a _ij ＝5	The ith hazard index has stronger influence than the jth hazard index
a _ij ＝7	The ith hazard index has much stronger influence than the jth hazard index
		a _ij ＝9	The ith hazard indicator is much more influential than the jth hazard indicator

Obtaining the judgment matrix of each expert according to the formula (2), and judging the matrix A for each expert according to the judgment matrix _k×k Computing the maximum feature root λ _max And expert evaluation index weight W = { W = ₁ ,w ₂ ,...,w _k In which w _i And (3) representing the evaluation weight of the expert on the ith rice hazard index, as shown in formulas (3) to (5).

AW＝λ _max W (3)

Matrix consistency detection can be performed by using the maximum feature root.

And (4) if m experts participating in evaluation are provided, the evaluation weight of all experts on each hazard index is recorded as W, as shown in the formula (6).

In formula (6), w _ij And evaluating the weight value of the ith expert to the jth index by the AHP algorithm. The superscript T denotes transposition.

(2) The expert categories are divided based on the SC algorithm. The SC algorithm is a clustering method based on graph theory, and the main idea is that high-dimensional sample data is regarded as a point in space, all data points are connected by edges, the weight of the edge between two points close to each other is higher, and the weight of the edge between two points far from each other is lower. Through the graph cutting, the sum of the inner side weights of all sub-graphs after the graph cutting is as large as possible, and the sum of the side weights of different sub-graphs is as small as possible, so that the purpose of clustering high-dimensional sample data is achieved.

In order to improve the objectivity of index weight and reduce subjective errors, the invention combines the scoring characteristics of different experts and adopts an SC algorithm which is suitable for high-dimensional clustering, has strong adaptability to data distribution and excellent clustering effect to perform unsupervised classification on the scoring results of the experts. The method calculates the expert compatibility based on the cosine similarity of the high-dimensional index weight, constructs a compatibility matrix, and takes the compatibility of the expert as the input of an SC (Standard center) algorithm, wherein the cosine similarity l is shown as a formula (7).

In formula (7), W _x ,W _y The evaluation index weights of the experts x and y are represented respectively, and k is the number of the hazard indexes.

According to the similarity calculation formula in the formula (7), an m-dimensional vector compatibility matrix L can be obtained, as shown in the formula (8).

Element l in the matrix _xy (x, y =1,2, \ 8230; m) represents the degree of compatibility of experts x and y, calculated according to equation (7).

In the SC algorithm d classification, in order to achieve the optimal clustering result, the invention selects CH _ score shown in formula (9) to evaluate the clustering effect, and selects the clustering result with the maximum value by comparing the sizes of the CH _ score.

Wherein, B _D Is a covariance matrix, W, between expert classes _D Is an expert category inner covariance matrix, tr is a trace of the matrix, and d is the number of categories. Let C _q Set of results representing all expert evaluations in class q, c _q Cluster center point representing current class q, c _e Center point, m, representing all expert evaluation results _q Indicating the number of expert evaluation results contained in the class q. According to the spectral clustering principle, the smaller the covariance of data in the classes, the better the covariance is, the larger the covariance between the classes, the higher the Calinski-Harabaz score is, and the better the clustering result is.

(3) And calculating the weight between the expert categories. For the calculation of index weight between expert categories, the invention is designed to divide the evaluation result of m experts into H categories expressed as { H } through SC algorithm ₁ ,h ₂ ,...h _H AtClustering cluster h _i (i =1,2, \8230H) in which the larger the number of experts in a category, the smaller the difference in consistency, and the smaller the assignment of H _i A relatively high weight value. The method comprises the following specific steps:

and 3.1, constructing consistency weight difference values among expert categories. Setting the weight of the ith expert evaluation index obtained based on the AHP algorithm as W _i The category is h _i And h is _i In which comprises

And (5) evaluating the result by each expert. W _i The difference value of the consistency weight with the weight of other expert evaluation indexes is D _i H is as shown in formula (12) _i A difference value of a correspondence weight between a class of expert and another class of expert being { }>

As shown in formula (13);

and 3.2, constructing weight constraint conditions among expert classes. Based on comprehensive consideration of the number of experts and the consistency difference, obtaining a weight calculation model and constraint conditions among experts, and satisfying the formulas (14) and (15);

wherein the content of the first and second substances,

is an expert category h _i The weight of (c).

And 3.3, calculating a weight coefficient among the expert categories. Calculating by a formula to obtain a cluster h _i Inter-expert-category weights of

As shown in equation (16).

And recording the weight result among all the expert categories as beta based on the expert classification result, as shown in the formula (17).

(4) Weights within the expert categories are calculated. The invention also starts from the expert index weight, carries out consistency check on the expert evaluation result, eliminates the index weight which does not pass the consistency check, determines an index reasonable interval and constructs a weight optimization model in the expert category, and the concrete implementation steps comprise the following steps:

and 4.1, determining a reasonable index interval. Set cluster h _i In which comprises

Based on the weight information given by the expert, each risk indicator is present->

A weight value, utilizing>

And determining reasonable index intervals according to the density distribution of the index weights.

For the index j, the range of values of the index that all experts can accept is

Satisfies the following conditions:

the length of the interval of the index value is r, and the index j meets the requirement

Let δ = r _j 2, δ is the conformity test criterion, if w _ij Does not contain other weight values of the index j in the delta field of (d), then w _ij Are singular points.

Determining the reasonable interval of the jth index after all singular points are deleted by traversing the ownership weight value of the index j

And 4.2, constructing a weight optimization model in the expert category. In order to maximally integrate the expert opinions in a reasonable interval, the objective function Obj in the model as the formula (18) satisfies the weight value in the expert category

And w _ij The sum of the deviations of (a); the constraint condition in the model is that T is in the reasonable index interval, and the sum of the weight values of the experts in the category is 1, as shown in formula (19).

And (4) recording the weight results in all the expert categories as T based on the expert classification results, as shown in the formula (20).

Wherein, t _ij And the weight of the ith expert evaluation result in the jth cluster is calculated.

And 4.3, weighting to obtain the comprehensive index weight. Weighting according to the calculation result of the intra-expert-category weight optimization model, the clustering result and the calculation result of the inter-expert-category weight to obtain the comprehensive weight S = { S } of each index ₁ ,s ₂ ,...,s _k As shown in formulas (21) and (22).

Wherein s is _ij Representing the weight, s, of the weighted ith expert on the jth index _i And (4) representing the comprehensive weight of the ith index after the group decision. Here, the

Means class h _i And (4) evaluating the weight occupied by the result by the internal expert i.

And 4.4, calculating the comprehensive risk value of the rice hazards. And (4) weighting the data C cleaned in the step one with the comprehensive index weight S to obtain a low-dimensional comprehensive risk value, namely an output value Y of the rice hazard risk assessment model, as shown in a formula (23).

Y＝S×C(x) (23)

Where C (x) is a vector consisting of normalized k hazard detection values. And multiplying the hazard detection values of all kinds by the corresponding comprehensive weights and then summing to obtain the final rice hazard risk value Y.

The quality safety condition of the current rice can be detected according to the output risk value Y, and the influence of the hazard on the quality safety of the rice can be determined according to the standardized hazard detection value and the comprehensive weight of the hazard, so that high risk factors can be positioned. The supervisory organization can feed back according to the risk value who obtains, supervises rice quality safety, carries out the detection and the processing of important hazardous substances.

The method effectively avoids the amplification of 'invalid information' and the reduction of 'valid information' in group decision, effectively considers the opinions of all experts in a more objective mode, and further constructs a more reasonable and accurate rice hazard index system.

Step three: and constructing a rice hazard risk assessment model.

Compared with the traditional mathematical model, the machine learning algorithm has higher risk identification capability, so that the risk assessment model is built based on the machine learning algorithm, the evaluation accuracy of the single machine learning algorithm is considered to be lower, and in order to further improve the accuracy of the assessment model, a risk assessment model frame formed by the single algorithm is skipped, the advantages of integration, classification and optimization algorithms are integrated, and the rice safety risk assessment model based on the fusion of the multi-machine learning algorithm is built through the Stacking model fusion, so that the intuitive rice safety risk assessment result can be provided for consumers more quickly and accurately when massive and complex data are analyzed.

The Stacking model selected by the invention is an integrated model combining a plurality of different algorithms together, so that the overall prediction precision of the evaluation model is improved. In order to ensure the accuracy of the fusion model evaluation, the selection of the learners should ensure that each learner has better independent prediction capability, so that the Extreme Gradient hoisting XGboost (Extreme Gradient Boosting) algorithm and the lightweight Gradient hoisting GBM (Light Gradient Boosting Machine) algorithm with strong generalization capability are selected as base learners; in order to realize effective complementation of information among algorithms, LSTM (long-short term memory network) which has a larger difference with the principle of the base learner is selected as a meta-learner to construct a fusion model.

In order to improve the operation precision of the model and save the manual parameter adjusting time, for the tree model with more over-parameters, a Bayesian Optimization Algorithm (BOA) is selected to rate the XGboost and LightGBM model parameters; for the neural network algorithm with slow training speed, the gray Wolf optimization algorithm (GWO) with fast convergence speed is adopted to automatically optimize the initial weight, the threshold value and the neuron number of the hidden layer of the LSTM algorithm, and after the model parameters are optimized, a framework of a fusion model BXGB-BLGB-GLSTM is finally formed, as shown in FIG. 2.

As shown in fig. 2, the data preprocessed in the first step is input into a base learner, the XGBoost algorithm and the LightGBM algorithm are used for prediction in the base learner, the output prediction result and the data preprocessed in the first step are used as input data of a meta-learner together, and the LSTM algorithm of the meta-learner is used for predicting and outputting a rice hazard risk result Y. The fusion model BXGB-BLGB-GLSTM realizes the calculation process in the second step.

Step four: and (5) performing model experiments.

1) The data set is partitioned. Firstly, taking the data C (x) preprocessed in the first step as input data of the BXGB-BLGB-GLSTM model, and dividing a data set according to a training test ratio of 3.

2) And (5) training a model. In model fusion, in order to avoid the problem of model overfitting caused by repeated learning of data by a base learner, the method performs K-fold cross validation on a training set. K-fold cross validation is a statistical method to assess generalization performance. In the K-fold cross validation, data are equally divided into K parts, each part is one fold, in the training process, K-1 fold data are used as a training set for training, and the rest 1 fold data are used as a validation set for verifying the model. The K-fold cross validation can be used for fully utilizing data, and the extreme condition that the training set and the validation set are not uniformly distributed due to data difference is avoided.

Dividing the data training set into K sub-training sets with equal size, traversing each sub-training set to enable the base learners (the XGboost model and the LightGBM model) to finish K times of training, and outputting results { x ] on the training set and the testing set respectively after the training of each base learner is finished ₁ ,x ₂ ,...,x _k H, for M base learners, M test set predictions can be output, and the M test set predictions are combined with C (x) to form a metadata set, and the metadata set is passed through a meta-learner (LSTM mode)Type) learning, and outputting the prediction result of the BXGB-BLGB-GLSTM model.

Step five: and (4) evaluating, analyzing and comparing the model.

In order to more clearly compare and illustrate the experimental results of the model of the invention, the invention adopts the correlation coefficient R ² The model is evaluated by 3 indexes, namely the average absolute error MAE and the average square error MSE, and each index is calculated as the formula (24) to the formula (26).

In expressions (24) to (26), N is the sample data amount; yo _i 、ym _i Respectively representing the comprehensive risk value and the predicted value of the hazards of the ith sample;

respectively representing the comprehensive risk average value and the average predicted value of all samples. R ² The size of the curve is positively correlated with the fitting degree of the curve; MAE and MSE are important indexes for measuring variable precision and are negatively related to model precision.

The embodiment of the invention performs example analysis based on rice hazard detection data of 31 provinces in 2018 China except Hongkong and Macao, completes pretreatment on the obtained rice hazard detection data according to the method of the step one, and constructs a rice hazard risk index system according to a screening process, as shown in Table 3.

TABLE 3 Rice hazard Risk indicator System

Classes of risk indicators	Risk index
		Heavy metal hazardous substance	Lead, cadmium, chromium, total mercury and inorganic arsenic
Mycotoxin noxious substances	Aflatoxin B1, ochratoxin A, deoxynivalenol, zearalenone
		Noxious substances of the pollutant class	Benzo [ alpha ] s]Pyrene and aluminum phosphide

In the aspect of index weight construction, 50 effective expert scoring questionnaires are collected together, and the results of partial expert scoring questionnaires are shown in table 4.

Table 4 partial expert scoring questionnaire results

And calculating the expert comprehensive index weight based on the scoring result in the table 4 in combination with the step two, as shown in the table 5.

TABLE 5 expert synthetic index weights

In the aspect of rice hazard risk assessment model experiments, the experimental environment is a Win10 operating system of i5-6200U CPU and 8G RAM, and codes are realized through python3 based on a Jupitter notewood platform. Based on the environmental configuration, the data C (x) cleaned in the first step is used as input data of a risk assessment model, the comprehensive risk value Y is used as output data of the risk assessment model, and the configuration results of parameters trained through the model are shown in table 6.

TABLE 6 optimal parameter configuration for each model algorithm

The parameters n _ estimators and epochs are used for controlling the quantity of estimators, learning _ rate is learning rate, max _ depth is the maximum depth of the tree model, seed is random number seed, the number of features considered when max _ features is the optimal split point, min _ samples _ split is the minimum number of samples required by splitting internal nodes, subsample is sample sampling rate, batch _ size is the number of samples selected after 1 training, optizer is a model optimizer, and activation is an excitation function.

Based on the simulation parameter configuration of table 6, C (x) is input into the BXGB-BLGB-GLSTM risk assessment model, a comparison curve of the comprehensive risk value and the predicted value of each risk index can be obtained, and the evaluation result pair based on the BXGB-BLGB-GLSTM model is as shown in fig. 3. Wherein the X-axis represents the number of samples (unit: one) and the Y-axis represents the degree of contamination (unit:%) of each type of hazard. Wherein a contamination level greater than 1 (i.e., Y > 1) represents a significant overproof of the hazard; and when Y belongs to (0, 1), the value of the Y axis is positively correlated with the pollution degree of the noxious stances.

Comparing and analyzing the evaluation result of the BXGB-BLGB-GLSTM model with a single model prediction result which is proved to have more prominent prediction effect by research, wherein the evaluation result pair based on the XGboost model is shown in figure 4, the evaluation result pair based on the LightGBM model is shown in figure 5, the evaluation result pair based on the LSTM model is shown in figure 6, the evaluation result pair based on the BP model is shown in figure 7, the evaluation result pair based on the SVM model is shown in figure 8, and the evaluation result pair based on the KNN model is shown in figure 9.

As can be seen from the comparative model experiment curves shown in FIGS. 3 to 9, when Y belongs to (0.2, 0.35), the coincidence degree of the predicted value and the true value of each model is high; when Y belongs to (0, 0.2) U (0.35 ∞), i.e. the pollution degree of various kinds of harmful substances is lower or higher, the average fitting effect of partial models (such as KNN, SVM and BP) is poorer, and the pollution degree is easily overestimated (underestimated) when the pollution degree is higher (lower).

In order to compare the experimental results of various models more clearly, the invention combines R ² The model is evaluated by the 3 indexes of MAE and MSE, and the evaluation index parameter pairs of each algorithm are shown in a table 7.

TABLE 7 comparison of evaluation index parameters of each model algorithm

Model (model)	R ²	MAE	MSE
				BXGB-BLGB-GLSTM	0.937165550918625	0.010853262188760	0.000205881888677
XGBoost	0.827113560494595	0.019379027245068	0.000566475670789
				LightGBM	0.759188224211823	0.022529495706231	0.000789038241599
LSTM	0.746908638159939	0.021653373041345	0.000829273246528
				BP	0.729470385424174	0.024493740457837	0.000886411018260
SVM	0.744607468363948	0.023041551003774	0.000836813205750
				KNN	0.739849107809394	0.022235136330000	0.000852404338835

Compared with a single model algorithm, the BXGB-BLGB-GLSTM mixed model provided by the invention has higher accuracy and stronger stability in the aspect of prediction, can intuitively and accurately analyze the risk value of food safety hazards, and can provide a scientific and effective basis for evaluation and decision making of a supervision department.

The above description is only for the best mode of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof should be equally replaced or changed within the technical scope of the present invention.

Claims

1. A rice safety risk assessment method based on multi-machine learning algorithm fusion is characterized by comprising the following steps:

the method comprises the following steps: acquiring rice hazard detection data and preprocessing the rice hazard detection data;

the preprocessing comprises noise filtering, data integration and normalization processing;

the preprocessed hazard detection data comprise standardized detection values of all hazards;

step two: constructing a rice safety risk assessment index system;

obtaining the evaluation result of the expert on the rice hazard indexes, and then executing: (1) Firstly, calculating the evaluation index weight of each expert based on an Analytic Hierarchy Process (AHP), wherein the evaluation index weight refers to the evaluation weight of each rice hazard index; (2) dividing the expert categories based on a spectral clustering method SC; (3) Calculating inter-expert-category weights and intra-expert-category weights; wherein, the more the number of experts in the category is, the smaller the consistency difference is, the larger the weight of the expert category is; (4) finally determining the comprehensive weight of each hazard index;

for the jth index, the evaluation weight of the ith expert to the jth index obtained in the step (1) is w _ij Grouping the evaluation results of the m experts into H classes by step (2), wherein the ith expert is classified into the class H _i In, h _i ∈{h ₁ ,h ₂ ,...h _H Get category h from step (3) _i Has a weight of

Obtaining the class h from the step (4) _i The evaluation result of an expert intern i takes on a weight of @>

Weighting to obtain the ith expert evaluation resultWeight s of the j index of the result pair _ij The following were used:

obtaining the comprehensive weight s of the jth index after the group decision _i The following were used:

weighting the hazard detection data preprocessed in the step one with the comprehensive weight to obtain a rice hazard risk value Y;

judging the quality safety condition of the rice according to the rice hazard risk value Y;

determining the influence of the hazards on the rice quality safety according to the detection data of the hazards and the weighted value of the corresponding comprehensive weight;

step three: constructing a rice hazard risk assessment model by adopting multi-machine learning algorithm fusion;

the rice hazard risk assessment model selects two machine learning algorithms of XGboost and LightGBM to form a base learner, and selects a long-short term memory network LSTM as a meta-learner; inputting the preprocessed hazard detection data into a rice hazard risk assessment model, inputting the output of two machine learning algorithms in the base learner and the preprocessed hazard detection data into the meta-learner, and finally outputting a rice hazard risk value Y by the model.

2. The method of claim 1, wherein in the first step, the noise filtering is to delete data that is determined to be inconsistent with the detection result; the data integration and normalization processing means that the unified detection data format is a floating point type, and the hazard detection result is standardized and unified by utilizing a trapezoid membership function.

3. The method according to claim 2, wherein in the first step, the hazard detection result after the unified data format is standardized by using the following function;

wherein C (x) represents a value normalized to a hazard detection result x, x _max Is the national standard value of the hazard, x _min In order to be the maximum value without risk,

4. the method as claimed in claim 1, wherein in the second step, when the evaluation weight of the index is calculated based on the AHP, a judgment matrix for the rice hazard index is constructed according to the expert scoring result, the rows and columns of the judgment matrix correspond to the rice hazard index, and the matrix elements represent the relative influence of the two hazard indexes.

5. The method according to claim 1, wherein the step two, when classifying the expert categories based on the SC algorithm, comprises: firstly, calculating a compatibility matrix based on the evaluation weight of experts on rice hazardous material indexes, wherein elements in the matrix represent the compatibility of two experts, and the compatibility is obtained by calculating cosine similarity in the indexes; secondly, inputting the compatibility matrix into an SC algorithm, evaluating the clustering effect by using the CH index, and selecting the classification result with the best clustering effect.

6. The method according to claim 1, wherein in the second step, the method for calculating the weight between expert categories comprises:

step 3.1, constructing consistency weight difference values among expert categories;

let the evaluation index weight of the ith expert be W _i The expert category is h _i Class h _i In which comprises

(ii) individual expert assessment results; calculating the consistency weight difference value D of the evaluation index weights of the ith expert and other experts _i The following were used:

wherein, W _i Evaluation index weight, W, for the jth expert _i ＝{w _i1 ,w _i2 ,...,w _ik }; k is the number of the indexes of the noxious substances;

then h is _i Consistency weight difference value between class experts and other expert classes

The calculation is as follows:

step 3.2, constructing weight constraint conditions among expert classes as follows:

wherein the content of the first and second substances,

is an expert category h _i The weight of (c);

step 3.3, calculating the weight among the expert categories as follows:

7. the method according to claim 1, wherein in the second step, the weight in the expert category is calculated as follows:

step 4.1, determining an index reasonable interval;

for the jth hazard index, the range of acceptable indexes of all experts is

Wherein:

the number of the experts participating in the evaluation is m;

interval length of index j value

Let consistency check criterion δ = r _j 2, if w _ij Does not contain other weight values of the index j in the delta field of (d), then w _ij Is a singular point;

traversing the ownership weight value of the index j, deleting all singular points, and determining the reasonable interval of the jth index

Step 4.2, constructing a weight optimization model in the expert category;

let the expert Categories h _i In which comprises

The evaluation index weight of the expert is determined, the objective function Obj of the model satisfies the weight value ^ in the expert category>

And w _ij The deviation sum of (c) is minimum as follows:

wherein, t _i Represents a category h _i The weight occupied by the inner ith expert evaluation result.

8. The method according to claim 1, wherein in the third step, a Bayesian optimization algorithm BOA is selected to calibrate XGboost and LightGBM model parameters; selecting a wolf optimization algorithm GWOO to automatically optimize the initial weight, the threshold value and the number of neurons in a hidden layer of the LSTM; finally, a fusion model BXGB-BLGB-GLSTM is obtained and used as a rice hazard risk assessment model.

9. The method according to claim 1 or 8, wherein in the third step, the rice hazard risk assessment model is trained, comprising:

(1) Dividing collected hazard detection data into a training set and a testing set according to the proportion of 3; to pair

(2) Dividing the training set into K sub-training sets with equal sizes, performing K-fold cross validation on the training set, and completing K times of training on the rice hazard risk assessment model.