CN114764682B - Rice safety risk assessment method based on multi-machine learning algorithm fusion - Google Patents

Rice safety risk assessment method based on multi-machine learning algorithm fusion Download PDF

Info

Publication number
CN114764682B
CN114764682B CN202210306564.2A CN202210306564A CN114764682B CN 114764682 B CN114764682 B CN 114764682B CN 202210306564 A CN202210306564 A CN 202210306564A CN 114764682 B CN114764682 B CN 114764682B
Authority
CN
China
Prior art keywords
expert
weight
rice
index
hazard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210306564.2A
Other languages
Chinese (zh)
Other versions
CN114764682A (en
Inventor
赵峙尧
王姿懿
于家斌
许继平
白玉廷
王小艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN202210306564.2A priority Critical patent/CN114764682B/en
Publication of CN114764682A publication Critical patent/CN114764682A/en
Application granted granted Critical
Publication of CN114764682B publication Critical patent/CN114764682B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Medical Informatics (AREA)
  • Agronomy & Crop Science (AREA)
  • Animal Husbandry (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Mining & Mineral Resources (AREA)

Abstract

The invention provides a rice safety risk assessment method based on multi-machine learning algorithm fusion. The method comprises the following steps: acquiring rice hazard detection data and preprocessing the rice hazard detection data; starting from the aspect of hazard indexes, realizing expert classification according to an AHP algorithm and an SC algorithm, solving weights in expert categories and weights among experts by combining consistency weight differences of expert evaluation results, constructing a rice safety risk evaluation index system, and carrying out weighted summation on preprocessed hazard detection data and comprehensive weights to obtain a rice hazard risk value; and a multi-machine learning algorithm is adopted to fuse and construct a rice safety risk assessment model so as to realize rapid risk assessment. The method effectively considers the opinions of all experts in a more objective mode, and avoids the amplification of invalid information and the reduction of valid information. The invention can effectively reduce the supervision cost, improve the risk discovery and response handling efficiency and provide accurate and efficient decision basis for supervision departments.

Description

Rice safety risk assessment method based on multi-machine learning algorithm fusion
Technical Field
The invention belongs to the technical field of food quality detection and food safety risk assessment, relates to technologies such as big data processing and machine learning, and particularly relates to a rice safety risk assessment method based on multi-machine learning algorithm fusion.
Background
In recent years, food safety events are frequent, higher requirements are put forward for food safety supervision, and countries in the world have successively introduced a series of strict food safety supervision policies. In order to further strengthen risk monitoring, risk assessment and supply chain management and improve risk discovery and response handling efficiency, all levels of government departments vigorously promote digital construction in the food safety field, strengthen 'big data + food' supervision and play the advantages and roles of technologies such as big data, artificial intelligence and the like in the fields of food safety risk assessment, supervision and the like.
At present, food safety risk assessment methods mainly comprise three major categories, namely a qualitative assessment method, a quantitative assessment method and a comprehensive risk assessment method. The qualitative assessment method is an assessment method with strong subjectivity, and mainly analyzes and judges the risk index according to the knowledge and experience of an evaluator, and calculates the index risk value according to the judgment result and a matrix model. The qualitative assessment method based on single expert assessment is relatively mature and comprises a Delphi method, an analytic hierarchy process, a decision laboratory method, an index scoring method and the like. Qualitative assessment methods based on multiple experts are divided into subjective weighting and objective weighting, wherein the subjective weighting method is to divide the expert weights based on expert prior information, such as: the prestige, the knowledge level and the like, and calculating a risk value according to an expert weight result; and dividing the expert weight based on the consistency index value of the expert evaluation result by an objective weighting method, and calculating the risk value according to the expert weight result. In actual decision making, the qualitative assessment method based on multiple experts has high credibility, and in the research of the expert weighting method, the objective weighting method is more widely applied compared with the subjective method. The quantitative evaluation method is an evaluation method with strong objectivity, and index risk values are calculated through a mathematical model, and the evaluation method comprises a Monte Carlo quantitative evaluation method, a grey correlation theory method, a fuzzy comprehensive evaluation model, a machine learning artificial neural network model and the like. The comprehensive risk assessment method is a combination of qualitative and quantitative assessment methods, an index system is established through the qualitative assessment method, and a risk assessment model is established according to the index system and the quantitative assessment method.
With the acceleration of digital transformation, food detection data grows exponentially and explosively, data processing and analysis are difficult to become the first problems restricting food safety risk supervision, and the accuracy of a risk assessment model taking data as a carrier is directly influenced. In the existing risk assessment method, the qualitative assessment method is high in labor cost and long in assessment process, and the quantitative assessment method has the problems of low index precision or weak overfitting performance and the like, so that the accuracy of a risk assessment result is low, the time cost is high, and the capability of accurately positioning a risk value is lost.
Disclosure of Invention
Aiming at the problems that in the prior art, food safety risk assessment time is long, assessment results are low in accuracy rate, and risks cannot be accurately located, the invention provides a rice safety risk assessment method based on multi-machine learning algorithm fusion.
The invention discloses a rice safety risk assessment method based on multi-machine learning algorithm fusion, which is realized by the following steps:
(1) And acquiring rice hazard detection data and preprocessing the rice hazard detection data.
The preprocessing comprises noise filtering, data integration and normalization processing of the detection data in sequence.
Setting k kinds of hazards, wherein the preprocessed hazard detection data comprise standardized detection values of all the hazards;
(2) And constructing a rice safety risk assessment index system.
Obtaining the evaluation result of the expert on the rice hazard indexes, and then executing: (1) Firstly, calculating the evaluation index weight of each expert based on an Analytic Hierarchy Process (AHP), wherein the evaluation index weight refers to the evaluation weight of each rice hazard index; (2) dividing the expert categories based on a spectral clustering method SC; (3) Calculating inter-expert-category weights and intra-expert-category weights; the more the number of experts in the category is, the smaller the consistency difference is, the greater the weight of the expert category is; (4) finally determining the comprehensive weight of each hazard index;
for the jth index, the evaluation weight of the ith expert to the jth index is calculated to be w ij The evaluation results of m experts are grouped into H classes by SC algorithm, wherein the ith expert is divided into H classes i In, h i ∈{h 1 ,h 2 ,...h H Get category h by calculation i Is weighted by
Figure BDA0003565697460000021
Class h i The inner expert i evaluates in weight ^ based on>
Figure BDA0003565697460000022
Then the weighting of the ith expert evaluation result to the jth index is obtained
Figure BDA0003565697460000023
Obtaining the comprehensive weight of the jth index after the group decision
Figure BDA0003565697460000024
And (4) weighting and summing the hazard detection data preprocessed in the step one with the comprehensive weight to obtain a rice hazard risk value Y.
(3) In order to provide visual rice safety risk assessment results more quickly and accurately, the rice safety risk assessment method adopts a multi-machine learning algorithm to construct a rice safety risk assessment model in a fusion mode.
Constructing a rice hazard risk assessment model, selecting two machine learning algorithms of XGboost and LightGBM to form a base learner, and selecting a long-short term memory network LSTM as a meta-learner; inputting the preprocessed hazardous material detection data into a rice hazardous material risk assessment model, inputting the output of two machine learning algorithms in the base learner and the preprocessed hazardous material detection data into a meta-learner, and finally outputting a rice hazardous material risk value Y by the model.
The method of the invention judges the rice quality safety condition according to the rice hazard risk value Y. According to the detection data of each hazard and the weighted value of the corresponding comprehensive weight, the influence of the hazard on the quality safety of the rice can be determined, and the main hazard can be positioned.
Compared with the prior art, the invention has the advantages that:
(1) According to the method, the rice safety risk indexes are screened based on the group decision model, a rice safety risk index evaluation system is constructed, on the premise that few obeys majority, amplification of 'invalid information' and reduction of 'valid information' in group decision are effectively avoided, and opinions of all experts are effectively considered in a more objective mode; the method of the invention fully considers that the expert knowledge level, the experience and the familiarity degree of rice hazard indexes are different, and constructs a rice safety risk assessment index system in a more objective mode.
(2) The method provided by the invention is constructed based on a fusion algorithm, the difference between the angle and the principle of observation data of each algorithm is comprehensively considered, the advantages and the disadvantages of the differentiation algorithm are made up based on a Stacking integrated learning strategy, the rice hazard risk value can be rapidly and accurately analyzed through a rice safety risk assessment model BXGB-BLGB-GLSTM, and a scientific and effective basis is provided for assessment decisions of supervision departments.
(3) According to the method, the danger detection data is preprocessed, effective information is extracted, and the accuracy of rice danger risk assessment model prediction can be improved.
(4) The method solves the problems that the food safety risk evaluation time is long, the accuracy of the evaluation result is low, and the risk cannot be accurately positioned in the prior art, can effectively reduce the supervision cost, improve the risk discovery and response treatment efficiency, and can provide an accurate and efficient decision basis for supervision departments.
Drawings
FIG. 1 is a schematic overall flow chart of the rice safety risk assessment method of the present invention;
FIG. 2 is a schematic diagram of the framework of the hybrid model BXGB-BLGB-GLSTM of the present invention;
FIG. 3 is a comparison graph of the evaluation results of an embodiment of the present invention using the BXGB-BLGB-GLSTM model;
FIG. 4 is a comparison graph of the evaluation results using the XGboost model according to an embodiment of the present invention;
FIG. 5 is a comparison graph of the results of the LightGBM model evaluation according to the embodiment of the invention;
FIG. 6 is a comparison graph of the results of an evaluation using the LSTM model according to an embodiment of the present invention;
FIG. 7 is a comparison graph of the evaluation results of the embodiment of the present invention using the BP model;
FIG. 8 is a comparison graph of the results of an evaluation using an SVM model according to an embodiment of the present invention;
FIG. 9 is a comparison of the results of the evaluation using the KNN model in accordance with the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings and examples.
The invention provides a rice safety risk assessment method based on multi-machine learning algorithm fusion, which comprises the following five steps of realizing process and effect verification. The respective steps are specifically described below.
The method comprises the following steps: and preprocessing the acquired rice hazard detection data.
The embodiment of the invention performs example analysis based on rice hazard spot inspection data of 31 provinces (autonomous region, city in direct jurisdiction) except Hongkong and Macao in 2018, wherein the data comprises detection provinces, detection time, detection items and results, and the like, wherein the detection items comprise chromium, benzo [ alpha ] pyrene, lead, inorganic arsenic, aflatoxin B and the like; according to different kinds of the pests, the method is divided into heavy metal pests, mycotoxin pests and pollutant pests; dividing the data into specific values, less than a specific data or undetected data according to the detection result; the results were classified as either pass or fail and the rice hazard detection data samples are shown in table 1.
TABLE 1 Rice hazard detection data sample
Figure BDA0003565697460000031
Figure BDA0003565697460000041
In order to extract effective information in the multivariate data, noise filtration, data integration and normalization processing are sequentially carried out on the detection data. The detection data are preprocessed, and effective information is extracted, so that the accuracy of the estimation model prediction is improved.
(1) And (5) filtering noise. Because the detection result of the hazard, the detection unit and the result judgment are separated from each other, the noise in the invention refers to the statistical error caused by unit record error, and the noise filtration is to delete the data which does not accord with the detection result judgment and the detection result judgment of the sample.
(2) And (6) data integration and normalization processing. Because the formats of the detection results are different, the subsequent risk assessment model construction is not facilitated, the unified detection data format is a floating point type, and the unified hazard detection results are standardized by utilizing the trapezoidal membership function of the formula (1).
Figure BDA0003565697460000042
Wherein, x represents the detection result of a certain hazard, x max Is the national standard value of the hazard,
Figure BDA0003565697460000043
c (x) represents a value normalized to the hazard detection result x, which is a risk-free maximum value.
Step two: and constructing a rice safety risk assessment index system.
When a rice safety index system is constructed, a mature Analytic Hierarchy Process (AHP) in a qualitative assessment method is selected for counting and summarizing scoring results of index experts based on rice hazard detection data and industry authority expert evaluation data, a Spectral Clustering algorithm (SC) which is suitable for high-dimensional Clustering, strong in adaptability to data distribution and excellent in Clustering effect is adopted for constructing a group decision weighting model based on index weight distribution, and the rice safety risk assessment index system is constructed in a more objective mode.
In consideration of different knowledge levels, experiences and familiarity degrees of rice hazard indexes of experts, in order to combine scoring characteristics of different experts, the rice safety risk assessment index system is constructed on the basis of expert scoring results. Firstly, the scoring results of experts are classified without supervision, a group decision weighting model based on index weight distribution is constructed by combining an unsupervised clustering algorithm suitable for high-dimensional data, and a rice hazard risk assessment index system is constructed in a more objective mode, wherein the specific flow is shown in figure 1. Firstly, obtaining the scoring result of each expert on the rice hazard indexes, and then continuing the following steps.
(1) And calculating the weight of the evaluation index based on an AHP algorithm. In the process of calculating the index weight, the AHP algorithm stratifies rice hazard detection items to be analyzed according to different hazard types, and constructs a judgment matrix A shown in a formula (2) according to expert scoring results k×k And endowing each hazard index with a corresponding weight. Wherein k is the number of the indexes of the hazardous substances.
Figure BDA0003565697460000044
Wherein, the element a in the matrix is judged ij After the ith hazard index is compared with the jth hazard index, the relative influence of the ith hazard index is judged according to a scale method of 1 to 9, and a ij Satisfy the requirement of
Figure BDA0003565697460000051
a ij The scale and meaning of the elements in the decision matrix are shown in table 2.
Table 2 shows the scale and meaning of the elements in the decision matrix
a ij Scale a ij Meaning of Scale
a ij =1 The ith hazard index has the same influence as the jth hazard index
a ij =3 The ith hazard index has slightly stronger influence than the jth hazard index
a ij =5 The ith hazard index has stronger influence than the jth hazard index
a ij =7 The ith hazard index has much stronger influence than the jth hazard index
a ij =9 The ith hazard indicator is much more influential than the jth hazard indicator
Obtaining the judgment matrix of each expert according to the formula (2), and judging the matrix A for each expert according to the judgment matrix k×k Computing the maximum feature root λ max And expert evaluation index weight W = { W = 1 ,w 2 ,...,w k In which w i And (3) representing the evaluation weight of the expert on the ith rice hazard index, as shown in formulas (3) to (5).
AW=λ max W (3)
Figure BDA0003565697460000052
Figure BDA0003565697460000053
Matrix consistency detection can be performed by using the maximum feature root.
And (4) if m experts participating in evaluation are provided, the evaluation weight of all experts on each hazard index is recorded as W, as shown in the formula (6).
Figure BDA0003565697460000054
In formula (6), w ij And evaluating the weight value of the ith expert to the jth index by the AHP algorithm. The superscript T denotes transposition.
(2) The expert categories are divided based on the SC algorithm. The SC algorithm is a clustering method based on graph theory, and the main idea is that high-dimensional sample data is regarded as a point in space, all data points are connected by edges, the weight of the edge between two points close to each other is higher, and the weight of the edge between two points far from each other is lower. Through the graph cutting, the sum of the inner side weights of all sub-graphs after the graph cutting is as large as possible, and the sum of the side weights of different sub-graphs is as small as possible, so that the purpose of clustering high-dimensional sample data is achieved.
In order to improve the objectivity of index weight and reduce subjective errors, the invention combines the scoring characteristics of different experts and adopts an SC algorithm which is suitable for high-dimensional clustering, has strong adaptability to data distribution and excellent clustering effect to perform unsupervised classification on the scoring results of the experts. The method calculates the expert compatibility based on the cosine similarity of the high-dimensional index weight, constructs a compatibility matrix, and takes the compatibility of the expert as the input of an SC (Standard center) algorithm, wherein the cosine similarity l is shown as a formula (7).
Figure BDA0003565697460000061
In formula (7), W x ,W y The evaluation index weights of the experts x and y are represented respectively, and k is the number of the hazard indexes.
According to the similarity calculation formula in the formula (7), an m-dimensional vector compatibility matrix L can be obtained, as shown in the formula (8).
Figure BDA0003565697460000062
Element l in the matrix xy (x, y =1,2, \ 8230; m) represents the degree of compatibility of experts x and y, calculated according to equation (7).
In the SC algorithm d classification, in order to achieve the optimal clustering result, the invention selects CH _ score shown in formula (9) to evaluate the clustering effect, and selects the clustering result with the maximum value by comparing the sizes of the CH _ score.
Figure BDA0003565697460000063
Figure BDA0003565697460000064
Figure BDA0003565697460000065
Wherein, B D Is a covariance matrix, W, between expert classes D Is an expert category inner covariance matrix, tr is a trace of the matrix, and d is the number of categories. Let C q Set of results representing all expert evaluations in class q, c q Cluster center point representing current class q, c e Center point, m, representing all expert evaluation results q Indicating the number of expert evaluation results contained in the class q. According to the spectral clustering principle, the smaller the covariance of data in the classes, the better the covariance is, the larger the covariance between the classes, the higher the Calinski-Harabaz score is, and the better the clustering result is.
(3) And calculating the weight between the expert categories. For the calculation of index weight between expert categories, the invention is designed to divide the evaluation result of m experts into H categories expressed as { H } through SC algorithm 1 ,h 2 ,...h H AtClustering cluster h i (i =1,2, \8230H) in which the larger the number of experts in a category, the smaller the difference in consistency, and the smaller the assignment of H i A relatively high weight value. The method comprises the following specific steps:
and 3.1, constructing consistency weight difference values among expert categories. Setting the weight of the ith expert evaluation index obtained based on the AHP algorithm as W i The category is h i And h is i In which comprises
Figure BDA0003565697460000066
And (5) evaluating the result by each expert. W i The difference value of the consistency weight with the weight of other expert evaluation indexes is D i H is as shown in formula (12) i A difference value of a correspondence weight between a class of expert and another class of expert being { }>
Figure BDA0003565697460000067
As shown in formula (13);
Figure BDA0003565697460000068
Figure BDA0003565697460000069
and 3.2, constructing weight constraint conditions among expert classes. Based on comprehensive consideration of the number of experts and the consistency difference, obtaining a weight calculation model and constraint conditions among experts, and satisfying the formulas (14) and (15);
Figure BDA0003565697460000071
Figure BDA0003565697460000072
wherein the content of the first and second substances,
Figure BDA0003565697460000073
is an expert category h i The weight of (c).
And 3.3, calculating a weight coefficient among the expert categories. Calculating by a formula to obtain a cluster h i Inter-expert-category weights of
Figure BDA0003565697460000074
As shown in equation (16).
Figure BDA0003565697460000075
And recording the weight result among all the expert categories as beta based on the expert classification result, as shown in the formula (17).
Figure BDA0003565697460000076
(4) Weights within the expert categories are calculated. The invention also starts from the expert index weight, carries out consistency check on the expert evaluation result, eliminates the index weight which does not pass the consistency check, determines an index reasonable interval and constructs a weight optimization model in the expert category, and the concrete implementation steps comprise the following steps:
and 4.1, determining a reasonable index interval. Set cluster h i In which comprises
Figure BDA0003565697460000077
Based on the weight information given by the expert, each risk indicator is present->
Figure BDA0003565697460000078
A weight value, utilizing>
Figure BDA0003565697460000079
And determining reasonable index intervals according to the density distribution of the index weights.
For the index j, the range of values of the index that all experts can accept is
Figure BDA00035656974600000710
Satisfies the following conditions:
Figure BDA00035656974600000711
the length of the interval of the index value is r, and the index j meets the requirement
Figure BDA00035656974600000712
Let δ = r j 2, δ is the conformity test criterion, if w ij Does not contain other weight values of the index j in the delta field of (d), then w ij Are singular points.
Determining the reasonable interval of the jth index after all singular points are deleted by traversing the ownership weight value of the index j
Figure BDA00035656974600000713
And 4.2, constructing a weight optimization model in the expert category. In order to maximally integrate the expert opinions in a reasonable interval, the objective function Obj in the model as the formula (18) satisfies the weight value in the expert category
Figure BDA00035656974600000714
And w ij The sum of the deviations of (a); the constraint condition in the model is that T is in the reasonable index interval, and the sum of the weight values of the experts in the category is 1, as shown in formula (19).
Figure BDA00035656974600000715
Figure BDA00035656974600000716
And (4) recording the weight results in all the expert categories as T based on the expert classification results, as shown in the formula (20).
Figure BDA0003565697460000081
Wherein, t ij And the weight of the ith expert evaluation result in the jth cluster is calculated.
And 4.3, weighting to obtain the comprehensive index weight. Weighting according to the calculation result of the intra-expert-category weight optimization model, the clustering result and the calculation result of the inter-expert-category weight to obtain the comprehensive weight S = { S } of each index 1 ,s 2 ,...,s k As shown in formulas (21) and (22).
Figure BDA0003565697460000082
Figure BDA0003565697460000083
Wherein s is ij Representing the weight, s, of the weighted ith expert on the jth index i And (4) representing the comprehensive weight of the ith index after the group decision. Here, the
Figure BDA0003565697460000084
Means class h i And (4) evaluating the weight occupied by the result by the internal expert i.
And 4.4, calculating the comprehensive risk value of the rice hazards. And (4) weighting the data C cleaned in the step one with the comprehensive index weight S to obtain a low-dimensional comprehensive risk value, namely an output value Y of the rice hazard risk assessment model, as shown in a formula (23).
Y=S×C(x) (23)
Where C (x) is a vector consisting of normalized k hazard detection values. And multiplying the hazard detection values of all kinds by the corresponding comprehensive weights and then summing to obtain the final rice hazard risk value Y.
The quality safety condition of the current rice can be detected according to the output risk value Y, and the influence of the hazard on the quality safety of the rice can be determined according to the standardized hazard detection value and the comprehensive weight of the hazard, so that high risk factors can be positioned. The supervisory organization can feed back according to the risk value who obtains, supervises rice quality safety, carries out the detection and the processing of important hazardous substances.
The method effectively avoids the amplification of 'invalid information' and the reduction of 'valid information' in group decision, effectively considers the opinions of all experts in a more objective mode, and further constructs a more reasonable and accurate rice hazard index system.
Step three: and constructing a rice hazard risk assessment model.
Compared with the traditional mathematical model, the machine learning algorithm has higher risk identification capability, so that the risk assessment model is built based on the machine learning algorithm, the evaluation accuracy of the single machine learning algorithm is considered to be lower, and in order to further improve the accuracy of the assessment model, a risk assessment model frame formed by the single algorithm is skipped, the advantages of integration, classification and optimization algorithms are integrated, and the rice safety risk assessment model based on the fusion of the multi-machine learning algorithm is built through the Stacking model fusion, so that the intuitive rice safety risk assessment result can be provided for consumers more quickly and accurately when massive and complex data are analyzed.
The Stacking model selected by the invention is an integrated model combining a plurality of different algorithms together, so that the overall prediction precision of the evaluation model is improved. In order to ensure the accuracy of the fusion model evaluation, the selection of the learners should ensure that each learner has better independent prediction capability, so that the Extreme Gradient hoisting XGboost (Extreme Gradient Boosting) algorithm and the lightweight Gradient hoisting GBM (Light Gradient Boosting Machine) algorithm with strong generalization capability are selected as base learners; in order to realize effective complementation of information among algorithms, LSTM (long-short term memory network) which has a larger difference with the principle of the base learner is selected as a meta-learner to construct a fusion model.
In order to improve the operation precision of the model and save the manual parameter adjusting time, for the tree model with more over-parameters, a Bayesian Optimization Algorithm (BOA) is selected to rate the XGboost and LightGBM model parameters; for the neural network algorithm with slow training speed, the gray Wolf optimization algorithm (GWO) with fast convergence speed is adopted to automatically optimize the initial weight, the threshold value and the neuron number of the hidden layer of the LSTM algorithm, and after the model parameters are optimized, a framework of a fusion model BXGB-BLGB-GLSTM is finally formed, as shown in FIG. 2.
As shown in fig. 2, the data preprocessed in the first step is input into a base learner, the XGBoost algorithm and the LightGBM algorithm are used for prediction in the base learner, the output prediction result and the data preprocessed in the first step are used as input data of a meta-learner together, and the LSTM algorithm of the meta-learner is used for predicting and outputting a rice hazard risk result Y. The fusion model BXGB-BLGB-GLSTM realizes the calculation process in the second step.
Step four: and (5) performing model experiments.
1) The data set is partitioned. Firstly, taking the data C (x) preprocessed in the first step as input data of the BXGB-BLGB-GLSTM model, and dividing a data set according to a training test ratio of 3.
2) And (5) training a model. In model fusion, in order to avoid the problem of model overfitting caused by repeated learning of data by a base learner, the method performs K-fold cross validation on a training set. K-fold cross validation is a statistical method to assess generalization performance. In the K-fold cross validation, data are equally divided into K parts, each part is one fold, in the training process, K-1 fold data are used as a training set for training, and the rest 1 fold data are used as a validation set for verifying the model. The K-fold cross validation can be used for fully utilizing data, and the extreme condition that the training set and the validation set are not uniformly distributed due to data difference is avoided.
Dividing the data training set into K sub-training sets with equal size, traversing each sub-training set to enable the base learners (the XGboost model and the LightGBM model) to finish K times of training, and outputting results { x ] on the training set and the testing set respectively after the training of each base learner is finished 1 ,x 2 ,...,x k H, for M base learners, M test set predictions can be output, and the M test set predictions are combined with C (x) to form a metadata set, and the metadata set is passed through a meta-learner (LSTM mode)Type) learning, and outputting the prediction result of the BXGB-BLGB-GLSTM model.
Step five: and (4) evaluating, analyzing and comparing the model.
In order to more clearly compare and illustrate the experimental results of the model of the invention, the invention adopts the correlation coefficient R 2 The model is evaluated by 3 indexes, namely the average absolute error MAE and the average square error MSE, and each index is calculated as the formula (24) to the formula (26).
Figure BDA0003565697460000091
Figure BDA0003565697460000092
Figure BDA0003565697460000101
In expressions (24) to (26), N is the sample data amount; yo i 、ym i Respectively representing the comprehensive risk value and the predicted value of the hazards of the ith sample;
Figure BDA0003565697460000102
respectively representing the comprehensive risk average value and the average predicted value of all samples. R 2 The size of the curve is positively correlated with the fitting degree of the curve; MAE and MSE are important indexes for measuring variable precision and are negatively related to model precision.
The embodiment of the invention performs example analysis based on rice hazard detection data of 31 provinces in 2018 China except Hongkong and Macao, completes pretreatment on the obtained rice hazard detection data according to the method of the step one, and constructs a rice hazard risk index system according to a screening process, as shown in Table 3.
TABLE 3 Rice hazard Risk indicator System
Classes of risk indicators Risk index
Heavy metal hazardous substance Lead, cadmium, chromium, total mercury and inorganic arsenic
Mycotoxin noxious substances Aflatoxin B1, ochratoxin A, deoxynivalenol, zearalenone
Noxious substances of the pollutant class Benzo [ alpha ] s]Pyrene and aluminum phosphide
In the aspect of index weight construction, 50 effective expert scoring questionnaires are collected together, and the results of partial expert scoring questionnaires are shown in table 4.
Table 4 partial expert scoring questionnaire results
Figure BDA0003565697460000103
And calculating the expert comprehensive index weight based on the scoring result in the table 4 in combination with the step two, as shown in the table 5.
TABLE 5 expert synthetic index weights
Figure BDA0003565697460000104
Figure BDA0003565697460000111
In the aspect of rice hazard risk assessment model experiments, the experimental environment is a Win10 operating system of i5-6200U CPU and 8G RAM, and codes are realized through python3 based on a Jupitter notewood platform. Based on the environmental configuration, the data C (x) cleaned in the first step is used as input data of a risk assessment model, the comprehensive risk value Y is used as output data of the risk assessment model, and the configuration results of parameters trained through the model are shown in table 6.
TABLE 6 optimal parameter configuration for each model algorithm
Figure BDA0003565697460000112
The parameters n _ estimators and epochs are used for controlling the quantity of estimators, learning _ rate is learning rate, max _ depth is the maximum depth of the tree model, seed is random number seed, the number of features considered when max _ features is the optimal split point, min _ samples _ split is the minimum number of samples required by splitting internal nodes, subsample is sample sampling rate, batch _ size is the number of samples selected after 1 training, optizer is a model optimizer, and activation is an excitation function.
Based on the simulation parameter configuration of table 6, C (x) is input into the BXGB-BLGB-GLSTM risk assessment model, a comparison curve of the comprehensive risk value and the predicted value of each risk index can be obtained, and the evaluation result pair based on the BXGB-BLGB-GLSTM model is as shown in fig. 3. Wherein the X-axis represents the number of samples (unit: one) and the Y-axis represents the degree of contamination (unit:%) of each type of hazard. Wherein a contamination level greater than 1 (i.e., Y > 1) represents a significant overproof of the hazard; and when Y belongs to (0, 1), the value of the Y axis is positively correlated with the pollution degree of the noxious stances.
Comparing and analyzing the evaluation result of the BXGB-BLGB-GLSTM model with a single model prediction result which is proved to have more prominent prediction effect by research, wherein the evaluation result pair based on the XGboost model is shown in figure 4, the evaluation result pair based on the LightGBM model is shown in figure 5, the evaluation result pair based on the LSTM model is shown in figure 6, the evaluation result pair based on the BP model is shown in figure 7, the evaluation result pair based on the SVM model is shown in figure 8, and the evaluation result pair based on the KNN model is shown in figure 9.
As can be seen from the comparative model experiment curves shown in FIGS. 3 to 9, when Y belongs to (0.2, 0.35), the coincidence degree of the predicted value and the true value of each model is high; when Y belongs to (0, 0.2) U (0.35 ∞), i.e. the pollution degree of various kinds of harmful substances is lower or higher, the average fitting effect of partial models (such as KNN, SVM and BP) is poorer, and the pollution degree is easily overestimated (underestimated) when the pollution degree is higher (lower).
In order to compare the experimental results of various models more clearly, the invention combines R 2 The model is evaluated by the 3 indexes of MAE and MSE, and the evaluation index parameter pairs of each algorithm are shown in a table 7.
TABLE 7 comparison of evaluation index parameters of each model algorithm
Model (model) R 2 MAE MSE
BXGB-BLGB-GLSTM 0.937165550918625 0.010853262188760 0.000205881888677
XGBoost 0.827113560494595 0.019379027245068 0.000566475670789
LightGBM 0.759188224211823 0.022529495706231 0.000789038241599
LSTM 0.746908638159939 0.021653373041345 0.000829273246528
BP 0.729470385424174 0.024493740457837 0.000886411018260
SVM 0.744607468363948 0.023041551003774 0.000836813205750
KNN 0.739849107809394 0.022235136330000 0.000852404338835
Compared with a single model algorithm, the BXGB-BLGB-GLSTM mixed model provided by the invention has higher accuracy and stronger stability in the aspect of prediction, can intuitively and accurately analyze the risk value of food safety hazards, and can provide a scientific and effective basis for evaluation and decision making of a supervision department.
The above description is only for the best mode of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof should be equally replaced or changed within the technical scope of the present invention.

Claims (9)

1. A rice safety risk assessment method based on multi-machine learning algorithm fusion is characterized by comprising the following steps:
the method comprises the following steps: acquiring rice hazard detection data and preprocessing the rice hazard detection data;
the preprocessing comprises noise filtering, data integration and normalization processing;
the preprocessed hazard detection data comprise standardized detection values of all hazards;
step two: constructing a rice safety risk assessment index system;
obtaining the evaluation result of the expert on the rice hazard indexes, and then executing: (1) Firstly, calculating the evaluation index weight of each expert based on an Analytic Hierarchy Process (AHP), wherein the evaluation index weight refers to the evaluation weight of each rice hazard index; (2) dividing the expert categories based on a spectral clustering method SC; (3) Calculating inter-expert-category weights and intra-expert-category weights; wherein, the more the number of experts in the category is, the smaller the consistency difference is, the larger the weight of the expert category is; (4) finally determining the comprehensive weight of each hazard index;
for the jth index, the evaluation weight of the ith expert to the jth index obtained in the step (1) is w ij Grouping the evaluation results of the m experts into H classes by step (2), wherein the ith expert is classified into the class H i In, h i ∈{h 1 ,h 2 ,...h H Get category h from step (3) i Has a weight of
Figure FDA0004041816000000011
Obtaining the class h from the step (4) i The evaluation result of an expert intern i takes on a weight of @>
Figure FDA0004041816000000012
Weighting to obtain the ith expert evaluation resultWeight s of the j index of the result pair ij The following were used:
Figure FDA0004041816000000013
obtaining the comprehensive weight s of the jth index after the group decision i The following were used:
Figure FDA0004041816000000014
weighting the hazard detection data preprocessed in the step one with the comprehensive weight to obtain a rice hazard risk value Y;
judging the quality safety condition of the rice according to the rice hazard risk value Y;
determining the influence of the hazards on the rice quality safety according to the detection data of the hazards and the weighted value of the corresponding comprehensive weight;
step three: constructing a rice hazard risk assessment model by adopting multi-machine learning algorithm fusion;
the rice hazard risk assessment model selects two machine learning algorithms of XGboost and LightGBM to form a base learner, and selects a long-short term memory network LSTM as a meta-learner; inputting the preprocessed hazard detection data into a rice hazard risk assessment model, inputting the output of two machine learning algorithms in the base learner and the preprocessed hazard detection data into the meta-learner, and finally outputting a rice hazard risk value Y by the model.
2. The method of claim 1, wherein in the first step, the noise filtering is to delete data that is determined to be inconsistent with the detection result; the data integration and normalization processing means that the unified detection data format is a floating point type, and the hazard detection result is standardized and unified by utilizing a trapezoid membership function.
3. The method according to claim 2, wherein in the first step, the hazard detection result after the unified data format is standardized by using the following function;
Figure FDA0004041816000000021
wherein C (x) represents a value normalized to a hazard detection result x, x max Is the national standard value of the hazard, x min In order to be the maximum value without risk,
Figure FDA0004041816000000022
4. the method as claimed in claim 1, wherein in the second step, when the evaluation weight of the index is calculated based on the AHP, a judgment matrix for the rice hazard index is constructed according to the expert scoring result, the rows and columns of the judgment matrix correspond to the rice hazard index, and the matrix elements represent the relative influence of the two hazard indexes.
5. The method according to claim 1, wherein the step two, when classifying the expert categories based on the SC algorithm, comprises: firstly, calculating a compatibility matrix based on the evaluation weight of experts on rice hazardous material indexes, wherein elements in the matrix represent the compatibility of two experts, and the compatibility is obtained by calculating cosine similarity in the indexes; secondly, inputting the compatibility matrix into an SC algorithm, evaluating the clustering effect by using the CH index, and selecting the classification result with the best clustering effect.
6. The method according to claim 1, wherein in the second step, the method for calculating the weight between expert categories comprises:
step 3.1, constructing consistency weight difference values among expert categories;
let the evaluation index weight of the ith expert be W i The expert category is h i Class h i In which comprises
Figure FDA0004041816000000023
(ii) individual expert assessment results; calculating the consistency weight difference value D of the evaluation index weights of the ith expert and other experts i The following were used:
Figure FDA0004041816000000024
wherein, W i Evaluation index weight, W, for the jth expert i ={w i1 ,w i2 ,...,w ik }; k is the number of the indexes of the noxious substances;
then h is i Consistency weight difference value between class experts and other expert classes
Figure FDA0004041816000000025
The calculation is as follows:
Figure FDA0004041816000000026
step 3.2, constructing weight constraint conditions among expert classes as follows:
Figure FDA0004041816000000027
Figure FDA0004041816000000028
wherein the content of the first and second substances,
Figure FDA0004041816000000029
is an expert category h i The weight of (c);
step 3.3, calculating the weight among the expert categories as follows:
Figure FDA00040418160000000210
7. the method according to claim 1, wherein in the second step, the weight in the expert category is calculated as follows:
step 4.1, determining an index reasonable interval;
for the jth hazard index, the range of acceptable indexes of all experts is
Figure FDA0004041816000000031
Wherein:
Figure FDA0004041816000000032
the number of the experts participating in the evaluation is m;
interval length of index j value
Figure FDA0004041816000000033
Let consistency check criterion δ = r j 2, if w ij Does not contain other weight values of the index j in the delta field of (d), then w ij Is a singular point;
traversing the ownership weight value of the index j, deleting all singular points, and determining the reasonable interval of the jth index
Figure FDA0004041816000000034
Step 4.2, constructing a weight optimization model in the expert category;
let the expert Categories h i In which comprises
Figure FDA0004041816000000035
The evaluation index weight of the expert is determined, the objective function Obj of the model satisfies the weight value ^ in the expert category>
Figure FDA0004041816000000036
And w ij The deviation sum of (c) is minimum as follows:
Figure FDA0004041816000000037
Figure FDA0004041816000000038
wherein, t i Represents a category h i The weight occupied by the inner ith expert evaluation result.
8. The method according to claim 1, wherein in the third step, a Bayesian optimization algorithm BOA is selected to calibrate XGboost and LightGBM model parameters; selecting a wolf optimization algorithm GWOO to automatically optimize the initial weight, the threshold value and the number of neurons in a hidden layer of the LSTM; finally, a fusion model BXGB-BLGB-GLSTM is obtained and used as a rice hazard risk assessment model.
9. The method according to claim 1 or 8, wherein in the third step, the rice hazard risk assessment model is trained, comprising:
(1) Dividing collected hazard detection data into a training set and a testing set according to the proportion of 3; to pair
(2) Dividing the training set into K sub-training sets with equal sizes, performing K-fold cross validation on the training set, and completing K times of training on the rice hazard risk assessment model.
CN202210306564.2A 2022-03-25 2022-03-25 Rice safety risk assessment method based on multi-machine learning algorithm fusion Active CN114764682B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210306564.2A CN114764682B (en) 2022-03-25 2022-03-25 Rice safety risk assessment method based on multi-machine learning algorithm fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210306564.2A CN114764682B (en) 2022-03-25 2022-03-25 Rice safety risk assessment method based on multi-machine learning algorithm fusion

Publications (2)

Publication Number Publication Date
CN114764682A CN114764682A (en) 2022-07-19
CN114764682B true CN114764682B (en) 2023-04-07

Family

ID=82364952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210306564.2A Active CN114764682B (en) 2022-03-25 2022-03-25 Rice safety risk assessment method based on multi-machine learning algorithm fusion

Country Status (1)

Country Link
CN (1) CN114764682B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115758888B (en) * 2022-11-17 2024-04-23 厦门智康力奇数字科技有限公司 Agricultural product security risk assessment method based on multi-machine learning algorithm fusion
CN116739617A (en) * 2023-06-08 2023-09-12 中国标准化研究院 Food related product risk management system and method based on data analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014092230A1 (en) * 2012-12-13 2014-06-19 대한민국 (식품의약품안전청장) System and method for inspecting imported food-based harm prediction
CN111461576A (en) * 2020-04-27 2020-07-28 宁波市食品检验检测研究院 Fuzzy comprehensive evaluation method for safety risk of chemical hazards in food
CN111582718A (en) * 2020-05-08 2020-08-25 国网安徽省电力有限公司电力科学研究院 Cable channel fire risk assessment method and device based on network analytic hierarchy process

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014092230A1 (en) * 2012-12-13 2014-06-19 대한민국 (식품의약품안전청장) System and method for inspecting imported food-based harm prediction
CN111461576A (en) * 2020-04-27 2020-07-28 宁波市食品检验检测研究院 Fuzzy comprehensive evaluation method for safety risk of chemical hazards in food
CN111582718A (en) * 2020-05-08 2020-08-25 国网安徽省电力有限公司电力科学研究院 Cable channel fire risk assessment method and device based on network analytic hierarchy process

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程加迁等.蔬菜水果重金属膳食暴露评估中风险权重的确定方法.《食品科学》.2018,全文. *

Also Published As

Publication number Publication date
CN114764682A (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN114764682B (en) Rice safety risk assessment method based on multi-machine learning algorithm fusion
CN107918921A (en) Criminal case court verdict measure and system
CN111461576A (en) Fuzzy comprehensive evaluation method for safety risk of chemical hazards in food
CN113191926B (en) Method and system for identifying grain and oil crop supply chain hazard based on deep integrated learning network
CN107704883A (en) A kind of sorting technique and system of the grade of magnesite ore
CN115602337A (en) Cryptocaryon irritans disease early warning method and system based on machine learning
CN111476274B (en) Big data predictive analysis method, system, device and storage medium
Tembusai et al. K-nearest neighbor with K-fold cross validation and analytic hierarchy process on data classification
Wang et al. Mushroom toxicity recognition based on multigrained cascade forest
Bąk et al. Fuzzy cognitive maps and their application in the economic sciences
CN116502887A (en) Rice processing chain risk evaluation method based on unsupervised clustering and extreme learning machine
CN113837266B (en) Software defect prediction method based on feature extraction and Stacking ensemble learning
CN112766739B (en) Method for evaluating heavy metal pollution in meat product based on BWM-E model
CN113205274A (en) Quantitative ranking method for construction quality
Gallo et al. A neural network model for classifying olive farms
CN110659996A (en) Stock investment risk early warning system and method based on machine learning
Rianasari et al. The classification of mushroom types using Naïve Bayes and principal component analysis
CN111062118A (en) Multilayer soft measurement modeling system and method based on neural network prediction layering
Liu Deconstruction and Implementation of Strategic Human Resource Management Evaluation Algorithm Using Data Mining Technology
CN114398493B (en) Unmanned aerial vehicle type spectrum construction method based on fuzzy clustering and cost-effectiveness value
CN115310999B (en) Enterprise electricity behavior analysis method and system based on multi-layer perceptron and sequencing network
CN115423148B (en) Agricultural machinery operation performance prediction method and device based on Ke Li jin method and decision tree
CN112308319B (en) Prediction method and device for civil aviation member passenger loss
CN112365168A (en) Method for evaluating ambient air quality based on principal component analysis
CN113111961B (en) Agricultural product information classification processing method and system based on three decision models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant