CN113159546B - Crop supply chain hazard risk judging method and system based on unsupervised dimension reduction density clustering - Google Patents
Crop supply chain hazard risk judging method and system based on unsupervised dimension reduction density clustering Download PDFInfo
- Publication number
- CN113159546B CN113159546B CN202110386361.4A CN202110386361A CN113159546B CN 113159546 B CN113159546 B CN 113159546B CN 202110386361 A CN202110386361 A CN 202110386361A CN 113159546 B CN113159546 B CN 113159546B
- Authority
- CN
- China
- Prior art keywords
- data
- risk
- sample set
- data sample
- density
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000009467 reduction Effects 0.000 title claims abstract description 17
- 239000013598 vector Substances 0.000 claims abstract description 82
- 238000011156 evaluation Methods 0.000 claims abstract description 33
- 238000010606 normalization Methods 0.000 claims abstract description 15
- 238000004364 calculation method Methods 0.000 claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 238000012216 screening Methods 0.000 claims abstract description 10
- 230000002093 peripheral effect Effects 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 30
- 238000009826 distribution Methods 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000011022 operating instruction Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 2
- 235000013305 food Nutrition 0.000 abstract description 27
- 238000010586 diagram Methods 0.000 description 9
- 238000004519 manufacturing process Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 8
- 238000007689 inspection Methods 0.000 description 6
- 239000000126 substance Substances 0.000 description 5
- 231100000678 Mycotoxin Toxicity 0.000 description 4
- 229910001385 heavy metal Inorganic materials 0.000 description 4
- 239000002636 mycotoxin Substances 0.000 description 4
- 238000000513 principal component analysis Methods 0.000 description 4
- 241000607479 Yersinia pestis Species 0.000 description 3
- 229910052782 aluminium Inorganic materials 0.000 description 3
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 3
- 229910052793 cadmium Inorganic materials 0.000 description 3
- BDOSMKKIYDKNTQ-UHFFFAOYSA-N cadmium atom Chemical compound [Cd] BDOSMKKIYDKNTQ-UHFFFAOYSA-N 0.000 description 3
- 238000013210 evaluation model Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 244000005700 microbiome Species 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 238000012847 principal component analysis method Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 239000000654 additive Substances 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000002778 food additive Substances 0.000 description 2
- 235000013373 food additive Nutrition 0.000 description 2
- 235000015219 food category Nutrition 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 239000000447 pesticide residue Substances 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000001988 toxicity Effects 0.000 description 2
- 231100000419 toxicity Toxicity 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 208000019331 Foodborne disease Diseases 0.000 description 1
- 238000012351 Integrated analysis Methods 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000012271 agricultural production Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 229910052804 chromium Inorganic materials 0.000 description 1
- 239000011651 chromium Substances 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000001808 coupling effect Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 235000021393 food security Nutrition 0.000 description 1
- 239000000383 hazardous chemical Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 238000012953 risk communication Methods 0.000 description 1
- 238000012954 risk control Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- MBMQEIFVQACCCH-UHFFFAOYSA-N trans-Zearalenon Natural products O=C1OC(C)CCCC(=O)CCCC=CC2=CC(O)=CC(O)=C21 MBMQEIFVQACCCH-UHFFFAOYSA-N 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- MBMQEIFVQACCCH-QBODLPLBSA-N zearalenone Chemical compound O=C1O[C@@H](C)CCCC(=O)CCC\C=C\C2=CC(O)=CC(O)=C21 MBMQEIFVQACCCH-QBODLPLBSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/02—Agriculture; Fishing; Forestry; Mining
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Tourism & Hospitality (AREA)
- Artificial Intelligence (AREA)
- Marketing (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Agronomy & Crop Science (AREA)
- Mining & Mineral Resources (AREA)
- Marine Sciences & Fisheries (AREA)
- Animal Husbandry (AREA)
- Primary Health Care (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
Abstract
The invention provides a crop supply chain hazard risk judging method of unsupervised dimension-reduction density clustering, which comprises the following steps: acquiring a data sample set, wherein the data sample set comprises risk indexes; vector coding and standard normalization preprocessing are carried out on risk indexes in the data sample set to obtain a data sample set containing standardized data vectors; performing feature dimension reduction on the data sample set containing the standardized data vector to obtain a data sample set containing a high-dimension feature vector; performing clustering calculation screening on the data sample set containing the high-dimensional feature vector to obtain a risk level clustering center; and carrying out neighborhood search on the peripheral data of the risk level clustering center to obtain a risk judgment result. The invention can carry out grading evaluation on the risk conditions of different crops and foods.
Description
Technical Field
The application relates to the field of food safety, in particular to a crop supply chain hazard risk judging method and system based on unsupervised dimension-reduction density clustering.
Background
Crops are indispensable products in daily life of people, and include food crops and cash crops, which influence national main security and stable economic and social development. However, in recent years, the quality safety problem of crops is more happening, and in order to reduce the risk threat of food sources, the risk grading evaluation is becoming a powerful guarantee for strengthening the food safety system of grains. The basic content of the method is to technically evaluate the risk possibly generated by the dangerous substances and the potential hazards based on a scientific level, namely, the pollution level of the food-borne dangerous substances is classified by combining various factors such as food characteristics, food pollution level, diet exposure and the like, and the risk level is quantified and the risk priority is identified in a plurality of complex food safety problems. The World Health Organization (WHO) and the United nations grain and agricultural organization (FAO) point out that food security risk classification evaluation is a structured decision-making process and is closely related to risk management, risk communication, risk prevention and control and the like, so that risk evaluators can be helped to accurately grasp the risk probability differences of different dangerous matters, guide risk management to clearly give priority to important supervision objects, and reasonably allocate resource decision corresponding management measures. The current risk classification evaluation method mainly comprises two aspects:
1) The index system method comprises the following steps: and extracting significant factors and potential factors from data such as food sampling inspection, investigation statistics and the like, and constructing a risk classification evaluation index system to carry out risk quantification classification. The method needs to integrate the characteristics of outbreak epidemiological data of chemical hazardous substances and food-borne diseases, evaluates food safety supervision spot inspection data of corresponding areas, builds a food risk classification index system on the basis of a qualitative or semi-qualitative risk evaluation mechanism, and analyzes and evaluates the food safety high risk combination needing to pay attention to.
2) Hierarchical model method: taking the diversity of foods, the diversity of the hazards, the toxicity difference of each hazard and the relevance of evaluation indexes into consideration, and taking factors such as growth, diffusion and the like into consideration, carrying out quantitative assignment, weight calculation and rank ordering on the possibility and severity of risk occurrence in a quantitative or semi-quantitative mode. The common models comprise a probability exposure evaluation model, a decision evaluation model, a sQMRA model, a FIRRM model, a iRisk method, a fuzzy comprehensive evaluation method, an integrated analysis method and the like, wherein the methods are used for constructing corresponding hazard risk evaluation models by using various statistical analysis and machine learning algorithms on the basis of comprehensively considering the diversity of foods and hazards and the difference of hazard toxicity, mining the inherent correlation among various indexes, and exploring the risk grades and probability indexes of different types of hazards.
At present, various risk evaluation technologies are applied to food safety evaluation, prevention and supervision, but crop supply is a multi-link process involving planting, production and processing, circulation and storage, sales and consumption and the like, wherein any link has hazard risk factors with different categories and degrees, and each factor is influenced by food diversity, data multisource isomerism, regional distribution difference, time variability and the like. The risk classification evaluation of the crop supply chain dangers involves a plurality of indexes, and the disaster degree and the influence on the social stability are various, so that the traditional risk classification method is difficult to be applied. In particular, the traditional method relies on metering data and statistical data in the risk analysis process, excessive human settings exist in index quantification and weight distribution in the analysis process, real sampling inspection supervision objective data verification is lacked, the action mechanism process of multidimensional heterogeneous food data on hazards in a supply chain link is ignored, and the coupling action mechanism of the hazards in the supply chain link is difficult to excavate in practical application, so that a false conclusion against practical association rules is easy to obtain in mathematical statistics.
Disclosure of Invention
In order to solve one of the technical problems, the invention provides a crop supply chain hazard risk judging method and system for unsupervised dimension-reduction density clustering.
An embodiment of the present invention provides a method for determining risk of crop supply chain hazards in an unsupervised dimension-reduction density cluster, where the method includes:
acquiring a data sample set, wherein the data sample set comprises risk indexes;
vector coding and standard normalization preprocessing are carried out on risk indexes in the data sample set to obtain a data sample set containing standardized data vectors;
Performing feature dimension reduction on the data sample set containing the standardized data vector to obtain a data sample set containing a high-dimension feature vector;
performing clustering calculation screening on the data sample set containing the high-dimensional feature vector to obtain a risk level clustering center;
And carrying out neighborhood search on the peripheral data of the risk level clustering center to obtain a risk judgment result.
A second aspect of embodiments of the present invention provides an unsupervised dimension-reduction density clustered crop supply chain hazard risk determination system, the system comprising a processor configured with processor-executable operating instructions to perform operations comprising:
acquiring a data sample set, wherein the data sample set comprises risk indexes;
vector coding and standard normalization preprocessing are carried out on risk indexes in the data sample set to obtain a data sample set containing standardized data vectors;
Performing feature dimension reduction on the data sample set containing the standardized data vector to obtain a data sample set containing a high-dimension feature vector;
performing clustering calculation screening on the data sample set containing the high-dimensional feature vector to obtain a risk level clustering center;
And carrying out neighborhood search on the peripheral data of the risk level clustering center to obtain a risk judgment result.
The beneficial effects of the invention are as follows: aiming at main dangers in a supply chain, the invention constructs an unsupervised dimension-reduction density clustering hazard risk judging method, which can carry out grading evaluation on risk conditions of different crops and foods. According to the invention, risk evaluation indexes set in advance manually are not needed, internal relations of various dangerous matters in different supply chain link distributions are mined in a purely data driving mode, the risk weights of the dangerous matters factors are calculated in a self-adaptive mode, the risk classification situation of various dangerous matters on various supply chain links of crops is evaluated more scientifically, the subjective preset risk class number and limit value division interference are avoided, a targeted and credible scheme is provided for work such as important determination and supervision priority order arrangement of the supply chain dangerous matters, the management efficiency is improved, and the occurrence of related food safety accidents is reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of a method for determining risk of crop supply chain hazards in an unsupervised dimension-reduction density cluster according to embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of principal component analysis;
FIG. 3 is a schematic diagram of a principal component analysis method according to example 1 of the present invention;
FIG. 4 is a schematic view of the local density according to example 1 of the present invention;
FIG. 5 is a schematic diagram of a comprehensive evaluation parameter screening clustering center according to embodiment 1 of the present invention;
fig. 6 is a schematic diagram of a density peak clustering principle according to embodiment 1 of the present invention, where (a) is two-dimensional data clustering, and (b) is a clustering center partition diagram;
FIG. 7 is a schematic view of the risk level density clustering results described in the examples;
FIG. 8 is a three-dimensional view of the risk level density clustering results described in the examples;
FIG. 9 is a graph showing the statistics of the risk level of a part of the dangerous objects described in the example;
FIG. 10 is a schematic diagram showing the risk level distribution of various types of pests in a crop supply chain as described in the examples;
FIG. 11 is a schematic diagram of risk level distribution of hazards within each supply chain link as described in the examples.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
Example 1
As shown in fig. 1, the embodiment provides a method for determining risk of crop supply chain hazards in an unsupervised dimension-reduction density cluster, which includes:
s101, acquiring a data sample set.
Specifically, the embodiment builds an unsupervised dimension-reduction density clustering hazard risk judging method aiming at main hazards (heavy metals, mycotoxins, pesticide residues, food additives and the like) existing in a supply chain based on food safety spot inspection data. The example analysis is carried out by using various dangerous matters in province and city of national main agricultural production. Information and data published by websites of the national grain bureau, the national quality inspection bureau and the like are collected, classified, sorted and preprocessed according to the types of hazard safety problems such as heavy metals, mycotoxins, microorganisms, food additives, pesticide residues and the like and sources of supply chain links thereof, and the types of hazard and sources of a large amount of grain processed product spot inspection data in a period of time are collected, so that the main production area and great consumption province of Chinese grain foods are covered.
In view of the fact that grain safety is affected by various factors such as policy and regulation, economy and society, and the like, a grain supply chain covers various links from planting, production and processing, storage logistics, to final sales and consumption in markets, supermarkets and other catering places, and potential safety hazards and risk factors can appear in each link. According to the embodiment, through analysis and preprocessing of professional websites and news media information in the literature and food industry, the hazard category of food safety accidents and source data thereof (for example, in 2013-2018), are obtained, are combined into spot check data to complete data expansion, a data set of 21027 samples is constructed, each data sample comprises 30 risk indexes, and each risk factor condition in a supply chain is reflected.
S102, vector coding and standard normalization preprocessing are carried out on risk indexes in the data sample set to obtain the data sample set containing the standardized data vector.
Specifically, in view of the fact that each risk index in the data sample set has category differences such as English, chinese characters, placeholders and the like, and has various structure type data such as numerical value type, logic type, character type, floating point type and the like. If these risk indicators are directly input into a cloud platform server or computer for processing, they will directly result in failure to store and analyze. In addition, each risk index value does not accord with the approximate normal distribution, and poor grading results can be caused by directly inputting the risk judgment model of the dangerous matters. Therefore, it is necessary to encode risk indicators of different categories and structures first to convert semi-structured, structured data into structured numeric.
In this embodiment, independent thermal encoding (One Hot Encoder) and Embedding encoding are used to encode the data sample sets so that the high-dimensional distances between different risk indicators are approximately the same. Firstly, performing 0/1 binarization processing on different types of characteristic values of each column of risk indexes in a data sample set by adopting independent thermal coding. Taking the production province index as an example, 5 types of characteristics of Henan, shandong, heilongjiang, jiangsu and Anhui are selected as the basis, and after vector coding operation, the method becomes Beijing 10000, hunan 01000, hebei 00100, hubei 00010 and Guangxi 00001, unstructured regional index codes are structured digital types, and the problem of digital size does not exist. However, the production provinces in the food data comprise a plurality of areas, a large amount of redundant sparse matrixes can be generated along with the rapid increase of dimensions, the calculation amount of a computer is increased, the computer does not perform useless operations, and the performance of the hazard risk judging process is not improved. Therefore, the embodiment continues to use Embedding codes to perform matrix weight embedding, and projects the ith characteristic value of a certain index in the sample to a high-dimensional space to perform coding, where the following formula is shown:
Where δ is a kronecker function, α represents all eigenvalues within the risk indicator, the function output is 1 when α=x i, and 0 when not equal. If the number of possible eigenvalues of index x i is m, δ is the eigenvector of length m. And ω β is the inter-layer weight matrix connecting the one-hot coding and Embedding coding, β is the embedded corresponding index. Taking a certain characteristic value vector 'Beijing 10000' in the provincial index as an example, the vector is embedded into a low-dimensional matrix space after vector coding processing Through linear projection mapping of an omega weight matrix with n×m dimensions, let n=3 and m=5, the embedded vector features are shown as follows:
After Embedding codes are used for embedding to represent all types of characteristics, the inputs of all continuous variables are mapped and connected, so that compared with single-hot codes, the characteristic dimension is reduced, a large amount of computing resources and memory use are effectively avoided, and subsequent data processing and judgment model training are facilitated.
Through the data sample set, the meaning and the numerical range of each risk index of different samples are different, the characteristic values of the same risk index are far different, and the model is correctly trained due to abnormal size data and discrete distribution rules. In order to make different attribute indexes have comparability, the model better understands the meaning of data, and meanwhile eliminates the influence of attribute dimension, the embodiment performs linear normalization and standardization processing on the numerical value attribute obtained after vector encoding. And (3) linearly changing the numerical attribute to map the result to the range of [0,1] so as to realize the equal-ratio scaling of the numerical attribute. The normalization process is shown in the following formula:
Wherein the method comprises the steps of For the index feature value after vector coding, X norm is the normalized result, and X max、Xmin is the data maximum value and minimum value respectively. And providing a basic data set for the subsequent risk scoring and early warning model by the normalized data. And then, carrying out normal distribution standardization processing on the characteristic results of each risk index, and aggregating each risk index to the distribution condition of approximate normal distribution with the mean value of 0 and the variance of 1. The normalization formula is as follows:
And (6) mu is the average standard value of the training set, s is the standard mean square error of the training set, y is the standard normalized data result, and finally, the data with far difference of all risk indexes are limited to the same range.
And S103, performing feature dimension reduction on the data sample set containing the standardized data vector to obtain the data sample set containing the high-dimension feature vector.
In particular, considering that the data sample set presents characteristics such as nonlinearity, multi-source heterogeneous distribution, too many dimensions and data structures are sometimes too late, and worse performance is generated in practical analysis application. The traditional simple linear principal component analysis (PRINCIPAL COMPONENT ANALYSIS, PCA) method transforms the related variables in the original vector space into a new coordinate in a mapping mode to form new independent vectors, and can greatly reserve the information of the original variables. As shown in fig. 2, the PCA reduces the feature of the n-dimensional data set to p principal component vectors of new composition, and if the maximum variance among the principal component vectors is obtained, it means that the amount of information contained is also maximum. Let the input data set x= (X 1,X2...,Xn)T, note E (X) =u, cov (X) = Σ, and the following is obtained by linear transformation:
Wherein a 1,a2,...,ap are p-dimensional unit vectors orthogonal to each other, namely vector characteristic results of the new transformed coordinate system. And solving to maximize the Pc variance, and sequencing the eigenvalues to obtain a principal component containing enough density information, thereby realizing the dimension reduction and parameter compression of the data. PCA, however, can only process linear data and does not work well in the face of non-linear food data. Therefore, the embodiment improves the PCA algorithm, provides a membership optimization kernel principal component analysis method (KERNEL PRINCIPAL Component Analysis, KPCA), maps the linear inseparable input vector into a high-dimensional nonlinear feature space, quantitatively analyzes the membership condition of each principal component by adopting a kernel function, realizes the unsupervised dimensionality reduction processing of the data in the data sample set through the effective dimensionality reduction processing of the nonlinear data, and can also avoid the dimensionality disaster problem. The schematic diagram is shown in fig. 3. Based on the kernel function inner product transformation principle, x i and x j are set as data sample points of an input space, linear inseparable standardized data vectors in a data sample set containing the standardized data vectors are projected into a high-dimensional feature space through a nonlinear mapping kernel function phi, and the kernel function is expressed in a Gaussian form, wherein the expression is as follows:
(xi,xj)→K(xi,xj)=Φ(xi)·Φ(xj)
Where x i,xj is the sample point of the input space and σ is the gaussian kernel bandwidth. N samples x k (k=1, 2,., n) of the input space, given that each index is subjected to coding and normalization preprocessing, are known The principal component covariance matrix C is obtained as follows:
Introducing a kernel function to transform an input original sample point into a sample point in a high-dimensional feature space, and obtaining a feature space covariance matrix as shown in the following formula:
The eigenvalue λ of the solution eigenvalue and the corresponding eigenvector v are shown as follows:
λ(Φ(xk)·ν)=Φ(xk)·Cν
since the feature vector can be linearly represented by the dataset, v can be linearly represented by Φ (x i):
Wherein s i represents a membership value (0<s i.ltoreq.1), representing the importance of each sample to the principal component partitioning hyperplane.
In this embodiment, the membership value is used to calculate the class center of each data sample from the positive class and the negative class, and the euclidean distance between the sample point and the positive class and the negative class is compared to determine the membership condition. The membership function is expressed as:
When the distance from the positive class sample to the center of the positive class is smaller than that from the center of the negative class, the membership degree is 1, and when the distance from the center of the positive class to the center of the negative class is larger than that from the center of the positive class, the membership degree is regarded as a noise point, and the membership degree value is calculated according to a distance function, so that different data are more reasonably divided into corresponding feature vectors.
According to the method for analyzing the principal components of the kernel with optimized membership, a data sample set containing standardized data vectors is projected into a nonlinear separable high-dimensional space, principal component classification representing key risk factors in each index is extracted, unsupervised dimension reduction and data quantity and calculation quantity compression are realized, and a foundation is provided for adaptively mining risk judgment grades of supply chain dangers.
And S104, carrying out clustering calculation screening on the data sample set containing the high-dimensional feature vector to obtain a risk level clustering center, and carrying out neighborhood search on surrounding data of the risk level clustering center to obtain a risk judgment result.
Specifically, on the basis of performing dimension reduction processing on different characteristic values of each index and extracting main component components of key risk factors, an unsupervised clustering analysis is performed on risk distribution and grades of various dangerous matters by using a data-driven density peak clustering method, and a risk grade clustering center is automatically screened according to three basic principles: (1) The local density of the risk clustering centers in the feature space is larger, and the periphery of any one risk clustering center is surrounded by other samples with smaller density and smaller distance; (2) The risk clustering centers have larger distances from other samples with higher densities, namely the distances between the clustering centers with different risk levels are longer; (3) The samples where the outlier noise is present have a smaller local density and are farther apart than the other samples. The specific implementation process is as follows:
s1041, estimating the local density in a nonlinear way.
Describing the risk distribution situation of the dangerous substances based on a density peak clustering method, setting the local density value of the risk level as rho i, the high-density distance value as delta i, and the expression is as follows:
Wherein D ij represents the feature distance between two samples, and D c represents the discrimination value for dividing the cluster to which the two samples belong, abbreviated as the truncated radius. The local density ρ i indicates that the distance between the jth sample and the ith sample is smaller than the total number of samples of D c and is therefore directly affected by the truncation distance D c, with the following distribution function relationship ρ i=f(Dij,Dc therebetween. As can be seen from data statistics, the local densities of all samples in the high-dimensional feature space conform to a specific Gaussian distribution, so that the local density estimation expression of the dimension-reduction feature vector v i is obtained by estimating a distribution function by adopting a non-parametric method in the embodiment:
Where w represents a dimensional constant and f represents a non-parametric kernel function. The constraint kernel function expression meets the origin symmetry constraint, and the total value of the uncertain integral of the kernel function is 1, so that a Gaussian estimated kernel function is obtained:
carrying out Gaussian kernel function to obtain a Gaussian kernel density estimation form of local density:
From the above equation, the local density of a certain feature vector is directly affected by the cut-off distance. When the value of D c is smaller, the closer the feature distance between the two feature vectors v a and v b is, the smaller the value of the intermediate variable z is obtained and gradually goes to zero, so that the closer the overall Gaussian distribution curve is to the intermediate symmetry real axis, the larger the obtained kernel function value is, the higher the local density is, and only the samples which are particularly close to each other have larger probability density and influence capability. As D c increases, the distribution of the local density function tends to be flattened, so that the local density can be affected by samples at different distances, and the contribution degree of samples at different distances to the local density is scattered too far from the kernel estimator, so that the actual situation is not met. Therefore, the ratio of 2% is selected in this embodiment to select the cut-off distance, that is, the largest distance among the smallest 2% distances is D c. The strategy of determining the cut-off distance by means of a scaling factor reduces the dependency of the parameter on the specific problem to some extent. The local density estimation process is shown in fig. 4, and it can be seen that the local density of the rightmost point is maximum, which has a value of 4; the local density value of the point pointed by the arrow with the rightmost point is the smallest, and the value is 1; the other points tend to be intermediate, with values of 2,3, respectively.
S1042, calculating the minimum value of the high-density distance.
After determining the optimal solution of the cut-off distance, for example, k samples with high local density of a certain sample v i are selected, and according to the local density sorting of each sample, the samples with higher local density are required to be selected as risk level clustering centers. For sample v j with higher local density values, the minimum distance between it and the v i sample is calculated as follows:
δi=Wmin
wherein δ i is the minimum value of the high-density distance. Each sample feature vector may be redefined with a local density value ρ i and a high density distance minimum δ i, i.e., v i(ρi,δi). And each feature vector which can be used as a cluster center is required to satisfy the following expression:
vc=vi(ρi>ρΔ,δi>δΔ)
wherein ρ Δ,δΔ represents the division threshold of the two parameters, respectively. If delta i of some samples is relatively large and its density is less than the threshold, then this indicates that the sample has outlier noise, which needs to be removed, and the noise discrimination is as follows:
vnoise=vj(ρi<ρΔ,δi>δΔ)
S1043, comprehensive evaluation parameter optimization clustering center screening
In a two-dimensional plane constructed by two parameters of a local density value and a high density distance minimum value, the traditional density peak clustering method can be used for dividing different clustering centers by empirically setting a local density threshold ρ Δ and a distance threshold delta Δ under the condition of manual observation. However, in the application of evaluating the risk level of the hazard in the actual crop supply chain, excessive human interference brings about evaluation subjectivity, does not have fault tolerance capability, and easily causes the change of the number of the risk level clustering centers and the neighborhood range when the distribution of data samples is different or different observer classification standards are different. Therefore, the embodiment proposes that the comprehensive evaluation parameters optimize the density peak clustering method, and the expression is as follows:
γi=δi·ρi
It can be seen that when gamma i is larger, both delta i and ρ i may be larger, and the probability that this sample point is the cluster center is larger. However, for some noise points ρ i, though small, δ i is large, ultimately resulting in γ i being large. Thus, the normalization of δ i and ρ i is used as follows:
Where u δ,uρ is the variance of delta i and p i respectively, For the average of δ i and ρ i, i represents the number of samples, after normalizing the two parameters, the two parameters are distributed together, and the new comprehensive evaluation function is shown as follows:
γi'=δi'·ρi'
The comprehensive evaluation function value of each sample is calculated, and the following formulas are arranged in ascending order:
γ1'<…<γi'<…<γk’
After the cluster center is determined according to the comprehensive evaluation function value of each sample, drawing a gamma i value plane distribution diagram according to the ascending label serial number of the sample which is not the cluster center, as shown in fig. 5. It is observed that gamma i of the non-cluster center is smoother, and when the transition from the non-cluster center to the cluster center occurs, a clear jump process occurs, and the value can be detected through the value. And gamma i presents a straight line in the non-clustered central region, while the distribution in the clustered central region satisfies the curtain law. The search procedure from the super voxel feature vectors F 1 to F K of the present embodiment is thus as follows: x=x 1→...→xk, i.e. from X 1 to X k, after progressive approximation, until a jump occurs at X i, so a jump threshold is set, and when X k-xk-1||≥ε1 is i, all data after X k is considered as a cluster center point, so the data is divided into different clusters.
When c samples are cluster centers, other samples in the data set need to be divided into corresponding class clusters according to the local density to which the samples belong. Assuming that the comprehensive evaluation parameters v i all meet the threshold dividing condition and are the center point of a certain risk level cluster, samples nearby the threshold dividing condition are attracted by local density to form the domain division of the risk level cluster, and the specific expression is as follows:
Vk:{v1,...,vk}→{peak1,...,peakk}
Ck:near(v1,...,vk-m)→peakk
Wherein k risk level clustering center sets { v 1,...,vk } exist, and k class cluster center sets { peak 1,...,peakk } are formed independently after analysis by the optimized density peak clustering method, and correspond to k risk level class groups. The rest is to divide other samples around the cluster center and distribute the samples to the corresponding risk level clusters to realize the unsupervised risk level evaluation of the supply chain hazards. The whole risk level clustering center dividing process is as shown in fig. 5, 15 different samples are distributed in a two-dimensional plane constructed by two parameters of a local density value and a high density distance minimum value, an original data space is provided with two risk level clusters A, B, the delta i of each sample is calculated by calculating the rho i of each sample point and then sorting, the sample point 6 and the sample point 11 can be seen to have larger delta i and rho i at the same time through fig. 6 (b), the cluster centers are respectively cluster A, B, the cluster centers are respectively larger delta i and rho i, and the sample points are peak points with higher density than other samples. Although the sample points 14, 15 have a larger minimum high density distance δ i, they are far from the cluster A, B, defined as noise points, which need to be culled.
Example 2
Corresponding to embodiment 1, this embodiment proposes an unsupervised dimension-reduction density clustering crop supply chain hazard risk determination system, the system comprising a processor configured with processor-executable operating instructions to perform the following operations:
acquiring a data sample set, wherein the data sample set comprises risk indexes;
vector coding and standard normalization preprocessing are carried out on risk indexes in the data sample set to obtain a data sample set containing standardized data vectors;
Performing feature dimension reduction on the data sample set containing the standardized data vector to obtain a data sample set containing a high-dimension feature vector;
performing clustering calculation screening on the data sample set containing the high-dimensional feature vector to obtain a risk level clustering center;
And carrying out neighborhood search on the peripheral data of the risk level clustering center to obtain a risk judgment result.
The specific working principle and calculation process of the system can be referred to the content described in embodiment 1, and will not be described herein.
On the basis of the pretreatment of the coding and standardized data, the method uses the Gaussian kernel main formation method to carry out dimension reduction treatment on the original data, extracts key risk factors in the multi-source heterogeneous indexes, and reduces the complexity and the calculation difficulty of the original input data; and then an optimized density peak clustering method is used for carrying out unsupervised clustering analysis, a comprehensive evaluation function is designed to select an optimized clustering center point screening and neighborhood searching process, the interference influence of artificial subjective factors is reduced, the automatic evaluation and high-low distribution ordering of the risk levels of various dangerous matters in a crop supply chain are realized in an unsupervised data driving mode, the problems that the traditional evaluation method is low in accuracy, long in running time, large in labor cost and the like are effectively solved, the risk evaluation accuracy and reliability of government regulatory authorities, enterprise merchants and consumers on the crop supply chain are improved, and the crop quality safety and the production of the supply chain are powerfully ensured. The practical effects of the present application will be described in further detail with reference to examples.
First, data used in experiments using a grain dataset comprising 30 risk indicators, containing 21027 sample numbers, will be described. The purpose of this example is how to achieve an automatic assessment of the risk level of supply chain hazards without human tagging, the risk indicators in the supply chain being shown in table 1.
TABLE 1
It can be seen from table 1 that there are indexes that can be quantized, such as production and consumption, and indexes that cannot be quantized, such as food category and links, among the risk indexes, so that the data are digitized using independent thermal encoding and Embedding encoding. In order to make different attribute indexes have comparability, the model better understand the meaning of data, and eliminate the influence of attribute dimension, linear normalization and normalization are performed on the numerical attribute obtained by preprocessing, and the processed partial data are shown in table 2.
TABLE 2
As can be seen from table 2, the risk indicators of these non-digital quantities of food category, links, etc. in the data after independent thermal encoding and Embedding encoding and normalization have been changed to digital representations, and all the data has been shrunk to the interval 0-1.
The multidimensional heterogeneous data containing 30 risk indexes and 21027 sample numbers are subjected to dimension reduction treatment by using a KPCA method, and after KPCA extraction, the total variance data of the risk index components are shown in table 3, and the risk index component matrix is shown in table 4.
TABLE 3 Table 3
As can be seen from Table 3, the feature values larger than 1 in the initial feature column have only the first 6 components, and the information of the first 6 components accounts for 93.325% of the total information, so that the information of the original variables is basically reserved, and the first 6 components are extracted as main components. The weight of the principal component can be calculated from the weight calculation method of the principal component, which is shown as follows:
Wherein ω i represents the weight of the i-th principal component, Representing the variance of the ith component, i=1, 2, …,6,Representing the cumulative variance of the selected components. The final calculated weights are shown below.
Assuming that the score of the i-th principal component extracted is F i, a score model of each principal component can be calculated from table 4, and the model is as follows:
Fi=m1x1+m2x2+…+m30x30
wherein m 1,m2,…,m30 represents the component value of the corresponding risk indicator.
TABLE 4 Table 4
And then, establishing a comprehensive evaluation score mathematical model according to the weight of each main component, wherein the formula is as follows:
where Q represents the composite score, ω k is the weight of the principal component, and n is the total number of principal components extracted.
According to the risk level list analysis of certain dangers, the risk levels of the dangers in different regional links obtained according to the comprehensive evaluation scores are shown below, and are divided into 8 levels of safety level I, safety level II, early warning level III, low risk level IV, medium risk level V, high risk level VI, high risk level VII and ultra-high risk level VIII by comparing the total scores of all the dangers. As shown in table 5:
TABLE 5
According to the risk level of the dangerous substances, the risk levels of mycotoxins, heavy metals, microorganisms and illegal additives in the grain data in rural areas and cities are counted, as shown in table 6.
It can be seen that the risk level of zearalenone, chromium in a city is lower than in rural areas, and the risk of the rest of the pests is higher than the risk level or leveling of the pests in rural areas, because the city's grain supply chain is relatively more complex. Overall, the risk level of illegal additives is highest, the risk level of microorganisms is lowest, and the risk level of mycotoxins and heavy metals is in the middle.
TABLE 6
And extracting a risk principal component according to a kernel principal component analysis method to perform characteristic dimension reduction characterization, and adopting self-adaptive mining of supply chain hazard risk evaluation from data based on unsupervised density peak clustering. The result is shown in FIG. 7 after performing unsupervised clustering operation by using scikit-learn machine learning library based on Python development environment.
As can be seen by looking at the three-dimensional scatter plot of fig. 8, the categories are well differentiated. The automatic risk level classification of the unsupervised clustering is realized based on the unsupervised density peak clustering, and the automatic risk level classification is respectively a safety level I, a safer level II, an early warning level III, a lower risk level IV, a medium risk level V, a higher risk level VI, a high risk level VII and an ultrahigh risk level VIII, wherein the total number of the risk levels is 8, the risk level of the safety level I is the lowest, the risk level of the ultrahigh risk level VIII is the highest, and the risk level classification is shown in a table 7.
TABLE 7
The safety level I can be 7200, the safety level II is 7025, the early warning level III is 2917, the low risk level IV is 1884, the medium risk level V is 764, the high risk level VI is 482, the high risk level VII is 450, and the ultra-high risk level VIII is 305. The different classes are obviously separated on the whole, two classes have cross overlapping parts, the clustering effect is good, and the automatic division of risk levels is realized by an unsupervised clustering mode. The proportion of dangerous grade proportion is not large in 21027 samples, which indicates that the grain safety risk of China is optimistic from the whole aspect in recent years, and the proportion is mainly related to the reason that the law and regulation related to China are more and more perfect in recent years and the food safety monitoring system is continuous and sound. However, the early warning level still exists, so that the establishment of sound related regulations is required to be continued, and the important supervision of foods in some high-risk areas is carried out, so that the safety of a grain supply chain is ensured.
The present example adopts the contour coefficient to evaluate the quality of the clustering result, and a point p is given, and the contour coefficient of the point is defined as follows:
Where a (p) is the distance between the point p and the other points p in the same cluster and b (p) is the minimum average distance between the point p and a different cluster. a (p) reflects the degree of compactness of data in the cluster to which p belongs, and b (p) reflects the degree of separation of the cluster from other neighboring clusters. Obviously, the larger b (p), the smaller a (p), and the better the corresponding clustering quality, the contour coefficient s (p) is adopted for averaging to measure the quality of the final clustering result. The range of the contour coefficient is [ -1,1], and the more the similar samples are separated from each other and the samples of different types are separated from each other, the higher the score is. The contour coefficient of the cluster of the example is calculated to be 0.56, which indicates that the cluster quality is good.
FIG. 9 shows the ratio of each hazard in the sample of the statistical part, and the residual quantity of aluminum and the content of cadmium are detected in most samples in the statistical sample, which are important subjects for hazard risk detection; fig. 10 shows the hazard class of the sample of the part counted by the clustering method of this example, and it can be seen that the ratio of the aluminum to cadmium content belonging to the safety class (II) is very low, indicating that the aluminum to cadmium content should be detected with emphasis in the supply chain.
In addition, by counting the risk level of each dangerous object, the distribution of risk levels in each link of the supply chain can be analyzed, as shown in fig. 11. It can be seen that the risk level distribution of the hazard varies in different supply chain links. Overall, the safety level I has a higher proportion in each link, while the risk level VIII has a higher proportion in the production link, which means that some hazard risks are easier to generate in the production stage, and the sample with high risk rarely appears in the consumption link, which means that the risk control in the supply chain consumption link is better. According to the result, the links which are easy to have high risk can be controlled in a focused way.
In conclusion, according to the experimental results of the examples, the method for clustering the unsupervised density peaks can realize the self-adaptive classification of the risk of the hazard in the crop supply chain according to various indexes, and implement crop safety early warning.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (8)
1. An unsupervised dimension-reduction density clustering crop supply chain hazard risk judging method, which is characterized by comprising the following steps:
acquiring a data sample set, wherein the data sample set comprises risk indexes;
vector coding and standard normalization preprocessing are carried out on risk indexes in the data sample set to obtain a data sample set containing standardized data vectors;
Performing feature dimension reduction on the data sample set containing the standardized data vector to obtain a data sample set containing a high-dimension feature vector;
performing clustering calculation screening on the data sample set containing the high-dimensional feature vector to obtain a risk level clustering center;
carrying out neighborhood search on the peripheral data of the risk level clustering center to obtain a risk judgment result;
The process of performing feature dimension reduction on the data sample set containing the standardized data vector to obtain the data sample set containing the high-dimension feature vector comprises the following steps:
Based on the kernel function inner product transformation principle, x i and x j are set as data sample points of an input space, linear inseparable standardized data vectors in a data sample set containing the standardized data vectors are projected into a high-dimensional feature space through a nonlinear mapping kernel function phi, and the kernel function is expressed in a Gaussian form, wherein the expression is as follows:
(xi,xj)→K(xi,xj)=Φ(xi)·Φ(xj)
where x i,xj is the sample point of the input space, σ is the gaussian kernel bandwidth, n samples of the input space x k (k=1, 2,., n), given that each index is subjected to coding and standardized preprocessing, it is known that The principal component covariance matrix C is obtained as follows:
Introducing a kernel function to transform an input original sample point into a sample point in a high-dimensional feature space, and obtaining a feature space covariance matrix as shown in the following formula:
the eigenvalue λ of the solution eigenvalue and the corresponding eigenvector v are as follows:
λ(Φ(xk)·v)=Φ(xk)·Cν
since the feature vector can be linearly represented by the dataset, v can be linearly represented by Φ (x i):
Wherein s i represents a membership value (0<s i.ltoreq.1), represents the importance of each sample to the principal component partitioning hyperplane,
Calculating class centers of each data sample from the positive class and the negative class through the membership value, judging membership conditions of the data samples by comparing Euclidean distances between sample points and the positive class and the negative class, wherein membership functions are expressed as follows:
When the distance from the positive class sample to the center of the positive class is smaller than that from the center of the negative class, the membership degree is 1, and when the distance from the center of the positive class to the center of the negative class is larger than that from the center of the positive class, the membership degree is regarded as a noise point, the membership degree value is calculated according to a distance function, and different data are more reasonably divided into corresponding feature vectors.
2. The method of claim 1, wherein vector encoding and standard normalization preprocessing the risk indicator in the set of data samples to obtain the set of data samples containing the normalized data vector comprises:
performing binarization processing on different types of characteristic values of each column of risk indexes in the data sample set;
the data output after binarization processing is subjected to weight embedding to obtain vector features;
Performing linear transformation on the numerical attribute features in the vector features to obtain normalized data;
And carrying out normal distribution standardization processing on the normalized data, and aggregating the numerical attribute characteristics to an approximate normal distribution state with the mean value of 0 and the variance of 1 to obtain a data sample set containing standardized data vectors.
3. The method according to claim 1, wherein the process of performing cluster computation screening on the data sample set containing the high-dimensional feature vector to obtain a risk level cluster center comprises:
performing distance standardization processing on each component of the high-dimensional feature vector in the data sample set containing the high-dimensional feature vector, and calculating the local density and the higher density distance difference value of each component of the high-dimensional feature vector according to density peak clustering to obtain the local density value and the high density distance minimum value of the high-dimensional feature vector;
and constructing a comprehensive evaluation function according to the local density value and the high-density distance minimum value of the high-dimensional feature vector, and obtaining a risk level clustering center according to the comprehensive evaluation function.
4. The method of claim 3, wherein the performing a neighborhood search on the risk level clustering center peripheral data to obtain a risk judgment result includes:
Performing field searching on the peripheral data of the risk level clustering center, and sorting local density values and high-density distance minimum values of the high-dimensional feature vectors around the risk level clustering center to obtain sorting results;
and removing noise points according to the sorting result to obtain a risk judgment result.
5. An unsupervised dimension-reduction density clustered crop supply chain hazard risk determination system, the system comprising a processor configured with processor-executable operating instructions to perform the method of claim 1.
6. The system of claim 5, wherein the processor is configured with processor-executable operating instructions to perform operations comprising:
performing binarization processing on different types of characteristic values of each column of risk indexes in the data sample set;
the data output after binarization processing is subjected to weight embedding to obtain vector features;
Performing linear transformation on the numerical attribute features in the vector features to obtain normalized data;
And carrying out normal distribution standardization processing on the normalized data, and aggregating the numerical attribute characteristics to an approximate normal distribution state with the mean value of 0 and the variance of 1 to obtain a data sample set containing standardized data vectors.
7. The system of claim 5, wherein the processor is configured with processor-executable operating instructions to perform operations comprising:
performing distance standardization processing on each component of the high-dimensional feature vector in the data sample set containing the high-dimensional feature vector, and calculating the local density and the higher density distance difference value of each component of the high-dimensional feature vector according to density peak clustering to obtain the local density value and the high density distance minimum value of the high-dimensional feature vector;
and constructing a comprehensive evaluation function according to the local density value and the high-density distance minimum value of the high-dimensional feature vector, and obtaining a risk level clustering center according to the comprehensive evaluation function.
8. The system of claim 7, wherein the processor is configured with processor-executable operating instructions to perform operations comprising:
Performing field searching on the peripheral data of the risk level clustering center, and sorting local density values and high-density distance minimum values of the high-dimensional feature vectors around the risk level clustering center to obtain sorting results;
and removing noise points according to the sorting result to obtain a risk judgment result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110386361.4A CN113159546B (en) | 2021-04-12 | 2021-04-12 | Crop supply chain hazard risk judging method and system based on unsupervised dimension reduction density clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110386361.4A CN113159546B (en) | 2021-04-12 | 2021-04-12 | Crop supply chain hazard risk judging method and system based on unsupervised dimension reduction density clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113159546A CN113159546A (en) | 2021-07-23 |
CN113159546B true CN113159546B (en) | 2024-05-14 |
Family
ID=76889797
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110386361.4A Active CN113159546B (en) | 2021-04-12 | 2021-04-12 | Crop supply chain hazard risk judging method and system based on unsupervised dimension reduction density clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113159546B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116187769B (en) * | 2023-05-04 | 2023-07-04 | 四川省安全科学技术研究院 | Urban flood disaster risk studying and judging method based on scene simulation |
CN116230193B (en) * | 2023-05-11 | 2023-07-21 | 聊城市第二人民医院 | Intelligent hospital file management method and system |
CN117787792B (en) * | 2023-12-27 | 2024-06-21 | 江苏科佳软件开发有限公司 | Medical instrument quality safety risk supervision-based method and system |
CN118134260B (en) * | 2024-04-30 | 2024-07-26 | 元尔科技(无锡)有限公司 | Food safety risk assessment method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180043420A (en) * | 2016-10-19 | 2018-04-30 | 한국식품연구원 | Method, Apparatus for Regional Food Safety Factor Computing, And a Computer-readable Storage Medium for executing the Method |
CN108876100A (en) * | 2018-04-28 | 2018-11-23 | 北京化工大学 | Neural network food safety risk prediction model based on ISM and AHP |
CN109409416A (en) * | 2018-09-29 | 2019-03-01 | 上海联影智能医疗科技有限公司 | Feature vector dimension reduction method and medical image recognition method, apparatus and storage medium |
CN109886352A (en) * | 2019-03-04 | 2019-06-14 | 北京航空航天大学 | A kind of unsupervised appraisal procedure of airspace complexity |
CN112381364A (en) * | 2020-10-30 | 2021-02-19 | 浪潮云信息技术股份公司 | Comprehensive evaluation method for food quality spot check |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10839256B2 (en) * | 2017-04-25 | 2020-11-17 | The Johns Hopkins University | Method and apparatus for clustering, analysis and classification of high dimensional data sets |
-
2021
- 2021-04-12 CN CN202110386361.4A patent/CN113159546B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180043420A (en) * | 2016-10-19 | 2018-04-30 | 한국식품연구원 | Method, Apparatus for Regional Food Safety Factor Computing, And a Computer-readable Storage Medium for executing the Method |
CN108876100A (en) * | 2018-04-28 | 2018-11-23 | 北京化工大学 | Neural network food safety risk prediction model based on ISM and AHP |
CN109409416A (en) * | 2018-09-29 | 2019-03-01 | 上海联影智能医疗科技有限公司 | Feature vector dimension reduction method and medical image recognition method, apparatus and storage medium |
CN109886352A (en) * | 2019-03-04 | 2019-06-14 | 北京航空航天大学 | A kind of unsupervised appraisal procedure of airspace complexity |
CN112381364A (en) * | 2020-10-30 | 2021-02-19 | 浪潮云信息技术股份公司 | Comprehensive evaluation method for food quality spot check |
Non-Patent Citations (3)
Title |
---|
"Comprehensive Risk Assessment of Hazards in Grain Supply Chain Based on Multi-Dimensional Data";Wang Xiaoyi et al;Journal of Food Science and Technology;第37卷(第6期);第129-138页 * |
"基于密度指标的大样本数据集聚类方法";王兵等;计算机工程与设计;第37卷(第5期);第1245-1248和1290页 * |
"基于深度置信网络-多类模糊支持向量机的粮食供应链危害物风险预警";王小艺等;食品科学;第41卷(第9期);第17-24页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113159546A (en) | 2021-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113159546B (en) | Crop supply chain hazard risk judging method and system based on unsupervised dimension reduction density clustering | |
CN107918921B (en) | Criminal case judgment result measuring method and system | |
Xu et al. | Machine learning-based wear fault diagnosis for marine diesel engine by fusing multiple data-driven models | |
CN113657545B (en) | User service data processing method, device, equipment and storage medium | |
CN109657947B (en) | Enterprise industry classification-oriented anomaly detection method | |
CN104156403B (en) | A kind of big data normal mode extracting method and system based on cluster | |
CN109657011B (en) | Data mining system for screening terrorist attack event crime groups | |
CN112039903B (en) | Network security situation assessment method based on deep self-coding neural network model | |
Momeni et al. | Clustering stock market companies via k-means algorithm | |
Yan et al. | Intelligent wear mode identification system for marine diesel engines based on multi-level belief rule base methodology | |
CN111612340A (en) | Network commodity inspection sampling method based on big data | |
Zhang | Food safety risk intelligence early warning based on support vector machine | |
CN105306438B (en) | Network security situation evaluating method based on fuzzy coarse central | |
CN115099149A (en) | Result prediction method based on multiple feature comparison and random forest algorithm | |
CN116384736A (en) | Smart city risk perception method and system | |
CN104102730B (en) | Known label-based big data normal mode extracting method and system | |
CN116433333A (en) | Digital commodity transaction risk prevention and control method and device based on machine learning | |
CN118171204A (en) | Method and system for classifying risk levels of potential safety hazards of electric power based on knowledge graph | |
Sun et al. | Incomplete data processing method based on the measurement of missing rate and abnormal degree: Take the loose particle localization data set as an example | |
CN115794803A (en) | Engineering audit problem monitoring method and system based on big data AI technology | |
CN110689140A (en) | Method for intelligently managing rail transit alarm data through big data | |
CN117132383A (en) | Credit data processing method, device, equipment and readable storage medium | |
CN113705920B (en) | Method for generating water data sample set for thermal power plant and terminal equipment | |
CN110928924A (en) | Power system customer satisfaction analyzing and predicting method based on neural network | |
CN110807174A (en) | Effluent analysis and abnormity identification method for sewage plant group based on statistical distribution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |