CN112216399A - Food-borne disease pathogenic factor prediction method and system based on BP neural network - Google Patents
Food-borne disease pathogenic factor prediction method and system based on BP neural network Download PDFInfo
- Publication number
- CN112216399A CN112216399A CN202011076959.5A CN202011076959A CN112216399A CN 112216399 A CN112216399 A CN 112216399A CN 202011076959 A CN202011076959 A CN 202011076959A CN 112216399 A CN112216399 A CN 112216399A
- Authority
- CN
- China
- Prior art keywords
- data
- layer
- food
- neural network
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000019331 Foodborne disease Diseases 0.000 title claims abstract description 80
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 78
- 230000001717 pathogenic effect Effects 0.000 title claims abstract description 70
- 238000000034 method Methods 0.000 title claims abstract description 29
- 210000002569 neuron Anatomy 0.000 claims abstract description 66
- 238000012549 training Methods 0.000 claims abstract description 58
- 238000012360 testing method Methods 0.000 claims abstract description 35
- 230000035945 sensitivity Effects 0.000 claims abstract description 23
- 238000004458 analytical method Methods 0.000 claims abstract description 15
- 238000007781 pre-processing Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 238000012546 transfer Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 11
- 238000004883 computer application Methods 0.000 claims description 9
- 238000013178 mathematical model Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 9
- 231100000331 toxic Toxicity 0.000 claims description 8
- 230000002588 toxic effect Effects 0.000 claims description 8
- 244000052616 bacterial pathogen Species 0.000 claims description 6
- 238000013500 data storage Methods 0.000 claims description 6
- 239000003795 chemical substances by application Substances 0.000 claims description 5
- 239000000126 substance Substances 0.000 claims description 5
- 241001465754 Metazoa Species 0.000 claims description 4
- 231100000678 Mycotoxin Toxicity 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 4
- 239000003344 environmental pollutant Substances 0.000 claims description 4
- 239000002636 mycotoxin Substances 0.000 claims description 4
- 244000045947 parasite Species 0.000 claims description 4
- 231100000719 pollutant Toxicity 0.000 claims description 4
- 206010020772 Hypertension Diseases 0.000 claims description 3
- 241000700605 Viruses Species 0.000 claims description 3
- 238000005094 computer simulation Methods 0.000 claims description 3
- 230000006806 disease prevention Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000013508 migration Methods 0.000 abstract description 3
- 230000005012 migration Effects 0.000 abstract description 3
- 238000003062 neural network model Methods 0.000 abstract description 2
- 235000013305 food Nutrition 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 241000196324 Embryophyta Species 0.000 description 3
- 241000588724 Escherichia coli Species 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- NTYJJOPFIAHURM-UHFFFAOYSA-N Histamine Chemical compound NCCC1=CN=CN1 NTYJJOPFIAHURM-UHFFFAOYSA-N 0.000 description 2
- 241000588769 Proteus <enterobacteria> Species 0.000 description 2
- 241000607272 Vibrio parahaemolyticus Species 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000000556 factor analysis Methods 0.000 description 2
- 238000009533 lab test Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 244000052769 pathogen Species 0.000 description 2
- ZFXYFBGIUFBOJW-UHFFFAOYSA-N theophylline Chemical compound O=C1N(C)C(=O)N(C)C2=C1NC=N2 ZFXYFBGIUFBOJW-UHFFFAOYSA-N 0.000 description 2
- 241000193755 Bacillus cereus Species 0.000 description 1
- 241000588919 Citrobacter freundii Species 0.000 description 1
- 241000588697 Enterobacter cloacae Species 0.000 description 1
- 241000735389 Gyrocarpus Species 0.000 description 1
- 241001263478 Norovirus Species 0.000 description 1
- 244000046052 Phaseolus vulgaris Species 0.000 description 1
- 241000607142 Salmonella Species 0.000 description 1
- 241000607768 Shigella Species 0.000 description 1
- 241000191967 Staphylococcus aureus Species 0.000 description 1
- 159000000009 barium salts Chemical class 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 229940105847 calamine Drugs 0.000 description 1
- 150000004657 carbamic acid derivatives Chemical class 0.000 description 1
- STJMRWALKKWQGH-UHFFFAOYSA-N clenbuterol Chemical compound CC(C)(C)NCC(O)C1=CC(Cl)=C(N)C(Cl)=C1 STJMRWALKKWQGH-UHFFFAOYSA-N 0.000 description 1
- 229960001117 clenbuterol Drugs 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 231100000676 disease causative agent Toxicity 0.000 description 1
- 230000037406 food intake Effects 0.000 description 1
- 229910052864 hemimorphite Inorganic materials 0.000 description 1
- 229960001340 histamine Drugs 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 150000002826 nitrites Chemical class 0.000 description 1
- 230000003071 parasitic effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- CFMYXEVWODSLAX-QOZOJKKESA-N tetrodotoxin Chemical compound O([C@@]([C@H]1O)(O)O[C@H]2[C@@]3(O)CO)[C@H]3[C@@H](O)[C@]11[C@H]2[C@@H](O)N=C(N)N1 CFMYXEVWODSLAX-QOZOJKKESA-N 0.000 description 1
- 229950010357 tetrodotoxin Drugs 0.000 description 1
- CFMYXEVWODSLAX-UHFFFAOYSA-N tetrodotoxin Natural products C12C(O)NC(=N)NC2(C2O)C(O)C3C(CO)(O)C1OC2(O)O3 CFMYXEVWODSLAX-UHFFFAOYSA-N 0.000 description 1
- 229960000278 theophylline Drugs 0.000 description 1
- 235000014692 zinc oxide Nutrition 0.000 description 1
- 239000011787 zinc oxide Substances 0.000 description 1
- CPYIZQLXMGRKSW-UHFFFAOYSA-N zinc;iron(3+);oxygen(2-) Chemical compound [O-2].[O-2].[O-2].[O-2].[Fe+3].[Fe+3].[Zn+2] CPYIZQLXMGRKSW-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/80—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Abstract
The invention discloses a food-borne disease pathogenic factor prediction method based on a BP neural network, which comprises the following steps: s1, collecting and sorting the food-borne disease accident cases, establishing a food-borne disease sample analysis database, and recording the characteristic items contained in each sample; s2, determining a training set and a test set, and performing attribute selection and neuron definition; has the advantages that: establishing a deep BP neural network model, improving a network structure by increasing the number of hidden layers of the neural network, optimizing the network computation complexity, establishing an accurate analysis and prediction network model of the food-borne disease epidemiological pathogenic factors, and updating the stack in real time through a dynamic migration network with a self-learning function, so that the execution efficiency and the sensitivity of a discrimination model network for predicting the food-borne disease pathogenic factors are improved; and preprocessing the missing data, reconstructing the data containing the missing items, and analyzing the data to make the data participate in effective network calculation.
Description
The technical field is as follows:
the invention belongs to the technical field of pathogenic factor prediction, and particularly relates to a method and a system for predicting pathogenic factors of food-borne diseases based on a BP neural network.
Background art:
food-borne diseases (foodborn diseases) are defined as "diseases caused by ingestion of contaminated food by the human body, including a wide range of diseases caused by parasites, chemicals and pathogenic bacteria which contaminate food at different stages during the production and preparation of food". Food-borne diseases cause great harm to public health, the healthy development of the food industry and social stability, and food-borne disease monitoring work is carried out in many countries at present. The food-borne disease monitoring system aims at identifying and controlling outbreak of food-borne diseases and analyzing and judging pathogenic factors; identifying susceptible population, high risk food and poor food handling procedures; the food-borne transmission way of specific pathogenic bacteria is determined; evaluating the influence of food-borne diseases and reducing the harm of the food-borne diseases; making a food safety assessment plan; researching the food-borne disease outbreak traceability and early warning strategy measures, in the aspect of realizing the calculation analysis prejudgment algorithm of the food-borne disease pathogenic factor, Logistic regression belongs to the traditional statistical method, the process is easy to understand, but the requirement on input data is higher, and each characteristic item of the analyzed data is independent; the decision tree algorithm has no independence requirement on input data, but the stability is poor and the processing of missing data is difficult; the Bayesian discriminant analysis method is applied to the food-borne disease pathogenic factor to predict the stability of the pathogenic factor, but the accuracy is low; the SVM algorithm is sensitive to missing data, so the invention provides a food-borne disease pathogenic factor prediction method and system based on a BP neural network to solve the problems.
The invention content is as follows:
the invention aims to solve the problems, and provides a method and a system for predicting pathogenic factors of food-borne diseases based on a BP neural network, which solve the problems of attribute selection and neuron accurate definition under the condition of the existing small sample sparse data and the defects of low prediction accuracy, sensitivity and specificity of pathogenic factors of food-borne diseases under the condition of a large amount of missing data.
In order to solve the above problems, the present invention provides a technical solution:
a food-borne disease pathogenic factor prediction method based on a BP neural network comprises the following steps:
s1, collecting and sorting the food-borne disease accident cases, establishing a food-borne disease sample analysis database, and recording the characteristic items contained in each sample;
s2, determining a training set and a test set, and performing attribute selection and neuron definition;
s3, preprocessing the missing data, expressing the null value by NaN, and processing the data containing NaN before training the network in order to construct a deep BP neural network food-borne disease pathogenic factor prediction model with high accuracy;
s4, establishing a deep BP neural network system model, and training a network by using the training set data;
and S5, inputting the data of the test set into the model, and analyzing the sensitivity and specificity.
Preferably, the data acquisition in S1 is derived from food-borne disease accident analysis data generated by the center for disease prevention and control.
Preferably, the 2 neurons in the output layer of the neural network in S2 are classified by J7 "pathogenic factor" and J71 ", J7 pathogenic factor is a string-type variable, and J71" classification "is an integer-type variable, and are classified by" 1 ═ chemical pollutant 2 ═ pathogenic bacterium 3 ═ virus 4 ═ mycotoxin 5 ═ parasite 6 ═ toxic animal 7 ═ toxic plant ";
it can be seen that the sample set had only one mycotoxin, which was "mold", without parasitic pathogens, mainly dominated by pathogens, and the remaining 5 categories could then be subdivided according to J7. The subdivision criterion is that if the number of samples of the same type in J7 is more than or equal to 4, the samples are divided into one type separately, otherwise, the samples are not divided. Then add the J70 attributes, which are:
0 ═ carbamates, 1 ═ general chemical pollutants, 2 ═ clenbuterol, 3 ═ barium salts, 4 ═ nitrites, 5 ═ organophosphates, 6 ═ spasmodics, 7 ═ histamine, 8 ═ proteus, 9 ═ general pathogenic bacteria, 10 ═ escherichia coli, 11 ═ escherichia coli, 12 ═ citrobacter freundii, 13 ═ vibrio parahaemolyticus, 14 ═ staphylococcus aureus, 15 ═ bacillus cereus, 16 ═ salmonella, 17 ═ enterobacter cloacae, 18 ═ shigella, 19 ═ norovirus, 20 ═ mould, 21 ═ general toxic animals, 22 ═ tetrodotoxin, 23 ═ general toxic plants, 24 ═ kidney bean, 25 ═ colatole, 26 ═ geloninum, 27 ═ 28, 28 ═ gyrocarpus, 29 ═ total theophylline, 30 ═ calamine
Meanwhile, if two or three pathogenic agents, such as "proteus, vibrio parahaemolyticus, pathogenic escherichia coli", are simultaneously present in J7, they are regarded as two or three different samples. The "unexplained cause" sample is discarded, at which point the sample size is 849.
For the food-borne disease causative agent prediction model herein, K ═ 30.
An output layer: 30 neurons, corresponding to 30 different etiologic agents in J70.
Preferably, the data in S3Each column is a sample data, each row is a feature item data, the first feature item of the second sample in the A data is missing, NaN in the A is replaced by the mean value of the data of the feature item of the first row without the residual data, a row of identification rows is added, 0 in each identification row represents that the corresponding position of the previous row is NaN data, 1 represents that the corresponding position of the previous row is not NaN,obtained by the above treatment
Preferably, S4 includes the following steps
1) Constructing a food-borne disease pathogenic factor prediction model by adopting a deep BP neural network;
2) modeling a mathematical model based on real sample data;
3) establishing a computer application model;
4) and obtaining a pathogenic factor pre-judgment result based on a computer application model.
Preferably, the deep BP neural network in the constructed prediction model mainly comprises two stages: a forward propagation stage and an error backward propagation stage;
and a forward propagation stage: the propagation direction is input layer → hidden layer → output layer, and the output value is calculated according to the input value of each layerCalculating the input value as the next layer by layer to obtain the actual output y of the final output layeri,i=1,2。
The output of the ith node of the input layer is:
the output of the first node of the l-1 layer hidden layer is as follows:
the output of the first node of the output layer is as follows:
and (3) an error back propagation stage: i.e. input data x for training samples(i)I.e. (x)1,x2,....,xp)(i)Output data y(i)I.e. (y)1,y2,...,yq)(i)Where i is 1, the output of the network net for N is fnet(x|W,b)Object letterThe number is as follows:
where L (x) is an error function, which may be a mean square error. The goal of the training is to minimize J (W, b; x, y). And returning layer by layer along the reverse direction of the forward propagation stage by adopting a gradient descent method, and correcting the weight and the threshold of each layer of neuron nodes:
The error is gradually reduced through the training, and the final output of the modified network is close to the target output value.
In the above model, nlThe number of layer I neurons; l is the number of hidden layers;
xjj is 1,2, and p is the l-th neuron node of the input layer, wherein p is the number of neurons of the input layer;
yiq is the ith neuron node of the output layer, wherein q is the number of neurons of the output layer;
representing the weight from the l-th neuron node of the first layer to the l-1 neuron node of the ith layer;a threshold value representing the ith neuron node from the l-th layer to the i-th layer;
f(l)(x) Representing the transfer function of layer I neurons; g (x) represents the transfer function of the output layer.
Preferably, the mathematical model modeling based on the real sample data is to construct a deep BP neural network to construct a hypertension prediction model based on the deep neural network principle and the neurons defined in the previous steps, and the model comprises an input layer, an L-layer hidden layer and an output layer;
an input layer: 96 neuron characteristic attributes are included;
hiding the layer: taking L hidden layers, wherein each hidden layer comprises n neurons.
Inputting a processing function: 'fixunknowns' is to preprocess the input data;
transfer function: the transfer function is of tan-sigmoid type for the hidden layer odd layerThe transfer function of the even layer is log-sigmoid type functionThe transfer function of the output layer is the softmax function, i.e. for K scalars x1,.....,xK,Wherein y is1,...,yKSatisfy the requirement ofFor the food-borne disease pathogenic factor prediction model of the invention, K is 2, if y1>y2,y1=1,y20, if y1<y2,y1=0,y2=1。
An output layer: 30 neurons, corresponding to 30 different etiologic agents in J70.
Network training function: the method of quantifying conjugate gradients.
Preferably, the computer application model is established by setting the corresponding parameter writing program according to a mathematical model and a built-in BP neural network function of python to implement computer simulation modeling, a food-borne disease pathogenic factor prediction application model of the deep neural network is established, the deep BP neural network is trained by using training set data and the trained network net is stored, and the output result of the net is tested by using test set data and the accuracy, sensitivity and specificity of the net are analyzed.
Preferably, in S5, according to the calculation formula:
where Ac represents the Accuracy of the training set (Accuracy), Se represents the Sensitivity of the training set (Sensitivity), Sp represents the Specificity of the training set (Specificity), TAC represents the Accuracy of the test set (Accuracy of the test set), TSe represents the Sensitivity of the test set (Sensitivity of the test set), and TSp represents the Specificity of the test set (Specificity of the test set).
The system performance of a BP neural network food-borne disease pathogenic factor discrimination model can be obtained.
According to the ROC curve of pathogenic factors predicted by the deep BP neural network, 50 hidden layers can be seen, the deep BP neural network prediction model with 200 neurons in each layer is the best for prediction effect, the number of the hidden layers of the deep BP neural network prediction model is determined to be 50, and the number of the neurons in the hidden layers is determined to be 200.
A food-borne disease pathogenic factor prediction system based on a BP neural network comprises a data acquisition module, a data storage module, a data normalization module, a data preprocessing module, a neuron training module, a BP neural network prediction module, a WEB server and mobile equipment;
the data acquisition module is used for acquiring food-borne disease data;
the data storage module is used for storing the data acquired by the data acquisition module;
the data normalization module is used for dividing food-borne disease data into a training sample set and a testing sample set and carrying out data normalization;
the data preprocessing module is used for processing data containing null values before training the network;
the neuron training module is used for training a neural network;
the BP neural network prediction module is used for constructing a neural network prediction model;
and the WEB server is used for storing the prediction result data and sending the prediction result to the mobile equipment.
The invention has the beneficial effects that:
the food-borne disease pathogenic factor prediction method and system based on the BP neural network, provided by the invention, have the advantages that a deep BP neural network model is established, the network structure is improved by increasing the number of layers of neural network hidden layers, the network calculation complexity is optimized, an accurate analysis prediction network model of food-borne disease epidemiological pathogenic factors is established, and the execution efficiency and the sensitivity of a discrimination model network for predicting the food-borne disease pathogenic factors are improved by updating the stack in real time through a dynamic migration network with a self-learning function; preprocessing the missing data, reconstructing the data containing the missing items, and analyzing the data to make the data participate in effective network calculation; coding the non-numerical characteristic item data, and converting the text data into numerical data for analysis; determining characteristic items influencing the prediction of pathogenic factors of food-borne diseases, encoding the characteristic items, and determining neurons of an input layer; the method has the advantages that the self-learning and the self-learning of some food-borne disease pathogenic factor samples with small amount are realized, the sample amount is increased through the use in the future, so that the prediction probability of the small sample pathogenic factor at present is increased, and theoretical and practical bases are provided for food-borne disease field data information acquisition, pathogenic factor analysis and prediction, laboratory test support and medical auxiliary diagnosis.
Description of the drawings:
for ease of illustration, the invention is described in detail by the following detailed description and the accompanying drawings.
FIG. 1 is a schematic diagram of a prediction method according to the present invention;
FIG. 2 is a diagram of a deep BP neural network according to the present invention;
FIG. 3 is a schematic model diagram of the deep BP neural network of the present invention;
FIG. 4 is a schematic diagram of ROC curve of deep BP neural network prediction pathogenic factor of the present invention;
FIG. 5 is a diagram of a prediction system according to the present invention.
The specific implementation mode is as follows:
as shown in fig. 1 to 5, the following technical solutions are adopted in the present embodiment: a food-borne disease pathogenic factor prediction method based on a BP neural network comprises the following steps:
s1, collecting and sorting the food-borne disease accident cases, establishing a food-borne disease sample analysis database, and recording the characteristic items contained in each sample;
s2, determining a training set and a test set, and performing attribute selection and neuron definition;
s3, preprocessing the missing data, expressing the null value by NaN, and processing the data containing NaN before training the network in order to construct a deep BP neural network food-borne disease pathogenic factor prediction model with high accuracy;
s4, establishing a deep BP neural network system model, and training a network by using the training set data;
and S5, inputting the data of the test set into the model, and analyzing the sensitivity and specificity.
Wherein the data acquisition in S1 is derived from food-borne disease accident analysis data formed by a disease prevention control center.
Wherein, 2 neurons in the output layer of the neural network in S2 are respectively classified as J7 "pathogenic factor" and J71 ", J7 pathogenic factor is a string-type variable, and J71" classification "is an integer variable, and is respectively" 1 ═ chemical pollutant 2 ═ pathogenic bacterium 3 ═ virus 4 ═ mycotoxin 5 ═ parasite 6 ═ toxic animal 7 ═ toxic plant ".
Wherein, the data in S3Each column is a sample data, each row is a feature item data, the first feature item of the second sample in the A data is missing, NaN in the A is replaced by the mean value of the data of the feature item of the first row without the residual data, a row of identification rows is added, 0 in each identification row represents that the corresponding position of the previous row is NaN data, 1 represents that the corresponding position of the previous row is not NaN,obtained by the above treatment
Wherein, the step of S4 comprises the following steps
1) Constructing a food-borne disease pathogenic factor prediction model by adopting a deep BP neural network;
2) modeling a mathematical model based on real sample data;
3) establishing a computer application model;
4) and obtaining a pathogenic factor pre-judgment result based on a computer application model.
The deep BP neural network in the constructed prediction model mainly comprises two stages: a forward propagation stage and an error backward propagation stage;
and a forward propagation stage: the propagation direction is input layer → hidden layer → output layer, and the output value is calculated according to the input value of each layerThen calculating the input value as the next layer by layer to obtain the final outputActual output of layer yi,i=1,2。
The output of the ith node of the input layer is:
the output of the first node of the l-1 layer hidden layer is as follows:
the output of the first node of the output layer is as follows:
and (3) an error back propagation stage: i.e. input data x for training samples(i)I.e. (x)1,x2,....,xp)(i)Output data y(i)I.e. (y)1,y2,...,yq)(i)Where i is 1, the output of the network net for N is fnet(x|W,b)The objective function is:
where L (x) is an error function, which may be a mean square error. The goal of the training is to minimize J (W, b; x, y). And returning layer by layer along the reverse direction of the forward propagation stage by adopting a gradient descent method, and correcting the weight and the threshold of each layer of neuron nodes:
The error is gradually reduced through the training, and the final output of the modified network is close to the target output value.
In the above model, nlThe number of layer I neurons; l is the number of hidden layers;
xjj is 1,2, and p is the l-th neuron node of the input layer, wherein p is the number of neurons of the input layer;
yiq is the ith neuron node of the output layer, wherein q is the number of neurons of the output layer;
representing the weight from the l-th neuron node of the first layer to the l-1 neuron node of the ith layer;
f(l)(x) Representing the transfer function of layer I neurons; g (x) represents the transfer function of the output layer.
The mathematical model modeling based on the real sample data is based on the deep neural network principle and the neurons defined in the previous steps, a deep BP neural network is constructed to construct a hypertension prediction model, and the model comprises an input layer, an L-layer hidden layer and an output layer;
an input layer: 96 neuron characteristic attributes are included;
J31","J32","J33","J431","J432","J531","J532","J533","Q","Q11","Q13","Q14","Q15","Q2","Q3","Q4","Q5","Q6","Q7","Q8","Q9","Q10","Q12","Q16","Q17","Q18","Q19","Q20","Q21","Q22","Q23","Q24","Q25","Q26","Z","Z1","Z21","Z22","Z23","Z24","Z3","Z4","Z5","Z111","Z112","Z113","Z114","Z115","Z116","Z117","Z118","Z119","Z1110","Z1111","Z1112","H","H1","H2","H3","X","X1","X2","X3","X4","M","M1","M2","M3","M4","S","S1","S2","S3","S4","S5","S6","S7","S8","S9","S10","S11","S12","S13","S14","S15","S161","S162","S163","S164","S165","P","P1","P2","P3","P4","P5";
hiding the layer: taking L hidden layers, wherein each hidden layer comprises n neurons, and the table is specifically shown in the following table;
Number of neurons and layers in the hidden layer
inputting a processing function: 'fixunknowns' is to preprocess the input data;
transfer function: the transfer function is of tan-sigmoid type for the hidden layer odd layerThe transfer function of the even layer is log-sigmoid type functionThe transfer function of the output layer is the softmax function, i.e. for K scalars x1,.....,xK,Wherein y is1,...,yKSatisfy the requirement ofFor the food-borne disease pathogenic factor prediction model of the invention, K is 2, if y1>y2,y1=1,y20, if y1<y2,y1=0,y2=1。
An output layer: 30 neurons, corresponding to 30 different etiologic agents in J70.
Network training function: the method of quantifying conjugate gradients.
The establishing computer application model is used for setting the corresponding parameter compiling program to implement computer simulation modeling by applying a built-in BP neural network function of python according to a mathematical model, establishing a food-borne disease pathogenic factor prediction application model of the deep neural network, training the deep BP neural network by using training set data and storing the trained network net, and testing the net output result by using test set data and analyzing the accuracy, sensitivity and specificity of the net output result.
Wherein in S5, according to a calculation formula:
where Ac represents the Accuracy of the training set (Accuracy), Se represents the Sensitivity of the training set (Sensitivity), Sp represents the Specificity of the training set (Specificity), TAC represents the Accuracy of the test set (Accuracy of the test set), TSe represents the Sensitivity of the test set (Sensitivity of the test set), and TSp represents the Specificity of the test set (Specificity of the test set);
the system performance of a BP neural network food-borne disease pathogenic factor discrimination model can be obtained as shown in the following table
A food-borne disease pathogenic factor prediction system based on a BP neural network comprises a data acquisition module, a data storage module, a data normalization module, a data preprocessing module, a neuron training module, a BP neural network prediction module, a WEB server and mobile equipment;
the data acquisition module is used for acquiring food-borne disease data;
the data storage module is used for storing the data acquired by the data acquisition module;
the data normalization module is used for dividing food-borne disease data into a training sample set and a testing sample set and carrying out data normalization;
the data preprocessing module is used for processing data containing null values before training the network;
the neuron training module is used for training a neural network;
the BP neural network prediction module is used for constructing a neural network prediction model;
and the WEB server is used for storing the prediction result data and sending the prediction result to the mobile equipment.
Specifically, the method comprises the following steps: a method and system for predicting pathogenic factor of food-borne disease based on BP neural network includes collecting data, collecting and arranging accident cases of food-borne disease, setting up analysis database of food-borne disease samples, recording characteristic item contained in each sample, determining training set and test set, carrying out attribute selection and neuron definition, preprocessing missing data, expressing null value by NaN, processing data containing NaN before training network, setting up model of deep BP neural network system, training network by using said training set data, inputting data of test set into model, analyzing sensitivity and specificity, setting up model of deep BP neural network, improving network structure by increasing layer number of hidden layer of neural network, optimizing the network computational complexity, establishing an accurate analysis and prediction network model of the food-borne disease epidemiological pathogenic factor, and updating the iteration in real time through a dynamic migration network with a self-learning function, so that the execution efficiency and the sensitivity of a discrimination model network for predicting the food-borne disease pathogenic factor are improved; preprocessing the missing data, reconstructing the data containing the missing items, and analyzing the data to make the data participate in effective network calculation; coding the non-numerical characteristic item data, and converting the text data into numerical data for analysis; determining characteristic items influencing the prediction of pathogenic factors of food-borne diseases, encoding the characteristic items, and determining neurons of an input layer; the method has the advantages that the self-learning and the self-learning of some food-borne disease pathogenic factor samples with small amount are realized, the sample amount is increased through the use in the future, so that the prediction probability of the small sample pathogenic factor at present is increased, and theoretical and practical bases are provided for food-borne disease field data information acquisition, pathogenic factor analysis and prediction, laboratory test support and medical auxiliary diagnosis.
While there have been shown and described what are at present considered to be the fundamental principles of the invention and its essential features and advantages, it will be understood by those skilled in the art that the invention is not limited by the embodiments described above, which are merely illustrative of the principles of the invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.
Claims (10)
1. A food-borne disease pathogenic factor prediction method based on a BP neural network is characterized in that: the prediction method comprises the following steps:
s1, collecting and sorting the food-borne disease accident cases, establishing a food-borne disease sample analysis database, and recording the characteristic items contained in each sample;
s2, determining a training set and a test set, and performing attribute selection and neuron definition;
s3, preprocessing the missing data, expressing the null value by NaN, and processing the data containing NaN before training the network in order to construct a deep BP neural network food-borne disease pathogenic factor prediction model with high accuracy;
s4, establishing a deep BP neural network system model, and training a network by using the training set data;
and S5, inputting the data of the test set into the model, and analyzing the sensitivity and specificity.
2. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: and the data acquisition in the S1 is derived from food-borne disease accident analysis data formed by a disease prevention and control center.
3. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: the 2 neurons in the output layer of the neural network in S2 are classified as J7 "pathogenic factor" and J71 ", respectively, J7 pathogenic factor is a string-type variable, and J71" classified "is a integer variable, and is classified as" 1 ═ chemical pollutant 2 ═ pathogenic bacterium 3 ═ virus 4 ═ mycotoxin 5 ═ parasite 6 ═ toxic animal 7 ═ toxic plant ", respectively.
4. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: data in the S3Each column is a sample data, each row is a feature item data, the first feature item of the second sample in the A data is missing, NaN in the A is replaced by the mean value of the data of the feature item of the first row without the residual data, a row of identification rows is added, 0 in each identification row represents that the corresponding position of the previous row is NaN data, 1 represents that the corresponding position of the previous row is not NaN,obtained by the above treatment
5. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: the step of S4 includes the following steps
1) Constructing a food-borne disease pathogenic factor prediction model by adopting a deep BP neural network;
2) modeling a mathematical model based on real sample data;
3) establishing a computer application model;
4) and obtaining a pathogenic factor pre-judgment result based on a computer application model.
6. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: the deep BP neural network in the constructed prediction model mainly comprises two stages: a forward propagation stage and an error backward propagation stage;
and a forward propagation stage: the propagation direction is input layer → hidden layer → output layer, and the output value is calculated according to the input value of each layerThen will beCalculating layer by layer as the input value of the next layer to obtain the actual output y of the final output layeri,i=1,2。
The output of the ith node of the input layer is:
the output of the first node of the l-1 layer hidden layer is as follows:
the output of the first node of the output layer is as follows:
and (3) an error back propagation stage: i.e. input data x for training samples(i)I.e. (x)1,x2,....,xp)(i)Output data y(i)I.e. (y)1,y2,...,yq)(i)Where i is 1, the output of the network net for N is fnet(x|W,b)The objective function is:
where L (x) is an error function, which may be a mean square error. The goal of the training is to minimize J (W, b; x, y). And returning layer by layer along the reverse direction of the forward propagation stage by adopting a gradient descent method, and correcting the weight and the threshold of each layer of neuron nodes:
The error is gradually reduced through the training, and the final output of the modified network is close to the target output value.
In the above model, nlThe number of layer I neurons; l is the number of hidden layers;
xj,j=1,2..., p is the l neuron node of the input layer, wherein p is the number of neurons in the input layer;
yiq is the ith neuron node of the output layer, wherein q is the number of neurons of the output layer;
representing the weight from the l-th neuron node of the first layer to the l-1 neuron node of the ith layer;
f(l)(x) Representing the transfer function of layer I neurons; g (x) represents the transfer function of the output layer.
7. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: the mathematical model modeling based on the real sample data is to construct a deep BP neural network to construct a hypertension prediction model based on the deep neural network principle and the neurons defined in the previous steps, wherein the model comprises an input layer, an L-layer hidden layer and an output layer;
an input layer: 96 neuron characteristic attributes are included;
hiding the layer: taking L hidden layers, wherein each hidden layer comprises n neurons;
inputting a processing function: 'fixunknowns' is to preprocess the input data;
transfer function: the transfer function is of tan-sigmoid type for the hidden layer odd layerThe transfer function of the even layer is log-sigmoid type functionTransfer function of output layerSeveral softmax functions, i.e. for K scalars x1,.....,xK,Wherein y is1,...,yKSatisfy the requirement ofFor the food-borne disease pathogenic factor prediction model of the invention, K is 2, if y1>y2,y1=1,y20, if y1<y2,y1=0,y2=1。
An output layer: 30 neurons, corresponding to 30 different etiologic agents in J70.
Network training function: the method of quantifying conjugate gradients.
8. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: the establishment of the computer application model is implemented by setting the corresponding parameter compiling program by using a built-in BP neural network function of python according to the mathematical model to implement computer simulation modeling, establishing a food-borne disease pathogenic factor prediction application model of the deep neural network, training the deep BP neural network by using training set data, storing the trained network net, testing the net output result by using test set data and analyzing the accuracy, sensitivity and specificity of the net output result.
9. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: in S5, according to the calculation formula:
where Ac represents the Accuracy of the training set (Accuracy), Se represents the Sensitivity of the training set (Sensitivity), Sp represents the Specificity of the training set (Specificity), TAC represents the Accuracy of the test set (Accuracy of the test set), TSe represents the Sensitivity of the test set (Sensitivity of the test set), and TSp represents the Specificity of the test set (Specificity of the test set);
the system performance of a BP neural network food-borne disease pathogenic factor discrimination model can be obtained.
10. A food-borne disease pathogenic factor prediction system based on a BP neural network is characterized in that: the system comprises a data acquisition module, a data storage module, a data normalization module, a data preprocessing module, a neuron training module, a BP neural network prediction module, a WEB server and mobile equipment;
the data acquisition module is used for acquiring food-borne disease data;
the data storage module is used for storing the data acquired by the data acquisition module;
the data normalization module is used for dividing food-borne disease data into a training sample set and a testing sample set and carrying out data normalization;
the data preprocessing module is used for processing data containing null values before training the network;
the neuron training module is used for training a neural network;
the BP neural network prediction module is used for constructing a neural network prediction model;
and the WEB server is used for storing the prediction result data and sending the prediction result to the mobile equipment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011076959.5A CN112216399A (en) | 2020-10-10 | 2020-10-10 | Food-borne disease pathogenic factor prediction method and system based on BP neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011076959.5A CN112216399A (en) | 2020-10-10 | 2020-10-10 | Food-borne disease pathogenic factor prediction method and system based on BP neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112216399A true CN112216399A (en) | 2021-01-12 |
Family
ID=74053021
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011076959.5A Pending CN112216399A (en) | 2020-10-10 | 2020-10-10 | Food-borne disease pathogenic factor prediction method and system based on BP neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112216399A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113205055A (en) * | 2021-05-11 | 2021-08-03 | 北京知见生命科技有限公司 | Fungus microscopic image classification method and system based on multi-scale attention mechanism |
CN115115260A (en) * | 2022-07-19 | 2022-09-27 | 东南大学溧阳研究院 | Quantitative analysis method for social electric influence caused by emergency based on BP neural network |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1881227A (en) * | 2006-05-16 | 2006-12-20 | 中国人民解放军第三军医大学 | Intelligent analytical model technology for diagnosing epidemic situation and classifying harmfulness degree of contagious disease |
CN104597753A (en) * | 2014-12-19 | 2015-05-06 | 徐工集团工程机械股份有限公司道路机械分公司 | Method and device for intelligently controlling asphalt and crushed stone spreading of synchronous chip sealer |
CN106096286A (en) * | 2016-06-15 | 2016-11-09 | 北京千安哲信息技术有限公司 | Clinical path formulating method and device |
CN106874581A (en) * | 2016-12-30 | 2017-06-20 | 浙江大学 | A kind of energy consumption of air conditioning system in buildings Forecasting Methodology based on BP neural network model |
CN107358294A (en) * | 2017-07-21 | 2017-11-17 | 河北工程大学 | A kind of water demand prediction method based on Elman neutral nets |
CN107506590A (en) * | 2017-08-26 | 2017-12-22 | 郑州大学 | A kind of angiocardiopathy forecast model based on improvement depth belief network |
CN109492287A (en) * | 2018-10-30 | 2019-03-19 | 成都云材智慧数据科技有限公司 | A kind of solid electrolyte ionic conductivity prediction technique based on BP neural network |
CN110087207A (en) * | 2019-05-05 | 2019-08-02 | 江南大学 | Wireless sensor network missing data method for reconstructing |
CN110322014A (en) * | 2019-07-10 | 2019-10-11 | 燕山大学 | A kind of finished cement specific surface area prediction technique based on BP neural network |
CN110610209A (en) * | 2019-09-16 | 2019-12-24 | 北京邮电大学 | Air quality prediction method and system based on data mining |
CN111047073A (en) * | 2019-11-14 | 2020-04-21 | 佛山科学技术学院 | Neural network-based aquaculture water quality prediction method and system |
CN111489046A (en) * | 2019-01-29 | 2020-08-04 | 广东省公共卫生研究院 | Regional food safety evaluation model based on supply chain and BP neural network |
CN111681764A (en) * | 2020-06-11 | 2020-09-18 | 西南大学 | BP neural network-based children respiratory system disease incidence prediction method |
-
2020
- 2020-10-10 CN CN202011076959.5A patent/CN112216399A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1881227A (en) * | 2006-05-16 | 2006-12-20 | 中国人民解放军第三军医大学 | Intelligent analytical model technology for diagnosing epidemic situation and classifying harmfulness degree of contagious disease |
CN104597753A (en) * | 2014-12-19 | 2015-05-06 | 徐工集团工程机械股份有限公司道路机械分公司 | Method and device for intelligently controlling asphalt and crushed stone spreading of synchronous chip sealer |
CN106096286A (en) * | 2016-06-15 | 2016-11-09 | 北京千安哲信息技术有限公司 | Clinical path formulating method and device |
CN106874581A (en) * | 2016-12-30 | 2017-06-20 | 浙江大学 | A kind of energy consumption of air conditioning system in buildings Forecasting Methodology based on BP neural network model |
CN107358294A (en) * | 2017-07-21 | 2017-11-17 | 河北工程大学 | A kind of water demand prediction method based on Elman neutral nets |
CN107506590A (en) * | 2017-08-26 | 2017-12-22 | 郑州大学 | A kind of angiocardiopathy forecast model based on improvement depth belief network |
CN109492287A (en) * | 2018-10-30 | 2019-03-19 | 成都云材智慧数据科技有限公司 | A kind of solid electrolyte ionic conductivity prediction technique based on BP neural network |
CN111489046A (en) * | 2019-01-29 | 2020-08-04 | 广东省公共卫生研究院 | Regional food safety evaluation model based on supply chain and BP neural network |
CN110087207A (en) * | 2019-05-05 | 2019-08-02 | 江南大学 | Wireless sensor network missing data method for reconstructing |
CN110322014A (en) * | 2019-07-10 | 2019-10-11 | 燕山大学 | A kind of finished cement specific surface area prediction technique based on BP neural network |
CN110610209A (en) * | 2019-09-16 | 2019-12-24 | 北京邮电大学 | Air quality prediction method and system based on data mining |
CN111047073A (en) * | 2019-11-14 | 2020-04-21 | 佛山科学技术学院 | Neural network-based aquaculture water quality prediction method and system |
CN111681764A (en) * | 2020-06-11 | 2020-09-18 | 西南大学 | BP neural network-based children respiratory system disease incidence prediction method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113205055A (en) * | 2021-05-11 | 2021-08-03 | 北京知见生命科技有限公司 | Fungus microscopic image classification method and system based on multi-scale attention mechanism |
CN115115260A (en) * | 2022-07-19 | 2022-09-27 | 东南大学溧阳研究院 | Quantitative analysis method for social electric influence caused by emergency based on BP neural network |
CN115115260B (en) * | 2022-07-19 | 2023-04-07 | 东南大学溧阳研究院 | Quantitative analysis method for social electric influence caused by emergency based on BP neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11650968B2 (en) | Systems and methods for predictive early stopping in neural network training | |
CN112216399A (en) | Food-borne disease pathogenic factor prediction method and system based on BP neural network | |
US11636026B2 (en) | Computer program for performance testing of models | |
CN113255882A (en) | Bearing fault diagnosis method based on improved convolution capsule network | |
Graening et al. | Shape mining: A holistic data mining approach for engineering design | |
CN113096810A (en) | Survival risk prediction method for esophageal squamous carcinoma patient based on convolutional neural network | |
Alsubai et al. | Heart failure detection using instance quantum circuit approach and traditional predictive analysis | |
US20220269991A1 (en) | Evaluating reliability of artificial intelligence | |
JP7101349B1 (en) | Classification system | |
Shobana et al. | Plant disease detection using convolution neural network | |
Padimi et al. | Applying Machine Learning Techniques To Maximize The Performance of Loan Default Prediction | |
Carratú et al. | A novel IVS procedure for handling Big Data with Artificial Neural Networks | |
Degeest et al. | Feature ranking in changing environments where new features are introduced | |
Dananjaya et al. | Decision support system for classification of early childhood diseases using principal component analysis and k-nearest neighbors classifier | |
Elfadl et al. | Using discriminant analysis and artificial neural network models for classification and prediction of fertility status of Friesian cattle | |
Yakub et al. | An Integrated Approach Based on Artificial Intelligence Using Anfis and Ann for Multiple Criteria Real Estate Price Prediction | |
CN111738410A (en) | Beef cattle individual growth curve acquisition method and device and storage medium | |
Zhu et al. | Bayesian functional data modeling for heterogeneous volatility | |
Barberà-Mariné et al. | Classifying Spanish mutual funds according to their survival capacity using SOM | |
US11315352B2 (en) | Calculating the precision of image annotations | |
Madhavi et al. | Early Discovery of Chronic Kidney Disease by Attributing Missing Values | |
Hansen et al. | Some experimental evidence on the performance of GA-designed neural networks | |
CN117235673B (en) | Cell culture prediction method and device, electronic equipment and storage medium | |
Prasad et al. | Machine Learning for Identification of Immedicable Renal Disease | |
US20230092949A1 (en) | System and method for estimating model metrics without labels |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |