CN112216399A - Food-borne disease pathogenic factor prediction method and system based on BP neural network - Google Patents

Food-borne disease pathogenic factor prediction method and system based on BP neural network Download PDF

Info

Publication number
CN112216399A
CN112216399A CN202011076959.5A CN202011076959A CN112216399A CN 112216399 A CN112216399 A CN 112216399A CN 202011076959 A CN202011076959 A CN 202011076959A CN 112216399 A CN112216399 A CN 112216399A
Authority
CN
China
Prior art keywords
data
layer
food
neural network
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011076959.5A
Other languages
Chinese (zh)
Inventor
高飞
张剑峰
刘忠卫
闫军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heilongjiang Center For Disease Control And Prevention
Original Assignee
Heilongjiang Center For Disease Control And Prevention
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Heilongjiang Center For Disease Control And Prevention filed Critical Heilongjiang Center For Disease Control And Prevention
Priority to CN202011076959.5A priority Critical patent/CN112216399A/en
Publication of CN112216399A publication Critical patent/CN112216399A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The invention discloses a food-borne disease pathogenic factor prediction method based on a BP neural network, which comprises the following steps: s1, collecting and sorting the food-borne disease accident cases, establishing a food-borne disease sample analysis database, and recording the characteristic items contained in each sample; s2, determining a training set and a test set, and performing attribute selection and neuron definition; has the advantages that: establishing a deep BP neural network model, improving a network structure by increasing the number of hidden layers of the neural network, optimizing the network computation complexity, establishing an accurate analysis and prediction network model of the food-borne disease epidemiological pathogenic factors, and updating the stack in real time through a dynamic migration network with a self-learning function, so that the execution efficiency and the sensitivity of a discrimination model network for predicting the food-borne disease pathogenic factors are improved; and preprocessing the missing data, reconstructing the data containing the missing items, and analyzing the data to make the data participate in effective network calculation.

Description

Food-borne disease pathogenic factor prediction method and system based on BP neural network
The technical field is as follows:
the invention belongs to the technical field of pathogenic factor prediction, and particularly relates to a method and a system for predicting pathogenic factors of food-borne diseases based on a BP neural network.
Background art:
food-borne diseases (foodborn diseases) are defined as "diseases caused by ingestion of contaminated food by the human body, including a wide range of diseases caused by parasites, chemicals and pathogenic bacteria which contaminate food at different stages during the production and preparation of food". Food-borne diseases cause great harm to public health, the healthy development of the food industry and social stability, and food-borne disease monitoring work is carried out in many countries at present. The food-borne disease monitoring system aims at identifying and controlling outbreak of food-borne diseases and analyzing and judging pathogenic factors; identifying susceptible population, high risk food and poor food handling procedures; the food-borne transmission way of specific pathogenic bacteria is determined; evaluating the influence of food-borne diseases and reducing the harm of the food-borne diseases; making a food safety assessment plan; researching the food-borne disease outbreak traceability and early warning strategy measures, in the aspect of realizing the calculation analysis prejudgment algorithm of the food-borne disease pathogenic factor, Logistic regression belongs to the traditional statistical method, the process is easy to understand, but the requirement on input data is higher, and each characteristic item of the analyzed data is independent; the decision tree algorithm has no independence requirement on input data, but the stability is poor and the processing of missing data is difficult; the Bayesian discriminant analysis method is applied to the food-borne disease pathogenic factor to predict the stability of the pathogenic factor, but the accuracy is low; the SVM algorithm is sensitive to missing data, so the invention provides a food-borne disease pathogenic factor prediction method and system based on a BP neural network to solve the problems.
The invention content is as follows:
the invention aims to solve the problems, and provides a method and a system for predicting pathogenic factors of food-borne diseases based on a BP neural network, which solve the problems of attribute selection and neuron accurate definition under the condition of the existing small sample sparse data and the defects of low prediction accuracy, sensitivity and specificity of pathogenic factors of food-borne diseases under the condition of a large amount of missing data.
In order to solve the above problems, the present invention provides a technical solution:
a food-borne disease pathogenic factor prediction method based on a BP neural network comprises the following steps:
s1, collecting and sorting the food-borne disease accident cases, establishing a food-borne disease sample analysis database, and recording the characteristic items contained in each sample;
s2, determining a training set and a test set, and performing attribute selection and neuron definition;
s3, preprocessing the missing data, expressing the null value by NaN, and processing the data containing NaN before training the network in order to construct a deep BP neural network food-borne disease pathogenic factor prediction model with high accuracy;
s4, establishing a deep BP neural network system model, and training a network by using the training set data;
and S5, inputting the data of the test set into the model, and analyzing the sensitivity and specificity.
Preferably, the data acquisition in S1 is derived from food-borne disease accident analysis data generated by the center for disease prevention and control.
Preferably, the 2 neurons in the output layer of the neural network in S2 are classified by J7 "pathogenic factor" and J71 ", J7 pathogenic factor is a string-type variable, and J71" classification "is an integer-type variable, and are classified by" 1 ═ chemical pollutant 2 ═ pathogenic bacterium 3 ═ virus 4 ═ mycotoxin 5 ═ parasite 6 ═ toxic animal 7 ═ toxic plant ";
it can be seen that the sample set had only one mycotoxin, which was "mold", without parasitic pathogens, mainly dominated by pathogens, and the remaining 5 categories could then be subdivided according to J7. The subdivision criterion is that if the number of samples of the same type in J7 is more than or equal to 4, the samples are divided into one type separately, otherwise, the samples are not divided. Then add the J70 attributes, which are:
0 ═ carbamates, 1 ═ general chemical pollutants, 2 ═ clenbuterol, 3 ═ barium salts, 4 ═ nitrites, 5 ═ organophosphates, 6 ═ spasmodics, 7 ═ histamine, 8 ═ proteus, 9 ═ general pathogenic bacteria, 10 ═ escherichia coli, 11 ═ escherichia coli, 12 ═ citrobacter freundii, 13 ═ vibrio parahaemolyticus, 14 ═ staphylococcus aureus, 15 ═ bacillus cereus, 16 ═ salmonella, 17 ═ enterobacter cloacae, 18 ═ shigella, 19 ═ norovirus, 20 ═ mould, 21 ═ general toxic animals, 22 ═ tetrodotoxin, 23 ═ general toxic plants, 24 ═ kidney bean, 25 ═ colatole, 26 ═ geloninum, 27 ═ 28, 28 ═ gyrocarpus, 29 ═ total theophylline, 30 ═ calamine
Meanwhile, if two or three pathogenic agents, such as "proteus, vibrio parahaemolyticus, pathogenic escherichia coli", are simultaneously present in J7, they are regarded as two or three different samples. The "unexplained cause" sample is discarded, at which point the sample size is 849.
For the food-borne disease causative agent prediction model herein, K ═ 30.
An output layer: 30 neurons, corresponding to 30 different etiologic agents in J70.
Preferably, the data in S3
Figure BDA0002717476880000031
Each column is a sample data, each row is a feature item data, the first feature item of the second sample in the A data is missing, NaN in the A is replaced by the mean value of the data of the feature item of the first row without the residual data, a row of identification rows is added, 0 in each identification row represents that the corresponding position of the previous row is NaN data, 1 represents that the corresponding position of the previous row is not NaN,
Figure BDA0002717476880000032
obtained by the above treatment
Figure BDA0002717476880000033
Preferably, S4 includes the following steps
1) Constructing a food-borne disease pathogenic factor prediction model by adopting a deep BP neural network;
2) modeling a mathematical model based on real sample data;
3) establishing a computer application model;
4) and obtaining a pathogenic factor pre-judgment result based on a computer application model.
Preferably, the deep BP neural network in the constructed prediction model mainly comprises two stages: a forward propagation stage and an error backward propagation stage;
and a forward propagation stage: the propagation direction is input layer → hidden layer → output layer, and the output value is calculated according to the input value of each layer
Figure BDA0002717476880000041
Calculating the input value as the next layer by layer to obtain the actual output y of the final output layeri,i=1,2。
The output of the ith node of the input layer is:
Figure BDA0002717476880000042
the output of the first node of the l-1 layer hidden layer is as follows:
Figure BDA0002717476880000043
the output of the first node of the output layer is as follows:
Figure BDA0002717476880000044
and (3) an error back propagation stage: i.e. input data x for training samples(i)I.e. (x)1,x2,....,xp)(i)Output data y(i)I.e. (y)1,y2,...,yq)(i)Where i is 1, the output of the network net for N is fnet(x|W,b)Object letterThe number is as follows:
Figure BDA0002717476880000045
where L (x) is an error function, which may be a mean square error. The goal of the training is to minimize J (W, b; x, y). And returning layer by layer along the reverse direction of the forward propagation stage by adopting a gradient descent method, and correcting the weight and the threshold of each layer of neuron nodes:
Figure BDA0002717476880000046
Figure BDA0002717476880000047
order to
Figure BDA0002717476880000048
Representing the effect of layer one neurons have on the final error, can be derived
Figure BDA0002717476880000049
The error is gradually reduced through the training, and the final output of the modified network is close to the target output value.
In the above model, nlThe number of layer I neurons; l is the number of hidden layers;
xjj is 1,2, and p is the l-th neuron node of the input layer, wherein p is the number of neurons of the input layer;
yiq is the ith neuron node of the output layer, wherein q is the number of neurons of the output layer;
Figure BDA0002717476880000051
representing the weight from the l-th neuron node of the first layer to the l-1 neuron node of the ith layer;
Figure BDA0002717476880000052
a threshold value representing the ith neuron node from the l-th layer to the i-th layer;
f(l)(x) Representing the transfer function of layer I neurons; g (x) represents the transfer function of the output layer.
Preferably, the mathematical model modeling based on the real sample data is to construct a deep BP neural network to construct a hypertension prediction model based on the deep neural network principle and the neurons defined in the previous steps, and the model comprises an input layer, an L-layer hidden layer and an output layer;
an input layer: 96 neuron characteristic attributes are included;
hiding the layer: taking L hidden layers, wherein each hidden layer comprises n neurons.
Inputting a processing function: 'fixunknowns' is to preprocess the input data;
transfer function: the transfer function is of tan-sigmoid type for the hidden layer odd layer
Figure BDA0002717476880000053
The transfer function of the even layer is log-sigmoid type function
Figure BDA0002717476880000054
The transfer function of the output layer is the softmax function, i.e. for K scalars x1,.....,xK
Figure BDA0002717476880000055
Wherein y is1,...,yKSatisfy the requirement of
Figure BDA0002717476880000056
For the food-borne disease pathogenic factor prediction model of the invention, K is 2, if y1>y2,y1=1,y20, if y1<y2,y1=0,y2=1。
An output layer: 30 neurons, corresponding to 30 different etiologic agents in J70.
Network training function: the method of quantifying conjugate gradients.
Preferably, the computer application model is established by setting the corresponding parameter writing program according to a mathematical model and a built-in BP neural network function of python to implement computer simulation modeling, a food-borne disease pathogenic factor prediction application model of the deep neural network is established, the deep BP neural network is trained by using training set data and the trained network net is stored, and the output result of the net is tested by using test set data and the accuracy, sensitivity and specificity of the net are analyzed.
Preferably, in S5, according to the calculation formula:
Figure BDA0002717476880000061
Figure BDA0002717476880000062
Figure BDA0002717476880000063
where Ac represents the Accuracy of the training set (Accuracy), Se represents the Sensitivity of the training set (Sensitivity), Sp represents the Specificity of the training set (Specificity), TAC represents the Accuracy of the test set (Accuracy of the test set), TSe represents the Sensitivity of the test set (Sensitivity of the test set), and TSp represents the Specificity of the test set (Specificity of the test set).
The system performance of a BP neural network food-borne disease pathogenic factor discrimination model can be obtained.
According to the ROC curve of pathogenic factors predicted by the deep BP neural network, 50 hidden layers can be seen, the deep BP neural network prediction model with 200 neurons in each layer is the best for prediction effect, the number of the hidden layers of the deep BP neural network prediction model is determined to be 50, and the number of the neurons in the hidden layers is determined to be 200.
A food-borne disease pathogenic factor prediction system based on a BP neural network comprises a data acquisition module, a data storage module, a data normalization module, a data preprocessing module, a neuron training module, a BP neural network prediction module, a WEB server and mobile equipment;
the data acquisition module is used for acquiring food-borne disease data;
the data storage module is used for storing the data acquired by the data acquisition module;
the data normalization module is used for dividing food-borne disease data into a training sample set and a testing sample set and carrying out data normalization;
the data preprocessing module is used for processing data containing null values before training the network;
the neuron training module is used for training a neural network;
the BP neural network prediction module is used for constructing a neural network prediction model;
and the WEB server is used for storing the prediction result data and sending the prediction result to the mobile equipment.
The invention has the beneficial effects that:
the food-borne disease pathogenic factor prediction method and system based on the BP neural network, provided by the invention, have the advantages that a deep BP neural network model is established, the network structure is improved by increasing the number of layers of neural network hidden layers, the network calculation complexity is optimized, an accurate analysis prediction network model of food-borne disease epidemiological pathogenic factors is established, and the execution efficiency and the sensitivity of a discrimination model network for predicting the food-borne disease pathogenic factors are improved by updating the stack in real time through a dynamic migration network with a self-learning function; preprocessing the missing data, reconstructing the data containing the missing items, and analyzing the data to make the data participate in effective network calculation; coding the non-numerical characteristic item data, and converting the text data into numerical data for analysis; determining characteristic items influencing the prediction of pathogenic factors of food-borne diseases, encoding the characteristic items, and determining neurons of an input layer; the method has the advantages that the self-learning and the self-learning of some food-borne disease pathogenic factor samples with small amount are realized, the sample amount is increased through the use in the future, so that the prediction probability of the small sample pathogenic factor at present is increased, and theoretical and practical bases are provided for food-borne disease field data information acquisition, pathogenic factor analysis and prediction, laboratory test support and medical auxiliary diagnosis.
Description of the drawings:
for ease of illustration, the invention is described in detail by the following detailed description and the accompanying drawings.
FIG. 1 is a schematic diagram of a prediction method according to the present invention;
FIG. 2 is a diagram of a deep BP neural network according to the present invention;
FIG. 3 is a schematic model diagram of the deep BP neural network of the present invention;
FIG. 4 is a schematic diagram of ROC curve of deep BP neural network prediction pathogenic factor of the present invention;
FIG. 5 is a diagram of a prediction system according to the present invention.
The specific implementation mode is as follows:
as shown in fig. 1 to 5, the following technical solutions are adopted in the present embodiment: a food-borne disease pathogenic factor prediction method based on a BP neural network comprises the following steps:
s1, collecting and sorting the food-borne disease accident cases, establishing a food-borne disease sample analysis database, and recording the characteristic items contained in each sample;
s2, determining a training set and a test set, and performing attribute selection and neuron definition;
s3, preprocessing the missing data, expressing the null value by NaN, and processing the data containing NaN before training the network in order to construct a deep BP neural network food-borne disease pathogenic factor prediction model with high accuracy;
s4, establishing a deep BP neural network system model, and training a network by using the training set data;
and S5, inputting the data of the test set into the model, and analyzing the sensitivity and specificity.
Wherein the data acquisition in S1 is derived from food-borne disease accident analysis data formed by a disease prevention control center.
Wherein, 2 neurons in the output layer of the neural network in S2 are respectively classified as J7 "pathogenic factor" and J71 ", J7 pathogenic factor is a string-type variable, and J71" classification "is an integer variable, and is respectively" 1 ═ chemical pollutant 2 ═ pathogenic bacterium 3 ═ virus 4 ═ mycotoxin 5 ═ parasite 6 ═ toxic animal 7 ═ toxic plant ".
Wherein, the data in S3
Figure BDA0002717476880000081
Each column is a sample data, each row is a feature item data, the first feature item of the second sample in the A data is missing, NaN in the A is replaced by the mean value of the data of the feature item of the first row without the residual data, a row of identification rows is added, 0 in each identification row represents that the corresponding position of the previous row is NaN data, 1 represents that the corresponding position of the previous row is not NaN,
Figure BDA0002717476880000091
obtained by the above treatment
Figure BDA0002717476880000092
Wherein, the step of S4 comprises the following steps
1) Constructing a food-borne disease pathogenic factor prediction model by adopting a deep BP neural network;
2) modeling a mathematical model based on real sample data;
3) establishing a computer application model;
4) and obtaining a pathogenic factor pre-judgment result based on a computer application model.
The deep BP neural network in the constructed prediction model mainly comprises two stages: a forward propagation stage and an error backward propagation stage;
and a forward propagation stage: the propagation direction is input layer → hidden layer → output layer, and the output value is calculated according to the input value of each layer
Figure BDA0002717476880000093
Then calculating the input value as the next layer by layer to obtain the final outputActual output of layer yi,i=1,2。
The output of the ith node of the input layer is:
Figure BDA0002717476880000094
the output of the first node of the l-1 layer hidden layer is as follows:
Figure BDA0002717476880000095
the output of the first node of the output layer is as follows:
Figure BDA0002717476880000096
and (3) an error back propagation stage: i.e. input data x for training samples(i)I.e. (x)1,x2,....,xp)(i)Output data y(i)I.e. (y)1,y2,...,yq)(i)Where i is 1, the output of the network net for N is fnet(x|W,b)The objective function is:
Figure BDA0002717476880000101
where L (x) is an error function, which may be a mean square error. The goal of the training is to minimize J (W, b; x, y). And returning layer by layer along the reverse direction of the forward propagation stage by adopting a gradient descent method, and correcting the weight and the threshold of each layer of neuron nodes:
Figure BDA0002717476880000102
Figure BDA0002717476880000103
order to
Figure BDA0002717476880000104
Representing the effect of layer one neurons have on the final error, can be derived
Figure BDA0002717476880000105
The error is gradually reduced through the training, and the final output of the modified network is close to the target output value.
In the above model, nlThe number of layer I neurons; l is the number of hidden layers;
xjj is 1,2, and p is the l-th neuron node of the input layer, wherein p is the number of neurons of the input layer;
yiq is the ith neuron node of the output layer, wherein q is the number of neurons of the output layer;
Figure BDA0002717476880000106
representing the weight from the l-th neuron node of the first layer to the l-1 neuron node of the ith layer;
Figure BDA0002717476880000107
a threshold value representing the ith neuron node from the l-th layer to the i-th layer;
f(l)(x) Representing the transfer function of layer I neurons; g (x) represents the transfer function of the output layer.
The mathematical model modeling based on the real sample data is based on the deep neural network principle and the neurons defined in the previous steps, a deep BP neural network is constructed to construct a hypertension prediction model, and the model comprises an input layer, an L-layer hidden layer and an output layer;
an input layer: 96 neuron characteristic attributes are included;
J31","J32","J33","J431","J432","J531","J532","J533","Q","Q11","Q13","Q14","Q15","Q2","Q3","Q4","Q5","Q6","Q7","Q8","Q9","Q10","Q12","Q16","Q17","Q18","Q19","Q20","Q21","Q22","Q23","Q24","Q25","Q26","Z","Z1","Z21","Z22","Z23","Z24","Z3","Z4","Z5","Z111","Z112","Z113","Z114","Z115","Z116","Z117","Z118","Z119","Z1110","Z1111","Z1112","H","H1","H2","H3","X","X1","X2","X3","X4","M","M1","M2","M3","M4","S","S1","S2","S3","S4","S5","S6","S7","S8","S9","S10","S11","S12","S13","S14","S15","S161","S162","S163","S164","S165","P","P1","P2","P3","P4","P5";
hiding the layer: taking L hidden layers, wherein each hidden layer comprises n neurons, and the table is specifically shown in the following table;
Number of neurons and layers in the hidden layer
Figure BDA0002717476880000111
inputting a processing function: 'fixunknowns' is to preprocess the input data;
transfer function: the transfer function is of tan-sigmoid type for the hidden layer odd layer
Figure BDA0002717476880000112
The transfer function of the even layer is log-sigmoid type function
Figure BDA0002717476880000113
The transfer function of the output layer is the softmax function, i.e. for K scalars x1,.....,xK
Figure BDA0002717476880000121
Wherein y is1,...,yKSatisfy the requirement of
Figure BDA0002717476880000122
For the food-borne disease pathogenic factor prediction model of the invention, K is 2, if y1>y2,y1=1,y20, if y1<y2,y1=0,y2=1。
An output layer: 30 neurons, corresponding to 30 different etiologic agents in J70.
Network training function: the method of quantifying conjugate gradients.
The establishing computer application model is used for setting the corresponding parameter compiling program to implement computer simulation modeling by applying a built-in BP neural network function of python according to a mathematical model, establishing a food-borne disease pathogenic factor prediction application model of the deep neural network, training the deep BP neural network by using training set data and storing the trained network net, and testing the net output result by using test set data and analyzing the accuracy, sensitivity and specificity of the net output result.
Wherein in S5, according to a calculation formula:
Figure BDA0002717476880000123
Figure BDA0002717476880000124
Figure BDA0002717476880000125
where Ac represents the Accuracy of the training set (Accuracy), Se represents the Sensitivity of the training set (Sensitivity), Sp represents the Specificity of the training set (Specificity), TAC represents the Accuracy of the test set (Accuracy of the test set), TSe represents the Sensitivity of the test set (Sensitivity of the test set), and TSp represents the Specificity of the test set (Specificity of the test set);
the system performance of a BP neural network food-borne disease pathogenic factor discrimination model can be obtained as shown in the following table
Figure BDA0002717476880000131
A food-borne disease pathogenic factor prediction system based on a BP neural network comprises a data acquisition module, a data storage module, a data normalization module, a data preprocessing module, a neuron training module, a BP neural network prediction module, a WEB server and mobile equipment;
the data acquisition module is used for acquiring food-borne disease data;
the data storage module is used for storing the data acquired by the data acquisition module;
the data normalization module is used for dividing food-borne disease data into a training sample set and a testing sample set and carrying out data normalization;
the data preprocessing module is used for processing data containing null values before training the network;
the neuron training module is used for training a neural network;
the BP neural network prediction module is used for constructing a neural network prediction model;
and the WEB server is used for storing the prediction result data and sending the prediction result to the mobile equipment.
Specifically, the method comprises the following steps: a method and system for predicting pathogenic factor of food-borne disease based on BP neural network includes collecting data, collecting and arranging accident cases of food-borne disease, setting up analysis database of food-borne disease samples, recording characteristic item contained in each sample, determining training set and test set, carrying out attribute selection and neuron definition, preprocessing missing data, expressing null value by NaN, processing data containing NaN before training network, setting up model of deep BP neural network system, training network by using said training set data, inputting data of test set into model, analyzing sensitivity and specificity, setting up model of deep BP neural network, improving network structure by increasing layer number of hidden layer of neural network, optimizing the network computational complexity, establishing an accurate analysis and prediction network model of the food-borne disease epidemiological pathogenic factor, and updating the iteration in real time through a dynamic migration network with a self-learning function, so that the execution efficiency and the sensitivity of a discrimination model network for predicting the food-borne disease pathogenic factor are improved; preprocessing the missing data, reconstructing the data containing the missing items, and analyzing the data to make the data participate in effective network calculation; coding the non-numerical characteristic item data, and converting the text data into numerical data for analysis; determining characteristic items influencing the prediction of pathogenic factors of food-borne diseases, encoding the characteristic items, and determining neurons of an input layer; the method has the advantages that the self-learning and the self-learning of some food-borne disease pathogenic factor samples with small amount are realized, the sample amount is increased through the use in the future, so that the prediction probability of the small sample pathogenic factor at present is increased, and theoretical and practical bases are provided for food-borne disease field data information acquisition, pathogenic factor analysis and prediction, laboratory test support and medical auxiliary diagnosis.
While there have been shown and described what are at present considered to be the fundamental principles of the invention and its essential features and advantages, it will be understood by those skilled in the art that the invention is not limited by the embodiments described above, which are merely illustrative of the principles of the invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.

Claims (10)

1. A food-borne disease pathogenic factor prediction method based on a BP neural network is characterized in that: the prediction method comprises the following steps:
s1, collecting and sorting the food-borne disease accident cases, establishing a food-borne disease sample analysis database, and recording the characteristic items contained in each sample;
s2, determining a training set and a test set, and performing attribute selection and neuron definition;
s3, preprocessing the missing data, expressing the null value by NaN, and processing the data containing NaN before training the network in order to construct a deep BP neural network food-borne disease pathogenic factor prediction model with high accuracy;
s4, establishing a deep BP neural network system model, and training a network by using the training set data;
and S5, inputting the data of the test set into the model, and analyzing the sensitivity and specificity.
2. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: and the data acquisition in the S1 is derived from food-borne disease accident analysis data formed by a disease prevention and control center.
3. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: the 2 neurons in the output layer of the neural network in S2 are classified as J7 "pathogenic factor" and J71 ", respectively, J7 pathogenic factor is a string-type variable, and J71" classified "is a integer variable, and is classified as" 1 ═ chemical pollutant 2 ═ pathogenic bacterium 3 ═ virus 4 ═ mycotoxin 5 ═ parasite 6 ═ toxic animal 7 ═ toxic plant ", respectively.
4. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: data in the S3
Figure FDA0002717476870000011
Each column is a sample data, each row is a feature item data, the first feature item of the second sample in the A data is missing, NaN in the A is replaced by the mean value of the data of the feature item of the first row without the residual data, a row of identification rows is added, 0 in each identification row represents that the corresponding position of the previous row is NaN data, 1 represents that the corresponding position of the previous row is not NaN,
Figure FDA0002717476870000021
obtained by the above treatment
Figure FDA0002717476870000022
5. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: the step of S4 includes the following steps
1) Constructing a food-borne disease pathogenic factor prediction model by adopting a deep BP neural network;
2) modeling a mathematical model based on real sample data;
3) establishing a computer application model;
4) and obtaining a pathogenic factor pre-judgment result based on a computer application model.
6. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: the deep BP neural network in the constructed prediction model mainly comprises two stages: a forward propagation stage and an error backward propagation stage;
and a forward propagation stage: the propagation direction is input layer → hidden layer → output layer, and the output value is calculated according to the input value of each layer
Figure FDA0002717476870000023
Then will be
Figure FDA0002717476870000024
Calculating layer by layer as the input value of the next layer to obtain the actual output y of the final output layeri,i=1,2。
The output of the ith node of the input layer is:
Figure FDA0002717476870000025
the output of the first node of the l-1 layer hidden layer is as follows:
Figure FDA0002717476870000026
the output of the first node of the output layer is as follows:
Figure FDA0002717476870000031
and (3) an error back propagation stage: i.e. input data x for training samples(i)I.e. (x)1,x2,....,xp)(i)Output data y(i)I.e. (y)1,y2,...,yq)(i)Where i is 1, the output of the network net for N is fnet(x|W,b)The objective function is:
Figure FDA0002717476870000032
where L (x) is an error function, which may be a mean square error. The goal of the training is to minimize J (W, b; x, y). And returning layer by layer along the reverse direction of the forward propagation stage by adopting a gradient descent method, and correcting the weight and the threshold of each layer of neuron nodes:
Figure FDA0002717476870000033
Figure FDA0002717476870000034
order to
Figure FDA0002717476870000035
Representing the effect of layer one neurons have on the final error, can be derived
Figure FDA0002717476870000036
The error is gradually reduced through the training, and the final output of the modified network is close to the target output value.
In the above model, nlThe number of layer I neurons; l is the number of hidden layers;
xj,j=1,2..., p is the l neuron node of the input layer, wherein p is the number of neurons in the input layer;
yiq is the ith neuron node of the output layer, wherein q is the number of neurons of the output layer;
Figure FDA0002717476870000037
representing the weight from the l-th neuron node of the first layer to the l-1 neuron node of the ith layer;
Figure FDA0002717476870000038
a threshold value representing the ith neuron node from the l-th layer to the i-th layer;
f(l)(x) Representing the transfer function of layer I neurons; g (x) represents the transfer function of the output layer.
7. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: the mathematical model modeling based on the real sample data is to construct a deep BP neural network to construct a hypertension prediction model based on the deep neural network principle and the neurons defined in the previous steps, wherein the model comprises an input layer, an L-layer hidden layer and an output layer;
an input layer: 96 neuron characteristic attributes are included;
hiding the layer: taking L hidden layers, wherein each hidden layer comprises n neurons;
inputting a processing function: 'fixunknowns' is to preprocess the input data;
transfer function: the transfer function is of tan-sigmoid type for the hidden layer odd layer
Figure FDA0002717476870000041
The transfer function of the even layer is log-sigmoid type function
Figure FDA0002717476870000042
Transfer function of output layerSeveral softmax functions, i.e. for K scalars x1,.....,xK
Figure FDA0002717476870000043
Wherein y is1,...,yKSatisfy the requirement of
Figure FDA0002717476870000044
For the food-borne disease pathogenic factor prediction model of the invention, K is 2, if y1>y2,y1=1,y20, if y1<y2,y1=0,y2=1。
An output layer: 30 neurons, corresponding to 30 different etiologic agents in J70.
Network training function: the method of quantifying conjugate gradients.
8. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: the establishment of the computer application model is implemented by setting the corresponding parameter compiling program by using a built-in BP neural network function of python according to the mathematical model to implement computer simulation modeling, establishing a food-borne disease pathogenic factor prediction application model of the deep neural network, training the deep BP neural network by using training set data, storing the trained network net, testing the net output result by using test set data and analyzing the accuracy, sensitivity and specificity of the net output result.
9. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: in S5, according to the calculation formula:
Figure FDA0002717476870000051
Figure FDA0002717476870000052
Figure FDA0002717476870000053
where Ac represents the Accuracy of the training set (Accuracy), Se represents the Sensitivity of the training set (Sensitivity), Sp represents the Specificity of the training set (Specificity), TAC represents the Accuracy of the test set (Accuracy of the test set), TSe represents the Sensitivity of the test set (Sensitivity of the test set), and TSp represents the Specificity of the test set (Specificity of the test set);
the system performance of a BP neural network food-borne disease pathogenic factor discrimination model can be obtained.
10. A food-borne disease pathogenic factor prediction system based on a BP neural network is characterized in that: the system comprises a data acquisition module, a data storage module, a data normalization module, a data preprocessing module, a neuron training module, a BP neural network prediction module, a WEB server and mobile equipment;
the data acquisition module is used for acquiring food-borne disease data;
the data storage module is used for storing the data acquired by the data acquisition module;
the data normalization module is used for dividing food-borne disease data into a training sample set and a testing sample set and carrying out data normalization;
the data preprocessing module is used for processing data containing null values before training the network;
the neuron training module is used for training a neural network;
the BP neural network prediction module is used for constructing a neural network prediction model;
and the WEB server is used for storing the prediction result data and sending the prediction result to the mobile equipment.
CN202011076959.5A 2020-10-10 2020-10-10 Food-borne disease pathogenic factor prediction method and system based on BP neural network Pending CN112216399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011076959.5A CN112216399A (en) 2020-10-10 2020-10-10 Food-borne disease pathogenic factor prediction method and system based on BP neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011076959.5A CN112216399A (en) 2020-10-10 2020-10-10 Food-borne disease pathogenic factor prediction method and system based on BP neural network

Publications (1)

Publication Number Publication Date
CN112216399A true CN112216399A (en) 2021-01-12

Family

ID=74053021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011076959.5A Pending CN112216399A (en) 2020-10-10 2020-10-10 Food-borne disease pathogenic factor prediction method and system based on BP neural network

Country Status (1)

Country Link
CN (1) CN112216399A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205055A (en) * 2021-05-11 2021-08-03 北京知见生命科技有限公司 Fungus microscopic image classification method and system based on multi-scale attention mechanism
CN115115260A (en) * 2022-07-19 2022-09-27 东南大学溧阳研究院 Quantitative analysis method for social electric influence caused by emergency based on BP neural network

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1881227A (en) * 2006-05-16 2006-12-20 中国人民解放军第三军医大学 Intelligent analytical model technology for diagnosing epidemic situation and classifying harmfulness degree of contagious disease
CN104597753A (en) * 2014-12-19 2015-05-06 徐工集团工程机械股份有限公司道路机械分公司 Method and device for intelligently controlling asphalt and crushed stone spreading of synchronous chip sealer
CN106096286A (en) * 2016-06-15 2016-11-09 北京千安哲信息技术有限公司 Clinical path formulating method and device
CN106874581A (en) * 2016-12-30 2017-06-20 浙江大学 A kind of energy consumption of air conditioning system in buildings Forecasting Methodology based on BP neural network model
CN107358294A (en) * 2017-07-21 2017-11-17 河北工程大学 A kind of water demand prediction method based on Elman neutral nets
CN107506590A (en) * 2017-08-26 2017-12-22 郑州大学 A kind of angiocardiopathy forecast model based on improvement depth belief network
CN109492287A (en) * 2018-10-30 2019-03-19 成都云材智慧数据科技有限公司 A kind of solid electrolyte ionic conductivity prediction technique based on BP neural network
CN110087207A (en) * 2019-05-05 2019-08-02 江南大学 Wireless sensor network missing data method for reconstructing
CN110322014A (en) * 2019-07-10 2019-10-11 燕山大学 A kind of finished cement specific surface area prediction technique based on BP neural network
CN110610209A (en) * 2019-09-16 2019-12-24 北京邮电大学 Air quality prediction method and system based on data mining
CN111047073A (en) * 2019-11-14 2020-04-21 佛山科学技术学院 Neural network-based aquaculture water quality prediction method and system
CN111489046A (en) * 2019-01-29 2020-08-04 广东省公共卫生研究院 Regional food safety evaluation model based on supply chain and BP neural network
CN111681764A (en) * 2020-06-11 2020-09-18 西南大学 BP neural network-based children respiratory system disease incidence prediction method

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1881227A (en) * 2006-05-16 2006-12-20 中国人民解放军第三军医大学 Intelligent analytical model technology for diagnosing epidemic situation and classifying harmfulness degree of contagious disease
CN104597753A (en) * 2014-12-19 2015-05-06 徐工集团工程机械股份有限公司道路机械分公司 Method and device for intelligently controlling asphalt and crushed stone spreading of synchronous chip sealer
CN106096286A (en) * 2016-06-15 2016-11-09 北京千安哲信息技术有限公司 Clinical path formulating method and device
CN106874581A (en) * 2016-12-30 2017-06-20 浙江大学 A kind of energy consumption of air conditioning system in buildings Forecasting Methodology based on BP neural network model
CN107358294A (en) * 2017-07-21 2017-11-17 河北工程大学 A kind of water demand prediction method based on Elman neutral nets
CN107506590A (en) * 2017-08-26 2017-12-22 郑州大学 A kind of angiocardiopathy forecast model based on improvement depth belief network
CN109492287A (en) * 2018-10-30 2019-03-19 成都云材智慧数据科技有限公司 A kind of solid electrolyte ionic conductivity prediction technique based on BP neural network
CN111489046A (en) * 2019-01-29 2020-08-04 广东省公共卫生研究院 Regional food safety evaluation model based on supply chain and BP neural network
CN110087207A (en) * 2019-05-05 2019-08-02 江南大学 Wireless sensor network missing data method for reconstructing
CN110322014A (en) * 2019-07-10 2019-10-11 燕山大学 A kind of finished cement specific surface area prediction technique based on BP neural network
CN110610209A (en) * 2019-09-16 2019-12-24 北京邮电大学 Air quality prediction method and system based on data mining
CN111047073A (en) * 2019-11-14 2020-04-21 佛山科学技术学院 Neural network-based aquaculture water quality prediction method and system
CN111681764A (en) * 2020-06-11 2020-09-18 西南大学 BP neural network-based children respiratory system disease incidence prediction method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205055A (en) * 2021-05-11 2021-08-03 北京知见生命科技有限公司 Fungus microscopic image classification method and system based on multi-scale attention mechanism
CN115115260A (en) * 2022-07-19 2022-09-27 东南大学溧阳研究院 Quantitative analysis method for social electric influence caused by emergency based on BP neural network
CN115115260B (en) * 2022-07-19 2023-04-07 东南大学溧阳研究院 Quantitative analysis method for social electric influence caused by emergency based on BP neural network

Similar Documents

Publication Publication Date Title
US11650968B2 (en) Systems and methods for predictive early stopping in neural network training
CN112216399A (en) Food-borne disease pathogenic factor prediction method and system based on BP neural network
US11636026B2 (en) Computer program for performance testing of models
CN113255882A (en) Bearing fault diagnosis method based on improved convolution capsule network
Graening et al. Shape mining: A holistic data mining approach for engineering design
CN113096810A (en) Survival risk prediction method for esophageal squamous carcinoma patient based on convolutional neural network
Alsubai et al. Heart failure detection using instance quantum circuit approach and traditional predictive analysis
US20220269991A1 (en) Evaluating reliability of artificial intelligence
JP7101349B1 (en) Classification system
Shobana et al. Plant disease detection using convolution neural network
Padimi et al. Applying Machine Learning Techniques To Maximize The Performance of Loan Default Prediction
Carratú et al. A novel IVS procedure for handling Big Data with Artificial Neural Networks
Degeest et al. Feature ranking in changing environments where new features are introduced
Dananjaya et al. Decision support system for classification of early childhood diseases using principal component analysis and k-nearest neighbors classifier
Elfadl et al. Using discriminant analysis and artificial neural network models for classification and prediction of fertility status of Friesian cattle
Yakub et al. An Integrated Approach Based on Artificial Intelligence Using Anfis and Ann for Multiple Criteria Real Estate Price Prediction
CN111738410A (en) Beef cattle individual growth curve acquisition method and device and storage medium
Zhu et al. Bayesian functional data modeling for heterogeneous volatility
Barberà-Mariné et al. Classifying Spanish mutual funds according to their survival capacity using SOM
US11315352B2 (en) Calculating the precision of image annotations
Madhavi et al. Early Discovery of Chronic Kidney Disease by Attributing Missing Values
Hansen et al. Some experimental evidence on the performance of GA-designed neural networks
CN117235673B (en) Cell culture prediction method and device, electronic equipment and storage medium
Prasad et al. Machine Learning for Identification of Immedicable Renal Disease
US20230092949A1 (en) System and method for estimating model metrics without labels

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination