CN112216399A

CN112216399A - Food-borne disease pathogenic factor prediction method and system based on BP neural network

Info

Publication number: CN112216399A
Application number: CN202011076959.5A
Authority: CN
Inventors: 高飞; 张剑峰; 刘忠卫; 闫军
Original assignee: Heilongjiang Center For Disease Control And Prevention
Current assignee: Heilongjiang Center For Disease Control And Prevention
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2021-01-12

Abstract

The invention discloses a food-borne disease pathogenic factor prediction method based on a BP neural network, which comprises the following steps: s1, collecting and sorting the food-borne disease accident cases, establishing a food-borne disease sample analysis database, and recording the characteristic items contained in each sample; s2, determining a training set and a test set, and performing attribute selection and neuron definition; has the advantages that: establishing a deep BP neural network model, improving a network structure by increasing the number of hidden layers of the neural network, optimizing the network computation complexity, establishing an accurate analysis and prediction network model of the food-borne disease epidemiological pathogenic factors, and updating the stack in real time through a dynamic migration network with a self-learning function, so that the execution efficiency and the sensitivity of a discrimination model network for predicting the food-borne disease pathogenic factors are improved; and preprocessing the missing data, reconstructing the data containing the missing items, and analyzing the data to make the data participate in effective network calculation.

Description

Food-borne disease pathogenic factor prediction method and system based on BP neural network

The technical field is as follows:

the invention belongs to the technical field of pathogenic factor prediction, and particularly relates to a method and a system for predicting pathogenic factors of food-borne diseases based on a BP neural network.

Background art:

food-borne diseases (foodborn diseases) are defined as "diseases caused by ingestion of contaminated food by the human body, including a wide range of diseases caused by parasites, chemicals and pathogenic bacteria which contaminate food at different stages during the production and preparation of food". Food-borne diseases cause great harm to public health, the healthy development of the food industry and social stability, and food-borne disease monitoring work is carried out in many countries at present. The food-borne disease monitoring system aims at identifying and controlling outbreak of food-borne diseases and analyzing and judging pathogenic factors; identifying susceptible population, high risk food and poor food handling procedures; the food-borne transmission way of specific pathogenic bacteria is determined; evaluating the influence of food-borne diseases and reducing the harm of the food-borne diseases; making a food safety assessment plan; researching the food-borne disease outbreak traceability and early warning strategy measures, in the aspect of realizing the calculation analysis prejudgment algorithm of the food-borne disease pathogenic factor, Logistic regression belongs to the traditional statistical method, the process is easy to understand, but the requirement on input data is higher, and each characteristic item of the analyzed data is independent; the decision tree algorithm has no independence requirement on input data, but the stability is poor and the processing of missing data is difficult; the Bayesian discriminant analysis method is applied to the food-borne disease pathogenic factor to predict the stability of the pathogenic factor, but the accuracy is low; the SVM algorithm is sensitive to missing data, so the invention provides a food-borne disease pathogenic factor prediction method and system based on a BP neural network to solve the problems.

The invention content is as follows:

the invention aims to solve the problems, and provides a method and a system for predicting pathogenic factors of food-borne diseases based on a BP neural network, which solve the problems of attribute selection and neuron accurate definition under the condition of the existing small sample sparse data and the defects of low prediction accuracy, sensitivity and specificity of pathogenic factors of food-borne diseases under the condition of a large amount of missing data.

In order to solve the above problems, the present invention provides a technical solution:

a food-borne disease pathogenic factor prediction method based on a BP neural network comprises the following steps:

s1, collecting and sorting the food-borne disease accident cases, establishing a food-borne disease sample analysis database, and recording the characteristic items contained in each sample;

s2, determining a training set and a test set, and performing attribute selection and neuron definition;

s3, preprocessing the missing data, expressing the null value by NaN, and processing the data containing NaN before training the network in order to construct a deep BP neural network food-borne disease pathogenic factor prediction model with high accuracy;

s4, establishing a deep BP neural network system model, and training a network by using the training set data;

and S5, inputting the data of the test set into the model, and analyzing the sensitivity and specificity.

Preferably, the data acquisition in S1 is derived from food-borne disease accident analysis data generated by the center for disease prevention and control.

Preferably, the 2 neurons in the output layer of the neural network in S2 are classified by J7 "pathogenic factor" and J71 ", J7 pathogenic factor is a string-type variable, and J71" classification "is an integer-type variable, and are classified by" 1 ═ chemical pollutant 2 ═ pathogenic bacterium 3 ═ virus 4 ═ mycotoxin 5 ═ parasite 6 ═ toxic animal 7 ═ toxic plant ";

it can be seen that the sample set had only one mycotoxin, which was "mold", without parasitic pathogens, mainly dominated by pathogens, and the remaining 5 categories could then be subdivided according to J7. The subdivision criterion is that if the number of samples of the same type in J7 is more than or equal to 4, the samples are divided into one type separately, otherwise, the samples are not divided. Then add the J70 attributes, which are:

0 ═ carbamates, 1 ═ general chemical pollutants, 2 ═ clenbuterol, 3 ═ barium salts, 4 ═ nitrites, 5 ═ organophosphates, 6 ═ spasmodics, 7 ═ histamine, 8 ═ proteus, 9 ═ general pathogenic bacteria, 10 ═ escherichia coli, 11 ═ escherichia coli, 12 ═ citrobacter freundii, 13 ═ vibrio parahaemolyticus, 14 ═ staphylococcus aureus, 15 ═ bacillus cereus, 16 ═ salmonella, 17 ═ enterobacter cloacae, 18 ═ shigella, 19 ═ norovirus, 20 ═ mould, 21 ═ general toxic animals, 22 ═ tetrodotoxin, 23 ═ general toxic plants, 24 ═ kidney bean, 25 ═ colatole, 26 ═ geloninum, 27 ═ 28, 28 ═ gyrocarpus, 29 ═ total theophylline, 30 ═ calamine

Meanwhile, if two or three pathogenic agents, such as "proteus, vibrio parahaemolyticus, pathogenic escherichia coli", are simultaneously present in J7, they are regarded as two or three different samples. The "unexplained cause" sample is discarded, at which point the sample size is 849.

For the food-borne disease causative agent prediction model herein, K ═ 30.

An output layer: 30 neurons, corresponding to 30 different etiologic agents in J70.

Preferably, the data in S3

Each column is a sample data, each row is a feature item data, the first feature item of the second sample in the A data is missing, NaN in the A is replaced by the mean value of the data of the feature item of the first row without the residual data, a row of identification rows is added, 0 in each identification row represents that the corresponding position of the previous row is NaN data, 1 represents that the corresponding position of the previous row is not NaN,

obtained by the above treatment

Preferably, S4 includes the following steps

1) Constructing a food-borne disease pathogenic factor prediction model by adopting a deep BP neural network;

2) modeling a mathematical model based on real sample data;

3) establishing a computer application model;

4) and obtaining a pathogenic factor pre-judgment result based on a computer application model.

Preferably, the deep BP neural network in the constructed prediction model mainly comprises two stages: a forward propagation stage and an error backward propagation stage;

and a forward propagation stage: the propagation direction is input layer → hidden layer → output layer, and the output value is calculated according to the input value of each layer

Calculating the input value as the next layer by layer to obtain the actual output y of the final output layer_i，i＝1,2。

The output of the ith node of the input layer is:

the output of the first node of the l-1 layer hidden layer is as follows:

the output of the first node of the output layer is as follows:

and (3) an error back propagation stage: i.e. input data x for training samples⁽ⁱ⁾I.e. (x)₁,x₂,....,x_p)⁽ⁱ⁾Output data y⁽ⁱ⁾I.e. (y)₁,y₂,...,y_q)⁽ⁱ⁾Where i is 1, the output of the network net for N is f_net(x|W,b)Object letterThe number is as follows:

where L (x) is an error function, which may be a mean square error. The goal of the training is to minimize J (W, b; x, y). And returning layer by layer along the reverse direction of the forward propagation stage by adopting a gradient descent method, and correcting the weight and the threshold of each layer of neuron nodes:

order to

Representing the effect of layer one neurons have on the final error, can be derived

The error is gradually reduced through the training, and the final output of the modified network is close to the target output value.

In the above model, n^lThe number of layer I neurons; l is the number of hidden layers;

x_jj is 1,2, and p is the l-th neuron node of the input layer, wherein p is the number of neurons of the input layer;

y_iq is the ith neuron node of the output layer, wherein q is the number of neurons of the output layer;

representing the weight from the l-th neuron node of the first layer to the l-1 neuron node of the ith layer;

a threshold value representing the ith neuron node from the l-th layer to the i-th layer;

f_(l)(x) Representing the transfer function of layer I neurons; g (x) represents the transfer function of the output layer.

Preferably, the mathematical model modeling based on the real sample data is to construct a deep BP neural network to construct a hypertension prediction model based on the deep neural network principle and the neurons defined in the previous steps, and the model comprises an input layer, an L-layer hidden layer and an output layer;

an input layer: 96 neuron characteristic attributes are included;

hiding the layer: taking L hidden layers, wherein each hidden layer comprises n neurons.

Inputting a processing function: 'fixunknowns' is to preprocess the input data;

transfer function: the transfer function is of tan-sigmoid type for the hidden layer odd layer

The transfer function of the even layer is log-sigmoid type function

The transfer function of the output layer is the softmax function, i.e. for K scalars x₁,.....,x_K，

Wherein y is₁,...,y_KSatisfy the requirement of

For the food-borne disease pathogenic factor prediction model of the invention, K is 2, if y₁＞y₂，y₁＝1,y₂0, if y₁＜y₂，y₁＝0,y₂＝1。

Network training function: the method of quantifying conjugate gradients.

Preferably, the computer application model is established by setting the corresponding parameter writing program according to a mathematical model and a built-in BP neural network function of python to implement computer simulation modeling, a food-borne disease pathogenic factor prediction application model of the deep neural network is established, the deep BP neural network is trained by using training set data and the trained network net is stored, and the output result of the net is tested by using test set data and the accuracy, sensitivity and specificity of the net are analyzed.

Preferably, in S5, according to the calculation formula:

where Ac represents the Accuracy of the training set (Accuracy), Se represents the Sensitivity of the training set (Sensitivity), Sp represents the Specificity of the training set (Specificity), TAC represents the Accuracy of the test set (Accuracy of the test set), TSe represents the Sensitivity of the test set (Sensitivity of the test set), and TSp represents the Specificity of the test set (Specificity of the test set).

The system performance of a BP neural network food-borne disease pathogenic factor discrimination model can be obtained.

According to the ROC curve of pathogenic factors predicted by the deep BP neural network, 50 hidden layers can be seen, the deep BP neural network prediction model with 200 neurons in each layer is the best for prediction effect, the number of the hidden layers of the deep BP neural network prediction model is determined to be 50, and the number of the neurons in the hidden layers is determined to be 200.

A food-borne disease pathogenic factor prediction system based on a BP neural network comprises a data acquisition module, a data storage module, a data normalization module, a data preprocessing module, a neuron training module, a BP neural network prediction module, a WEB server and mobile equipment;

the data acquisition module is used for acquiring food-borne disease data;

the data storage module is used for storing the data acquired by the data acquisition module;

the data normalization module is used for dividing food-borne disease data into a training sample set and a testing sample set and carrying out data normalization;

the data preprocessing module is used for processing data containing null values before training the network;

the neuron training module is used for training a neural network;

the BP neural network prediction module is used for constructing a neural network prediction model;

and the WEB server is used for storing the prediction result data and sending the prediction result to the mobile equipment.

The invention has the beneficial effects that:

the food-borne disease pathogenic factor prediction method and system based on the BP neural network, provided by the invention, have the advantages that a deep BP neural network model is established, the network structure is improved by increasing the number of layers of neural network hidden layers, the network calculation complexity is optimized, an accurate analysis prediction network model of food-borne disease epidemiological pathogenic factors is established, and the execution efficiency and the sensitivity of a discrimination model network for predicting the food-borne disease pathogenic factors are improved by updating the stack in real time through a dynamic migration network with a self-learning function; preprocessing the missing data, reconstructing the data containing the missing items, and analyzing the data to make the data participate in effective network calculation; coding the non-numerical characteristic item data, and converting the text data into numerical data for analysis; determining characteristic items influencing the prediction of pathogenic factors of food-borne diseases, encoding the characteristic items, and determining neurons of an input layer; the method has the advantages that the self-learning and the self-learning of some food-borne disease pathogenic factor samples with small amount are realized, the sample amount is increased through the use in the future, so that the prediction probability of the small sample pathogenic factor at present is increased, and theoretical and practical bases are provided for food-borne disease field data information acquisition, pathogenic factor analysis and prediction, laboratory test support and medical auxiliary diagnosis.

Description of the drawings:

for ease of illustration, the invention is described in detail by the following detailed description and the accompanying drawings.

FIG. 1 is a schematic diagram of a prediction method according to the present invention;

FIG. 2 is a diagram of a deep BP neural network according to the present invention;

FIG. 3 is a schematic model diagram of the deep BP neural network of the present invention;

FIG. 4 is a schematic diagram of ROC curve of deep BP neural network prediction pathogenic factor of the present invention;

FIG. 5 is a diagram of a prediction system according to the present invention.

The specific implementation mode is as follows:

as shown in fig. 1 to 5, the following technical solutions are adopted in the present embodiment: a food-borne disease pathogenic factor prediction method based on a BP neural network comprises the following steps:

Wherein the data acquisition in S1 is derived from food-borne disease accident analysis data formed by a disease prevention control center.

Wherein, 2 neurons in the output layer of the neural network in S2 are respectively classified as J7 "pathogenic factor" and J71 ", J7 pathogenic factor is a string-type variable, and J71" classification "is an integer variable, and is respectively" 1 ═ chemical pollutant 2 ═ pathogenic bacterium 3 ═ virus 4 ═ mycotoxin 5 ═ parasite 6 ═ toxic animal 7 ═ toxic plant ".

Wherein, the data in S3

obtained by the above treatment

Wherein, the step of S4 comprises the following steps

2) modeling a mathematical model based on real sample data;

3) establishing a computer application model;

The deep BP neural network in the constructed prediction model mainly comprises two stages: a forward propagation stage and an error backward propagation stage;

Then calculating the input value as the next layer by layer to obtain the final outputActual output of layer y_i，i＝1,2。

The output of the ith node of the input layer is:

the output of the first node of the l-1 layer hidden layer is as follows:

the output of the first node of the output layer is as follows:

and (3) an error back propagation stage: i.e. input data x for training samples⁽ⁱ⁾I.e. (x)₁,x₂,....,x_p)⁽ⁱ⁾Output data y⁽ⁱ⁾I.e. (y)₁,y₂,...,y_q)⁽ⁱ⁾Where i is 1, the output of the network net for N is f_net(x|W,b)The objective function is:

order to

The mathematical model modeling based on the real sample data is based on the deep neural network principle and the neurons defined in the previous steps, a deep BP neural network is constructed to construct a hypertension prediction model, and the model comprises an input layer, an L-layer hidden layer and an output layer;

an input layer: 96 neuron characteristic attributes are included;

J31","J32","J33","J431","J432","J531","J532","J533","Q","Q11","Q13","Q14","Q15","Q2","Q3","Q4","Q5","Q6","Q7","Q8","Q9","Q10","Q12","Q16","Q17","Q18","Q19","Q20","Q21","Q22","Q23","Q24","Q25","Q26","Z","Z1","Z21","Z22","Z23","Z24","Z3","Z4","Z5","Z111","Z112","Z113","Z114","Z115","Z116","Z117","Z118","Z119","Z1110","Z1111","Z1112","H","H1","H2","H3","X","X1","X2","X3","X4","M","M1","M2","M3","M4","S","S1","S2","S3","S4","S5","S6","S7","S8","S9","S10","S11","S12","S13","S14","S15","S161","S162","S163","S164","S165","P","P1","P2","P3","P4","P5"；

hiding the layer: taking L hidden layers, wherein each hidden layer comprises n neurons, and the table is specifically shown in the following table;

Number of neurons and layers in the hidden layer

inputting a processing function: 'fixunknowns' is to preprocess the input data;

The transfer function of the even layer is log-sigmoid type function

Wherein y is₁,...,y_KSatisfy the requirement of

Network training function: the method of quantifying conjugate gradients.

The establishing computer application model is used for setting the corresponding parameter compiling program to implement computer simulation modeling by applying a built-in BP neural network function of python according to a mathematical model, establishing a food-borne disease pathogenic factor prediction application model of the deep neural network, training the deep BP neural network by using training set data and storing the trained network net, and testing the net output result by using test set data and analyzing the accuracy, sensitivity and specificity of the net output result.

Wherein in S5, according to a calculation formula:

where Ac represents the Accuracy of the training set (Accuracy), Se represents the Sensitivity of the training set (Sensitivity), Sp represents the Specificity of the training set (Specificity), TAC represents the Accuracy of the test set (Accuracy of the test set), TSe represents the Sensitivity of the test set (Sensitivity of the test set), and TSp represents the Specificity of the test set (Specificity of the test set);

the system performance of a BP neural network food-borne disease pathogenic factor discrimination model can be obtained as shown in the following table

the data acquisition module is used for acquiring food-borne disease data;

the neuron training module is used for training a neural network;

Specifically, the method comprises the following steps: a method and system for predicting pathogenic factor of food-borne disease based on BP neural network includes collecting data, collecting and arranging accident cases of food-borne disease, setting up analysis database of food-borne disease samples, recording characteristic item contained in each sample, determining training set and test set, carrying out attribute selection and neuron definition, preprocessing missing data, expressing null value by NaN, processing data containing NaN before training network, setting up model of deep BP neural network system, training network by using said training set data, inputting data of test set into model, analyzing sensitivity and specificity, setting up model of deep BP neural network, improving network structure by increasing layer number of hidden layer of neural network, optimizing the network computational complexity, establishing an accurate analysis and prediction network model of the food-borne disease epidemiological pathogenic factor, and updating the iteration in real time through a dynamic migration network with a self-learning function, so that the execution efficiency and the sensitivity of a discrimination model network for predicting the food-borne disease pathogenic factor are improved; preprocessing the missing data, reconstructing the data containing the missing items, and analyzing the data to make the data participate in effective network calculation; coding the non-numerical characteristic item data, and converting the text data into numerical data for analysis; determining characteristic items influencing the prediction of pathogenic factors of food-borne diseases, encoding the characteristic items, and determining neurons of an input layer; the method has the advantages that the self-learning and the self-learning of some food-borne disease pathogenic factor samples with small amount are realized, the sample amount is increased through the use in the future, so that the prediction probability of the small sample pathogenic factor at present is increased, and theoretical and practical bases are provided for food-borne disease field data information acquisition, pathogenic factor analysis and prediction, laboratory test support and medical auxiliary diagnosis.

While there have been shown and described what are at present considered to be the fundamental principles of the invention and its essential features and advantages, it will be understood by those skilled in the art that the invention is not limited by the embodiments described above, which are merely illustrative of the principles of the invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.

Claims

1. A food-borne disease pathogenic factor prediction method based on a BP neural network is characterized in that: the prediction method comprises the following steps:

2. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: and the data acquisition in the S1 is derived from food-borne disease accident analysis data formed by a disease prevention and control center.

3. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: the 2 neurons in the output layer of the neural network in S2 are classified as J7 "pathogenic factor" and J71 ", respectively, J7 pathogenic factor is a string-type variable, and J71" classified "is a integer variable, and is classified as" 1 ═ chemical pollutant 2 ═ pathogenic bacterium 3 ═ virus 4 ═ mycotoxin 5 ═ parasite 6 ═ toxic animal 7 ═ toxic plant ", respectively.

4. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: data in the S3

obtained by the above treatment

5. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 1, wherein: the step of S4 includes the following steps

2) modeling a mathematical model based on real sample data;

3) establishing a computer application model;

6. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: the deep BP neural network in the constructed prediction model mainly comprises two stages: a forward propagation stage and an error backward propagation stage;

Then will be

Calculating layer by layer as the input value of the next layer to obtain the actual output y of the final output layer_i，i＝1,2。

The output of the ith node of the input layer is:

the output of the first node of the l-1 layer hidden layer is as follows:

the output of the first node of the output layer is as follows:

order to

x_j,j＝1,2..., p is the l neuron node of the input layer, wherein p is the number of neurons in the input layer;

7. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: the mathematical model modeling based on the real sample data is to construct a deep BP neural network to construct a hypertension prediction model based on the deep neural network principle and the neurons defined in the previous steps, wherein the model comprises an input layer, an L-layer hidden layer and an output layer;

an input layer: 96 neuron characteristic attributes are included;

hiding the layer: taking L hidden layers, wherein each hidden layer comprises n neurons;

inputting a processing function: 'fixunknowns' is to preprocess the input data;

The transfer function of the even layer is log-sigmoid type function

Transfer function of output layerSeveral softmax functions, i.e. for K scalars x₁,.....,x_K，

Wherein y is₁,...,y_KSatisfy the requirement of

Network training function: the method of quantifying conjugate gradients.

8. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: the establishment of the computer application model is implemented by setting the corresponding parameter compiling program by using a built-in BP neural network function of python according to the mathematical model to implement computer simulation modeling, establishing a food-borne disease pathogenic factor prediction application model of the deep neural network, training the deep BP neural network by using training set data, storing the trained network net, testing the net output result by using test set data and analyzing the accuracy, sensitivity and specificity of the net output result.

9. The method for predicting pathogenic factors of food-borne diseases based on the BP neural network as claimed in claim 5, wherein: in S5, according to the calculation formula:

10. A food-borne disease pathogenic factor prediction system based on a BP neural network is characterized in that: the system comprises a data acquisition module, a data storage module, a data normalization module, a data preprocessing module, a neuron training module, a BP neural network prediction module, a WEB server and mobile equipment;

the data acquisition module is used for acquiring food-borne disease data;

the neuron training module is used for training a neural network;