CN112017771B - Method and system for constructing disease prediction model based on semen routine inspection data - Google Patents

Method and system for constructing disease prediction model based on semen routine inspection data Download PDF

Info

Publication number
CN112017771B
CN112017771B CN202010900071.2A CN202010900071A CN112017771B CN 112017771 B CN112017771 B CN 112017771B CN 202010900071 A CN202010900071 A CN 202010900071A CN 112017771 B CN112017771 B CN 112017771B
Authority
CN
China
Prior art keywords
data
semen
disease
knowledge base
sample set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010900071.2A
Other languages
Chinese (zh)
Other versions
CN112017771A (en
Inventor
杜乐
杜登斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuzheng Intelligent Technology Beijing Co ltd
Original Assignee
Wuzheng Intelligent Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuzheng Intelligent Technology Beijing Co ltd filed Critical Wuzheng Intelligent Technology Beijing Co ltd
Priority to CN202010900071.2A priority Critical patent/CN112017771B/en
Publication of CN112017771A publication Critical patent/CN112017771A/en
Application granted granted Critical
Publication of CN112017771B publication Critical patent/CN112017771B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a method and a system for constructing a disease prediction model based on semen routine inspection data, wherein the method comprises the following steps: acquiring semen biochemical examination data, immunological examination data and vital sign information of a sample crowd to form a first sample set; performing data cleaning and standardization on the first sample set according to a disease knowledge base corresponding to semen routine inspection data to form a second sample set; dividing the second sample set into a training set and a verification set, and then taking the training set as the input of a radial basis function neural network; and training the radial basis function neural network until the deviation between the output value and the true value is lower than a threshold value, and obtaining a disease prediction model. The invention builds a machine learning model by utilizing Radial Basis Functions (RBFs) based on a sample set built by multiple data sources so as to predict related diseases, can be used for basic doctors to learn and reference, is convenient for early self-check and prevention of patients, and has certain popularization and application values.

Description

Method and system for constructing disease prediction model based on semen routine inspection data
Technical Field
The invention relates to the technical fields of intelligent medical treatment and medical information, relates to a method and a system for constructing a disease prediction model, and particularly relates to a method and a system for constructing a disease prediction model based on semen routine inspection data.
Background
Semen consists of sperm and seminal plasma, wherein the sperm accounts for 10 percent, and the rest is seminal plasma. It contains various enzymes and inorganic salts in addition to water, fructose, proteins and fats. Semen routine examination is primarily a preliminary laboratory examination of the volume, nature and function of semen. The content includes semen volume, color, viscosity, liquefaction time, sperm count, sperm motility, sperm morphology, semen cell examination, etc. Is mainly used for diagnosing male reproductive capacity and reproductive system diseases.
Immunological examination can determine whether autoimmune and chromosomal karyotyping is present and whether chromosomal abnormalities are present. Determination of serum FSH (follicle stimulating hormone), LH (luteinizing hormone), T (testosterone), PRL (prolactin) are important methods for oligospermia examination and also help to distinguish between primary or secondary testicular failure.
The existing diagnosis of the semen related diseases needs to rely on doctors and multiple examinations with abundant experience and strong professional ability to make an accurate diagnosis and treatment scheme. In the context of shortage of medical resources, a person to be tested or a patient usually needs to go through a period of examination and waiting time to obtain all examination results, so that uncertainty exists in the timeliness of examination data, thereby delaying the optimal diagnosis time of the patient and even causing misdiagnosis, and bringing mental loss and economic loss to the patient.
On the other hand, the medical services provided by the medical equipment resources and the professional ability of basic medical staff are limited by the shortage of basic medical institutions, and cannot meet the demands of the masses.
Disclosure of Invention
In order to relieve medical resource tension and physical examination pressure of basic medical institutions, facilitate self-checking prevention of patients and study and reference of basic doctors, the invention provides a method for constructing a disease prediction model based on semen routine examination data, which comprises the following steps: acquiring semen biochemical examination data, immunological examination data and vital sign information of a sample crowd to form a first sample set; performing data cleaning and standardization on the first sample set according to a disease knowledge base corresponding to semen routine inspection data to form a second sample set; dividing the second sample set into a training set and a verification set, and then taking the training set as the input of a radial basis function neural network; and training the radial basis function neural network until the deviation between the output value and the true value is lower than a threshold value, and obtaining a disease prediction model.
In some embodiments of the present invention, the data cleaning and standardization are performed on the first sample set according to the disease knowledge base corresponding to the semen routine inspection data, and the forming of the second sample set includes the following steps:
and eliminating data which do not accord with biological rules and contradictory data in the semen biochemical examination data according to a disease knowledge base, normalizing the semen biochemical examination data, and mapping the semen biochemical examination data to [0,1 ].
In some embodiments of the present invention, the data cleaning and standardization are performed on the first sample set according to the disease knowledge base corresponding to the semen routine inspection data, and the forming of the second sample set includes the following steps:
and normalizing the semen biochemical examination data according to the data of the immunological examination data which do not accord with the immunological rule and the contradictory data according to a disease knowledge base, and mapping the semen biochemical examination data onto [0,1 ].
In some embodiments of the present invention, the data cleaning and standardization are performed on the first sample set according to the disease knowledge base corresponding to the semen routine inspection data, and the forming of the second sample set includes the following steps:
and carrying out semantic similarity calculation on the vital sign information according to the disease knowledge base to obtain a corresponding characteristic value of the vital sign information, and eliminating data with low correlation with semen related diseases.
In the above embodiment, the second sample set includes normalized semen biochemical test data and immunological test data, and characteristic values of vital sign information of the living body test.
In another aspect of the invention, a system for predicting a disease model based on semen routine inspection data is provided, which comprises an acquisition module, a storage module, a matching module, a calculation module and a prediction model, wherein the acquisition module is used for acquiring semen biochemical inspection data, immunological inspection data and vital sign information of a person to be tested; the storage module is used for storing a disease knowledge base corresponding to the semen routine inspection data; the calculation module is used for matching the semen biochemical examination data, the immunological examination data and the vital sign information of the living body detection of the testee with the disease knowledge base, and normalizing the semen biochemical examination data and the immunological examination data to obtain a feature vector of the testee; the prediction model is used for predicting the illness probability of the testee according to the feature vector.
In some embodiments of the present invention, the calculating module performs semantic similarity calculation on the detected sign information of the living body according to the disease knowledge base, so as to obtain a first feature vector.
Further, the calculating module calculates semantic similarity between the disease knowledge base and sign information of living body detection through Euclidean distance to obtain a second feature vector; and obtaining the feature vector of the person to be tested according to the first feature vector and the second feature vector.
In some embodiments of the present invention, the prediction model includes a model constructed by the method for constructing a disease prediction model based on semen routine inspection data provided in the first aspect of the present invention.
Further, the predictive model includes a trained radial basis function neural network.
The beneficial effects of the invention are as follows:
1. according to the invention, based on the data set constructed by multiple data sources, the machine learning model is constructed by cleaning and normalizing the data set and then utilizing the Radial Basis Function (RBF), and the probability of the testee suffering from the diseases related to semen can be rapidly predicted through the machine learning model. The method can be used for basic level doctors to learn and reference, is convenient for early prediction and prevention of patients, and has certain popularization and application values.
2. The invention adopts different screening and cleaning methods aiming at different attributes of various semen inspection data, improves the effectiveness and accuracy of the data, reduces the training error and training time of the model, and thus has better robustness.
Drawings
FIG. 1 is a basic flow chart of a method of constructing a disease prediction model based on semen routine inspection data in some embodiments of the invention;
fig. 2 is a schematic structural diagram of a system for predicting a model of a disease based on semen routine inspection data in some embodiments of the invention.
Detailed Description
The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.
The invention provides a method for constructing a disease prediction model based on semen routine inspection data, which comprises the following steps: s101, acquiring semen biochemical examination data and immunological examination data of sample groups, and forming a first sample set by physical sign information of living body detection; s102, carrying out data cleaning and standardization on the first sample set according to a disease knowledge base corresponding to semen routine inspection data to form a second sample set; s103, dividing the second sample set into a training set and a verification set, and then taking the training set as the input of a radial basis function neural network; s104, training the radial basis function neural network until the deviation between the output value and the true value is lower than a threshold value, and obtaining a disease prediction model.
Specifically, the biochemical parameters of each item index of semen routine examination are described below. For example, under microscopic examination, 1) White Blood Cells (WBCs) > 5/HPF, seen in genital tract inflammation (seminal vesiculitis, prostatitis), tuberculosis, tumors, etc.; 2) Red Blood Cells (RBCs) > 5/HPF, commonly found in seminal vesicle tuberculosis, prostate cancer, and the like. For another example, 1, pH: if the pH is less than 7.0, the composition is used for treating chronic infectious diseases, seminal vesicle hypofunction, congenital seminal vesicle deficiency, vas deferens obstruction and the like; 2. if the pH is more than 8.0, the patients with acute infectious diseases are mostly seen in accessory gonads or epididymis; 3. semen motility rate. If the sperm motility is less than 35%, the sperm motility is often the cause of male infertility, and is mainly found in varicocele, non-specific infection of the reproductive system, hypophysis dysfunction and the like.
The characteristic information of each type of characteristics in the semen routine inspection comprises the color, the character, the smell, the quantity and the like of the semen. For example, 1, semen color anomaly: in the case of yellow or brown purulent semen, it is common to seminal vesiculitis or prostatitis; if the semen is bloody semen with bright red, dark red or pink, the semen is mostly seen in seminal vesiculitis, prostatic tuberculosis and seminal vesiculum tumors are rare; 2. semen volume abnormality: excessive semen volume: it is often seen in oligospermia and seminal vesiculitis, and also in those with overgrowth of forbidden time; semen volume reduction: is used for treating oligospermia, testicular hypofunction, endocrine disturbance, seminal vesiculitis, prostatitis, genital system infection, etc.; semen-free fluid: is commonly seen in azoospermia; 3. abnormal semen liquefaction is usually found in the cases of prostate infection or lesions, such as the lesions of seminal vesicle glands and bulbar glands.
The sign information of the living body detection of the subject is as follows: for example, one or a combination of any several of testis distending pain, vas deferens pain, urgent urination, frequent urination, painful urination, high fever, chills, hypodynamia, waist soreness, spermatorrhea, premature ejaculation, thirst, emaciation, weakness, susceptibility to cold and the like; for example, semen may suffer from prostatitis if it is colorless and transparent, too thin, urgent, frequent, painful, high fever, chills; if semen is weak, debilitated and soreness of waist, oligospermia may occur; lean semen, distending pain in the testes, pain in the vas deferens, low back pain, which means that there may be symptoms of blood stasis. Preferably, the words or phrases are extracted by keywords, and irrelevant stop words are removed, namely the characteristic values of sign information detected by the living body of the detected person.
In step S102 of some embodiments of the present invention, performing data cleaning and standardization on the first sample set according to a disease knowledge base corresponding to semen routine inspection data, to form a second sample set includes the following steps:
and eliminating data which do not accord with biological rules and contradictory data in the semen biochemical examination data according to a disease knowledge base, normalizing the semen biochemical examination data, and mapping the semen biochemical examination data to [0,1 ].
In step S102 of some embodiments of the present invention, performing data cleaning and standardization on the first sample set according to a disease knowledge base corresponding to semen routine inspection data, to form a second sample set includes the following steps:
and normalizing the semen biochemical examination data according to the data of the immunological examination data which do not accord with the immunological rule and the contradictory data according to a disease knowledge base, and mapping the semen biochemical examination data onto [0,1 ].
Specifically, according to the clinical diagnostics standard, a biochemical parameter standard library of each item index of semen routine examination, various character characteristic information and symptom information library of semen routine examination and a possibly corresponding disease knowledge library are established through normalization. For example, semen routine examination generally involves extracting semen, and determining whether the semen volume, the sperm motility, the sperm count, the abnormal sperm volume, the semen liquefaction time, the semen pH, the total number of sperm, the sperm motility time, the sperm climbing, the erythrocyte, the leucocyte, etc. are abnormal, and whether the semen is in a normal state or an abnormal state is determined by detecting whether the semen is in a higher or lower state, and whether the semen is in an abnormal state. The method specifically comprises the following steps: the normal semen discharge value is 2-6 ml; the normal value of semen liquefaction time is: self-liquefying at 37 ℃ within 525 minutes; the pH normal value is: 7.2 to 7.8; semen motility (WHO standard): the lower limit of the reference value for sperm motility (PR+NP) was 40% and the lower limit of the reference value for forward motile sperm (PR) was 32%. The WHO standard sperm motility rate of a level, b level and c level is more than or equal to 60 percent; sperm motility (WHO standard): within 60 minutes after semen ejection, 50% or more sperm have forward motion (class a + class b), or 25% or more sperm have rapid forward motion (class a); microscopy: 1) White Blood Cell (WBC) normal value < 5/HPF; 2) Red Blood Cell (RBC) normal values < 5/HPF; 3) Sperm density: normal sperm density is around 2000-6000 ten thousand per milliliter. The above "clinical diagnostics" is only an example of a disease knowledge base corresponding to semen routine examination data, and is not to be taken as a limitation of the present invention. For example, the knowledge base of diseases related to the present invention includes "immunology" and "clinical genitalia", etc.
In step S102 of some embodiments of the present invention, performing data cleaning and standardization on the first sample set according to a disease knowledge base corresponding to semen routine inspection data, to form a second sample set includes the following steps:
and carrying out semantic similarity calculation on the vital sign information according to the disease knowledge base to obtain a corresponding characteristic value of the vital sign information, and eliminating data with low correlation with semen related diseases.
In the above embodiment, the second sample set includes normalized semen biochemical test data and immunological test data, and characteristic values of vital sign information of the living body test.
In another aspect of the present invention, a system for predicting a disease based on semen routine examination data is provided, which comprises an acquisition module 11, a storage module 12, a calculation module 13 and a prediction model 14, wherein the acquisition module 11 is used for acquiring semen biochemical examination data, immunological examination data and vital sign information of a living body detection of a person to be detected; the storage module 12 is used for storing a disease knowledge base corresponding to semen routine examination data; the computing module 13 is configured to match the semen biochemical inspection data, the immunological inspection data, and the vital sign information of the living body detection of the person to be tested with the disease knowledge base, and normalize the semen biochemical inspection data and the immunological inspection data to obtain a feature vector of the person to be tested; the prediction model 14 is used for predicting the disease probability of the testee according to the feature vector.
In some embodiments of the present invention, the calculating module 13 performs semantic similarity calculation on the detected sign information of the living body according to the disease knowledge base to obtain a first feature vector.
Further, the calculating module 13 calculates the semantic similarity between the disease knowledge base and the sign information of the living body detection through euclidean distance to obtain a second feature vector; and obtaining the feature vector of the person to be tested according to the first feature vector and the second feature vector. Specifically, characteristic information (color, character, smell, number, etc. of semen) of each category of semen of the subject in routine examination is acquired, and symptom sign information of the subject is acquired. Such as testicular distending pain, vas deferens pain, urgency, frequency, pain in urination, etc., which involve extraction of textual features and semantic similarity calculations. Here, the characteristic items are selected by TF-I DF, and a semen trait characteristic information vector set and a symptom characteristic information vector set are established.
The main ideas of TF-I DF are: if a word appears in one article with a high frequency TF and in other articles with few occurrences, the word or phrase is considered to have good category discrimination and is suitable for classification. The Term Frequency (TF) represents the frequency with which terms (keywords) appear in text. This number will typically be normalized (typically word frequency divided by the total number of articles) to prevent it from biasing toward long documents. The formula is:
namely:
if the fewer documents containing the term t, the larger the IDF, the better the category discrimination of the term is. The formula is:
where |D| is the total number of files in the corpus. I { j: ti εdj } | represents the containing word t i I.e. the number of files of ni, j +.0). If the term is not in the corpus, it will result in zero denominator, so 1+|{ j: ti εdj } | is typically used. Namely:
the denominator is added with 1 to avoid that the denominator is 0;
high term frequencies within a particular document, and low document frequencies of that term throughout the document collection, may yield a high weighted TF-IDF. Thus, TF-IDF tends to filter out common words, preserving important words. The formula is:
TF-IDF=TF*IDF;
meanwhile, the similarity of semantic relations between the feature information vector set to be identified and the feature information vector set in the database is calculated by using the cosine similarity theorem.
If there are two vectors in the n-dimensional space, vector A (a 1 ,a 2 ,a 3 ,....,a n ) Vector B (B) 1 ,b 2 ,b 3 ,....b n ),
Wherein, 1 of the vector A and the vector B can be understood as the characteristic vector of the tester in the previous embodiment; the other is the corresponding feature vector in the predictive model that matches the model.
In some embodiments of the present invention, the prediction model includes a model constructed by the method for constructing a disease prediction model based on semen routine inspection data provided in the first aspect of the present invention.
Further, the predictive model includes a trained radial basis function neural network.
Specifically, the RBF network of the present invention non-linearly maps data to a high-dimensional linear space through radial basis functions, and then fits or regresses with a linear model in the high-dimensional space. The network comprises three layers, wherein the first layer is an input layer and comprises N nodes (namely characteristics or data); the second layer is a hidden layer, M nodes are all used, and each node is an activation function for nonlinear mapping of data of the input layer to a high-dimensional space; the third layer is the output layer, where only one value is output. Here, the output of the RBF neural network is a predicted value of synthax integral, and the possible pathological changes of the semen abnormality of the subject are estimated based on the network output.
The method comprises the following specific steps: input vector X (vector corresponding to the second sample set), corresponding target output vector Y (vector corresponding to the disorder or disease), and width vector D of the radial basis function j . At the time of training of the first input sample (l=1, 2,., N), the expression and calculation method of each parameter are as follows:
1) Parameters are determined.
(1) Determining an input vector X:
X=[x 1 ,x 2 ,...,x n ] T n is the number of input layer elements;
(2) determining an output vector Y and a desired output vector O
Y=[y 1 ,y 2 ,...,y q ] T Q is the number of output layer units;
O=[o 1 ,o 2 ,...,o q ] T
(3) initializing connection weights of hidden layer to output layer
W k =[w k1 ,w k2 ,...,w kp ] T ,(k=1,2,...,q);
Where p is the number of hidden layer units and q is the number of output layer units.
The method for initializing the reference center gives a weight initialization method from the hidden layer to the output layer:
where mink is the minimum of all expected outputs in the kth output neuron in the training set; maxk is the maximum of all desired outputs in the kth output neuron in the training set.
(4) Initializing central parameters C of neurons of hidden layers j ={c j1 ,c j2 ,...,c jn } T . The centers of the neurons of different hidden layers have different values, and the corresponding width with the centers can be adjusted, so that different input information characteristics can be maximally reflected by the neurons of different hidden layers. In practical applications, an input message is always contained in a certain range of values. Without loss of generality, the initial values of the central components of the neurons of the hidden layer are changed from small to large at equal intervals, so that weaker input information generates stronger response near the smaller center. The size of the pitch can be adjusted by the number of hidden layer neurons. The method has the advantages that the reasonable hidden layer neuron number can be found through a trial and error method, the initialization of the center is reasonable as much as possible, different input features are more obviously reflected at different centers, and the characteristics of the Gaussian kernel are reflected.
Based on the four items, the initial values of the RBF neural network center parameters are as follows:
(p is the total number of hidden layer neurons, j=1, 2,..p), mini is the minimum value of all input information of the ith feature in the training set, max i The maximum value of all input information for the ith feature in the training set.
(5) Initializing width vector D j ={d j1 ,d j2 ,...,d jn } T . The width vector affects the range of neuron action on the input information: the smaller the width, the narrower the shape of the corresponding hidden layer neuron action function, and the smaller the response of the information in the vicinity of the center of the other neurons to the neuron. The calculation method comprises the following steps:
d f for the width adjustment coefficient, the value is smaller than 1, so that each hidden layer neuron can more easily realize the feeling ability to local information, and the local response ability of the RBF neural network is improved.
2) The output value zj of the jth neuron of the hidden layer is calculated.
C j Is the center vector of the jth neuron of the hidden layer, and is composed of the center components of all neurons of the hidden layer corresponding to the input layer, C j ={c j1 ,c j2 ,...,c jn } T The method comprises the steps of carrying out a first treatment on the surface of the Dj is the width vector of the jth neuron of the hidden layer, and C j Correspondingly, D j ={d j1 ,d j2 ,...,d jn } T The larger the Dj is, the larger the influence range of the hidden layer on the input vector is, and the smoothness among neurons is better; the term "normal" refers to a normal number.
3) And calculating the output of the output layer neurons.
Y=[y 1 ,y 2 ,...,y q ] TWherein w is kj The weight is adjusted between the kth neuron of the output layer and the jth neuron of the hidden layer.
4) And (5) carrying out iterative calculation of the weight parameters.
The training method of the RBF neural network weight parameter is taken as a gradient descent method. The center, width and adjustment weight parameters are adaptively adjusted to the optimal values through learning, and the iterative calculation is as follows:
w kj (t) is the adjustment weight between the kth output neuron and the jth hidden layer neuron in the t-th iterative computation, c ji (t) is the central component of the jth hidden layer neuron in the t-th iterative calculation for the ith input neuron, d ji (t) is the center c ji The corresponding width of (t), η is a learning factor.
E is an RBF neural network evaluation function:wherein O is lk A desired output value for the kth output neuron at the ith input sample; y is lk Is the network output value of the kth output neuron at the ith input sample.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (6)

1. The method for constructing the disease prediction model based on semen routine inspection data is characterized by comprising the following steps:
acquiring semen biochemical examination data, immunological examination data and vital sign information of a sample crowd to form a first sample set;
data cleaning and standardization are carried out on the first sample set according to a disease knowledge base corresponding to semen routine inspection data, so that a second sample set is formed: removing data which do not accord with biological rules and contradictory data from semen biochemical inspection data according to a disease knowledge base, normalizing the semen biochemical inspection data, and mapping the semen biochemical inspection data onto [0,1 ]; rejecting the data of which the immunological check data do not accord with the immunological rule and the data contradicting each other according to a disease knowledge base, normalizing the immunological check data, and mapping the immunological check data to [0,1 ]; according to the disease knowledge base, carrying out semantic similarity calculation on the detected vital sign information of the living body to obtain a corresponding characteristic value of the detected vital sign information of the living body, and eliminating data with low correlation with semen related diseases;
dividing the second sample set into a training set and a verification set, and then taking the training set as the input of a radial basis function neural network;
and training the radial basis function neural network until the deviation between the output value and the true value is lower than a threshold value, and obtaining a disease prediction model.
2. The method of claim 1, wherein the second sample set comprises normalized semen biochemical test data and immunological test data, and characteristic values of vital sign information of a living body test.
3. A system for predicting a disease model based on semen routine examination data is characterized by comprising an acquisition module, a storage module, a calculation module and a prediction model,
the acquisition module is used for acquiring semen biochemical examination data, immunological examination data and vital sign information of living body detection of a person to be detected;
the storage module is used for storing a disease knowledge base corresponding to the semen routine inspection data;
the calculation module is used for matching the semen biochemical examination data, the immunological examination data and the vital sign information of the living body detection of the testee with the disease knowledge base, and normalizing the semen biochemical examination data and the immunological examination data to obtain a feature vector of the testee;
the prediction model is used for predicting the disease probability of a person to be tested according to the feature vector, and comprises a model constructed by the disease prediction model construction method based on semen routine inspection data according to any one of claims 1-2.
4. The system of claim 3, wherein the computing module performs semantic similarity computation on the vital sign information of the living body detection according to the disease knowledge base to obtain a first feature vector.
5. The system of claim 4, wherein the computing module computes semantic similarity between the disease knowledge base and vital sign information of the living body test by euclidean distance to obtain a second feature vector; and obtaining the feature vector of the person to be tested according to the first feature vector and the second feature vector.
6. A system of disease prediction models based on semen routine inspection data according to claim 3, wherein the prediction models comprise trained radial basis function neural networks.
CN202010900071.2A 2020-08-31 2020-08-31 Method and system for constructing disease prediction model based on semen routine inspection data Active CN112017771B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010900071.2A CN112017771B (en) 2020-08-31 2020-08-31 Method and system for constructing disease prediction model based on semen routine inspection data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010900071.2A CN112017771B (en) 2020-08-31 2020-08-31 Method and system for constructing disease prediction model based on semen routine inspection data

Publications (2)

Publication Number Publication Date
CN112017771A CN112017771A (en) 2020-12-01
CN112017771B true CN112017771B (en) 2024-02-27

Family

ID=73515297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010900071.2A Active CN112017771B (en) 2020-08-31 2020-08-31 Method and system for constructing disease prediction model based on semen routine inspection data

Country Status (1)

Country Link
CN (1) CN112017771B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112233750B (en) * 2020-10-20 2024-02-02 吾征智能技术(北京)有限公司 Information matching system based on hemoptysis characters and diseases
CN112908484A (en) * 2021-01-18 2021-06-04 吾征智能技术(北京)有限公司 System, equipment and storage medium for analyzing diseases by cross-modal fusion
CN113393934B (en) * 2021-06-07 2022-07-12 义金(杭州)健康科技有限公司 Health trend estimation method and prediction system based on vital sign big data

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997005553A1 (en) * 1995-07-25 1997-02-13 Horus Therapeutics, Inc. Computer assisted methods for diagnosing diseases
WO1998024369A1 (en) * 1996-12-02 1998-06-11 The University Of Texas System Spectroscopic detection of cervical pre-cancer using radial basis function networks
US6090044A (en) * 1997-12-10 2000-07-18 Bishop; Jeffrey B. System for diagnosing medical conditions using a neural network
WO2005091203A2 (en) * 2004-03-12 2005-09-29 Aureon Laboratories, Inc. Systems and methods for treating, diagnosing and predicting the occurrence of a medical condition
CA2747184A1 (en) * 2010-08-06 2012-02-06 Miraculins Inc. Biomarkers for the diagnosis of prostate cancer in a non-hypertensive population
CN104008164A (en) * 2014-05-29 2014-08-27 华东师范大学 Generalized regression neural network based short-term diarrhea multi-step prediction method
CA2894317A1 (en) * 2015-06-15 2016-12-15 Deep Genomics Incorporated Systems and methods for classifying, prioritizing and interpreting genetic variants and therapies using a deep neural network
KR20180046432A (en) * 2016-10-27 2018-05-09 가톨릭대학교 산학협력단 Method and Apparatus for Classification and Prediction of Pathology Stage using Decision Tree for Treatment of Prostate Cancer
WO2018187952A1 (en) * 2017-04-12 2018-10-18 邹霞 Kernel discriminant analysis approximation method based on neural network
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
CN110459328A (en) * 2019-07-05 2019-11-15 梁俊 A kind of Clinical Decision Support Systems for assessing sudden cardiac arrest
CN110880369A (en) * 2019-10-08 2020-03-13 中国石油大学(华东) Gas marker detection method based on radial basis function neural network and application
KR102100699B1 (en) * 2019-07-01 2020-04-16 (주)제이엘케이인스펙션 Apparatus and method for constructing unified lesion learning model and apparatus and method for diagnosing lesion using the unified lesion learning model
CN111554401A (en) * 2020-03-26 2020-08-18 肾泰网健康科技(南京)有限公司 Method for constructing AI (artificial intelligence) chronic kidney disease screening model, and chronic kidney disease screening method and system

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997005553A1 (en) * 1995-07-25 1997-02-13 Horus Therapeutics, Inc. Computer assisted methods for diagnosing diseases
WO1998024369A1 (en) * 1996-12-02 1998-06-11 The University Of Texas System Spectroscopic detection of cervical pre-cancer using radial basis function networks
US6090044A (en) * 1997-12-10 2000-07-18 Bishop; Jeffrey B. System for diagnosing medical conditions using a neural network
WO2005091203A2 (en) * 2004-03-12 2005-09-29 Aureon Laboratories, Inc. Systems and methods for treating, diagnosing and predicting the occurrence of a medical condition
CA2747184A1 (en) * 2010-08-06 2012-02-06 Miraculins Inc. Biomarkers for the diagnosis of prostate cancer in a non-hypertensive population
CN104008164A (en) * 2014-05-29 2014-08-27 华东师范大学 Generalized regression neural network based short-term diarrhea multi-step prediction method
CA2894317A1 (en) * 2015-06-15 2016-12-15 Deep Genomics Incorporated Systems and methods for classifying, prioritizing and interpreting genetic variants and therapies using a deep neural network
KR20180046432A (en) * 2016-10-27 2018-05-09 가톨릭대학교 산학협력단 Method and Apparatus for Classification and Prediction of Pathology Stage using Decision Tree for Treatment of Prostate Cancer
WO2018187952A1 (en) * 2017-04-12 2018-10-18 邹霞 Kernel discriminant analysis approximation method based on neural network
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
KR102100699B1 (en) * 2019-07-01 2020-04-16 (주)제이엘케이인스펙션 Apparatus and method for constructing unified lesion learning model and apparatus and method for diagnosing lesion using the unified lesion learning model
CN110459328A (en) * 2019-07-05 2019-11-15 梁俊 A kind of Clinical Decision Support Systems for assessing sudden cardiac arrest
CN110880369A (en) * 2019-10-08 2020-03-13 中国石油大学(华东) Gas marker detection method based on radial basis function neural network and application
CN111554401A (en) * 2020-03-26 2020-08-18 肾泰网健康科技(南京)有限公司 Method for constructing AI (artificial intelligence) chronic kidney disease screening model, and chronic kidney disease screening method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Application of multilayer perceptron and radial basis function neural networks in differentiating between chronic obstructive pulmonary and congestive heart failure diseases;Mehrabi, S,等;EXPERT SYSTEMS WITH APPLICATIONS;第36卷(第03期);第6956-6959页 *
基于GMM-RBF神经网络的前列腺癌诊断方法;崔少泽,等;管理科学(第01期);第33-47页 *
系统性红斑狼疮自身抗体谱数据的解读与疾病模型预测;彭玲,等;检验医学与临床(第05期);第635-638页 *

Also Published As

Publication number Publication date
CN112017771A (en) 2020-12-01

Similar Documents

Publication Publication Date Title
CN112017771B (en) Method and system for constructing disease prediction model based on semen routine inspection data
Yu et al. Automatic classification of leukocytes using deep neural network
Güvenir et al. Estimating the chance of success in IVF treatment using a ranking algorithm
CN109378066A (en) A kind of control method and control device for realizing disease forecasting based on feature vector
CN108717867A (en) Disease forecasting method for establishing model and device based on Gradient Iteration tree
CN107506579A (en) Cerebral hemorrhage forecast model method for building up and system based on integrated study
Misir et al. A reduced set of features for chronic kidney disease prediction
CN110232185A (en) Towards financial industry software test knowledge based map semantic similarity calculation method
Assegie et al. Exploring the performance of feature selection method using breast cancer dataset
Mostaar et al. Use of artificial neural networks and PCA to predict results of infertility treatment in the ICSI method
Bhandari et al. Comparative analysis of fuzzy expert systems for diabetic diagnosis
Shinde et al. Analysis of WBC, RBC, platelets using deep learning
CN114496231A (en) Constitution identification method, apparatus, equipment and storage medium based on knowledge graph
Oliver et al. Extraction of SNOMED concepts from medical record texts.
Sharma et al. Fuzzy logic: A tool to predict the Renal diseases
Lowongtrakool et al. Noise filtering in unsupervised clustering using computation intelligence
Jabbar et al. Risks of chronic kidney disease prediction using various data mining algorithms
Dardzinska et al. Decision-making process in colon disease and Crohn’s disease treatment
Hossam et al. A sub-optimum feature selection algorithm for effective breast cancer detection based on particle swarm optimization
CN112233742A (en) Medical record document classification system, equipment and storage medium based on clustering
Heaton et al. Repurposing trec-covid annotations to answer the key questions of cord-19
Razzaq et al. Stroke Prediction in Elderly Persons using Remote Health Monitoring
Junath et al. Prognostic diagnosis for breast cancer patients using probabilistic bayesian classification
CN116110594B (en) Knowledge evaluation method and system of medical knowledge graph based on associated literature
Tita et al. Analyze the use of machine learning models in the Pima diabetes data set for early stage detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant