CN107845424B - Method and system for diagnostic information processing analysis - Google Patents

Method and system for diagnostic information processing analysis Download PDF

Info

Publication number
CN107845424B
CN107845424B CN201711128161.9A CN201711128161A CN107845424B CN 107845424 B CN107845424 B CN 107845424B CN 201711128161 A CN201711128161 A CN 201711128161A CN 107845424 B CN107845424 B CN 107845424B
Authority
CN
China
Prior art keywords
label
information
sample
vector
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711128161.9A
Other languages
Chinese (zh)
Other versions
CN107845424A (en
Inventor
黄梦醒
韩惠蕊
张雨
冯文龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hainan University
Original Assignee
Hainan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hainan University filed Critical Hainan University
Priority to CN201711128161.9A priority Critical patent/CN107845424B/en
Publication of CN107845424A publication Critical patent/CN107845424A/en
Application granted granted Critical
Publication of CN107845424B publication Critical patent/CN107845424B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a method and a system for processing and analyzing diagnostic information, and relates to the technical field of medical systems. The method for processing and analyzing the diagnostic information comprises the following steps: establishing a feature space and a label space through a plurality of sample features and label information, and establishing a feature vector and a first label vector of a sample; calculating the occurrence frequency of each label information; calculating the times that every two label information simultaneously appear in one sample; calculating the similarity of every two label information, and establishing a label-label similarity matrix; reconstructing the first label vector to obtain a second label vector, and calculating the occurrence probability of the label information of the target sample according to the second label vector; and selecting a preset number of label information as the recommended label information of the target sample. The invention can effectively find more potential diseases by using the correlation between the diseases and the complications thereof by utilizing the characteristic space and the label information formed by the characteristics corresponding to the diseases of the patients, thereby improving the precision of diagnosis decision support.

Description

Method and system for diagnostic information processing analysis
Technical Field
The invention relates to the technical field of medical systems, in particular to a method and a system for processing and analyzing diagnostic information.
Background
In the big data age, people have gradually accepted methods of using health big data to assist diagnosis and treatment. With the success of many industry applications in utilizing big data, the health services industry is beginning to utilize medical big data to improve service efficiency and quality.
Health information tools and machine learning techniques have been successfully used to help physicians diagnose diseases and develop treatment regimens more efficiently. Clinical decision support applications include systems that provide diagnostics, personalized drug assessment, treatment protocols, and related medical knowledge. Clinical decision support applications aim to provide medical personnel with professional knowledge, patient information and intelligent means to improve the efficiency and utility of medical personnel in making decisions. Through clinical decision support applications, medical negligence can be reduced and medical service quality can be improved. In the medical field, there is an increasing demand for high quality clinical decision support systems.
Clinicians differentiate patients and make diagnoses for patients through their experience and knowledge. Therefore, if the clinician does not have a great deal of experience and accurate judgment, inevitable medical errors will result. The goal of building clinical decision support systems is to improve the accuracy and efficiency of clinicians through machine learning techniques. The system can extract the characteristics of the patient through personal health records, such as physiological data, electronic medical records, 3D images, radiological images, genome sequencing, clinical and charging data, classify the patient according to the characteristics of the patient and provide corresponding clinical suggestions to doctors. Scoring criteria in medical scenarios and the complexity of the medical field are challenges for clinical decision support systems. Currently, many clinical decision support systems have been developed in the market to provide assistance to physicians.
Because of complications associated with some diseases, it is common for a patient to have multiple diseases simultaneously. To assess the reference disease to the clinician, a more complex clinical support decision system is required. After analyzing the real clinical diagnosis information, the number of patients with multiple diseases accounts for a large part of the total number of patients. The clinical decision support system needs to recommend multiple reference diseases to the clinician. Therefore, disease conversion is recommended to reference the problem of disease for multi-label classification.
The ML-kNN (lazy multi-label classification method) is widely applied and researched due to simple steps and outstanding effects. However, the algorithm ignores the association between tags by estimating the likelihood of each tag independently. In actual disease diagnosis, many labels are connected, and the ML-kNN method is not effective for application scenes with connection among the labels.
Disclosure of Invention
The invention mainly aims to provide a method and a system for processing and analyzing diagnostic information, aiming at effectively improving the accuracy of diagnosing diseases by utilizing the correlation among the diseases.
To achieve the above object, the present invention provides a method for processing and analyzing diagnostic information, comprising the steps of:
establishing a feature space and a label space of multi-label information through a plurality of sample features and label information of a plurality of samples, and establishing a feature vector and a first label vector of each sample according to each sample feature and each label information;
calculating the occurrence frequency of each label information; calculating the times that every two label information simultaneously appear in one sample;
calculating the similarity of every two label information, and establishing a label-label similarity matrix;
reconstructing a first label vector of the sample through the label-label similarity matrix to obtain a second label vector, and calculating the label information occurrence probability of the target sample according to the second label vector;
and sorting the probability of the target sample label information in a descending order, and selecting a preset number of label information as the recommended label information of the target sample.
Preferably, the establishing a feature space and a label space of multi-label information by using a plurality of sample features and label information of a plurality of samples further comprises:
preset F ═ F1,f2...fbB-dimensional feature space of a plurality of label information, and preset L ═ L1,l2,...lqQ-dimensional label space of the plurality of label information;
predetermined T { (X)1,Y1),(X2,Y2),...,(Xn,Yn) Is a set of multiple label information, preset
Figure BDA0001468824520000031
B-dimensional feature vectors for the samples; then
Figure BDA0001468824520000032
Is the sample XiA corresponding tag vector; if the feature vector XiSpace with label ljThen label vector
Figure BDA0001468824520000033
Otherwise
Figure BDA0001468824520000034
Preferably, the calculating the number of occurrences of each tag information includes: calculating the sample corresponding to the label information according to the occurrence frequency of the label information in each sample, and setting the value as rij(ii) a If the feature vector XkIn which there is a label space liThen r isij1, otherwise rij=0。
Preferably, the calculating the similarity between every two tags and the establishing the tag-tag similarity matrix further includes:
and calculating the similarity of every two labels by using a cosine similarity calculation method.
Preferably, the calculating the similarity of every two tags by using the cosine similarity calculation method includes:
by calculation of formula
Figure BDA0001468824520000035
The similarity of every two labels is calculated,
wherein, PijIs to include label space l at the same timeiAnd the label space ljThe set of (a) and (b),
Figure BDA0001468824520000036
is a label space liAnd the label space ljWhile appearing in sample XkThe number of times of (1);
Figure BDA0001468824520000037
and
Figure BDA0001468824520000038
respectively a label space liTotal number of occurrences and label space ljTotal number of occurrences.
Preferably, the calculating the similarity between every two label information, and the establishing a label-label similarity matrix includes:
the label-to-label similarity matrix is
Figure BDA0001468824520000041
Wherein the element S in the matrixij=sim(Ii,Ij) Presentation label liAnd a label ljThe similarity of (c).
Preferably, the tag information matrix for reconstructing the sample by the tag-tag similarity matrix is: y ═ g (x),
wherein g (X) is:
Figure BDA0001468824520000042
preferably, the calculating the probability of occurrence of the tag information of the target sample according to the second tag vector includes:
and calculating the occurrence probability of each label information in the target sample by using a lazy multi-label classification algorithm according to the second label vector.
The present invention also provides a diagnostic information processing and analyzing system, the system comprising:
a module for establishing a feature space and a label space of multi-label information according to the plurality of sample features and label information of the plurality of samples;
means for establishing a feature vector and a first label vector for the sample based on each sample feature and each label information;
a module for calculating the number of occurrences of each of the tag information;
a module for counting the number of times that every two tag information simultaneously appear in one sample;
a module for calculating the similarity of every two label information and establishing a label-label similarity matrix;
the module is used for reconstructing a label information matrix of the sample through the label-label similarity matrix, reconstructing a first label vector of the sample to obtain a second label vector, and calculating the occurrence probability of label information of a target sample according to the second label vector;
module for sorting probability of occurrence of target sample label information in descending order'
And the module is used for selecting the label information with preset quantity as the recommended label information of the target sample.
Preferably, the system further comprises: the module for calculating the similarity of every two label information and establishing the label-label similarity matrix is a cosine similarity calculation module.
The technical scheme of the invention utilizes the characteristics corresponding to the diseases of the patients to form a characteristic space in the multi-label learning algorithm, and utilizes the diseases as label information in the multi-label learning algorithm, the times of the common appearance of every two label information on one patient are obtained by calculating and analyzing the label information, the label matrix is reconstructed by utilizing the label similarity matrix to update the label vector corresponding to each patient, and then the multi-label learning algorithm is utilized to predict the possible diseases for the target patient, so that the correlation between the diseases and the complications thereof can be effectively utilized to discover more potential diseases, and the precision of diagnosis decision support is improved.
Drawings
FIG. 1 is a flow chart illustrating a method of diagnostic information processing analysis according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the present invention provides a method for processing and analyzing diagnostic information, comprising the steps of:
establishing a feature space and a label space of multi-label information through a plurality of sample features and label information of a plurality of samples, and establishing a feature vector and a first label vector of each sample according to each sample feature and each label information; calculating the occurrence frequency of each label information; calculating the times that every two label information simultaneously appear in one sample; calculating the similarity of every two label information, and establishing a label-label similarity matrix; reconstructing a first label vector of a sample through the label-label similarity matrix to obtain a second label vector, and calculating the occurrence probability of label information of a target sample according to the second label vector; and sorting the probability of the target sample label information in a descending order, and selecting a preset number of label information as the recommended label information of the target sample.
The principle of the invention is as follows: complications refer to the development of one disease causing another disease or condition. When a doctor diagnoses a patient, if the doctor has diagnosed a disease, the doctor considers whether the patient also suffers from complications of the disease to ensure that more diseases of the patient can be found. Meanwhile, the relation between a disease and other diseases can be reflected through the co-occurrence frequency of the disease and other diseases, so that the similarity between the diseases can be reflected by using the co-occurrence frequency between every two diseases, and the information of the diseases possibly suffered by the patient is finally recommended by using the calculated disease similarity and combining a lazy multi-label classification method.
In a particular embodiment, a multi-labeled dataset is pre-set from the patient diagnostic record, (Basic multiple labels data set, denoted BMLDS); meanwhile, a feature set and a label set of the sample are separated from the BMLDS and are respectively marked as F and L, and the BMLDS is subjected to the following steps of: 3 into a training set and a test set, the method of the invention comprises:
s1, establishing a feature space and a label space of multi-label information through a plurality of sample features and label information of a plurality of samples:
preset F ═ F1,f2...fbB-dimensional feature space of a plurality of label information, and preset L ═ L1,l2,...lqQ-dimensional label space of the plurality of label information;
predetermined T { (X)1,Y1),(X2,Y2),...,(Xn,Yn) Is a set of multiple label information, preset
Figure BDA0001468824520000071
B-dimensional feature vectors for the samples; then
Figure BDA0001468824520000072
Is the sample XiA corresponding tag vector; if the feature vector XiSpace with label ljThen label vector
Figure BDA0001468824520000073
Otherwise
Figure BDA0001468824520000074
S2, calculating the occurrence frequency of each label information: calculating the sample corresponding to the label information according to the occurrence frequency of the label information in each sample, and setting the value as rij(ii) a If the feature vector XkIn which there is a label space liThen r isij1, otherwise rij=0。
S3, calculating the similarity of every two labels, and establishing a label-label similarity matrix:
and calculating the similarity of every two labels by using a cosine similarity calculation method.
The similarity calculation method comprises a cosine similarity calculation method, a Pearson correlation coefficient calculation method and a Jaccard similarity coefficient calculation method.
Specifically, calculating the similarity of every two tags by using a cosine similarity calculation method comprises the following steps:
by calculation of formula
Figure BDA0001468824520000075
Calculating the similarity of every two labels, wherein PijIs to include label space l at the same timeiAnd the label space ljThe set of (a) and (b),
Figure BDA0001468824520000076
is a label space liAnd the label space ljWhile appearing in sample XkThe number of times of (1);
Figure BDA0001468824520000077
and
Figure BDA0001468824520000078
respectively a label space liTotal number of occurrences and label space ljThe total number of occurrences;
the calculation method of the Pearson correlation coefficient comprises the following steps:
Figure BDA0001468824520000081
wherein I and J are each a label liAnd a label ljAnd the corresponding sample vector and the Pearson correlation coefficient are respectively subjected to cosine included angles of the calculated space vectors after the vector I and the vector J are subjected to overall standardization.
The calculation method of the Jaccard similarity coefficient comprises the following steps:
Figure BDA0001468824520000082
wherein I and J are each a label liAnd a label ljThe corresponding sample vector.
S4, calculating the similarity of every two label information, and establishing a label-label similarity matrix:
the label-to-label similarity matrix is
Figure BDA0001468824520000083
Wherein the element S in the matrixij=sim(Ii,Ij) Presentation label liAnd a label ljThe similarity of (c).
S5, reconstructing the label information matrix of the sample through the label-label similarity matrix
Y ═ g (x), where g (x) is:
Figure BDA0001468824520000084
s6, calculating the label information occurrence probability of the target sample according to the second label vector:
and calculating the occurrence probability of each label information in the target sample by using a lazy multi-label classification algorithm according to the second label vector. Specifically, the probability value of each tag information appearing in the target sample ranges between [0, 1 ].
The classification function of the lazy multi-label classification algorithm is as follows:
Figure BDA0001468824520000085
the number of times each label appears in k neighboring (kNN) samples of each sample is first counted, and the label that may appear in the unlabeled sample is estimated by maximizing the posterior probability. For a sample space X containing m samples, the label space is denoted as L. Event(s)
Figure BDA0001468824520000091
And the probability that the ith label information takes the value of b is represented, wherein b is 0 or 1, b is 0 and represents that the label does not appear, and b is 1 and represents that the label appears. Event(s)
Figure BDA0001468824520000092
Indicating that exactly j of k neighbors are liAnd a label, which determines whether a label l appears in the sample x by calculating the magnitude of the equation value.
Wherein,
Figure BDA0001468824520000093
by the formula
Figure BDA0001468824520000094
The calculation is carried out according to the calculation,
yj(li) Indicates whether sample j has label liS ∈ (0, 1). Therefore, the temperature of the molten metal is controlled,
Figure BDA0001468824520000095
conditional probability
Figure BDA0001468824520000096
The calculation of (2) requires that when traversing the training sample set, statistics is carried out on samples of k neighbors of each sample, wherein the samples contain labels liThe case (1). Array C [ j ]]Counting each sample label liWhen the value is 1, the k neighbor sample of the sample contains a label liThe number of (2); array C' [ j]Label liWhen the value is 0, the k neighbor sample of the sample contains a label liThe number of (2). p represents the number of tags. The conditional probability is calculated by the following equation:
Figure BDA0001468824520000097
Figure BDA0001468824520000098
and S7, sequencing the probability of the target sample label information in a descending order, and selecting a preset number of label information as the recommended label information of the target sample.
And arranging the labels from large to small according to the probability value of the label information in the target sample, and selecting N pieces of label information before rehearsal as recommended labels of the target sample. When a patient is seen, the first N possible complication label information with the maximum probability value is recommended, and the specific value of N can be preset according to different diseases.
In a specific embodiment, the patients are samples in the multi-label learning algorithm, the corresponding features of the diseases of each patient form a feature space in the multi-label learning algorithm, and the diseases suffered by all the patients serve as label information in the multi-label learning algorithm. Obtaining the co-occurrence times of every two label information through the analysis of the label information; when the number of times that two different label information appears in the same sample is more, the correlation of the two label information is larger; calculating the similarity between every two label information according to the cosine similarity, recalculating the label vectors corresponding to the samples according to the similarity of the labels, wherein the vector value corresponding to a certain label is equal to or exceeds 0.5, the vector value of the label is reset to be 1, otherwise, the vector value of the label is still 0; the method comprises the steps of finding potential labels of training samples by reconstructing label vectors corresponding to the samples, associating the labels of the training samples, reconstructing a label matrix in a label space by using a label similarity matrix to update the label vector corresponding to each sample, and finally predicting the potential labels for the target samples by using a multi-label learning algorithm.
In a specific example, 9 common diseases (including type 2 diabetes, hyperlipidemia, fatty liver, hyperkalemia, hypoproteinemia, diabetic nephropathy, cerebral infarction, coronary heart disease, and osteoporosis) were selected. Selecting one or more patients with these diseases from a hospital as a test sample; test reports and basic information of patients are collected as sample characteristics, and 459 patient samples containing 5 patient basic information and 278 test items are obtained. And extracting the sex, the age, the body temperature, the height and the weight of the patient from the patient sample as basic attributes of the patient. The value of gender is binary, such as 0 for male and 1 for female. However, the values of age, body temperature, height and weight are numerical, retaining their actual values; for an item whose assay value is numerical, if its assay value is within a normal reference range, its value is set to 1; if its assay value is outside the normal range, its value is set as the actual assay value. For items for which the assay value is in the form of a textual description, the different textual description values for the item are collected and then arranged using an array. If the text description value of the project is equal to the reference value, the value is 0; and if the text description of the item is not equal to the reference value, setting the value as the arrangement value of the text description in the array.
Statistics of the characteristics and statistics of the disease are listed, respectively, as shown in table 1 and table 2. Overall, 60% of patients are male and 40% are female. The average age, body temperature, height and weight of the patients were 64.64, 36.5, 167.81 and 67.75, respectively. From statistics of diseases, 2-type diabetes and cerebral infarction are the most common two diseases among these 9 diseases. In fact, these diseases are also the most common diseases in the elderly population. We randomly selected 70% of patients as training samples and the remaining 30% as testing samples.
TABLE 1
Figure BDA0001468824520000111
TABLE 2
Figure BDA0001468824520000112
The evaluation criteria for the multi-label classification problem fall into two categories: a. ranking-based evaluation criteria: the goal is to rank the relevant examples ahead of the irrelevant examples. b. Binary prediction evaluation: the goal is to make a strict yes/no classification for each target sample. Hamming Loss, accuracy, recall, and F1-score (F1 score) were used to evaluate the effect of the present invention.
Hamming Loss evaluates the average difference between the recommended label of the test sample and its actual label:
Figure BDA0001468824520000113
wherein, h (x)i) Is a test sample xiThe recommended set of tags of (1); p is the number of test samples; y isiIs a test sample xiThe actual set of tags of (2); Δ is the symmetry difference.
The accuracy is defined as the ratio of the number of tags hit in the tag recommendation list to the total number of the tag recommendation list. I.e., the accuracy rate represents the probability of the accuracy rate of the test sample having the recommended label. The accuracy is formulated as follows:
Figure BDA0001468824520000121
the recall is defined as the ratio of the number of hit tags to the number of true tags for the test sample. In other words, the recall rate represents the probability of the accuracy with which the real tag is recommended. The recall ratio is formulated as follows:
Figure BDA0001468824520000122
f1-score considers both accuracy and recall, and its formula is as follows:
Figure BDA0001468824520000123
the effect of the method is analyzed by comparing the effect of the invention with the effect of two classical multi-label classification methods. Two classic multi-label classification methods are a lazy multi-label classification method (ML-kNN) and a multi-label classification method (BR-kNN) combining a BR method and a kNN method, respectively. In table 3, the main and classical multi-label classification algorithms are compared, and according to the nearest neighbor numbers with good stability and high accuracy, the nearest neighbor numbers are all set to 10, and the smoothing factor of the invention and the smoothing factor of ML-kNN are both set to 1. In all methods, the recommended number of tags is 2. Experiments were performed using 10-fold cross validation, and the final results were the average of these experimental results. In Table 3, for the present invention, its accuracy is 0.236, recall is 0.3793 and F1-score is 0.2915, which are superior to the other two methods. Comparing the results of the experiment with the second ML-kNN, the accuracy, recall and F1-score of the present invention were improved by 11%, 13% and 12%, respectively. The Hamming Loss of the present invention is 0.2117, which is also superior to the other two methods. Thus, the performance of the present invention is superior to the other two methods.
TABLE 3
Figure BDA0001468824520000131
The smaller the value of ↓: the better the effect ↓: the larger the value
The present invention also provides a diagnostic information processing and analyzing system, the system comprising:
a module for establishing a feature space and a label space of multi-label information according to the plurality of sample features and label information of the plurality of samples;
means for establishing a feature vector and a first label vector for the sample based on each sample feature and each label information;
a module for calculating the number of occurrences of each of the tag information;
a module for counting the number of times that every two tag information simultaneously appear in one sample;
a module for calculating the similarity of every two label information and establishing a label-label similarity matrix;
the module is used for reconstructing a label information matrix of the sample through the label-label similarity matrix, reconstructing a first label vector of the sample to obtain a second label vector, and calculating the occurrence probability of label information of a target sample according to the second label vector;
module for sorting probability of occurrence of target sample label information in descending order'
And the module is used for selecting the label information with preset quantity as the recommended label information of the target sample.
Preferably, the system further comprises: the module for calculating the similarity of every two label information and establishing the label-label similarity matrix is a module for calculating cosine similarity.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims (9)

1. A method of diagnostic information processing analysis, comprising the steps of:
establishing a feature space and a label space of multi-label information through a plurality of sample features and label information of a plurality of samples, and establishing a feature vector and a first label vector of each sample according to each sample feature and each label information; the sample characteristics are the test report and basic information of the disease of the patient;
calculating the occurrence frequency of each label information; calculating the times that every two label information simultaneously appear in one sample; calculating the similarity of every two label information; establishing a label-label similarity matrix;
reconstructing a first label vector of the sample through the label-label similarity matrix to obtain a second label vector, and calculating the label information occurrence probability of the target sample according to the second label vector;
sorting the probability of the target sample label information in a descending order, and selecting a preset number of label information as the recommended label information of the target sample;
the establishing of the feature space and the label space of the multi-label information by the plurality of sample features and the label information of the plurality of samples further comprises:
preset F ═ F1,f2...fbB-dimensional feature space of a plurality of label information, and preset L ═ L1,l2,...lqQ-dimensional label space of the plurality of label information;
predetermined T { (X)1,Y1),(X2,Y2),...,(Xn,Yn) Is a set of multiple label information, preset
Figure FDA0003259557700000011
B-dimensional feature vectors for the samples; then
Figure FDA0003259557700000012
Is the sample XiA corresponding tag vector; if the feature vector XiSpace with label ljThen label vector
Figure FDA0003259557700000014
Otherwise
Figure FDA0003259557700000013
2. The method of claim 1, wherein said calculating the number of occurrences of said each tag information comprises: calculating the sample corresponding to the label information according to the occurrence frequency of the label information in each sample, and setting the value as rij(ii) a If the feature vector XkIn which there is a label space liThen r isij1, otherwise rij=0。
3. The method of claim 2, wherein the calculating the similarity between every two tags and establishing the tag-to-tag similarity matrix further comprises:
and calculating the similarity of every two labels by using a cosine similarity calculation method.
4. The method of claim 3, wherein the calculating the similarity of each two tags by using the cosine similarity calculation method comprises:
by calculation of formula
Figure FDA0003259557700000021
The similarity of every two labels is calculated,
wherein, PijIs to include label space l at the same timeiAnd the label space ljThe set of (a) and (b),
Figure FDA0003259557700000022
is a label space liAnd the label space ljWhile appearing in sample XkThe number of times of (1);
Figure FDA0003259557700000023
and
Figure FDA0003259557700000024
respectively a label space liTotal number of occurrences and label space ljTotal number of occurrences.
5. The method of claim 4, wherein the calculating the similarity between every two label information and establishing the label-to-label similarity matrix comprises:
the label-to-label similarity matrix is
Figure FDA0003259557700000025
Wherein the element S in the matrixij=sim(Ii,Ij) Presentation label liAnd a label ljThe similarity of (c).
6. The method of claim 5, wherein reconstructing the label information matrix of the sample by the label-to-label similarity matrix is: y ═ g (x),
wherein g (X) is:
Figure FDA0003259557700000031
7. the method of claim 1, wherein the calculating the probability of occurrence of the tag information for the target sample according to the second tag vector comprises:
and calculating the occurrence probability of each label information in the target sample by using a lazy multi-label classification algorithm according to the second label vector.
8. A diagnostic information processing and analysis system, the system comprising:
a module for establishing a feature space and a label space of multi-label information according to the plurality of sample features and label information of the plurality of samples; the sample characteristics are the test report and basic information of the disease of the patient;
means for establishing a feature vector and a first label vector for the sample based on each sample feature and each label information;
a module for calculating the number of occurrences of each of the tag information;
a module for counting the number of times that every two tag information simultaneously appear in one sample;
a module for calculating the similarity of every two label information and establishing a label-label similarity matrix;
the module is used for reconstructing a label information matrix of the sample through the label-label similarity matrix, reconstructing a first label vector of the sample to obtain a second label vector, and calculating the occurrence probability of label information of a target sample according to the second label vector;
means for sorting the probabilities of occurrence of the target sample label information in descending order;
a module for selecting a preset number of label information as the recommended label information of the target sample;
the establishing of the feature space and the label space of the multi-label information according to the plurality of sample features and the label information of the plurality of samples further comprises:
preset F ═ F1,f2...fbB-dimensional feature space of a plurality of label information, and preset L ═ L1,l2,...lqQ-dimensional label space of the plurality of label information;
predetermined T { (X)1,Y1),(X2,Y2),...,(Xn,Yn) Is a set of multiple label information, preset
Figure FDA0003259557700000041
B-dimensional feature vectors for the samples; then
Figure FDA0003259557700000042
Is the sample XiA corresponding tag vector; if the feature vector XiSpace with label ljThen label vector
Figure FDA0003259557700000043
Otherwise
Figure FDA0003259557700000044
9. The system of claim 8, further comprising: the module for calculating the similarity of every two label information and establishing the label-label similarity matrix is a module for calculating cosine similarity.
CN201711128161.9A 2017-11-15 2017-11-15 Method and system for diagnostic information processing analysis Active CN107845424B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711128161.9A CN107845424B (en) 2017-11-15 2017-11-15 Method and system for diagnostic information processing analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711128161.9A CN107845424B (en) 2017-11-15 2017-11-15 Method and system for diagnostic information processing analysis

Publications (2)

Publication Number Publication Date
CN107845424A CN107845424A (en) 2018-03-27
CN107845424B true CN107845424B (en) 2021-11-16

Family

ID=61678966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711128161.9A Active CN107845424B (en) 2017-11-15 2017-11-15 Method and system for diagnostic information processing analysis

Country Status (1)

Country Link
CN (1) CN107845424B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962383A (en) * 2018-06-05 2018-12-07 南京麦睿智能科技有限公司 Hospital's intelligence hospital guide's method and apparatus
CN109697629B (en) * 2018-11-15 2023-02-24 平安科技(深圳)有限公司 Product data pushing method and device, storage medium and computer equipment
CN113096795B (en) * 2019-12-23 2023-10-03 四川医枢科技有限责任公司 Multi-source data-aided clinical decision support system and method
CN111968740B (en) * 2020-09-03 2021-04-27 卫宁健康科技集团股份有限公司 Diagnostic label recommendation method and device, storage medium and electronic equipment
CN112117009A (en) * 2020-09-25 2020-12-22 北京百度网讯科技有限公司 Method, device, electronic equipment and medium for constructing label prediction model
CN112201354A (en) * 2020-10-20 2021-01-08 吾征智能技术(北京)有限公司 Disease information matching system based on blood lead value

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298605A (en) * 2011-06-01 2011-12-28 清华大学 Image automatic annotation method and device based on digraph unequal probability random search
CN102799680A (en) * 2012-07-24 2012-11-28 华北电力大学(保定) XML (extensible markup language) document spectrum clustering method based on affinity propagation
KR101450784B1 (en) * 2013-07-02 2014-10-23 아주대학교산학협력단 Systematic identification method of novel drug indications using electronic medical records in network frame method
CN104615894A (en) * 2015-02-13 2015-05-13 上海中医药大学 Traditional Chinese medicine diagnosis method and system based on k-nearest neighbor labeled specific weight characteristics
CN105243300A (en) * 2015-08-31 2016-01-13 合肥工业大学 Approximation spectral clustering algorithm based method for predicting cancer metastasis and recurrence
CN105320967A (en) * 2015-11-04 2016-02-10 中科院成都信息技术股份有限公司 Multi-label AdaBoost integration method based on label correlation
CN105808931A (en) * 2016-03-03 2016-07-27 北京大学深圳研究生院 Knowledge graph based acupuncture and moxibustion decision support method and apparatus
CN106202916A (en) * 2016-07-04 2016-12-07 扬州大学 The layering multiple manifold setting up a kind of Alzheimer analyzes model
CN107133293A (en) * 2017-04-25 2017-09-05 中国科学院计算技术研究所 A kind of ML kNN improved methods and system classified suitable for multi-tag

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090105092A1 (en) * 2006-11-28 2009-04-23 The Trustees Of Columbia University In The City Of New York Viral database methods

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298605A (en) * 2011-06-01 2011-12-28 清华大学 Image automatic annotation method and device based on digraph unequal probability random search
CN102298605B (en) * 2011-06-01 2013-04-17 清华大学 Image automatic annotation method and device based on digraph unequal probability random search
CN102799680A (en) * 2012-07-24 2012-11-28 华北电力大学(保定) XML (extensible markup language) document spectrum clustering method based on affinity propagation
KR101450784B1 (en) * 2013-07-02 2014-10-23 아주대학교산학협력단 Systematic identification method of novel drug indications using electronic medical records in network frame method
CN104615894A (en) * 2015-02-13 2015-05-13 上海中医药大学 Traditional Chinese medicine diagnosis method and system based on k-nearest neighbor labeled specific weight characteristics
CN105243300A (en) * 2015-08-31 2016-01-13 合肥工业大学 Approximation spectral clustering algorithm based method for predicting cancer metastasis and recurrence
CN105320967A (en) * 2015-11-04 2016-02-10 中科院成都信息技术股份有限公司 Multi-label AdaBoost integration method based on label correlation
CN105808931A (en) * 2016-03-03 2016-07-27 北京大学深圳研究生院 Knowledge graph based acupuncture and moxibustion decision support method and apparatus
CN106202916A (en) * 2016-07-04 2016-12-07 扬州大学 The layering multiple manifold setting up a kind of Alzheimer analyzes model
CN107133293A (en) * 2017-04-25 2017-09-05 中国科学院计算技术研究所 A kind of ML kNN improved methods and system classified suitable for multi-tag

Also Published As

Publication number Publication date
CN107845424A (en) 2018-03-27

Similar Documents

Publication Publication Date Title
CN107845424B (en) Method and system for diagnostic information processing analysis
CN111710420B (en) Complication onset risk prediction method, system, terminal and storage medium based on electronic medical record big data
CN101911078B (en) Coupling similar patient case
Baker et al. Continuous and automatic mortality risk prediction using vital signs in the intensive care unit: a hybrid neural network approach
CN114999629B (en) AD early prediction method, system and device based on multi-feature fusion
DE112014000897T5 (en) Learning health systems and procedures
CN111048210B (en) Method and equipment for evaluating disease risk based on fundus image
US20210397996A1 (en) Methods and systems for classification using expert data
CN113962930B (en) Alzheimer disease risk assessment model establishing method and electronic equipment
Zhou et al. Cohesive multi-modality feature learning and fusion for COVID-19 patient severity prediction
CN112102945A (en) Device for predicting severe condition of COVID-19 patient
CN114783580B (en) Medical data quality evaluation method and system
CN111540467A (en) Schizophrenia classification identification method, operation control device and medical equipment
US11937939B2 (en) Methods and systems for utilizing diagnostics for informed vibrant constituional guidance
EP2727520A1 (en) Hepatic fibrosis detection apparatus and system
Rathi et al. Early Prediction of Diabetes Using Machine Learning Techniques
Osuwa et al. Importance of continuous improvement of machine learning algorithms from a health care management and management information systems perspective
Huang et al. Study on patient similarity measurement based on electronic medical records
Uddin et al. A Voice assistive mobile application tool to detect cardiovascular disease using machine learning approach
Bayati et al. A low-cost method for multiple disease prediction
CN112561935A (en) Method, device and equipment for identifying Alzheimer's disease
Sideris et al. A data-driven feature extraction framework for predicting the severity of condition of congestive heart failure patients
Canino et al. Feature selection model for diagnosis, electronic medical records and geographical data correlation
CN113707330B (en) Construction method of syndrome differentiation model of Mongolian medicine, syndrome differentiation system and method of Mongolian medicine
Nahian et al. Common human diseases prediction using machine learning based on survey data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant