CN109935337B - Medical record searching method and system based on similarity measurement - Google Patents

Medical record searching method and system based on similarity measurement Download PDF

Info

Publication number
CN109935337B
CN109935337B CN201910137294.5A CN201910137294A CN109935337B CN 109935337 B CN109935337 B CN 109935337B CN 201910137294 A CN201910137294 A CN 201910137294A CN 109935337 B CN109935337 B CN 109935337B
Authority
CN
China
Prior art keywords
medical record
medical
similarity
value
records
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910137294.5A
Other languages
Chinese (zh)
Other versions
CN109935337A (en
Inventor
朱培栋
张振宇
王平
熊荫乔
刘欣
郭敏捷
冯璐
郑昱
李勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha University
Original Assignee
Changsha University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha University filed Critical Changsha University
Priority to CN201910137294.5A priority Critical patent/CN109935337B/en
Publication of CN109935337A publication Critical patent/CN109935337A/en
Application granted granted Critical
Publication of CN109935337B publication Critical patent/CN109935337B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a medical record searching method and a medical record searching system based on similarity measurement, wherein the steps of the invention comprise that a medical record group is constructed aiming at a query medical record group A to obtain a medical record group set C; generating a medical record group data set D with similar labels for the medical record group set C; and constructing a machine learning model, completing training through a medical record group data set D, inputting all medical records in the target medical record and the query medical record set A into the machine learning model together, obtaining similarity measurement values between all medical records in the target medical record and the query medical record set A, and outputting N medical records with the highest similarity measurement. The method fully utilizes the information of the medical records and the theoretical knowledge related to the medical records, can improve the precision of the similarity measurement of the medical records, improves the accuracy of the sequencing of the medical records, has better precision improvement potential, and has the advantages of high precision, high applicability, good robustness and sustainable optimization potential.

Description

Medical record searching method and system based on similarity measurement
Technical Field
The invention relates to a similar medical record searching technology in the medical field, in particular to a medical record searching method and system based on similarity measurement, which can realize similar medical record searching under heterogeneous medical information and sort the similar medical records based on similarity.
Background
The medical record refers to the file for recording the disease performance and diagnosis and treatment condition of the patient according to the standard, and mainly comprises: basic information of patients, medical history information of patients, examination information, medical advice information, diagnosis information, treatment scheme, disease feedback and the like. The medical record describes the complete state of an illness in the process of seeing a doctor of a patient, and the state of the illness of the patient is stored in a data information mode. The research and analysis taking the medical record as the object has important significance, the medical record searching in the invention is the same, for example, when doctors in hospitals under county level can not grasp the state of illness of patients in the past treatment, the diagnosis and treatment of the current patients can be assisted by searching the treatment scheme of experts in similar medical records based on the invention. Therefore, the research of the medical record searching has great theoretical and practical significance.
Similar case finding is mainly based on similar ordering of cases. The similar ordering of medical records plays an extremely important fundamental role in understanding the types of medical records, identifying the relationship between medical records and predicting the trend of the disease conditions in the medical records, which is the premise and the basis of the application of the medical records. Similar sorting of medical records means that for a given medical record, all medical records in the medical record library are compared with the medical record, and then the medical records are sorted based on the similarity. The most important work is how to determine the similarity value between two medical records, i.e. the similarity measure of the medical records: the closer two cases are, the larger their similarity measure is, and the further apart the two cases are, the smaller their similarity measure is.
The existing methods for measuring the similarity of medical records can be divided into two categories: a machine learning model based on medical record data and a traditional theoretical model based on theoretical knowledge. The traditional theoretical model starts from medical field knowledge, and judges the similarity size relation between medical records through pathological analysis, and the model has the advantages of good interpretability, high similarity measurement precision among a small number of medical records, higher requirement on professional knowledge, limitation of professional field knowledge and high difficulty in improving model precision. The machine learning model based on the medical record data starts from the medical record data, analyzes and learns a large amount of medical record data with formed relations, and then learns the similarity relation among the medical record data. Therefore, the problem that the difficulty in improving the precision of the model and the difficulty in obtaining the similarity labels of the medical records exist in the medical record similarity measurement method in practical application, the problem influences the improvement of the precision of the medical record similarity measurement, and further influences the accuracy of the medical record similarity sequencing. Therefore, the method for measuring the similarity of the medical records, which can automatically acquire the label of the similarity of the medical records, is particularly important, and has important theoretical significance and practical requirements.
Chinese patent publication No. CN104572675B discloses a system and method for retrieving similar medical records, and in the technical scheme, a medical record similarity calculation method based on pathology is designed, thereby realizing a function of retrieving similar medical records from a medical record library. According to the technical scheme, the method for calculating the similarity of the medical records from the pathological angle ignores the data information of the medical records, and the similarity measurement method cannot perform self-optimization based on errors; meanwhile, the technical scheme takes the medical records as objects to carry out similarity retrieval, the medical records cannot completely reflect the condition of the patient, and the similarity of the medical records is only partial similarity of the disease condition of the patient. Yanghe et al disclose a similar medical record retrieval system based on a medical big data platform in the southeast national defense medicine, the technical scheme is mainly based on a natural language processing technology to realize the similar medical record retrieval function in a medical record library on the medical big data platform, and the method of the technical scheme has certain defects in the accuracy of similarity measurement and the completeness of medical record reaction conditions as in the previously described Chinese patent document with the publication number of CN 104572675B. Therefore, the accuracy and the integrity of the medical record query still have a space for further optimization.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the invention can be well suitable for similarity measurement of medical records, can well solve the practical problems of the deficiency of similar labels of the medical records, the imperfection of theoretical knowledge of the similarity of the medical records and the like, fully utilizes the self information of the medical records and the theoretical knowledge related to the medical records, can improve the precision of the similarity measurement of the medical records and the accuracy of the sequencing of the medical records, and simultaneously has better precision improvement potential based on the similarity measurement method of the medical records of machine learning, thereby having the advantages of high precision, high applicability, good robustness and sustainable optimization potential.
In order to solve the technical problems, the invention adopts the technical scheme that:
a medical record searching method based on similarity measurement comprises the following implementation steps:
1) establishing a medical record group aiming at the query medical record set A to obtain a medical record group set C;
2) assigning similar labels of the medical record groups to the medical record group set C to obtain a medical record group data set D with similar labels;
3) constructing a machine learning model, and completing the training of the machine learning model through a medical record group data set D with similar labels, wherein the machine learning model establishes a mapping relation between medical record groups and the similarity of the medical record groups through training;
4) and inputting all medical records in the target medical record and the query medical record set A into the machine learning model together to obtain similarity metric values between the target medical record and all medical records in the query medical record set A, and selecting N medical records with the highest similarity metric values to output.
Preferably, the detailed steps of step 1) include:
1.1) aiming at all medical records in the query medical record set A, carrying out full-arrangement and combination on every two medical records to obtain a medical record group set B, wherein elements in the medical record group set B are medical record groups, and each medical record group consists of two medical records;
1.2) randomly selecting part of the medical record groups aiming at the medical record group set B to obtain a medical record group selection set C0(ii) a Respectively calculating similarity value index values according to multiple specified similarity value indexes aiming at the case group set B, and respectively performing descending order arrangement on the basis of the similarity value index values to obtain a case group ordered set B aiming at different similarity value indexesi(ii) a Ordered set B for all case groupsiRespectively selecting and generating a case group selection set C based on probability distribution of similarity value indexesiThe appointed multiple similarity value indexes comprise at least two of Euclidean distance, cosine distance, Jacard distance and adjusted cosine distance;
1.3) selecting the case groupSelect set C0All case group selection set CiAnd (5) carrying out collection and combination to obtain a case group set C.
Preferably, the selection and generation of the case group selection set C based on the probability distribution of similarity value indexesiChronological, the ordered set of medical records BiThe probability of random selection of each case group is shown as the formula (1);
Figure GDA0002812021150000031
in the formula (1), P (SM)i) Set of medical records in order BiProbability of the ith case group being selected, SMiSet of medical records in order BiIndex of similarity value of the ith case group, f (SM)j) Set of medical records in order BiThe similarity value of the ith medical record group is normalized, and m is the ordered set B of medical record groupsiNumber of cases in the middle, f (SM)j) The functional expression of (a) is represented by the formula (2);
Figure GDA0002812021150000032
in the formula (2), SM1Set of medical records in order BiSimilarity index, SM, for the 1 st case groupmSet of medical records in order BiThe index value of the similarity value of the mth case group, m is the ordered set B of case groupsiThe number of the group of the middle cases.
Preferably, the detailed steps of step 2) include:
2.1) representing the medical record group set C as a medical record group set matrix b, wherein each row of the medical record group set matrix b represents a medical record, the first n rows of the medical record respectively represent n characteristics of the medical record, and the last row of characteristics s is diagnostic information of the medical record;
2.2) determining the weight value of each characteristic of the medical record according to the medical record group set matrix b;
and 2.3) calculating the similarity value s of each case group in the case group set C according to the similarity between the features of the cases and the weight corresponding to the features, and further obtaining a case group set D with similar labels.
Preferably, the detailed steps of step 2.2) include: calculating an original weight value of each feature of the medical record according to the formula (4), and performing normalization processing on the original weight values of all the features to serve as a final weight value of each feature;
Figure GDA0002812021150000033
in the formula (4), yi' original weight values for each i feature representing a medical condition,
Figure GDA0002812021150000034
representing a characteristic vector formed by the ith row of characteristics of all medical records of the medical record group set matrix b;
Figure GDA0002812021150000035
a diagnosis information characteristic vector formed by the diagnosis information s of all medical records of the medical record group set matrix b; sigmaiAnd (4) representing the variance of the ith column characteristics of all the medical cases of the medical case group set matrix b.
Preferably, the functional expression of the similarity value s of each case group in the case group set C calculated in step 2.3) is shown as formula (6);
Figure GDA0002812021150000041
in the formula (6), sijRepresenting the similarity of the case group consisting of case i and case j, n being the total number of features,
Figure GDA0002812021150000042
value of the x-th feature of case i
Figure GDA0002812021150000043
Is the value of the x-th feature of the medical condition j, yxTo be the weight of the xth feature after normalization,
Figure GDA0002812021150000044
is the maximum value of the x-th feature,
Figure GDA0002812021150000045
is the minimum of the xth feature.
Preferably, the detailed step of constructing the machine learning model in step 3) includes:
3.1) designing three loss functions of a scoring loss function, a sorting loss function and a sorting probability loss function respectively; the input of the scoring loss function is data of two medical records, the output is a similarity score value, and the scoring loss function is represented by an absolute value between a prediction score value and a label score value; the input of the sorting loss function is three medical record data, wherein one of the three medical record data is used for inquiring the medical record, the two sorting medical record data are used for inquiring the medical record, and the output of the sorting loss function is a similarity score value; the input of the sequencing probability loss function is three medical record data, wherein one of the three medical record data queries a medical record, the two sequencing medical records output score comparison probability values, and the sequencing probability loss function is represented by difference probabilities between the prediction score values and the label score values of the two sequencing medical records respectively by querying the medical record;
3.2) constructing a neural network model, wherein the neural network model consists of an input layer, a hidden layer and an output layer, the input layer is used for completely inputting all dimensions of the medical record into a network, and the hidden layer is a complete connection layer network and is used for characteristic processing of the medical record; the output layer is used for outputting a similarity metric value between the two medical records;
3.3) respectively taking the loss functions of the neural network model, the sequencing loss function and the sequencing probability loss function as the loss functions of the neural network model and selecting the loss function with the best effect;
and 3.4) selecting an activation function of the neural network model according to the type of the selected loss function, wherein the activation function adopts a linear activation function when the loss function is selected and scored, the activation function adopts a tanh function as the activation function when the loss function is selected and ranked, and the activation function adopts a sigmoid function as the activation function when the loss function is selected and ranked to obtain the probability loss function.
Preferably, when three loss functions of the scoring loss function, the sorting loss function and the sorting probability loss function are designed in the step 3.1), a function expression of the scoring loss function is shown as a formula (7), a function expression of the sorting loss function is shown as a formula (8), and a function expression of the sorting probability loss function is shown as a formula (9);
Figure GDA0002812021150000046
in the formula (7), L (theta) is a loss value of the loss function, and t is the number of case groups;
Figure GDA0002812021150000047
is a medical record
Figure GDA0002812021150000048
The medical records
Figure GDA0002812021150000049
The value of the model prediction score of (a),
Figure GDA00028120211500000410
is a medical record
Figure GDA00028120211500000411
The medical records
Figure GDA00028120211500000412
A tag score value of;
Figure GDA0002812021150000051
in the formula (8), L (theta) is the loss value of the sorting loss function, t is the number of case groups,
Figure GDA0002812021150000052
is a medical record qiThe medical records
Figure GDA0002812021150000053
The neural network model of (1) predicts a score value,
Figure GDA0002812021150000054
is a medical record qiThe medical records
Figure GDA0002812021150000055
The neural network model of (1) predicts a score value,
Figure GDA0002812021150000056
is a medical record qiThe medical records
Figure GDA0002812021150000057
The value of the tag's score of (c),
Figure GDA0002812021150000058
is a medical record qiThe medical records
Figure GDA0002812021150000059
Sign is a sign function;
Figure GDA00028120211500000510
in the formula (9), L (theta) represents the loss value of the ranking probability loss function, t represents the number of case groups,
Figure GDA00028120211500000511
medical record q under the value of label scoreiAnd medical record
Figure GDA00028120211500000512
The similarity between them is greater than the medical record qiAnd medical record
Figure GDA00028120211500000513
The probability of inter-similarity is determined,
Figure GDA00028120211500000514
medical record q under the value of the representation model prediction scoreiAnd medical record
Figure GDA00028120211500000515
The similarity between them is greater than the medical record qiAnd medical record
Figure GDA00028120211500000516
Probability of inter-similarity; wherein
Figure GDA00028120211500000517
The functional expressions of (a) are respectively expressed by the formulas (10) and (11);
Figure GDA00028120211500000518
Figure GDA00028120211500000519
in the formulae (10) and (11),
Figure GDA00028120211500000520
is a medical record qiThe medical records
Figure GDA00028120211500000521
The value of the tag's score of (c),
Figure GDA00028120211500000522
is a medical record qiThe medical records
Figure GDA00028120211500000523
The value of the tag's score of (c),
Figure GDA00028120211500000524
is a medical record qiThe medical records
Figure GDA00028120211500000525
The neural network model of (1) predicts a score value,
Figure GDA00028120211500000526
is a medical record qiThe medical records
Figure GDA00028120211500000527
The neural network model of (1) predicts the score value.
The invention also provides a medical record searching system based on the similarity measurement, which comprises a computer device, wherein the computer device is programmed to execute the steps of the medical record searching method based on the similarity measurement, or a storage medium of the computer device is stored with a computer program which is programmed to execute the medical record searching method based on the similarity measurement.
The present invention also provides a computer readable storage medium having stored therein a computer program programmed to execute the aforementioned similarity metric-based medical record searching method of the present invention.
Compared with the prior art, the invention has the following advantages:
1. the medical record searching based on the similarity measurement has great requirements and significance in practical application. The existing medical field shows the phenomenon of uneven resource distribution: a large amount of medical resources and expert resources are concentrated in a small number of large hospitals, most of the hospitals below county level have only a small amount of medical resources and the level of doctor business is relatively low as a whole, but actually these hospitals below county level are the subjects of the majority of patients, and therefore, a state occurs in which the majority of patients cannot receive high-level medical services. The invention can relieve the problem to a certain extent, the medical record library comprises a large amount of medical record data, the medical record data comprises the diagnosis and treatment conditions of the patient by the expert, the fact is medical resource, when the condition of the patient to be diagnosed can not be grasped by the doctors of the hospitals under the county level, the primary examination can be carried out on the patient, so that the basic information, the symptoms, the history, the examination information and the like of the patient can be input into the system to form a primary medical record, and then the primary medical record is input into the medical record searching system as a whole, the invention can output part of similar medical records from the medical record library based on the similarity between the medical records, thus, the doctors of the hospitals under the county level can use the analysis of the diagnosis and treatment schemes of the similar patients to reference the diagnosis and treatment conditions of the similar patients by the doctors of the experts to further assist in diagnosing and treating the current patient, the invention can share the knowledge of expert resources in the form of electronic data, assist medical treatment and better serve the medical field.
2. The method is well suitable for measuring the similarity of the medical records, can well solve the practical problems of the deficiency of medical record similarity labels, the imperfection of medical record similarity theoretical knowledge and the like, fully utilizes the self information of the medical records and the relevant theoretical knowledge of the medical records, can improve the precision of the medical record similarity measurement and improve the accuracy of medical record sequencing, and meanwhile, the medical record similarity measurement method based on machine learning has better precision improvement potential and has the advantages of high precision, high applicability, good robustness and sustainable optimization potential.
3. The machine learning method has higher requirement on the distribution condition of training data, and the method designs a multi-index probability distribution method for selecting the case group under the condition that the distribution condition of the case is unknown, so that the distribution deviation condition of the case group data under a single index is avoided to a certain extent.
4. According to the invention, from the actual condition of medical record data, the traditional theoretical model and the machine learning model are integrated, the traditional theoretical model finishes the work of weak labels, the machine learning model learns the similarity of medical records, the advantages of each model are fully utilized, and the accuracy of medical record similarity measurement is improved.
5. Compared with the traditional theoretical model, the method provided by the invention has the advantages that convenience is provided for the improvement of the medical record similarity measurement precision by the application of the machine learning technology, the optimization of data, the adjustment of parameters and the improvement of the learning method are all optimization methods which cannot be provided by the traditional medical record similarity measurement method, the method provided by the invention provides a basis for the continuous optimization of the medical record similarity measurement, and the improvement potential of the medical record similarity measurement precision can be increased.
Drawings
FIG. 1 is a schematic diagram of the basic principle of the method of the embodiment of the present invention.
FIG. 2 is a schematic diagram of a process of generating a medical record group set C according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a score loss function in the method according to the embodiment of the present invention.
Fig. 4 is a schematic structural diagram of the ordering loss function in the method according to the embodiment of the present invention.
Fig. 5 is a schematic structural diagram of an ordering probability loss function in the method according to the embodiment of the present invention.
Fig. 6 is a schematic structural diagram of a network model in the method according to the embodiment of the present invention.
Detailed Description
As shown in fig. 1, the implementation steps of the medical record searching method based on similarity measurement in this embodiment include:
1) establishing a medical record group aiming at the query medical record set A to obtain a medical record group set C;
2) assigning similar labels of the medical record groups to the medical record group set C to obtain a medical record group data set D with similar labels;
3) constructing a machine learning model, and completing the training of the machine learning model through a medical record group data set D with similar labels, wherein the machine learning model establishes a mapping relation between medical record groups and the similarity of the medical record groups through training;
4) and inputting all medical records in the target medical record and the query medical record set A into the machine learning model together to obtain similarity metric values between the target medical record and all medical records in the query medical record set A, and selecting N medical records with the highest similarity metric values to output.
In practical application, the medical record refers to the file for recording the disease performance and diagnosis and treatment condition of patients according to the standard, and mainly comprises: basic information of patients, medical history information of patients, examination information, medical advice information, diagnosis information, treatment scheme, disease feedback and the like. The data formats of medical records are also various: formatted key-value pair data, text data, image data, audio data, etc. The medical record data applied in this embodiment is formatted key value-to-medical record data obtained by collating the original medical record data.
The machine learning method has higher requirement on the distribution condition of training data, and when the medical record set is constructed aiming at the inquiry medical record set A to obtain the medical record set C under the condition that the distribution condition of the medical record is unknown, the medical record set is constructed to obtain the medical record set C specifically based on a multi-index probability distribution mode.
In this embodiment, the detailed steps of step 1) include:
1.1) aiming at all medical records in the query medical record set A, carrying out full-arrangement and combination on every two medical records to obtain a medical record group set B, wherein elements in the medical record group set B are medical record groups, and each medical record group consists of two medical records;
1.2) randomly selecting part of the medical record groups aiming at the medical record group set B to obtain a medical record group selection set C0(ii) a Respectively calculating similarity value index values according to multiple specified similarity value indexes aiming at the case group set B, and respectively performing descending order arrangement on the basis of the similarity value index values to obtain a case group ordered set B aiming at different similarity value indexesi(ii) a Ordered set B for all case groupsiRespectively selecting and generating a case group selection set C based on probability distribution of similarity value indexesi
1.3) selection of case groups C0All case group selection set CiAnd (5) carrying out collection and combination to obtain a case group set C.
In this embodiment, a case group selection set C is selected and generated based on probability distribution of similarity value indexesiChronological, the ordered set of medical records BiThe probability of random selection of each case group is shown as the formula (1);
Figure GDA0002812021150000071
in the formula (1), P (SM)i) Set of medical records in order BiMiddle and ith diseasesProbability of case group being selected, SMiSet of medical records in order BiIndex of similarity value of the ith case group, f (SM)j) Set of medical records in order BiThe similarity value of the ith medical record group is normalized, and m is the ordered set B of medical record groupsiNumber of cases in the middle, f (SM)j) The functional expression of (a) is represented by the formula (2);
Figure GDA0002812021150000081
in the formula (2), SM1Set of medical records in order BiSimilarity index, SM, for the 1 st case groupmSet of medical records in order BiThe index value of the similarity value of the mth case group, m is the ordered set B of case groupsiThe number of the group of the middle cases.
In this embodiment, the specified multiple similarity value indexes include an euclidean distance, a cosine distance, a jaccard distance, and an adjusted cosine distance, and at least two of them may be adopted or further expanded to add other similarity value indexes as needed.
Euclidean distance: referring to fig. 2, for the case group set B, euclidean distances are respectively calculated as similarity value index values according to a plurality of specified similarity value indexes, and for different similarity value indexes, the similarity value index values are respectively subjected to descending order arrangement to obtain a first case group ordered set B1(ii) a For all first case group ordered sets B1. First medical record group ordered set B1The elements in the ordered set are selected based on the similarity values, and the similarity values of the elements in the ordered set are respectively recorded as SM1、SM2…SMmThe medical records are arranged from big to small, the probability of selecting a specific medical record group is shown as formulas (1) and (2), and finally a first medical record group selection set C is generated based on probability distribution selection of similarity value indexes1
Cosine distance: referring to fig. 2, cosine distances are respectively calculated as similarity value index values according to a plurality of specified similarity value indexes for the case group B, and the cosine distances are respectively calculated as the similarity value index values for different similarity value indexesPerforming descending order arrangement based on the similarity value index value to obtain a second case group ordered set B2(ii) a For all second case group ordered set B2. Second case group ordered set B2The elements in the ordered set are selected based on the similarity values, and the similarity values of the elements in the ordered set are respectively recorded as SM1、SM2…SMmThe selection sets are arranged from large to small, the probability of selecting a specific case group is shown as formulas (1) and (2), and finally a second case group selection set C is generated based on probability distribution selection of similarity value indexes2
Jacard distance: referring to fig. 2, for the medical record group B, the jaccard distance is respectively calculated as the similarity index value according to the multiple specified similarity indexes, and for the different similarity indexes, the third medical record group ordered set B is obtained by performing descending order arrangement based on the similarity index values respectively3(ii) a Ordered set B for all third case groups3. Ordered set of third medical record group B3The elements in the ordered set are selected based on the similarity values, and the similarity values of the elements in the ordered set are respectively recorded as SM1、SM2…SMmThe selection sets are arranged from large to small, the probability of selecting a specific case group is shown as formulas (1) and (2), and finally a third case group selection set C is generated based on probability distribution selection of similarity value indexes2
Adjusting the cosine distance: referring to fig. 2, the adjusted cosine distances are respectively calculated as similarity value index values according to a plurality of specified similarity value indexes for the case group set B, and the similarity value indexes are respectively sorted in descending order to obtain a fourth case group ordered set B2(ii) a Ordered set B for all fourth case groups4. Fourth case group ordered set B4The elements in the ordered set are selected based on the similarity values, and the similarity values of the elements in the ordered set are respectively recorded as SM1、SM2…SMmThe probability of selecting a specific case group is shown as formulas (1) and (2) in a descending order, and finally, a fourth case group selection set C is generated based on probability distribution selection of similar value indexes3
With reference to figure 2 of the drawings,finally, the medical record group is selected and collected C0The first medical record group selection set C1The second medical record group selection set C2Third medical record group selection set C2Fourth case group selection set C3And (5) carrying out collection and combination to obtain a case group set C. The case group set C is a method for constructing case groups based on multi-index probability distribution, a large number of suitable case groups can be obtained through the method, the distribution deviation condition of the case group data under a single index is avoided to a certain extent, and therefore the purpose of reducing data distribution errors can be achieved.
In this embodiment, the detailed steps of step 2) include:
2.1) expressing the medical record group set C as a medical record group set matrix b, wherein each row of the medical record group set matrix b expresses one medical record, the first n rows of the medical record respectively express n characteristics of the medical record, and the last row of characteristics s is diagnosis information of the medical record;
2.2) determining the weight value of each characteristic of the medical record according to the medical record group set matrix b;
and 2.3) calculating the similarity value s of each case group in the case group set C according to the similarity between the features of the cases and the weight corresponding to the features, and further obtaining a case group set D with similar labels.
The precise description of the medical records is the basis of the similar ordering of the medical records, and in the embodiment, the medical records are described as vectors composed of attribute features:
Figure GDA0002812021150000091
wherein, biA medical record i is shown;
Figure GDA0002812021150000092
the attribute characteristics of the medical record are expressed; siThe diagnosis information is a special feature of the medical record. Therefore, the function expression of the medical record set matrix b is shown in formula (3);
Figure GDA0002812021150000093
referring to formula (3), each row of the medical record group set matrix b represents a medical record, the first n columns of the medical record respectively represent n features of the medical record, the last column of the features s is diagnostic information of the medical record, and the medical record in row 1 is taken as an example, wherein
Figure GDA0002812021150000094
N features representing the case, the last column of features s1The diagnosis information of the medical record.
In this embodiment, the detailed steps of step 2.2) include: calculating an original weight value of each characteristic of the medical record according to the formula (4), and performing normalization processing on the original weight values of all the characteristics to serve as a final weight value of each characteristic, wherein a function expression of the normalization processing is shown as a formula (5);
Figure GDA0002812021150000095
in the formula (4), yi' original weight values for each i feature representing a medical condition,
Figure GDA0002812021150000096
a characteristic vector formed by the ith column characteristics of all medical records of the medical record group set matrix b,
Figure GDA0002812021150000097
Figure GDA0002812021150000098
a diagnosis information feature vector formed by the diagnosis information s of all the medical records of the medical record group set matrix b,
Figure GDA0002812021150000099
σithe ith column of characteristics of all medical records in the medical record group set matrix b
Figure GDA0002812021150000101
The variance of (c). The multi-factor empowerment treatment is a problem which is widely applied but difficult, and in the process of measuring the similarity of medical records,in the weighting processing of the corresponding content of different data structures in the medical record data, due to the difficulty of modeling, weighting from the perspective of a model is difficult, while a general subjective weighting method excessively depends on subjective factors and easily generates subjective errors, in the embodiment, the original weight value of each characteristic of the medical record is calculated according to the formula (4) to be the objective weighting method provided based on data stability, and the method can reduce the errors to a certain extent, so that the purpose of reducing the weighting errors can be achieved.
Figure GDA0002812021150000102
In the formula (4), yi' represents the original weight value of each i characteristics of the medical records, n represents the total number of all medical records in the medical record group set matrix b, yj' represents the original weight value of each j feature of the medical condition.
In this embodiment, the functional expression of the similarity value s of each medical record group in the medical record group set C calculated in step 2.3) is as shown in formula (6);
Figure GDA0002812021150000103
in the formula (6), sijRepresenting the similarity of the case group consisting of case i and case j, n being the total number of features,
Figure GDA0002812021150000104
value of the x-th feature of case i
Figure GDA0002812021150000105
Is the value of the x-th feature of the medical condition j, yxTo be the weight of the xth feature after normalization,
Figure GDA0002812021150000106
is the maximum value of the x-th feature,
Figure GDA0002812021150000107
is the minimum value of the x-th feature。
In this embodiment, for the case group data in the case group set C { (b1, b2), (b3, b4) … }, the similarity between cases is characterized based on the method of similarity between features and weights corresponding to the features in the BM25 algorithm, and the specific calculation method is as shown in formula (6), and the similarity value s of each case group in the set C is obtained by calculation, so as to obtain the case group set D { (b1, b2, sm) with similar labels12)、(b3,b4,sm34)…},sm12The similarity between the group of cases consisting of case b1 and case b2, i.e., the similarity between case b1 and case b2, is shown.
In this embodiment, the detailed steps of constructing the machine learning model in step 3) include:
3.1) respectively designing three loss functions of a scoring loss function, a sequencing loss function and a sequencing probability loss function, wherein:
as shown in fig. 3, the input of the scoring loss function is two medical records data, and the output is a similarity score value, and the scoring loss function is represented by an absolute value between the prediction score value and the label score value;
as shown in fig. 4, the input of the ranking loss function is three medical records data, wherein one of the medical records is queried, two of the ranking medical records are queried, and the output is the similarity score value, and the ranking loss function is represented by the difference between the prediction score value and the label score value between the two ranking medical records respectively;
as shown in fig. 5, the input of the ranking probability loss function is three medical records data, one of which queries the medical records, two of which ranks the medical records, and the output is a score comparison probability value, and the ranking probability loss function is represented by the difference probability between the prediction score value and the label score value of the query medical records between the two rows of the ranking medical records;
3.2) constructing a neural network model, wherein the neural network model consists of an input layer, a hidden layer and an output layer as shown in FIG. 6, the input layer is used for completely inputting all dimensions of the medical record into the network, and the hidden layer is a complete connection layer network and is used for characteristic processing of the medical record; the output layer is used for outputting a similarity metric value between the two medical records;
3.3) respectively taking the loss functions of the neural network model, the sequencing loss function and the sequencing probability loss function as the loss functions of the neural network model and selecting the loss function with the best effect;
and 3.4) selecting an activation function of the neural network model according to the type of the selected loss function, wherein the activation function adopts a linear activation function when the loss function is selected and scored, the activation function adopts a tanh function as the activation function when the loss function is selected and ranked, and the activation function adopts a sigmoid function as the activation function when the loss function is selected and ranked to obtain the probability loss function.
In the embodiment, when three loss functions, namely a scoring loss function, a sorting loss function and a sorting probability loss function, are designed in the step 3.1), the scoring loss function is used for scoring the similar value of the medical record as the loss function, and the functional expression of the scoring loss function is shown as a formula (7); the sorting loss function is used for taking the sorting relation of the similar values of the medical records as a loss function, and the function expression of the sorting loss function is shown as a formula (8); the sequencing probability loss function is used for taking the sequencing probability of the similar value of the case as a loss function, and the functional expression of the sequencing probability loss function is shown as a formula (9);
Figure GDA0002812021150000111
in the formula (7), L (theta) is a loss value of the loss function, and t is the number of case groups;
Figure GDA0002812021150000112
is a medical record
Figure GDA0002812021150000113
The medical records
Figure GDA0002812021150000114
The value of the model prediction score of (a),
Figure GDA0002812021150000115
is a medical record
Figure GDA0002812021150000116
The medical records
Figure GDA0002812021150000117
A tag score value of;
Figure GDA0002812021150000118
in the formula (8), L (theta) is the loss value of the sorting loss function, t is the number of case groups,
Figure GDA0002812021150000119
is a medical record qiThe medical records
Figure GDA00028120211500001110
The neural network model of (1) predicts a score value,
Figure GDA00028120211500001111
is a medical record qiThe medical records
Figure GDA00028120211500001112
The neural network model of (1) predicts a score value,
Figure GDA00028120211500001113
is a medical record qiThe medical records
Figure GDA00028120211500001114
The value of the tag's score of (c),
Figure GDA00028120211500001115
is a medical record qiThe medical records
Figure GDA00028120211500001116
Sign is a sign function;
Figure GDA00028120211500001117
in the formula (9), L (theta) represents the loss value of the ranking probability loss function, t represents the number of case groups,
Figure GDA00028120211500001118
medical record q under the value of label scoreiAnd medical record
Figure GDA00028120211500001119
The similarity between them is greater than the medical record qiAnd medical record
Figure GDA00028120211500001120
The probability of inter-similarity is determined,
Figure GDA00028120211500001121
medical record q under the value of the representation model prediction scoreiAnd medical record
Figure GDA00028120211500001122
The similarity between them is greater than the medical record qiAnd medical record
Figure GDA00028120211500001123
Probability of inter-similarity; wherein
Figure GDA0002812021150000121
The functional expressions of (a) are respectively expressed by the formulas (10) and (11);
Figure GDA0002812021150000122
Figure GDA0002812021150000123
in the formulae (10) and (11),
Figure GDA0002812021150000124
is a medical record qiThe medical records
Figure GDA0002812021150000125
The value of the tag's score of (c),
Figure GDA0002812021150000126
is a medical record qiThe medical records
Figure GDA0002812021150000127
The value of the tag's score of (c),
Figure GDA0002812021150000128
is a medical record qiThe medical records
Figure GDA0002812021150000129
The neural network model of (1) predicts a score value,
Figure GDA00028120211500001210
is a medical record qiThe medical records
Figure GDA00028120211500001211
The neural network model of (1) predicts the score value.
And for the medical record group set D with the similar labels, inputting data into a neural network model, training the similarity of the medical records according to the designed model, and finally obtaining a neural network model which can measure the similarity of the two medical records, namely a regressor. The regressor is a network structure with determined parameters, is a method for judging the similarity value of two medical records, and realizes the functions of inputting two medical records and outputting the similarity value of the medical records, such as inputting medical records b1 and b2 and outputting the similarity value s of medical records b1 and b2 (b1, b 2). On the basis, for the target medical record and the query medical record set A, all medical records in the query medical record set A are respectively input into the regressor together with the medical records to obtain the similarity values of the target medical record and all medical records in the query medical record set A, the similarity value of the former N is found out, and N medical records with the highest similarity to the target medical record are obtained, wherein the value of N can be specified as required and can be one or more. It should be noted that, the above mentioned inquiry medical record set A, medical record group set B and medical record group selection setAnd C0The first medical record group selection set C1The second medical record group selection set C2Third medical record group selection set C2Fourth case group selection set C3The letters referred to in the medical record group set C and the medical record group set D are only used for distinguishing the data sets, and they should not be used to constitute any specific limitation to the data sets themselves.
The embodiment fully considers various actual conditions in the medical record similarity measurement process, describes the actual difficulty of the process in detail, comprehensively discusses each stage of the medical record similarity measurement, provides technical difficulties and solutions in the analysis of each stage, finally integrates the analysis of each stage, and provides the medical record similarity measurement method based on weak supervision machine learning. The weak supervised machine learning is an attempt to construct a predicted machine learning model through weak supervision, the model provides a method for integrating a theoretical model based on theoretical knowledge and a machine learning model based on data, the embodiment method utilizes the advantage of no data dependence of the theoretical model to create a weak supervised label, and utilizes the advantage of the unlimited domain knowledge of the machine learning model to perform the weak supervised learning. The method can be well suitable for the similarity measurement of medical records, and can well solve the practical problems of the loss of medical record similar labels, the imperfection of medical record similar theoretical knowledge and the like.
In order to further verify the medical record searching method based on similarity measurement in the present embodiment, the following adopts the actual medical record data of the JK medical data center and the public data set Robust 04; the evaluation indexes adopt MAP, P @20 and nDCG @20, wherein the average precision of all retrieved medical records of MAP, P @20 represents the average precision of the first 20 retrieved medical records, and nDCG @20 is the accumulated discount precision of the first 20 retrieved medical records, namely the precision of each medical record is different in corresponding weight, and the weight of the previous medical record is larger; experiments are carried out through two types of medical record data sets; the distribution of MAP, P @20 and nDCG @20 for each algorithm under different data sets is shown in Table 1.
Table 1: and (5) measuring the similarity of the medical records and comparing the precision.
Figure GDA0002812021150000131
As can be seen from table 1, compared with the existing theory-based model method (BM25) and SVM-based weak supervised learning algorithm (rankkssvm), the case finding method (method) based on similarity measurement according to the present embodiment has greater advantages under each evaluation index when the ranking probability loss function is adopted as the loss function; the method has the greatest advantage under the index nDCG @20, and the method has relatively smaller advantage under the index MAP, which indicates that the medical record searching method based on the similarity measurement has sensitive medical record groups with larger similarity.
Compared with the Chinese patent document with the publication number of CN104572675B, the medical record searching method based on the similarity measurement of the embodiment carries out the similarity measurement of the medical record from the data perspective based on the machine learning method, fully utilizes the data information of the medical record and provides the self-optimization function based on the error; meanwhile, the problem that the medical record in the Chinese patent document with the publication number of CN104572675B can not completely reflect the illness state of the patient is solved by taking the medical record as a retrieval object. Yanghe et al disclose a similar medical record retrieval system based on a medical big data platform in the southeast national defense medicine, the method of the technical scheme has certain defects in the accuracy of similarity measurement and the completeness of medical record reaction conditions like the Chinese patent document with the publication number of CN104572675B, and the method provided by the medical record searching method based on the similarity measurement of the embodiment solves the problem to a certain extent.
In addition, the present embodiment further provides a medical record searching system based on similarity measurement, which includes a computer device programmed to execute the steps of the aforementioned medical record searching method based on similarity measurement according to the present embodiment. The present embodiment further provides a medical record searching system based on similarity measurement, which includes a computer device, where a storage medium of the computer device stores a computer program programmed to execute the aforementioned medical record searching method based on similarity measurement according to the present embodiment. The present embodiment also provides a computer-readable storage medium, in which a computer program is stored, which is programmed to execute the aforementioned medical record searching method based on similarity measurement of the present embodiment.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (8)

1. A medical record searching method based on similarity measurement is characterized by comprising the following implementation steps:
1) establishing a medical record group aiming at the query medical record set A to obtain a medical record group set C;
2) assigning similar labels of the medical record groups to the medical record group set C to obtain a medical record group data set D with similar labels;
3) constructing a machine learning model, and completing the training of the machine learning model through a medical record group data set D with similar labels, wherein the machine learning model establishes a mapping relation between medical record groups and the similarity of the medical record groups through training;
4) inputting the target medical records and all medical records in the query medical record set A into a machine learning model together to obtain similarity metric values between the target medical records and all medical records in the query medical record set A, and selecting N medical records with the highest similarity metric values to output;
the detailed steps of the step 1) comprise:
1.1) aiming at all medical records in the query medical record set A, carrying out full-arrangement and combination on every two medical records to obtain a medical record group set B, wherein elements in the medical record group set B are medical record groups, and each medical record group consists of two medical records;
1.2) random selection for case group set BPart of the medical record groups obtain a medical record group selection set C0(ii) a Respectively calculating similarity value index values according to multiple specified similarity value indexes aiming at the case group set B, and respectively performing descending order arrangement on the basis of the similarity value index values to obtain a case group ordered set B aiming at different similarity value indexesi(ii) a Ordered set B for all case groupsiRespectively selecting and generating a case group selection set C based on probability distribution of similarity value indexesiThe appointed multiple similarity value indexes comprise at least two of Euclidean distance, cosine distance, Jacard distance and adjusted cosine distance;
1.3) selection of case groups C0All case group selection set CiCarrying out set combination to obtain a medical record group set C;
the detailed steps of the step 2) comprise:
2.1) representing the medical record group set C as a medical record group set matrix b, wherein each row of the medical record group set matrix b represents a medical record, the first n rows of the medical record respectively represent n characteristics of the medical record, and the last row of characteristics s is diagnostic information of the medical record;
2.2) determining the weight value of each characteristic of the medical record according to the medical record group set matrix b;
and 2.3) calculating the similarity value s of each case group in the case group set C according to the similarity between the features of the cases and the weight corresponding to the features, and further obtaining a case group set D with similar labels.
2. The method of claim 1, wherein the selection of the medical record group selection set C is generated based on probability distribution of similarity value indexiChronological, the ordered set of medical records BiThe probability of random selection of each case group is shown as the formula (1);
Figure FDA0002812021140000011
in the formula (1), P (SM)i) Set of medical records in order BiThe ith medical record group quiltProbability of selection, SMiSet of medical records in order BiIndex of similarity value of the ith case group, f (SM)j) Set of medical records in order BiThe similarity value of the ith medical record group is normalized, and m is the ordered set B of medical record groupsiNumber of cases in the middle, f (SM)j) The functional expression of (a) is represented by the formula (2);
Figure FDA0002812021140000021
in the formula (2), SM1Set of medical records in order BiSimilarity index, SM, for the 1 st case groupmSet of medical records in order BiThe index value of the similarity value of the mth case group, m is the ordered set B of case groupsiThe number of the group of the middle cases.
3. The method for finding medical records based on similarity measurement according to claim 1, wherein the detailed steps of step 2.2) include: calculating an original weight value of each feature of the medical record according to the formula (4), and performing normalization processing on the original weight values of all the features to serve as a final weight value of each feature;
Figure FDA0002812021140000022
in the formula (4), yi' original weight values for each i feature representing a medical condition,
Figure FDA0002812021140000023
representing a characteristic vector formed by the ith row of characteristics of all medical records of the medical record group set matrix b;
Figure FDA0002812021140000024
a diagnosis information characteristic vector formed by the diagnosis information s of all medical records of the medical record group set matrix b; sigmaiI column representing all medical records of the medical record group set matrix bThe variance of the features.
4. The method for finding medical records based on similarity measurement according to claim 1, wherein the functional expression of the similarity value s of each medical record group in the medical record group set C calculated in step 2.3) is shown in formula (6);
Figure FDA0002812021140000025
in the formula (6), sijRepresenting the similarity of the case group consisting of case i and case j, n being the total number of features,
Figure FDA0002812021140000026
value of the x-th feature of case i
Figure FDA0002812021140000027
Is the value of the x-th feature of the medical condition j, yxTo be the weight of the xth feature after normalization,
Figure FDA0002812021140000028
is the maximum value of the x-th feature,
Figure FDA0002812021140000029
is the minimum of the xth feature.
5. The medical record searching method based on similarity measurement as claimed in claim 1, wherein the detailed step of constructing the machine learning model in step 3) comprises:
3.1) designing three loss functions of a scoring loss function, a sorting loss function and a sorting probability loss function respectively; the input of the scoring loss function is data of two medical records, the output is a similarity score value, and the scoring loss function is represented by an absolute value between a prediction score value and a label score value; the input of the sorting loss function is three medical record data, wherein one of the three medical record data is used for inquiring the medical record, the two sorting medical record data are used for inquiring the medical record, and the output of the sorting loss function is a similarity score value; the input of the sequencing probability loss function is three medical record data, wherein one of the three medical record data queries a medical record, the two sequencing medical records output score comparison probability values, and the sequencing probability loss function is represented by difference probabilities between the prediction score values and the label score values of the two sequencing medical records respectively by querying the medical record;
3.2) constructing a neural network model, wherein the neural network model consists of an input layer, a hidden layer and an output layer, the input layer is used for completely inputting all dimensions of the medical record into a network, and the hidden layer is a complete connection layer network and is used for characteristic processing of the medical record; the output layer is used for outputting a similarity metric value between the two medical records;
3.3) respectively taking the loss functions of the neural network model, the sequencing loss function and the sequencing probability loss function as the loss functions of the neural network model and selecting the loss function with the best effect;
and 3.4) selecting an activation function of the neural network model according to the type of the selected loss function, wherein the activation function adopts a linear activation function when the loss function is selected and scored, the activation function adopts a tanh function as the activation function when the loss function is selected and ranked, and the activation function adopts a sigmoid function as the activation function when the loss function is selected and ranked to obtain the probability loss function.
6. The medical record searching method based on similarity measurement according to claim 5, wherein when three kinds of loss functions of a scoring loss function, a sorting loss function and a sorting probability loss function are designed in step 3.1), a functional expression of the scoring loss function is shown as formula (7), a functional expression of the sorting loss function is shown as formula (8), and a functional expression of the sorting probability loss function is shown as formula (9);
Figure FDA0002812021140000031
in the formula (7), L (theta) is a loss value of the loss function, and t is the number of case groups;
Figure FDA0002812021140000032
is a medical record
Figure FDA0002812021140000033
The medical records
Figure FDA0002812021140000034
The value of the model prediction score of (a),
Figure FDA0002812021140000035
is a medical record
Figure FDA0002812021140000036
The medical records
Figure FDA0002812021140000037
A tag score value of;
Figure FDA0002812021140000038
in the formula (8), L (theta) is the loss value of the sorting loss function, t is the number of case groups,
Figure FDA0002812021140000039
is a medical record qiThe medical records
Figure FDA00028120211400000310
The neural network model of (1) predicts a score value,
Figure FDA00028120211400000311
is a medical record qiThe medical records
Figure FDA00028120211400000312
Neural network model ofThe value of the type prediction score is,
Figure FDA00028120211400000313
is a medical record qiThe medical records
Figure FDA00028120211400000314
The value of the tag's score of (c),
Figure FDA00028120211400000315
is a medical record qiThe medical records
Figure FDA00028120211400000316
Sign is a sign function;
Figure FDA00028120211400000317
in the formula (9), L (theta) represents the loss value of the ranking probability loss function, t represents the number of case groups,
Figure FDA00028120211400000318
medical record q under the value of label scoreiAnd medical record
Figure FDA00028120211400000319
The similarity between them is greater than the medical record qiAnd medical record
Figure FDA00028120211400000320
The probability of inter-similarity is determined,
Figure FDA00028120211400000321
medical record q under the value of the representation model prediction scoreiAnd medical record
Figure FDA00028120211400000322
The similarity between them is greater than the medical record qiAnd medical record
Figure FDA00028120211400000323
Probability of inter-similarity; wherein
Figure FDA0002812021140000041
The functional expressions of (a) are respectively expressed by the formulas (10) and (11);
Figure FDA0002812021140000042
Figure FDA0002812021140000043
in the formulae (10) and (11),
Figure FDA0002812021140000044
is a medical record qiThe medical records
Figure FDA0002812021140000045
The value of the tag's score of (c),
Figure FDA0002812021140000046
is a medical record qiThe medical records
Figure FDA0002812021140000047
The value of the tag's score of (c),
Figure FDA0002812021140000048
is a medical record qiThe medical records
Figure FDA0002812021140000049
The neural network model of (1) predicts a score value,
Figure FDA00028120211400000410
is a medical record qiThe medical records
Figure FDA00028120211400000411
The neural network model of (1) predicts the score value.
7. A medical record searching system based on similarity measurement, comprising a computer device, characterized in that: the computer device is programmed to perform the steps of the similarity metric-based medical record searching method according to any one of claims 1 to 6, or a storage medium of the computer device has stored therein a computer program programmed to perform the similarity metric-based medical record searching method according to any one of claims 1 to 6.
8. A computer-readable storage medium characterized by: the computer readable storage medium has stored therein a computer program programmed to execute the similarity metric based medical record searching method according to any one of claims 1 to 6.
CN201910137294.5A 2019-02-25 2019-02-25 Medical record searching method and system based on similarity measurement Active CN109935337B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910137294.5A CN109935337B (en) 2019-02-25 2019-02-25 Medical record searching method and system based on similarity measurement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910137294.5A CN109935337B (en) 2019-02-25 2019-02-25 Medical record searching method and system based on similarity measurement

Publications (2)

Publication Number Publication Date
CN109935337A CN109935337A (en) 2019-06-25
CN109935337B true CN109935337B (en) 2021-01-15

Family

ID=66985863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910137294.5A Active CN109935337B (en) 2019-02-25 2019-02-25 Medical record searching method and system based on similarity measurement

Country Status (1)

Country Link
CN (1) CN109935337B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647912A (en) * 2019-08-15 2020-01-03 深圳久凌软件技术有限公司 Fine-grained image recognition method and device, computer equipment and storage medium
CN111091907A (en) * 2019-11-15 2020-05-01 合肥工业大学 Health medical knowledge retrieval method and system based on similar case library
CN112069783A (en) * 2020-09-10 2020-12-11 卫宁健康科技集团股份有限公司 Medical record input method and input system thereof
CN112652393B (en) * 2020-12-31 2021-09-07 山东大学齐鲁医院 ERCP quality control method, system, storage medium and equipment based on deep learning
CN112836012B (en) * 2021-01-25 2023-05-12 中山大学 Similar patient retrieval method based on ordering learning
US20240029850A1 (en) * 2022-07-22 2024-01-25 Opeeka, Inc. Method and system utilizing machine learning to develop and improve care models for patients in an electronic patient system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140089003A1 (en) * 2012-09-27 2014-03-27 University Of Utah Research Foundation Patient health record similarity measure
CN104572675A (en) * 2013-10-16 2015-04-29 中国人民解放军南京军区南京总医院 Similar medical history searching system and method
CN106951719A (en) * 2017-04-10 2017-07-14 荣科科技股份有限公司 The construction method and constructing system of clinical diagnosis model, clinical diagnosing system
CN107180155A (en) * 2017-04-17 2017-09-19 中国科学院计算技术研究所 A kind of disease forecasting method and system based on Manufacturing resource model
CN107193919A (en) * 2017-05-15 2017-09-22 清华大学深圳研究生院 The search method and system of a kind of electronic health record
CN107656952A (en) * 2016-12-30 2018-02-02 青岛中科慧康科技有限公司 The modeling method of parallel intelligent case recommended models
CN108831559A (en) * 2018-06-20 2018-11-16 清华大学 A kind of Chinese electronic health record text analyzing method and system
CN109119134A (en) * 2018-08-09 2019-01-01 脉景(杭州)健康管理有限公司 Medical history data processing method, medical data recommender system, equipment and medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140089003A1 (en) * 2012-09-27 2014-03-27 University Of Utah Research Foundation Patient health record similarity measure
CN104572675A (en) * 2013-10-16 2015-04-29 中国人民解放军南京军区南京总医院 Similar medical history searching system and method
CN107656952A (en) * 2016-12-30 2018-02-02 青岛中科慧康科技有限公司 The modeling method of parallel intelligent case recommended models
CN106951719A (en) * 2017-04-10 2017-07-14 荣科科技股份有限公司 The construction method and constructing system of clinical diagnosis model, clinical diagnosing system
CN107180155A (en) * 2017-04-17 2017-09-19 中国科学院计算技术研究所 A kind of disease forecasting method and system based on Manufacturing resource model
CN107193919A (en) * 2017-05-15 2017-09-22 清华大学深圳研究生院 The search method and system of a kind of electronic health record
CN108831559A (en) * 2018-06-20 2018-11-16 清华大学 A kind of Chinese electronic health record text analyzing method and system
CN109119134A (en) * 2018-08-09 2019-01-01 脉景(杭州)健康管理有限公司 Medical history data processing method, medical data recommender system, equipment and medium

Also Published As

Publication number Publication date
CN109935337A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN109935337B (en) Medical record searching method and system based on similarity measurement
Hand et al. A note on using the F-measure for evaluating record linkage algorithms
Akella et al. Machine learning algorithms for predicting coronary artery disease: efforts toward an open source solution
Marlin et al. Unsupervised pattern discovery in electronic health care data using probabilistic clustering models
Bashir et al. BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting
Begum et al. A case‐based decision support system for individual stress diagnosis using fuzzy similarity matching
US8145582B2 (en) Synthetic events for real time patient analysis
Afshar et al. Taste: temporal and static tensor factorization for phenotyping electronic health records
CN108062978B (en) Method for predicting main adverse cardiovascular events of patients with acute coronary syndrome
CN116364299B (en) Disease diagnosis and treatment path clustering method and system based on heterogeneous information network
CN114220549A (en) Effective physiological feature selection and medical causal reasoning method based on interpretable machine learning
WO2021114635A1 (en) Patient grouping model constructing method, patient grouping method, and related device
Benhar et al. A systematic mapping study of data preparation in heart disease knowledge discovery
CN111091907A (en) Health medical knowledge retrieval method and system based on similar case library
Pargent et al. Predictive modeling with psychological panel data
Hu et al. Variable selection with missing data in both covariates and outcomes: Imputation and machine learning
CN114530248A (en) Method for determining risk pre-warning model of potentially inappropriate prescription for cardiovascular disease
US11537888B2 (en) Systems and methods for predicting pain level
Lin et al. Medical Concept Embedding with Variable Temporal Scopes for Patient Similarity.
Huang et al. Study on patient similarity measurement based on electronic medical records
CN117195027A (en) Cluster weighted clustering integration method based on member selection
US20130253892A1 (en) Creating synthetic events using genetic surprisal data representing a genetic sequence of an organism with an addition of context
CN110957046A (en) Medical health case knowledge matching method and system
CN110033862B (en) Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium
US20210133627A1 (en) Methods and systems for confirming an advisory interaction with an artificial intelligence platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant