CN113284627A - Medication recommendation method based on patient characterization learning - Google Patents
Medication recommendation method based on patient characterization learning Download PDFInfo
- Publication number
- CN113284627A CN113284627A CN202110406631.3A CN202110406631A CN113284627A CN 113284627 A CN113284627 A CN 113284627A CN 202110406631 A CN202110406631 A CN 202110406631A CN 113284627 A CN113284627 A CN 113284627A
- Authority
- CN
- China
- Prior art keywords
- medication
- data
- symptom
- cluster
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000003814 drug Substances 0.000 title claims abstract description 130
- 229940079593 drug Drugs 0.000 title claims abstract description 113
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012512 characterization method Methods 0.000 title claims abstract description 19
- 208000024891 symptom Diseases 0.000 claims abstract description 107
- 239000013598 vector Substances 0.000 claims abstract description 68
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 24
- 238000010219 correlation analysis Methods 0.000 claims abstract description 8
- 230000011218 segmentation Effects 0.000 claims description 32
- 230000006870 function Effects 0.000 claims description 23
- 239000000523 sample Substances 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 6
- 238000005192 partition Methods 0.000 claims description 6
- 239000013610 patient sample Substances 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000001647 drug administration Methods 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 3
- 206010039101 Rhinorrhoea Diseases 0.000 description 8
- 208000010753 nasal discharge Diseases 0.000 description 8
- 206010039085 Rhinitis allergic Diseases 0.000 description 6
- 201000010105 allergic rhinitis Diseases 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000003745 diagnosis Methods 0.000 description 5
- 230000036541 health Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 206010002653 Anosmia Diseases 0.000 description 3
- 206010050515 Hyposmia Diseases 0.000 description 3
- 206010052437 Nasal discomfort Diseases 0.000 description 3
- 206010028748 Nasal obstruction Diseases 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 235000019559 hyposmia Nutrition 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 206010041232 sneezing Diseases 0.000 description 3
- 241001251949 Xanthium sibiricum Species 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 208000001780 epistaxis Diseases 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 206010052140 Eye pruritus Diseases 0.000 description 1
- 206010019233 Headaches Diseases 0.000 description 1
- 208000032843 Hemorrhage Diseases 0.000 description 1
- 206010028740 Nasal dryness Diseases 0.000 description 1
- 208000003251 Pruritus Diseases 0.000 description 1
- 239000013566 allergen Substances 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 231100000869 headache Toxicity 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 230000007803 itching Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000002483 medication Methods 0.000 description 1
- 230000007721 medicinal effect Effects 0.000 description 1
- 208000011309 nasal bleeding Diseases 0.000 description 1
- 201000009240 nasopharyngitis Diseases 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 208000023504 respiratory system disease Diseases 0.000 description 1
- 206010039083 rhinitis Diseases 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Medicinal Chemistry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Toxicology (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Pharmacology & Pharmacy (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a medication recommendation method based on patient characterization learning, which comprises the following steps: extracting data from the electronic medical record, and expressing unstructured complaint text information in the data as structured data; performing characterization learning on the structured data by adopting a stack sparse self-encoder to obtain a low-dimensional expression symptom vector of patient symptom data and a low-dimensional expression medication vector of medication information data; analyzing the low-dimensional representation of the patient symptom data and the low-dimensional representation of the medication information data by using a clustering algorithm to obtain the symptom characteristics and the medication characteristics of the patients in each group cluster; performing typical correlation analysis on the symptom characteristics and the medication characteristics of the patients in each group cluster to obtain the incidence relation between the symptom characteristics and the medication characteristics of the patients in each group cluster; and predicting the recommended medication by adopting a weighted distance average K neighbor algorithm according to the incidence relation. The method can accurately recommend the medicine for the patient according to the electronic medical record, and improves the working efficiency of doctors.
Description
Technical Field
The invention relates to the technical field of medical informatization, in particular to a medication recommendation method based on patient characterization learning.
Background
In recent years, with the continuous development of computers and information technology, the medical information industry of China is gradually built and perfected, wherein the treatment records of patients are changed from original paper materials to digital electronic medical records. Compared with other countries in the world, China starts to build electronic medical records later. However, as the medical and health system is receiving more attention, in recent years, the government of China has developed a plurality of policies to support the construction and development of medical informatization.
The electronic medical record is important data information in medical informatization, and the electronic medical record covers a large amount of medical information and health information in all medical activities of patients to see a doctor, so that the electronic medical record has great research significance. First, for a patient, mining information in an electronic medical record helps the patient to develop his or her own health. The past diagnosis information and health condition of the patient are recorded in the electronic medical record, and if the data information in the records can be extracted and analyzed, certain reference and prediction can be provided for the physical condition and health information of the patient. Meanwhile, other similar patients can be found in the big data by analyzing and mining the electronic medical record data of the patients, and the condition information of the patients with similar symptoms is used for providing reference for the patients; secondly, for the doctor, the medical efficiency can be improved by mining the information in the electronic medical record. The computer processes a large number of electronic medical records through methods such as natural language processing, machine learning and the like, and particularly can assist medical staff in completing diagnosis and treatment of patients through text information in the medical records, so that the decision-making capability of doctors and the treatment efficiency of the patients are improved.
The electronic medical record records not only structured data, but also a large amount of unstructured image, signal and text information, and the unstructured data contains the most precious information in the electronic medical record. Current medication recommendation systems are generally limited to the use of numerical and structured data in patient electronic medical record data. However, the medicine recommendation is performed only by the structured data, so that the medicine taking accuracy is low, and the individual medicine taking requirements of patients are difficult to meet. In addition, the traditional manual data feature extraction method not only consumes a great deal of manpower, but also has higher requirements on professional knowledge.
Therefore, a method for recommending medication to a patient aiming at the problem of insufficient usage of unstructured information is needed.
Disclosure of Invention
The invention provides a medication recommendation method based on patient characterization learning, which aims to overcome the defects in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme.
A medication recommendation method based on patient characterization learning, comprising:
extracting data from the electronic medical record, wherein the data comprises unstructured complaint text information and structured data;
representing unstructured complaint text information in the data as structured data;
performing characterization learning on the structured data by adopting a stack sparse self-encoder to obtain a low-dimensional expression symptom vector of patient symptom data and a low-dimensional expression medication vector of medication information data;
analyzing the low-dimensional representation of the patient symptom data and the low-dimensional representation of the medication information data by using a clustering algorithm to obtain the symptom characteristics and the medication characteristics of the patients in each group cluster;
performing typical correlation analysis on the symptom characteristics and the medication characteristics of the patients in each group cluster to obtain the incidence relation between the symptom characteristics and the medication characteristics of the patients in each group cluster;
and predicting recommended medication by adopting a weighted distance average K nearest neighbor algorithm according to the incidence relation.
Preferably, the method for representing unstructured complaint text information in the data as structured data comprises the following steps:
performing word segmentation on the unstructured complaint text information: processing the unstructured main complaint text information based on a word segmentation tool, calculating mutual information values among words, identifying fixed matched words in the main complaint text according to the mutual information values, and constructing a self-defined dictionary so as to complete word segmentation work of the main complaint;
and extracting information of the result after word segmentation: and comparing the words after word segmentation processing with standard texts in a symptom library of a hospital one by one, if the words are matched, directly finishing extraction work, and if the words are not matched, searching the symptom word texts corresponding to the words after word segmentation based on a word similarity calculation method of a search engine to obtain structured data.
Preferably, the symptom word text corresponding to the word after word segmentation is searched by the word similarity calculation method based on the search engine, and the method comprises the following specific operation steps:
for the processed word p and the symptom bank standard text q,q is a set of related p texts in a symptom library, and the number of search results returned by the page when two words are searched respectively and simultaneously is obtained by using a crawler and is recorded as N (p), N (Q) and N (p ^ Q);
similarity of word phases is calculated according to the following formula (1):
and sequentially calculating the similarity of the word p and all standard texts in the word Q, and if the corresponding maximum similarity exceeds a first set threshold value, putting the word p in the corresponding standard texts to obtain structured data.
Preferably, the stacked sparse autoencoder is formed by connecting two layers of simple autoencoders, and the hidden layer dimensions of the two layers of simple autoencoders are 8 dimensions and 4 dimensions respectively.
Preferably, the characterization learning of the structured data is performed by using a stack sparse self-encoder, and comprises the following steps:
the loss function of the stacked sparse self-encoder is mean square error, and the sparsity limit is introduced by adding an L2 regularization term, and the formula is shown as the following formula (2):
the hidden layer activation function adopts a Relu function shown in the following formula (3):
f(x)=max(0,x) (3)
the reconstructed layer activation function is a Softplus function represented by the following formula (4):
f(x)=log(1+ex) (4)
where J is the loss function of the model, xiIs the ith vector of the input model, N is the number of input data, f and g are the deep neural networks of the encoding stage and the decoding stage in the self-encoder, respectively, α is the regularization coefficient, and w is each parameter in the model.
Preferably, the low dimensional representation of the patient symptom data and the low dimensional representation of the medication information data are analyzed using a clustering algorithm to obtain the symptom characteristic and the medication characteristic of the patient within each population cluster, including:
taking the sum of squared errors SSE as a core index, taking symptom vectors of all patients as a training set, and obtaining the optimal clustering number by using a heuristic elbow rule;
combining symptom vectors expressed by patient symptom data in a low-dimensional manner and medication vectors expressed by medication information data in a low-dimensional manner to form a combined vector, taking the combined vector of all patients as a cluster, dividing the combined vector into two clusters by using a K-Means clustering algorithm, calculating SSE values of the two clusters, and continuously dividing a large cluster in the SSE values corresponding to the two clusters into the two clusters by using the K-Means algorithm until the optimal cluster number is reached;
and counting the obtained original data information of the patients in each cluster group to obtain the symptom characteristics and the corresponding medication characteristics of the patients in each cluster group.
Preferably, the method uses a heuristic elbow rule to obtain the optimal cluster number by taking the sum of squared errors as a core index and the symptom vectors of all patients as a training set, and specifically comprises the following steps:
clustering symptom vectors of all patients and setting different cluster numbers according to the error square sum of the following formula (5) as a core index, calculating an SSE value obtained by taking the symptom vector of each patient as a sample point, respectively drawing a relation graph of the SSE value and the cluster number, and observing the elbow of a curve, namely the cluster number corresponding to the highest curvature position, as an optimal cluster number;
where u is the selected sample point, C is the respective cluster set of the cluster partitions, C is the number of clusters in the cluster partitioniDenotes the ith cluster, miIs CiAverage of all samples in (1).
Preferably, the typical correlation analysis is performed on the symptom characteristics and the medication characteristics of the patients in each population cluster to obtain the association relationship between the symptom characteristics and the medication characteristics of the patients in each population cluster, and the association relationship comprises:
sample set X belonging to symptom characteristics in same group cluster and belonging to Rr×nAnd a sample set of drug administration characteristics Y ∈ Rs×nNormalizing the data to have a mean of 0 and a variance of 1, wherein r and s represent the dimensions of each symptom characteristic and each medication characteristic, respectively;
selecting a plurality of sets of linearly uncorrelated projection vectors in two sample sets, and respectively determining the vector a in each set to be the RrAnd b ∈ RsProjecting X and Y onto X ' and Y ', respectively, i.e. X ' ═ aTX,Y′=bTY; optimizing the objective so that XSolving the constraint optimization problem of the maximum correlation coefficient rho according to the Lagrangian function shown in the following formula (6) to obtain a plurality of groups of linear combinations and corresponding correlation coefficients as the correlation relationship between the symptom characteristics and the medication characteristics of the patients in each group cluster, wherein X' belongs to Rv×n,Y′∈Rv×nAnd v is the number of linear combinations:
wherein S isXY=cov(X,Y)。
Preferably, according to the association relationship, a weighted distance average K-nearest neighbor algorithm is used to predict recommended medication, including:
finding distances of other K groups of projection vectors adjacent in the cluster to which the sample belongs by using a K nearest neighbor algorithm according to a distance calculation formula of the following formula (7):
obtaining and validating patient sample xaAdjacent k sets of projection vectors X' ═ { X ═ X1′,x2′,...,xk′},X′∈Rv×kV is the number of linear combinations which are the dimensionality of a typical correlation vector, a group of original complaint data X and medication data Y which are not learned by a self-encoder are obtained through X', and the average medication value is calculated according to the following formula (8) through the nearest neighbor reverse distance weightingAccording to the average dosageThe relationship with a second set threshold determines whether the medication data Y is recommended for use:
wherein,indicating that the sum of the correlation coefficients after processing is 1,weight, D, representing the inverse distance weighting of the ith projection vectoriIs a patient sample xaDistance from the ith projection vector, YiRepresenting the ith raw medication data.
Preferably, the average dosage is based on the average dosageDetermining whether the medication data Y is recommended in relation to a second set threshold, including: interpreting the mean value of the medicationAnd if the second set threshold is larger than the second set threshold, recommending the medication data Y to be used and setting the medication result to be 1, if not, not recommending the medication data Y to be used and setting the medication result to be 0, wherein the second set threshold is acquired through training of a training set.
The technical scheme provided by the medicine recommendation method based on patient characterization learning can be seen that the method carries out feature extraction, structuralization and dimension reduction on electronic medical record data, analyzes the expression of patient symptoms and medicine information by using a clustering algorithm, establishes the incidence relation between chief complaint symptoms and medicine situations by using typical correlation analysis, adopts a weighted distance average K neighbor algorithm to predict medicine recommendation for patients, and provides reference opinions for doctors during diagnosis.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a medication recommendation method based on patient characterization learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a medication recommendation method based on patient characterization learning according to this embodiment;
FIG. 3 is a schematic diagram illustrating a medication recommendation method based on patient characterization learning according to this embodiment;
FIG. 4 is a diagram illustrating a structure of a stacked sparse self-encoder according to the present embodiment;
FIG. 5 is a schematic diagram of a sample information extraction;
FIG. 6 is a graph of symptom characteristics and medication characteristics results for patient clusters;
FIG. 7 is a graph showing the results of the method on the allergic rhinitis test set of the accuracy rate, precision rate and f1 value varying with the threshold value.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding of the embodiments of the present invention, the following description will be further explained by taking specific embodiments as examples with reference to the drawings, and the embodiments of the present invention are not limited thereto.
Examples
Fig. 1 is a schematic flow chart of a medication recommendation method based on patient characterization learning according to an embodiment of the present invention, fig. 2 and fig. 3 are schematic diagrams of a medication recommendation method based on patient characterization learning according to an embodiment of the present invention, and referring to fig. 1, fig. 2 and fig. 3, the method includes the following steps:
s1 extracts data from the electronic medical record.
The extracted data includes unstructured complaint text information and structured data.
S2 represents the unstructured complaint text information in the data as structured data.
Since the electronic medical record contains a large amount of unstructured text information, in order to mine the value of the unstructured text information, the unstructured text information needs to be converted into structured data which can be utilized by a computer.
There are now more sophisticated segmentation tools in the field of Chinese natural language processing, such as Jieba Chinese segmentation ("Jieba" Chinese segmentation), THULAC (THU Lexical Analyzer for Chinese, Qinghua university) and ICTCCLAS (Institute of Computing Technology, Chinese Lexical Analysis System, Chinese academy of sciences). However, these general word segmentation tools cannot segment the medical complaint text accurately because some medical words are professional. The content of the word segmentation of the unstructured complaint text information in this embodiment specifically includes: processing unstructured main complaint text information based on a word segmentation tool, calculating mutual information values among words, identifying fixed matched words in the main complaint text according to the mutual information values, and constructing a custom dictionary, thereby completing word segmentation work of the main complaint. Specifically, after removing data missing from the chief complaint information in the original data set, in this embodiment, firstly, a basic Jieba word segmentation tool is used to process the chief complaint information therein, an accurate mode is selected, and the text is most accurately segmented; then, the mutual information value of each adjacent word is calculated, the words are sorted from big to small, a proper fixed collocation new word is selected by setting a threshold value, the new word is added into a user-defined dictionary, and the main complaint text is segmented again. The operation process is as follows:
data cleaning is carried out on an original electronic medical record data set, word segmentation is carried out on the main complaint text information by using a Jieba word segmentation tool, and each section of main complaint text forms a plurality of words to form a list;
constructing nodes for the obtained word segmentation results by using a 2-gram model (binary model), storing the word segmentation results by using a Trie tree (dictionary tree) and counting word frequency;
for two words X and Y, P (X), P (Y) are probabilities of two words respectively, and P (X, Y) is a probability of two words appearing adjacently, a mutual information value is calculated according to the following formula (1):
and sequencing the obtained mutual information values from large to small, setting a threshold value and selecting a proper fixed collocation new word. And (4) constructing a custom dictionary by using the new words, and segmenting the main complaint text again.
And extracting information of the result after word segmentation: and comparing the words after word segmentation processing with standard texts in a symptom library of a hospital one by one, if the words are matched, directly finishing extraction work, and if the words are not matched, searching the symptom word texts corresponding to the words after word segmentation based on a word similarity calculation method of a search engine to obtain structured data.
In this embodiment, the search for the symptom word text corresponding to the segmented word by using the word similarity calculation method based on the search engine is performed by searching for the symptom word text corresponding to the segmented word by using the word similarity calculation method based on the Baidu search, and the specific operation steps are as follows:
for the processed word p and the symptom bank standard text q,q is a set of related p texts in a symptom library, and the number of search results returned by the page when two words are searched respectively and simultaneously is obtained by using a crawler and is recorded as N (p), N (Q) and N (p ^ Q);
similarity of word phases is calculated according to the following formula (2):
and sequentially calculating the similarity of the word p and all standard texts in the word Q, and if the corresponding maximum similarity exceeds a first set threshold value, putting the word p in the corresponding standard texts to obtain structured data.
S3, the structured data is subjected to characterization learning by adopting a stack sparse self-encoder, and a symptom vector represented by a low dimension of the patient symptom data and a medication vector represented by a low dimension of the medication information data are obtained.
The loss function of the stacked sparse self-encoder is mean square error, and the sparsity limit is introduced by adding an L2 regularization term, and the formula is shown as the following formula (3):
the hidden layer activation function adopts a Relu function shown in the following formula (4):
f(x)=max(0,x) (4)
the reconstructed layer activation function is a Softplus function represented by the following formula (5):
f(x)=log(1+ex) (5)
where J is the loss function of the model, xiIs the ith vector of the input model, N is the number of input data, f and g are the deep neural networks of the encoding stage and the decoding stage in the self-encoder, respectively, α is the regularization coefficient, and w is each parameter in the model.
The self-encoder reconstructs original input data through the learning of the hidden layer and learns the compressed low-dimensional representation of the original input data, so that the error between the input data and the output data can be reduced to the maximum extent. The stack sparse self-encoder is a model for improving a self-encoder, and a plurality of self-encoder models are stacked to perform information learning layer by layer, so that more complex codes and deep features of original input data can be learned. And a regularization term is added for inhibiting neurons and preventing the phenomena of network over-memory and over-fitting. Fig. 4 is a schematic structural diagram of the stacked sparse self-encoder of this embodiment, and referring to fig. 4, the stacked sparse self-encoder is formed by connecting two layers of simple self-encoders, and the hidden layer dimensions of the two layers of simple self-encoders are 8 dimensions and 4 dimensions, respectively.
S4, the low-dimensional representation of the patient symptom data and the low-dimensional representation of the medication information data are analyzed by using a clustering algorithm, and the symptom characteristics and the medication characteristics of the patients in each group cluster are obtained.
After the self-encoder processing, the low-dimensional symptom data and the low-dimensional medication data are subjected to cluster analysis, and the clustering result is used for depicting the patient group portrait. And analyzing the original data information of the patients in each group cluster according to the clustered result, wherein the main characteristic of the analysis is the symptom characteristic and the medication characteristic of the patients in each group cluster after clustering, and the group patient image also provides reference and basis for the follow-up medication recommendation.
The method comprises the following steps of taking Sum of Squared Errors (SSE) as a core index, taking symptom vectors of all patients as a training set, and obtaining the optimal cluster number by using a heuristic elbow rule, wherein the method specifically comprises the following steps:
clustering symptom vectors of all patients and setting different cluster numbers according to the error square sum of the following formula (6) as a core index, calculating an SSE value obtained by taking the symptom vector of each patient as a sample point, respectively drawing a relation graph of the SSE value and the cluster number, and observing the elbow of a curve, namely the cluster number corresponding to the highest curvature position, as an optimal cluster number;
where u is the selected sample point, C is the respective cluster set of the cluster partitions, C is the number of clusters in the cluster partitioniDenotes the ith cluster, miIs CiAverage of all samples in (1).
Combining symptom vectors expressed by patient symptom data in a low-dimensional manner and medication vectors expressed by medication information data in a low-dimensional manner to form a combined vector, taking the combined vector of all patients as a cluster, dividing the combined vector into two clusters by using a K-Means clustering algorithm, calculating SSE values of the two clusters, and continuously dividing a large cluster in the SSE values corresponding to the two clusters into the two clusters by using the K-Means algorithm until the optimal cluster number is reached;
and counting the obtained original data information of the patients in each cluster group to obtain the symptom characteristics and the corresponding medication characteristics of the patients in each cluster group.
S5, carrying out typical correlation analysis on the symptom characteristics and the medication characteristics of the patients in each group cluster to obtain the incidence relation between the symptom characteristics and the medication characteristics of the patients in each group cluster.
The method specifically comprises the following steps:
characterised by symptoms within the same population clusterSample set X is belonged to Rr×nAnd a sample set of drug administration characteristics Y ∈ Rs×nNormalizing the data to have a mean of 0 and a variance of 1, wherein r and s represent the dimensions of each symptom characteristic and each medication characteristic, respectively;
selecting a plurality of sets of linearly uncorrelated projection vectors in two sample sets, and respectively determining the vector a in each set to be the RrAnd b ∈ RsProjecting X and Y onto X ' and Y ', respectively, i.e. X ' ═ aTX,Y′=bTY; optimizing the target to maximize the correlation coefficient rho of X 'and Y', let SXYCov (X, Y), the criteria function can be written as:observing the formula, the result does not change when the denominator of the numerator is increased by the same factor at the same time, so it is converted into optimizing the numerator when the value of the denominator is fixed. The specific formula is shown in the following formula (7):
maxaTSXYb
s.t.aTSXXa=1,bTSYYb=1 (7)
obtaining a constraint optimization problem of solving the maximum correlation coefficient rho according to a Lagrangian function shown in the following formula (8), obtaining a plurality of groups of linear combinations and corresponding correlation coefficients as the incidence relation between the symptom characteristics and the medication characteristics of the patients in each group of clusters, wherein X' belongs to Rv×n,Y′∈Rv×nAnd v is the number of linear combinations:
wherein S isXY=cov(X,Y)。
S6, according to the incidence relation, the K neighbor algorithm of weighted distance average is adopted to predict the recommended medication.
Finding distances of other K groups of projection vectors adjacent in the cluster to which the sample belongs by using a K nearest neighbor algorithm according to a distance calculation formula of the following formula (9):
k sets of projection vectors X' adjacent to the patient sample X to be confirmed are obtained, X ═ X1′,x2′,...,xk′},X′∈Rv ×kV is the number of linear combinations which are the dimensionality of a typical correlation vector, a group of original complaint data X and medication data Y which are not learned by a self-encoder are obtained through X', and the medication average value is calculated according to the following formula (10) by the nearest neighbor reverse distance weightingAccording to the average dosageThe relationship with the second set threshold determines whether or not the medication data Y is recommended to be used:
wherein,indicating that the sum of the correlation coefficients after processing is 1,weight, D, representing the inverse distance weighting of the ith projection vectoriIs a patient sample xaDistance from the ith projection vector, YiRepresenting the ith raw medication data.
The specific judging steps are as follows: interpreting the mean value of the medicationWhether the second set threshold is larger than a second set threshold or not, if so, recommending the medication data Y to be used, setting the medication result to be 1, if not, not recommending the medication data Y to be used, setting the medication result to be 0, and acquiring the second set threshold through training of a training set。
The following concrete examples of the real data for diagnosing allergic rhinitis in otorhinolaryngology clinics department of a certain Beijing hospital are as follows:
the electronic medical record used in the embodiment is derived from real data of allergic rhinitis diagnosed in otorhinolaryngology clinics department of a certain Beijing hospital, and consists of three major parts, namely basic information, chief complaint information and outpatient information. Wherein the basic information includes the patient's number, number of visits, and patient's gender. The chief complaint information is a refined summary made by doctors according to symptoms, physical signs and properties of patients, duration and mild and severe conditions; the outpatient service information comprises information of the visit time, the diagnosis made by the doctor and the medication advice; because allergic rhinitis is a common respiratory disease, it is usually diagnosed by the symptoms and history of the patient, and in rare cases allergen detection is possible. Therefore, in the electronic medical record data set, the most concerned is the main complaint text information which contains the key information of the doctor for making diagnosis and ordering medication for the patient.
The data set is considered to be the information of patients diagnosed with allergic rhinitis, and the chief complaint information of the data set is relatively fixed in the use of medical words. After the general word segmentation tool processes the words, the mutual information value among the words is calculated, the fixed matching words in the main complaint text are identified according to the mutual information value, and a self-defined dictionary is constructed, so that the word segmentation work of the main complaint is completed.
After the word segmentation is completed, the main complaint text takes 16 types of symptoms given by the hospital as standard texts to extract information in the main complaint text. The chief complaint information is a refined description of the patient symptom information by doctors and has strong medical speciality, but because of different habits of different doctors, the chief complaint information has certain difference in description even for the same thing, and may be slightly different in words or completely different in expression. For example, "epistaxis" and "nasal bleeding" have the same meaning, and "thin nasal discharge", "white nasal discharge" and "watery nasal discharge" have the same meaning. In order to extract information in the main complaints more accurately and comprehensively, words with similar meanings are found out through similarity calculation and classified. The method researches Chinese words, so that a word similarity calculation method of hundred-degree search is used, a network is used as a corpus updated in real time, and the relevance of word pairs is emphasized. The method mainly uses the number of query results obtained by a search engine.
Taking 16 types of symptoms given by a hospital party as standard texts, respectively: nasal obstruction, watery nasal discharge, purulent nasal discharge, watery nasal discharge, nasal itching, nasal dryness, sneezing, postnasal operation, headache, nasal hemorrhage, hyposmia, common cold, ear itching, dripping and leaking after nose, bloody nasal discharge, and itchy eyes. When information is extracted, the results of the word segmentation of the main complaint text are compared one by one, and if the word is matched with the 16-class symptom standard, the extraction work is directly finished; if the word is not matched with the symptom standard, the similarity between the word and the 16 types of symptom standards is calculated in sequence, and if the maximum similarity exceeds a set threshold value, the word is classified into the corresponding standard. An example of information extraction is shown in fig. 5. The patient's symptom information and medication information are all converted into structured information represented by numerical values of 0 or 1, wherein a total of 3731 pieces of patient information each containing 16-dimensional symptom information and 23-dimensional medication information.
All the structured information of complaint symptoms and medications were scored as 8: 2, the training set data is subjected to representation learning by adopting a stack sparse autoencoder to obtain a symptom vector expressed by the low dimension of the patient symptom data and a medication vector expressed by the low dimension of the medication information data, wherein the symptom vector of each patient is set to be 4 dimensions, and the medication vector is set to be 5 dimensions. Clustering analysis is carried out on low-dimensional symptom data and low-dimensional medication data, a patient population image is depicted by using a clustering result, 3 types of patient populations are obtained in total, and the symptom characteristics and the medication characteristics of 3 clustering clusters are shown in figure 6. And finally, performing typical correlation analysis on the patient symptom data and the medication data processed by the self-encoder to obtain the correlation between the symptom characteristic and the medication characteristic of the patient, and taking the first 3 pairs of typical correlation variables to form a 3-dimensional typical correlation vector.
Medication recommendations were made using the test set. The main complaints of one patient are described as intermittent bilateral nasal obstruction, clear watery nasal discharge, nasal itching, continuous sneezing and hyposmia for several months, and the values of the nasal obstruction, the clear watery nasal discharge, the nasal itching, the sneezing and the hyposmia in the symptoms are set to be 1 through information extraction, and the values of the other symptoms are 0. The medication data of the patient are Renokote, Compound Xanthium sibiricum tablets, cis-Er Ning and Aiseiping. The symptom data is processed by an autoencoder to obtain a symptom vector represented in a low dimension, where the calculation results are (1.850080848, 2.189118862, 3.87420392, 2.021232367). And calculating a typical correlation vector which is (-3.701395327, -0.676103465, -3.337884659) through the projection vector obtained on the training set, finding K groups of symptom vectors adjacent to the typical correlation vector in the result of the training set by using a K-nearest neighbor algorithm, and further obtaining K groups of original sample information, wherein K is set to be 24. The average of the 24 groups of the administration data was calculated based on the nearest neighbor inverse distance weighting, and the calculation result was (0.6, 0.067, 1, 0, 0.2, 0.2, 0.533, 0, 0, 0, 0.133, 0.2, 0.4, 0, 0, 0, 0, 0, 0.067) as the administration data of the test sample. And if the score of the medicine is larger than or equal to the set threshold, recommending the medicine to be used, and setting the medicine using result to be 1, otherwise, not recommending the medicine to be used, and setting the medicine using result to be 0. The threshold value is taken to be 0.4, and four medicaments meet the result, namely the recommended results are 'Renokott', 'Compound Xanthium sibiricum', 'cis-Er-Ning' and 'Aisaiping', which are consistent with the real situation.
The results of comparing the recommendation results with the real results and obtaining the change of the accuracy acc, the accuracy p and the f1 values on the test set along with the threshold values are shown in fig. 7, when the threshold value is 0.5, the accuracy of the test set is 90.15%, the accuracy is 84.86%, and the f1 value is 0.4823, which shows that the method can accurately recommend most of the medicines for the allergic rhinitis patients.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A medication recommendation method based on patient characterization learning, comprising:
extracting data from the electronic medical record, wherein the data comprises unstructured complaint text information and structured data;
representing unstructured complaint text information in the data as structured data;
performing characterization learning on the structured data by adopting a stack sparse self-encoder to obtain a low-dimensional expression symptom vector of patient symptom data and a low-dimensional expression medication vector of medication information data;
analyzing the low-dimensional representation of the patient symptom data and the low-dimensional representation of the medication information data by using a clustering algorithm to obtain the symptom characteristics and the medication characteristics of the patients in each group cluster;
performing typical correlation analysis on the symptom characteristics and the medication characteristics of the patients in each group cluster to obtain the incidence relation between the symptom characteristics and the medication characteristics of the patients in each group cluster;
and predicting recommended medication by adopting a weighted distance average K nearest neighbor algorithm according to the incidence relation.
2. The method according to claim 1, wherein the representing unstructured complaint text information in the data as structured data comprises:
performing word segmentation on the unstructured complaint text information: processing the unstructured main complaint text information based on a word segmentation tool, calculating mutual information values among words, identifying fixed matched words in the main complaint text according to the mutual information values, and constructing a self-defined dictionary so as to complete word segmentation work of the main complaint;
and extracting information of the result after word segmentation: and comparing the words after word segmentation processing with standard texts in a symptom library of a hospital one by one, if the words are matched, directly finishing extraction work, and if the words are not matched, searching the symptom word texts corresponding to the words after word segmentation based on a word similarity calculation method of a search engine to obtain structured data.
3. The method according to claim 2, wherein the search for the symptom word text corresponding to the participled word by the search engine-based word similarity calculation method is a search for the symptom word text corresponding to the participled word by the Baidu search-based word similarity calculation method, and the specific operation steps are as follows:
for the processed word p and the symptom bank standard text q,q is a set of related p texts in a symptom library, and the number of search results returned by the page when two words are searched respectively and simultaneously is obtained by using a crawler and is recorded as N (p), N (Q) and N (p ^ Q);
similarity of word phases is calculated according to the following formula (1):
and sequentially calculating the similarity of the word p and all standard texts in the word Q, and if the corresponding maximum similarity exceeds a first set threshold value, putting the word p in the corresponding standard texts to obtain structured data.
4. The method according to claim 1, wherein the stacked sparse self-encoder is formed by connecting two layers of simple self-encoders, and hidden layer dimensions of the two layers of simple self-encoders are 8 dimensions and 4 dimensions respectively.
5. The method of claim 4, wherein the learning of the characterization of the structured data by using the stacked sparse self-encoder comprises:
the loss function of the stacked sparse self-encoder is mean square error, and the sparsity limit is introduced by adding an L2 regularization term, and the formula is shown as the following formula (2):
the hidden layer activation function adopts a Relu function shown in the following formula (3):
f(x)=max(0,x) (3)
the reconstructed layer activation function is a Softplus function represented by the following formula (4):
f(x)=log(1+ex) (4)
where J is the loss function of the model, xiIs the ith vector of the input model, N is the number of input data, f and g are the deep neural networks of the encoding stage and the decoding stage in the self-encoder, respectively, α is the regularization coefficient, and w is each parameter in the model.
6. The method of claim 1, wherein analyzing the low dimensional representation of the patient symptom data and the low dimensional representation of the medication information data using a clustering algorithm to obtain the symptom characteristic and the medication characteristic of the patient within each cluster of the population comprises:
taking the sum of squared errors SSE as a core index, taking symptom vectors of all patients as a training set, and obtaining the optimal clustering number by using a heuristic elbow rule;
combining symptom vectors expressed by patient symptom data in a low-dimensional manner and medication vectors expressed by medication information data in a low-dimensional manner to form a combined vector, taking the combined vector of all patients as a cluster, dividing the combined vector into two clusters by using a K-Means clustering algorithm, calculating SSE values of the two clusters, and continuously dividing a large cluster in the SSE values corresponding to the two clusters into the two clusters by using the K-Means algorithm until the optimal cluster number is reached;
and counting the obtained original data information of the patients in each cluster group to obtain the symptom characteristics and the corresponding medication characteristics of the patients in each cluster group.
7. The method according to claim 6, wherein the method uses a heuristic elbow rule to obtain the optimal cluster number by using the sum of squared errors as a core index and the symptom vectors of all patients as a training set, and specifically comprises:
clustering symptom vectors of all patients and setting different cluster numbers according to the error square sum of the following formula (5) as a core index, calculating an SSE value obtained by taking the symptom vector of each patient as a sample point, respectively drawing a relation graph of the SSE value and the cluster number, and observing the elbow of a curve, namely the cluster number corresponding to the highest curvature position, as an optimal cluster number;
where u is the selected sample point, C is the respective cluster set of the cluster partitions, C is the number of clusters in the cluster partitioniDenotes the ith cluster, miIs CiAverage of all samples in (1).
8. The method of claim 1, wherein the performing canonical correlation analysis on the symptom characteristic and the medication characteristic of the patients in each cluster of groups to obtain the association between the symptom characteristic and the medication characteristic of the patients in each cluster of groups comprises:
sample set X belonging to symptom characteristics in same group cluster and belonging to Rr×nAnd a sample set of drug administration characteristics Y ∈ Rs×nNormalizing the data to have a mean of 0 and a variance of 1, wherein r and s represent the dimensions of each symptom characteristic and each medication characteristic, respectively;
selecting a plurality of sets of linearly uncorrelated projection vectors in two sample sets, and respectively determining the vector a in each set to be the RrAnd b ∈ RsProjecting X and Y onto X ' and Y ', respectively, i.e. X ' ═ aTX,Y′=bTY; the optimization target enables the correlation coefficient rho of the X ' and the Y ' to be maximum, the constraint optimization problem of the maximum correlation coefficient rho is solved according to the Lagrangian function shown in the following formula (6), a plurality of groups of linear combinations and corresponding correlation coefficients are obtained and used as the incidence relation between the symptom characteristic and the medication characteristic of the patients in each group cluster, and at the moment, X ' belongs to Rv×n,Y′∈Rv×nAnd v is the number of linear combinations:
wherein S isXY=cov(X,Y)。
9. The method of claim 1, wherein said predicting recommended medication using a weighted distance average K-nearest neighbor algorithm based on said correlations comprises:
finding distances of other K groups of projection vectors adjacent in the cluster to which the sample belongs by using a K nearest neighbor algorithm according to a distance calculation formula of the following formula (7):
obtaining and validating patient sample xaAdjacent k sets of projection vectors X' ═ { X ═ X1′,x2′,...,xk′},X′∈Rv×kV is the number of linear combinations, which are the dimensions of a typical correlation vector, and a set of non-linear combinations is obtained by XThe original complaint data X and the medication data Y learned by the encoder calculate the average medication value by the nearest neighbor inverse distance weighting according to the following formula (8)According to the average dosageThe relationship with a second set threshold determines whether the medication data Y is recommended for use:
10. The method of claim 1 wherein said mean on drug basisDetermining whether the medication data Y is recommended in relation to a second set threshold, including: interpreting the mean value of the medicationWhether the dosage is larger than a second set threshold value or not, if so, recommending the medication data Y to be used, setting the medication result to be 1, if not, not recommending the medication data Y to be used, setting the medication result to be 0, and setting the second set threshold valueValues are obtained by training in a training set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110406631.3A CN113284627B (en) | 2021-04-15 | 2021-04-15 | Medication recommendation method based on patient characterization learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110406631.3A CN113284627B (en) | 2021-04-15 | 2021-04-15 | Medication recommendation method based on patient characterization learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113284627A true CN113284627A (en) | 2021-08-20 |
CN113284627B CN113284627B (en) | 2024-05-17 |
Family
ID=77276853
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110406631.3A Active CN113284627B (en) | 2021-04-15 | 2021-04-15 | Medication recommendation method based on patient characterization learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113284627B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116344009A (en) * | 2023-05-22 | 2023-06-27 | 武汉盛博汇信息技术有限公司 | Online diagnosis notification method and device |
CN116884554A (en) * | 2023-09-06 | 2023-10-13 | 济宁蜗牛软件科技有限公司 | Electronic medical record classification management method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107516110A (en) * | 2017-08-22 | 2017-12-26 | 华南理工大学 | A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding |
KR20200027091A (en) * | 2018-08-31 | 2020-03-12 | 주식회사 비플컨설팅 | A system that recommends diagnostic cases by deducing the degree of similarity using the artificial neural network technique for the patient's main symptom and diagnostic relationship |
CN110880361A (en) * | 2019-10-16 | 2020-03-13 | 平安科技(深圳)有限公司 | Personalized accurate medication recommendation method and device |
CN111462896A (en) * | 2020-03-31 | 2020-07-28 | 重庆大学 | Real-time intelligent auxiliary ICD coding system and method based on medical record |
CN111696678A (en) * | 2020-06-15 | 2020-09-22 | 中南大学 | Deep learning-based medication decision method and system |
KR20210009182A (en) * | 2019-07-16 | 2021-01-26 | (주)아이쿱 | Method for recommending diabetic medicine based on deep-learning |
-
2021
- 2021-04-15 CN CN202110406631.3A patent/CN113284627B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107516110A (en) * | 2017-08-22 | 2017-12-26 | 华南理工大学 | A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding |
KR20200027091A (en) * | 2018-08-31 | 2020-03-12 | 주식회사 비플컨설팅 | A system that recommends diagnostic cases by deducing the degree of similarity using the artificial neural network technique for the patient's main symptom and diagnostic relationship |
KR20210009182A (en) * | 2019-07-16 | 2021-01-26 | (주)아이쿱 | Method for recommending diabetic medicine based on deep-learning |
CN110880361A (en) * | 2019-10-16 | 2020-03-13 | 平安科技(深圳)有限公司 | Personalized accurate medication recommendation method and device |
CN111462896A (en) * | 2020-03-31 | 2020-07-28 | 重庆大学 | Real-time intelligent auxiliary ICD coding system and method based on medical record |
CN111696678A (en) * | 2020-06-15 | 2020-09-22 | 中南大学 | Deep learning-based medication decision method and system |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116344009A (en) * | 2023-05-22 | 2023-06-27 | 武汉盛博汇信息技术有限公司 | Online diagnosis notification method and device |
CN116344009B (en) * | 2023-05-22 | 2023-08-15 | 武汉盛博汇信息技术有限公司 | Online diagnosis notification method and device |
CN116884554A (en) * | 2023-09-06 | 2023-10-13 | 济宁蜗牛软件科技有限公司 | Electronic medical record classification management method and system |
CN116884554B (en) * | 2023-09-06 | 2023-11-24 | 济宁蜗牛软件科技有限公司 | Electronic medical record classification management method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113284627B (en) | 2024-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111414393B (en) | Semantic similar case retrieval method and equipment based on medical knowledge graph | |
CN109460473B (en) | Electronic medical record multi-label classification method based on symptom extraction and feature representation | |
CN108399163B (en) | Text similarity measurement method combining word aggregation and word combination semantic features | |
Lin et al. | User-level psychological stress detection from social media using deep neural network | |
Ruan et al. | Representation learning for clinical time series prediction tasks in electronic health records | |
Fang et al. | Feature Selection Method Based on Class Discriminative Degree for Intelligent Medical Diagnosis. | |
CN109036577B (en) | Diabetes complication analysis method and device | |
CN108062978B (en) | Method for predicting main adverse cardiovascular events of patients with acute coronary syndrome | |
CN109378066A (en) | A kind of control method and control device for realizing disease forecasting based on feature vector | |
Pokharel et al. | Temporal tree representation for similarity computation between medical patients | |
Biswas et al. | Machine Learning‐Based Model to Predict Heart Disease in Early Stage Employing Different Feature Selection Techniques | |
CN113284627B (en) | Medication recommendation method based on patient characterization learning | |
Wang et al. | EHR2Vec: representation learning of medical concepts from temporal patterns of clinical notes based on self-attention mechanism | |
CN113658712A (en) | Doctor-patient matching method, device, equipment and storage medium | |
Gollapalli et al. | Text mining on hospital stay durations and management of sickle cell disease patients | |
US20220165430A1 (en) | Leveraging deep contextual representation, medical concept representation and term-occurrence statistics in precision medicine to rank clinical studies relevant to a patient | |
Kongburan et al. | Enhancing predictive power of cluster-boosted regression with text-based indexing | |
CN114822734A (en) | Traditional Chinese medical record analysis method based on cyclic convolution neural network | |
Permatasari et al. | Features Selection for Entity Resolution in Prostitution on Twitter | |
Barakat et al. | From Similarities to Probabilities: Feature Engineering for Predicting Drugs’ Adverse Reactions | |
Ibrahim et al. | FORMAT PROPOSED APPROACH FOR PREDICTING LIVER DISEASE | |
Shabbeer et al. | Prediction of Sudden Health Crises Owing to Congestive Heart Failure with Deep Learning Models. | |
CN117235487B (en) | Feature extraction method and system for predicting hospitalization event of asthma patient | |
CN117079821B (en) | Patient hospitalization event prediction method | |
CN116598004B (en) | Prevalence prediction method, prevalence prediction device, computer device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |