CN113284627A - Medication recommendation method based on patient characterization learning - Google Patents

Medication recommendation method based on patient characterization learning Download PDF

Info

Publication number
CN113284627A
CN113284627A CN202110406631.3A CN202110406631A CN113284627A CN 113284627 A CN113284627 A CN 113284627A CN 202110406631 A CN202110406631 A CN 202110406631A CN 113284627 A CN113284627 A CN 113284627A
Authority
CN
China
Prior art keywords
medication
data
symptom
cluster
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110406631.3A
Other languages
Chinese (zh)
Other versions
CN113284627B (en
Inventor
朱振峰
徐慕豪
刘俊秀
葛欣宜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Peking University Third Hospital Peking University Third Clinical Medical College
Original Assignee
Beijing Jiaotong University
Peking University Third Hospital Peking University Third Clinical Medical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University, Peking University Third Hospital Peking University Third Clinical Medical College filed Critical Beijing Jiaotong University
Priority to CN202110406631.3A priority Critical patent/CN113284627B/en
Publication of CN113284627A publication Critical patent/CN113284627A/en
Application granted granted Critical
Publication of CN113284627B publication Critical patent/CN113284627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Medicinal Chemistry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Toxicology (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a medication recommendation method based on patient characterization learning, which comprises the following steps: extracting data from the electronic medical record, and expressing unstructured complaint text information in the data as structured data; performing characterization learning on the structured data by adopting a stack sparse self-encoder to obtain a low-dimensional expression symptom vector of patient symptom data and a low-dimensional expression medication vector of medication information data; analyzing the low-dimensional representation of the patient symptom data and the low-dimensional representation of the medication information data by using a clustering algorithm to obtain the symptom characteristics and the medication characteristics of the patients in each group cluster; performing typical correlation analysis on the symptom characteristics and the medication characteristics of the patients in each group cluster to obtain the incidence relation between the symptom characteristics and the medication characteristics of the patients in each group cluster; and predicting the recommended medication by adopting a weighted distance average K neighbor algorithm according to the incidence relation. The method can accurately recommend the medicine for the patient according to the electronic medical record, and improves the working efficiency of doctors.

Description

Medication recommendation method based on patient characterization learning
Technical Field
The invention relates to the technical field of medical informatization, in particular to a medication recommendation method based on patient characterization learning.
Background
In recent years, with the continuous development of computers and information technology, the medical information industry of China is gradually built and perfected, wherein the treatment records of patients are changed from original paper materials to digital electronic medical records. Compared with other countries in the world, China starts to build electronic medical records later. However, as the medical and health system is receiving more attention, in recent years, the government of China has developed a plurality of policies to support the construction and development of medical informatization.
The electronic medical record is important data information in medical informatization, and the electronic medical record covers a large amount of medical information and health information in all medical activities of patients to see a doctor, so that the electronic medical record has great research significance. First, for a patient, mining information in an electronic medical record helps the patient to develop his or her own health. The past diagnosis information and health condition of the patient are recorded in the electronic medical record, and if the data information in the records can be extracted and analyzed, certain reference and prediction can be provided for the physical condition and health information of the patient. Meanwhile, other similar patients can be found in the big data by analyzing and mining the electronic medical record data of the patients, and the condition information of the patients with similar symptoms is used for providing reference for the patients; secondly, for the doctor, the medical efficiency can be improved by mining the information in the electronic medical record. The computer processes a large number of electronic medical records through methods such as natural language processing, machine learning and the like, and particularly can assist medical staff in completing diagnosis and treatment of patients through text information in the medical records, so that the decision-making capability of doctors and the treatment efficiency of the patients are improved.
The electronic medical record records not only structured data, but also a large amount of unstructured image, signal and text information, and the unstructured data contains the most precious information in the electronic medical record. Current medication recommendation systems are generally limited to the use of numerical and structured data in patient electronic medical record data. However, the medicine recommendation is performed only by the structured data, so that the medicine taking accuracy is low, and the individual medicine taking requirements of patients are difficult to meet. In addition, the traditional manual data feature extraction method not only consumes a great deal of manpower, but also has higher requirements on professional knowledge.
Therefore, a method for recommending medication to a patient aiming at the problem of insufficient usage of unstructured information is needed.
Disclosure of Invention
The invention provides a medication recommendation method based on patient characterization learning, which aims to overcome the defects in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme.
A medication recommendation method based on patient characterization learning, comprising:
extracting data from the electronic medical record, wherein the data comprises unstructured complaint text information and structured data;
representing unstructured complaint text information in the data as structured data;
performing characterization learning on the structured data by adopting a stack sparse self-encoder to obtain a low-dimensional expression symptom vector of patient symptom data and a low-dimensional expression medication vector of medication information data;
analyzing the low-dimensional representation of the patient symptom data and the low-dimensional representation of the medication information data by using a clustering algorithm to obtain the symptom characteristics and the medication characteristics of the patients in each group cluster;
performing typical correlation analysis on the symptom characteristics and the medication characteristics of the patients in each group cluster to obtain the incidence relation between the symptom characteristics and the medication characteristics of the patients in each group cluster;
and predicting recommended medication by adopting a weighted distance average K nearest neighbor algorithm according to the incidence relation.
Preferably, the method for representing unstructured complaint text information in the data as structured data comprises the following steps:
performing word segmentation on the unstructured complaint text information: processing the unstructured main complaint text information based on a word segmentation tool, calculating mutual information values among words, identifying fixed matched words in the main complaint text according to the mutual information values, and constructing a self-defined dictionary so as to complete word segmentation work of the main complaint;
and extracting information of the result after word segmentation: and comparing the words after word segmentation processing with standard texts in a symptom library of a hospital one by one, if the words are matched, directly finishing extraction work, and if the words are not matched, searching the symptom word texts corresponding to the words after word segmentation based on a word similarity calculation method of a search engine to obtain structured data.
Preferably, the symptom word text corresponding to the word after word segmentation is searched by the word similarity calculation method based on the search engine, and the method comprises the following specific operation steps:
for the processed word p and the symptom bank standard text q,
Figure BDA0003022526440000031
q is a set of related p texts in a symptom library, and the number of search results returned by the page when two words are searched respectively and simultaneously is obtained by using a crawler and is recorded as N (p), N (Q) and N (p ^ Q);
similarity of word phases is calculated according to the following formula (1):
Figure BDA0003022526440000032
and sequentially calculating the similarity of the word p and all standard texts in the word Q, and if the corresponding maximum similarity exceeds a first set threshold value, putting the word p in the corresponding standard texts to obtain structured data.
Preferably, the stacked sparse autoencoder is formed by connecting two layers of simple autoencoders, and the hidden layer dimensions of the two layers of simple autoencoders are 8 dimensions and 4 dimensions respectively.
Preferably, the characterization learning of the structured data is performed by using a stack sparse self-encoder, and comprises the following steps:
the loss function of the stacked sparse self-encoder is mean square error, and the sparsity limit is introduced by adding an L2 regularization term, and the formula is shown as the following formula (2):
Figure BDA0003022526440000033
the hidden layer activation function adopts a Relu function shown in the following formula (3):
f(x)=max(0,x) (3)
the reconstructed layer activation function is a Softplus function represented by the following formula (4):
f(x)=log(1+ex) (4)
where J is the loss function of the model, xiIs the ith vector of the input model, N is the number of input data, f and g are the deep neural networks of the encoding stage and the decoding stage in the self-encoder, respectively, α is the regularization coefficient, and w is each parameter in the model.
Preferably, the low dimensional representation of the patient symptom data and the low dimensional representation of the medication information data are analyzed using a clustering algorithm to obtain the symptom characteristic and the medication characteristic of the patient within each population cluster, including:
taking the sum of squared errors SSE as a core index, taking symptom vectors of all patients as a training set, and obtaining the optimal clustering number by using a heuristic elbow rule;
combining symptom vectors expressed by patient symptom data in a low-dimensional manner and medication vectors expressed by medication information data in a low-dimensional manner to form a combined vector, taking the combined vector of all patients as a cluster, dividing the combined vector into two clusters by using a K-Means clustering algorithm, calculating SSE values of the two clusters, and continuously dividing a large cluster in the SSE values corresponding to the two clusters into the two clusters by using the K-Means algorithm until the optimal cluster number is reached;
and counting the obtained original data information of the patients in each cluster group to obtain the symptom characteristics and the corresponding medication characteristics of the patients in each cluster group.
Preferably, the method uses a heuristic elbow rule to obtain the optimal cluster number by taking the sum of squared errors as a core index and the symptom vectors of all patients as a training set, and specifically comprises the following steps:
clustering symptom vectors of all patients and setting different cluster numbers according to the error square sum of the following formula (5) as a core index, calculating an SSE value obtained by taking the symptom vector of each patient as a sample point, respectively drawing a relation graph of the SSE value and the cluster number, and observing the elbow of a curve, namely the cluster number corresponding to the highest curvature position, as an optimal cluster number;
Figure BDA0003022526440000051
where u is the selected sample point, C is the respective cluster set of the cluster partitions, C is the number of clusters in the cluster partitioniDenotes the ith cluster, miIs CiAverage of all samples in (1).
Preferably, the typical correlation analysis is performed on the symptom characteristics and the medication characteristics of the patients in each population cluster to obtain the association relationship between the symptom characteristics and the medication characteristics of the patients in each population cluster, and the association relationship comprises:
sample set X belonging to symptom characteristics in same group cluster and belonging to Rr×nAnd a sample set of drug administration characteristics Y ∈ Rs×nNormalizing the data to have a mean of 0 and a variance of 1, wherein r and s represent the dimensions of each symptom characteristic and each medication characteristic, respectively;
selecting a plurality of sets of linearly uncorrelated projection vectors in two sample sets, and respectively determining the vector a in each set to be the RrAnd b ∈ RsProjecting X and Y onto X ' and Y ', respectively, i.e. X ' ═ aTX,Y′=bTY; optimizing the objective so that XSolving the constraint optimization problem of the maximum correlation coefficient rho according to the Lagrangian function shown in the following formula (6) to obtain a plurality of groups of linear combinations and corresponding correlation coefficients as the correlation relationship between the symptom characteristics and the medication characteristics of the patients in each group cluster, wherein X' belongs to Rv×n,Y′∈Rv×nAnd v is the number of linear combinations:
Figure BDA0003022526440000052
wherein S isXY=cov(X,Y)。
Preferably, according to the association relationship, a weighted distance average K-nearest neighbor algorithm is used to predict recommended medication, including:
finding distances of other K groups of projection vectors adjacent in the cluster to which the sample belongs by using a K nearest neighbor algorithm according to a distance calculation formula of the following formula (7):
Figure BDA0003022526440000053
obtaining and validating patient sample xaAdjacent k sets of projection vectors X' ═ { X ═ X1′,x2′,...,xk′},X′∈Rv×kV is the number of linear combinations which are the dimensionality of a typical correlation vector, a group of original complaint data X and medication data Y which are not learned by a self-encoder are obtained through X', and the average medication value is calculated according to the following formula (8) through the nearest neighbor reverse distance weighting
Figure BDA0003022526440000061
According to the average dosage
Figure BDA0003022526440000062
The relationship with a second set threshold determines whether the medication data Y is recommended for use:
Figure BDA0003022526440000063
wherein,
Figure BDA0003022526440000064
indicating that the sum of the correlation coefficients after processing is 1,
Figure BDA0003022526440000065
weight, D, representing the inverse distance weighting of the ith projection vectoriIs a patient sample xaDistance from the ith projection vector, YiRepresenting the ith raw medication data.
Preferably, the average dosage is based on the average dosage
Figure BDA0003022526440000066
Determining whether the medication data Y is recommended in relation to a second set threshold, including: interpreting the mean value of the medication
Figure BDA0003022526440000067
And if the second set threshold is larger than the second set threshold, recommending the medication data Y to be used and setting the medication result to be 1, if not, not recommending the medication data Y to be used and setting the medication result to be 0, wherein the second set threshold is acquired through training of a training set.
The technical scheme provided by the medicine recommendation method based on patient characterization learning can be seen that the method carries out feature extraction, structuralization and dimension reduction on electronic medical record data, analyzes the expression of patient symptoms and medicine information by using a clustering algorithm, establishes the incidence relation between chief complaint symptoms and medicine situations by using typical correlation analysis, adopts a weighted distance average K neighbor algorithm to predict medicine recommendation for patients, and provides reference opinions for doctors during diagnosis.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a medication recommendation method based on patient characterization learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a medication recommendation method based on patient characterization learning according to this embodiment;
FIG. 3 is a schematic diagram illustrating a medication recommendation method based on patient characterization learning according to this embodiment;
FIG. 4 is a diagram illustrating a structure of a stacked sparse self-encoder according to the present embodiment;
FIG. 5 is a schematic diagram of a sample information extraction;
FIG. 6 is a graph of symptom characteristics and medication characteristics results for patient clusters;
FIG. 7 is a graph showing the results of the method on the allergic rhinitis test set of the accuracy rate, precision rate and f1 value varying with the threshold value.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding of the embodiments of the present invention, the following description will be further explained by taking specific embodiments as examples with reference to the drawings, and the embodiments of the present invention are not limited thereto.
Examples
Fig. 1 is a schematic flow chart of a medication recommendation method based on patient characterization learning according to an embodiment of the present invention, fig. 2 and fig. 3 are schematic diagrams of a medication recommendation method based on patient characterization learning according to an embodiment of the present invention, and referring to fig. 1, fig. 2 and fig. 3, the method includes the following steps:
s1 extracts data from the electronic medical record.
The extracted data includes unstructured complaint text information and structured data.
S2 represents the unstructured complaint text information in the data as structured data.
Since the electronic medical record contains a large amount of unstructured text information, in order to mine the value of the unstructured text information, the unstructured text information needs to be converted into structured data which can be utilized by a computer.
There are now more sophisticated segmentation tools in the field of Chinese natural language processing, such as Jieba Chinese segmentation ("Jieba" Chinese segmentation), THULAC (THU Lexical Analyzer for Chinese, Qinghua university) and ICTCCLAS (Institute of Computing Technology, Chinese Lexical Analysis System, Chinese academy of sciences). However, these general word segmentation tools cannot segment the medical complaint text accurately because some medical words are professional. The content of the word segmentation of the unstructured complaint text information in this embodiment specifically includes: processing unstructured main complaint text information based on a word segmentation tool, calculating mutual information values among words, identifying fixed matched words in the main complaint text according to the mutual information values, and constructing a custom dictionary, thereby completing word segmentation work of the main complaint. Specifically, after removing data missing from the chief complaint information in the original data set, in this embodiment, firstly, a basic Jieba word segmentation tool is used to process the chief complaint information therein, an accurate mode is selected, and the text is most accurately segmented; then, the mutual information value of each adjacent word is calculated, the words are sorted from big to small, a proper fixed collocation new word is selected by setting a threshold value, the new word is added into a user-defined dictionary, and the main complaint text is segmented again. The operation process is as follows:
data cleaning is carried out on an original electronic medical record data set, word segmentation is carried out on the main complaint text information by using a Jieba word segmentation tool, and each section of main complaint text forms a plurality of words to form a list;
constructing nodes for the obtained word segmentation results by using a 2-gram model (binary model), storing the word segmentation results by using a Trie tree (dictionary tree) and counting word frequency;
for two words X and Y, P (X), P (Y) are probabilities of two words respectively, and P (X, Y) is a probability of two words appearing adjacently, a mutual information value is calculated according to the following formula (1):
Figure BDA0003022526440000091
and sequencing the obtained mutual information values from large to small, setting a threshold value and selecting a proper fixed collocation new word. And (4) constructing a custom dictionary by using the new words, and segmenting the main complaint text again.
And extracting information of the result after word segmentation: and comparing the words after word segmentation processing with standard texts in a symptom library of a hospital one by one, if the words are matched, directly finishing extraction work, and if the words are not matched, searching the symptom word texts corresponding to the words after word segmentation based on a word similarity calculation method of a search engine to obtain structured data.
In this embodiment, the search for the symptom word text corresponding to the segmented word by using the word similarity calculation method based on the search engine is performed by searching for the symptom word text corresponding to the segmented word by using the word similarity calculation method based on the Baidu search, and the specific operation steps are as follows:
for the processed word p and the symptom bank standard text q,
Figure BDA0003022526440000092
q is a set of related p texts in a symptom library, and the number of search results returned by the page when two words are searched respectively and simultaneously is obtained by using a crawler and is recorded as N (p), N (Q) and N (p ^ Q);
similarity of word phases is calculated according to the following formula (2):
Figure BDA0003022526440000101
and sequentially calculating the similarity of the word p and all standard texts in the word Q, and if the corresponding maximum similarity exceeds a first set threshold value, putting the word p in the corresponding standard texts to obtain structured data.
S3, the structured data is subjected to characterization learning by adopting a stack sparse self-encoder, and a symptom vector represented by a low dimension of the patient symptom data and a medication vector represented by a low dimension of the medication information data are obtained.
The loss function of the stacked sparse self-encoder is mean square error, and the sparsity limit is introduced by adding an L2 regularization term, and the formula is shown as the following formula (3):
Figure BDA0003022526440000102
the hidden layer activation function adopts a Relu function shown in the following formula (4):
f(x)=max(0,x) (4)
the reconstructed layer activation function is a Softplus function represented by the following formula (5):
f(x)=log(1+ex) (5)
where J is the loss function of the model, xiIs the ith vector of the input model, N is the number of input data, f and g are the deep neural networks of the encoding stage and the decoding stage in the self-encoder, respectively, α is the regularization coefficient, and w is each parameter in the model.
The self-encoder reconstructs original input data through the learning of the hidden layer and learns the compressed low-dimensional representation of the original input data, so that the error between the input data and the output data can be reduced to the maximum extent. The stack sparse self-encoder is a model for improving a self-encoder, and a plurality of self-encoder models are stacked to perform information learning layer by layer, so that more complex codes and deep features of original input data can be learned. And a regularization term is added for inhibiting neurons and preventing the phenomena of network over-memory and over-fitting. Fig. 4 is a schematic structural diagram of the stacked sparse self-encoder of this embodiment, and referring to fig. 4, the stacked sparse self-encoder is formed by connecting two layers of simple self-encoders, and the hidden layer dimensions of the two layers of simple self-encoders are 8 dimensions and 4 dimensions, respectively.
S4, the low-dimensional representation of the patient symptom data and the low-dimensional representation of the medication information data are analyzed by using a clustering algorithm, and the symptom characteristics and the medication characteristics of the patients in each group cluster are obtained.
After the self-encoder processing, the low-dimensional symptom data and the low-dimensional medication data are subjected to cluster analysis, and the clustering result is used for depicting the patient group portrait. And analyzing the original data information of the patients in each group cluster according to the clustered result, wherein the main characteristic of the analysis is the symptom characteristic and the medication characteristic of the patients in each group cluster after clustering, and the group patient image also provides reference and basis for the follow-up medication recommendation.
The method comprises the following steps of taking Sum of Squared Errors (SSE) as a core index, taking symptom vectors of all patients as a training set, and obtaining the optimal cluster number by using a heuristic elbow rule, wherein the method specifically comprises the following steps:
clustering symptom vectors of all patients and setting different cluster numbers according to the error square sum of the following formula (6) as a core index, calculating an SSE value obtained by taking the symptom vector of each patient as a sample point, respectively drawing a relation graph of the SSE value and the cluster number, and observing the elbow of a curve, namely the cluster number corresponding to the highest curvature position, as an optimal cluster number;
Figure BDA0003022526440000111
where u is the selected sample point, C is the respective cluster set of the cluster partitions, C is the number of clusters in the cluster partitioniDenotes the ith cluster, miIs CiAverage of all samples in (1).
Combining symptom vectors expressed by patient symptom data in a low-dimensional manner and medication vectors expressed by medication information data in a low-dimensional manner to form a combined vector, taking the combined vector of all patients as a cluster, dividing the combined vector into two clusters by using a K-Means clustering algorithm, calculating SSE values of the two clusters, and continuously dividing a large cluster in the SSE values corresponding to the two clusters into the two clusters by using the K-Means algorithm until the optimal cluster number is reached;
and counting the obtained original data information of the patients in each cluster group to obtain the symptom characteristics and the corresponding medication characteristics of the patients in each cluster group.
S5, carrying out typical correlation analysis on the symptom characteristics and the medication characteristics of the patients in each group cluster to obtain the incidence relation between the symptom characteristics and the medication characteristics of the patients in each group cluster.
The method specifically comprises the following steps:
characterised by symptoms within the same population clusterSample set X is belonged to Rr×nAnd a sample set of drug administration characteristics Y ∈ Rs×nNormalizing the data to have a mean of 0 and a variance of 1, wherein r and s represent the dimensions of each symptom characteristic and each medication characteristic, respectively;
selecting a plurality of sets of linearly uncorrelated projection vectors in two sample sets, and respectively determining the vector a in each set to be the RrAnd b ∈ RsProjecting X and Y onto X ' and Y ', respectively, i.e. X ' ═ aTX,Y′=bTY; optimizing the target to maximize the correlation coefficient rho of X 'and Y', let SXYCov (X, Y), the criteria function can be written as:
Figure BDA0003022526440000121
observing the formula, the result does not change when the denominator of the numerator is increased by the same factor at the same time, so it is converted into optimizing the numerator when the value of the denominator is fixed. The specific formula is shown in the following formula (7):
maxaTSXYb
s.t.aTSXXa=1,bTSYYb=1 (7)
obtaining a constraint optimization problem of solving the maximum correlation coefficient rho according to a Lagrangian function shown in the following formula (8), obtaining a plurality of groups of linear combinations and corresponding correlation coefficients as the incidence relation between the symptom characteristics and the medication characteristics of the patients in each group of clusters, wherein X' belongs to Rv×n,Y′∈Rv×nAnd v is the number of linear combinations:
Figure BDA0003022526440000122
wherein S isXY=cov(X,Y)。
S6, according to the incidence relation, the K neighbor algorithm of weighted distance average is adopted to predict the recommended medication.
Finding distances of other K groups of projection vectors adjacent in the cluster to which the sample belongs by using a K nearest neighbor algorithm according to a distance calculation formula of the following formula (9):
Figure BDA0003022526440000131
k sets of projection vectors X' adjacent to the patient sample X to be confirmed are obtained, X ═ X1′,x2′,...,xk′},X′∈Rv ×kV is the number of linear combinations which are the dimensionality of a typical correlation vector, a group of original complaint data X and medication data Y which are not learned by a self-encoder are obtained through X', and the medication average value is calculated according to the following formula (10) by the nearest neighbor reverse distance weighting
Figure BDA0003022526440000132
According to the average dosage
Figure BDA0003022526440000133
The relationship with the second set threshold determines whether or not the medication data Y is recommended to be used:
Figure BDA0003022526440000134
wherein,
Figure BDA0003022526440000135
indicating that the sum of the correlation coefficients after processing is 1,
Figure BDA0003022526440000136
weight, D, representing the inverse distance weighting of the ith projection vectoriIs a patient sample xaDistance from the ith projection vector, YiRepresenting the ith raw medication data.
The specific judging steps are as follows: interpreting the mean value of the medication
Figure BDA0003022526440000137
Whether the second set threshold is larger than a second set threshold or not, if so, recommending the medication data Y to be used, setting the medication result to be 1, if not, not recommending the medication data Y to be used, setting the medication result to be 0, and acquiring the second set threshold through training of a training set。
The following concrete examples of the real data for diagnosing allergic rhinitis in otorhinolaryngology clinics department of a certain Beijing hospital are as follows:
the electronic medical record used in the embodiment is derived from real data of allergic rhinitis diagnosed in otorhinolaryngology clinics department of a certain Beijing hospital, and consists of three major parts, namely basic information, chief complaint information and outpatient information. Wherein the basic information includes the patient's number, number of visits, and patient's gender. The chief complaint information is a refined summary made by doctors according to symptoms, physical signs and properties of patients, duration and mild and severe conditions; the outpatient service information comprises information of the visit time, the diagnosis made by the doctor and the medication advice; because allergic rhinitis is a common respiratory disease, it is usually diagnosed by the symptoms and history of the patient, and in rare cases allergen detection is possible. Therefore, in the electronic medical record data set, the most concerned is the main complaint text information which contains the key information of the doctor for making diagnosis and ordering medication for the patient.
The data set is considered to be the information of patients diagnosed with allergic rhinitis, and the chief complaint information of the data set is relatively fixed in the use of medical words. After the general word segmentation tool processes the words, the mutual information value among the words is calculated, the fixed matching words in the main complaint text are identified according to the mutual information value, and a self-defined dictionary is constructed, so that the word segmentation work of the main complaint is completed.
After the word segmentation is completed, the main complaint text takes 16 types of symptoms given by the hospital as standard texts to extract information in the main complaint text. The chief complaint information is a refined description of the patient symptom information by doctors and has strong medical speciality, but because of different habits of different doctors, the chief complaint information has certain difference in description even for the same thing, and may be slightly different in words or completely different in expression. For example, "epistaxis" and "nasal bleeding" have the same meaning, and "thin nasal discharge", "white nasal discharge" and "watery nasal discharge" have the same meaning. In order to extract information in the main complaints more accurately and comprehensively, words with similar meanings are found out through similarity calculation and classified. The method researches Chinese words, so that a word similarity calculation method of hundred-degree search is used, a network is used as a corpus updated in real time, and the relevance of word pairs is emphasized. The method mainly uses the number of query results obtained by a search engine.
Taking 16 types of symptoms given by a hospital party as standard texts, respectively: nasal obstruction, watery nasal discharge, purulent nasal discharge, watery nasal discharge, nasal itching, nasal dryness, sneezing, postnasal operation, headache, nasal hemorrhage, hyposmia, common cold, ear itching, dripping and leaking after nose, bloody nasal discharge, and itchy eyes. When information is extracted, the results of the word segmentation of the main complaint text are compared one by one, and if the word is matched with the 16-class symptom standard, the extraction work is directly finished; if the word is not matched with the symptom standard, the similarity between the word and the 16 types of symptom standards is calculated in sequence, and if the maximum similarity exceeds a set threshold value, the word is classified into the corresponding standard. An example of information extraction is shown in fig. 5. The patient's symptom information and medication information are all converted into structured information represented by numerical values of 0 or 1, wherein a total of 3731 pieces of patient information each containing 16-dimensional symptom information and 23-dimensional medication information.
All the structured information of complaint symptoms and medications were scored as 8: 2, the training set data is subjected to representation learning by adopting a stack sparse autoencoder to obtain a symptom vector expressed by the low dimension of the patient symptom data and a medication vector expressed by the low dimension of the medication information data, wherein the symptom vector of each patient is set to be 4 dimensions, and the medication vector is set to be 5 dimensions. Clustering analysis is carried out on low-dimensional symptom data and low-dimensional medication data, a patient population image is depicted by using a clustering result, 3 types of patient populations are obtained in total, and the symptom characteristics and the medication characteristics of 3 clustering clusters are shown in figure 6. And finally, performing typical correlation analysis on the patient symptom data and the medication data processed by the self-encoder to obtain the correlation between the symptom characteristic and the medication characteristic of the patient, and taking the first 3 pairs of typical correlation variables to form a 3-dimensional typical correlation vector.
Medication recommendations were made using the test set. The main complaints of one patient are described as intermittent bilateral nasal obstruction, clear watery nasal discharge, nasal itching, continuous sneezing and hyposmia for several months, and the values of the nasal obstruction, the clear watery nasal discharge, the nasal itching, the sneezing and the hyposmia in the symptoms are set to be 1 through information extraction, and the values of the other symptoms are 0. The medication data of the patient are Renokote, Compound Xanthium sibiricum tablets, cis-Er Ning and Aiseiping. The symptom data is processed by an autoencoder to obtain a symptom vector represented in a low dimension, where the calculation results are (1.850080848, 2.189118862, 3.87420392, 2.021232367). And calculating a typical correlation vector which is (-3.701395327, -0.676103465, -3.337884659) through the projection vector obtained on the training set, finding K groups of symptom vectors adjacent to the typical correlation vector in the result of the training set by using a K-nearest neighbor algorithm, and further obtaining K groups of original sample information, wherein K is set to be 24. The average of the 24 groups of the administration data was calculated based on the nearest neighbor inverse distance weighting, and the calculation result was (0.6, 0.067, 1, 0, 0.2, 0.2, 0.533, 0, 0, 0, 0.133, 0.2, 0.4, 0, 0, 0, 0, 0, 0.067) as the administration data of the test sample. And if the score of the medicine is larger than or equal to the set threshold, recommending the medicine to be used, and setting the medicine using result to be 1, otherwise, not recommending the medicine to be used, and setting the medicine using result to be 0. The threshold value is taken to be 0.4, and four medicaments meet the result, namely the recommended results are 'Renokott', 'Compound Xanthium sibiricum', 'cis-Er-Ning' and 'Aisaiping', which are consistent with the real situation.
The results of comparing the recommendation results with the real results and obtaining the change of the accuracy acc, the accuracy p and the f1 values on the test set along with the threshold values are shown in fig. 7, when the threshold value is 0.5, the accuracy of the test set is 90.15%, the accuracy is 84.86%, and the f1 value is 0.4823, which shows that the method can accurately recommend most of the medicines for the allergic rhinitis patients.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A medication recommendation method based on patient characterization learning, comprising:
extracting data from the electronic medical record, wherein the data comprises unstructured complaint text information and structured data;
representing unstructured complaint text information in the data as structured data;
performing characterization learning on the structured data by adopting a stack sparse self-encoder to obtain a low-dimensional expression symptom vector of patient symptom data and a low-dimensional expression medication vector of medication information data;
analyzing the low-dimensional representation of the patient symptom data and the low-dimensional representation of the medication information data by using a clustering algorithm to obtain the symptom characteristics and the medication characteristics of the patients in each group cluster;
performing typical correlation analysis on the symptom characteristics and the medication characteristics of the patients in each group cluster to obtain the incidence relation between the symptom characteristics and the medication characteristics of the patients in each group cluster;
and predicting recommended medication by adopting a weighted distance average K nearest neighbor algorithm according to the incidence relation.
2. The method according to claim 1, wherein the representing unstructured complaint text information in the data as structured data comprises:
performing word segmentation on the unstructured complaint text information: processing the unstructured main complaint text information based on a word segmentation tool, calculating mutual information values among words, identifying fixed matched words in the main complaint text according to the mutual information values, and constructing a self-defined dictionary so as to complete word segmentation work of the main complaint;
and extracting information of the result after word segmentation: and comparing the words after word segmentation processing with standard texts in a symptom library of a hospital one by one, if the words are matched, directly finishing extraction work, and if the words are not matched, searching the symptom word texts corresponding to the words after word segmentation based on a word similarity calculation method of a search engine to obtain structured data.
3. The method according to claim 2, wherein the search for the symptom word text corresponding to the participled word by the search engine-based word similarity calculation method is a search for the symptom word text corresponding to the participled word by the Baidu search-based word similarity calculation method, and the specific operation steps are as follows:
for the processed word p and the symptom bank standard text q,
Figure FDA0003022526430000021
q is a set of related p texts in a symptom library, and the number of search results returned by the page when two words are searched respectively and simultaneously is obtained by using a crawler and is recorded as N (p), N (Q) and N (p ^ Q);
similarity of word phases is calculated according to the following formula (1):
Figure FDA0003022526430000022
and sequentially calculating the similarity of the word p and all standard texts in the word Q, and if the corresponding maximum similarity exceeds a first set threshold value, putting the word p in the corresponding standard texts to obtain structured data.
4. The method according to claim 1, wherein the stacked sparse self-encoder is formed by connecting two layers of simple self-encoders, and hidden layer dimensions of the two layers of simple self-encoders are 8 dimensions and 4 dimensions respectively.
5. The method of claim 4, wherein the learning of the characterization of the structured data by using the stacked sparse self-encoder comprises:
the loss function of the stacked sparse self-encoder is mean square error, and the sparsity limit is introduced by adding an L2 regularization term, and the formula is shown as the following formula (2):
Figure FDA0003022526430000023
the hidden layer activation function adopts a Relu function shown in the following formula (3):
f(x)=max(0,x) (3)
the reconstructed layer activation function is a Softplus function represented by the following formula (4):
f(x)=log(1+ex) (4)
where J is the loss function of the model, xiIs the ith vector of the input model, N is the number of input data, f and g are the deep neural networks of the encoding stage and the decoding stage in the self-encoder, respectively, α is the regularization coefficient, and w is each parameter in the model.
6. The method of claim 1, wherein analyzing the low dimensional representation of the patient symptom data and the low dimensional representation of the medication information data using a clustering algorithm to obtain the symptom characteristic and the medication characteristic of the patient within each cluster of the population comprises:
taking the sum of squared errors SSE as a core index, taking symptom vectors of all patients as a training set, and obtaining the optimal clustering number by using a heuristic elbow rule;
combining symptom vectors expressed by patient symptom data in a low-dimensional manner and medication vectors expressed by medication information data in a low-dimensional manner to form a combined vector, taking the combined vector of all patients as a cluster, dividing the combined vector into two clusters by using a K-Means clustering algorithm, calculating SSE values of the two clusters, and continuously dividing a large cluster in the SSE values corresponding to the two clusters into the two clusters by using the K-Means algorithm until the optimal cluster number is reached;
and counting the obtained original data information of the patients in each cluster group to obtain the symptom characteristics and the corresponding medication characteristics of the patients in each cluster group.
7. The method according to claim 6, wherein the method uses a heuristic elbow rule to obtain the optimal cluster number by using the sum of squared errors as a core index and the symptom vectors of all patients as a training set, and specifically comprises:
clustering symptom vectors of all patients and setting different cluster numbers according to the error square sum of the following formula (5) as a core index, calculating an SSE value obtained by taking the symptom vector of each patient as a sample point, respectively drawing a relation graph of the SSE value and the cluster number, and observing the elbow of a curve, namely the cluster number corresponding to the highest curvature position, as an optimal cluster number;
Figure FDA0003022526430000031
where u is the selected sample point, C is the respective cluster set of the cluster partitions, C is the number of clusters in the cluster partitioniDenotes the ith cluster, miIs CiAverage of all samples in (1).
8. The method of claim 1, wherein the performing canonical correlation analysis on the symptom characteristic and the medication characteristic of the patients in each cluster of groups to obtain the association between the symptom characteristic and the medication characteristic of the patients in each cluster of groups comprises:
sample set X belonging to symptom characteristics in same group cluster and belonging to Rr×nAnd a sample set of drug administration characteristics Y ∈ Rs×nNormalizing the data to have a mean of 0 and a variance of 1, wherein r and s represent the dimensions of each symptom characteristic and each medication characteristic, respectively;
selecting a plurality of sets of linearly uncorrelated projection vectors in two sample sets, and respectively determining the vector a in each set to be the RrAnd b ∈ RsProjecting X and Y onto X ' and Y ', respectively, i.e. X ' ═ aTX,Y′=bTY; the optimization target enables the correlation coefficient rho of the X ' and the Y ' to be maximum, the constraint optimization problem of the maximum correlation coefficient rho is solved according to the Lagrangian function shown in the following formula (6), a plurality of groups of linear combinations and corresponding correlation coefficients are obtained and used as the incidence relation between the symptom characteristic and the medication characteristic of the patients in each group cluster, and at the moment, X ' belongs to Rv×n,Y′∈Rv×nAnd v is the number of linear combinations:
Figure FDA0003022526430000041
wherein S isXY=cov(X,Y)。
9. The method of claim 1, wherein said predicting recommended medication using a weighted distance average K-nearest neighbor algorithm based on said correlations comprises:
finding distances of other K groups of projection vectors adjacent in the cluster to which the sample belongs by using a K nearest neighbor algorithm according to a distance calculation formula of the following formula (7):
Figure FDA0003022526430000042
obtaining and validating patient sample xaAdjacent k sets of projection vectors X' ═ { X ═ X1′,x2′,...,xk′},X′∈Rv×kV is the number of linear combinations, which are the dimensions of a typical correlation vector, and a set of non-linear combinations is obtained by XThe original complaint data X and the medication data Y learned by the encoder calculate the average medication value by the nearest neighbor inverse distance weighting according to the following formula (8)
Figure FDA0003022526430000043
According to the average dosage
Figure FDA0003022526430000044
The relationship with a second set threshold determines whether the medication data Y is recommended for use:
Figure FDA0003022526430000045
wherein,
Figure FDA0003022526430000051
indicating that the sum of the correlation coefficients after processing is 1,
Figure FDA0003022526430000052
weight, D, representing the inverse distance weighting of the ith projection vectoriIs a patient sample xaDistance from the ith projection vector, YiRepresenting the ith raw medication data.
10. The method of claim 1 wherein said mean on drug basis
Figure FDA0003022526430000053
Determining whether the medication data Y is recommended in relation to a second set threshold, including: interpreting the mean value of the medication
Figure FDA0003022526430000054
Whether the dosage is larger than a second set threshold value or not, if so, recommending the medication data Y to be used, setting the medication result to be 1, if not, not recommending the medication data Y to be used, setting the medication result to be 0, and setting the second set threshold valueValues are obtained by training in a training set.
CN202110406631.3A 2021-04-15 2021-04-15 Medication recommendation method based on patient characterization learning Active CN113284627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110406631.3A CN113284627B (en) 2021-04-15 2021-04-15 Medication recommendation method based on patient characterization learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110406631.3A CN113284627B (en) 2021-04-15 2021-04-15 Medication recommendation method based on patient characterization learning

Publications (2)

Publication Number Publication Date
CN113284627A true CN113284627A (en) 2021-08-20
CN113284627B CN113284627B (en) 2024-05-17

Family

ID=77276853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110406631.3A Active CN113284627B (en) 2021-04-15 2021-04-15 Medication recommendation method based on patient characterization learning

Country Status (1)

Country Link
CN (1) CN113284627B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116344009A (en) * 2023-05-22 2023-06-27 武汉盛博汇信息技术有限公司 Online diagnosis notification method and device
CN116884554A (en) * 2023-09-06 2023-10-13 济宁蜗牛软件科技有限公司 Electronic medical record classification management method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516110A (en) * 2017-08-22 2017-12-26 华南理工大学 A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding
KR20200027091A (en) * 2018-08-31 2020-03-12 주식회사 비플컨설팅 A system that recommends diagnostic cases by deducing the degree of similarity using the artificial neural network technique for the patient's main symptom and diagnostic relationship
CN110880361A (en) * 2019-10-16 2020-03-13 平安科技(深圳)有限公司 Personalized accurate medication recommendation method and device
CN111462896A (en) * 2020-03-31 2020-07-28 重庆大学 Real-time intelligent auxiliary ICD coding system and method based on medical record
CN111696678A (en) * 2020-06-15 2020-09-22 中南大学 Deep learning-based medication decision method and system
KR20210009182A (en) * 2019-07-16 2021-01-26 (주)아이쿱 Method for recommending diabetic medicine based on deep-learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516110A (en) * 2017-08-22 2017-12-26 华南理工大学 A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding
KR20200027091A (en) * 2018-08-31 2020-03-12 주식회사 비플컨설팅 A system that recommends diagnostic cases by deducing the degree of similarity using the artificial neural network technique for the patient's main symptom and diagnostic relationship
KR20210009182A (en) * 2019-07-16 2021-01-26 (주)아이쿱 Method for recommending diabetic medicine based on deep-learning
CN110880361A (en) * 2019-10-16 2020-03-13 平安科技(深圳)有限公司 Personalized accurate medication recommendation method and device
CN111462896A (en) * 2020-03-31 2020-07-28 重庆大学 Real-time intelligent auxiliary ICD coding system and method based on medical record
CN111696678A (en) * 2020-06-15 2020-09-22 中南大学 Deep learning-based medication decision method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116344009A (en) * 2023-05-22 2023-06-27 武汉盛博汇信息技术有限公司 Online diagnosis notification method and device
CN116344009B (en) * 2023-05-22 2023-08-15 武汉盛博汇信息技术有限公司 Online diagnosis notification method and device
CN116884554A (en) * 2023-09-06 2023-10-13 济宁蜗牛软件科技有限公司 Electronic medical record classification management method and system
CN116884554B (en) * 2023-09-06 2023-11-24 济宁蜗牛软件科技有限公司 Electronic medical record classification management method and system

Also Published As

Publication number Publication date
CN113284627B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
CN111414393B (en) Semantic similar case retrieval method and equipment based on medical knowledge graph
CN109460473B (en) Electronic medical record multi-label classification method based on symptom extraction and feature representation
CN108399163B (en) Text similarity measurement method combining word aggregation and word combination semantic features
Lin et al. User-level psychological stress detection from social media using deep neural network
Ruan et al. Representation learning for clinical time series prediction tasks in electronic health records
Fang et al. Feature Selection Method Based on Class Discriminative Degree for Intelligent Medical Diagnosis.
CN109036577B (en) Diabetes complication analysis method and device
CN108062978B (en) Method for predicting main adverse cardiovascular events of patients with acute coronary syndrome
CN109378066A (en) A kind of control method and control device for realizing disease forecasting based on feature vector
Pokharel et al. Temporal tree representation for similarity computation between medical patients
Biswas et al. Machine Learning‐Based Model to Predict Heart Disease in Early Stage Employing Different Feature Selection Techniques
CN113284627B (en) Medication recommendation method based on patient characterization learning
Wang et al. EHR2Vec: representation learning of medical concepts from temporal patterns of clinical notes based on self-attention mechanism
CN113658712A (en) Doctor-patient matching method, device, equipment and storage medium
Gollapalli et al. Text mining on hospital stay durations and management of sickle cell disease patients
US20220165430A1 (en) Leveraging deep contextual representation, medical concept representation and term-occurrence statistics in precision medicine to rank clinical studies relevant to a patient
Kongburan et al. Enhancing predictive power of cluster-boosted regression with text-based indexing
CN114822734A (en) Traditional Chinese medical record analysis method based on cyclic convolution neural network
Permatasari et al. Features Selection for Entity Resolution in Prostitution on Twitter
Barakat et al. From Similarities to Probabilities: Feature Engineering for Predicting Drugs’ Adverse Reactions
Ibrahim et al. FORMAT PROPOSED APPROACH FOR PREDICTING LIVER DISEASE
Shabbeer et al. Prediction of Sudden Health Crises Owing to Congestive Heart Failure with Deep Learning Models.
CN117235487B (en) Feature extraction method and system for predicting hospitalization event of asthma patient
CN117079821B (en) Patient hospitalization event prediction method
CN116598004B (en) Prevalence prediction method, prevalence prediction device, computer device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant