CN113360643A - Electronic medical record data quality evaluation method based on short text classification - Google Patents
Electronic medical record data quality evaluation method based on short text classification Download PDFInfo
- Publication number
- CN113360643A CN113360643A CN202110587641.1A CN202110587641A CN113360643A CN 113360643 A CN113360643 A CN 113360643A CN 202110587641 A CN202110587641 A CN 202110587641A CN 113360643 A CN113360643 A CN 113360643A
- Authority
- CN
- China
- Prior art keywords
- data
- electronic medical
- medical record
- quality
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000013441 quality evaluation Methods 0.000 title claims abstract description 23
- 238000003745 diagnosis Methods 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 8
- 201000010099 disease Diseases 0.000 claims description 7
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 230000002457 bidirectional effect Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000003384 imaging method Methods 0.000 claims description 3
- 230000015654 memory Effects 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 abstract description 11
- 238000013136 deep learning model Methods 0.000 abstract description 2
- 238000000605 extraction Methods 0.000 description 8
- 238000013145 classification model Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 201000009267 bronchiectasis Diseases 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000029523 Interstitial Lung disease Diseases 0.000 description 1
- 238000012356 Product development Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000033772 system development Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Software Systems (AREA)
- Epidemiology (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Databases & Information Systems (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses an electronic medical record data quality evaluation method based on short text classification, which comprises the following steps: s1: processing data; s2: according to the identification; s3: and (5) evaluating the quality. The invention provides a short text classification-based method, which comprises the steps of splitting an original text of an electronic medical record into shorter sentences, constructing a BilSTM-Attention model to classify the short sentences, and finally performing corresponding evaluation according to whether a classification result is consistent with diagnosis or not. The method provided by the invention does not need to manually process the original text of the electronic medical record, thereby not only saving the labor and time cost, but also reducing the requirements on professional medical personnel. Meanwhile, the deep learning model can make full use of massive electronic medical record data to effectively classify the split sentences, so that reasonable evaluation is made.
Description
Technical Field
The invention belongs to the technical field of electronic medical record data quality evaluation, and particularly relates to an electronic medical record data quality evaluation method based on short text classification.
Background
With the advent of the big data era, computer network technology is widely applied to the medical field, and various medical institutions collect massive electronic medical record data through an information management system to replace traditional handwritten paper medical records. The electronic medical record records the whole process of diagnosis and treatment of a patient by a doctor, contains information such as symptoms, signs, diagnosis, prescription and the like, and has great potential value in the fields of auxiliary diagnosis, risk prediction, medicine recommendation and the like. However, due to the limited data management level of the medical institution and the insufficient diagnosis and treatment capability of the doctor, a large amount of non-standard description texts exist in the electronic medical record data, so that the recorded information is inaccurate and incomplete, and the efficiency and the quality of medical research and product development are directly influenced. Therefore, data quality evaluation needs to be performed on the electronic medical records, and the electronic medical records with high quality are screened based on the data quality evaluation, so that interference of noise information and redundant information is reduced, which is of great significance for completing tasks such as medical data analysis, prediction model research, auxiliary system development and the like.
The existing electronic medical record data quality evaluation methods mainly comprise two methods, one is a manual evaluation method, and the other is a method combining information extraction and identification. In the manual evaluation method, professional medical personnel directly check each electronic medical record, and the clinical diagnosis and treatment experience of the professional medical personnel confirms whether the electronic medical record has the problems of inaccurate description, incomplete diagnosis, insufficient basis and the like, so that reliable evaluation is performed. This method has an advantage in that the evaluation results are stable and effective, and a disadvantage in that the labor and time costs are very high. The method of information extraction and basis identification firstly utilizes a question-answering system to extract key information of a patient from an electronic medical record, then establishes a basis identification model through machine learning algorithms such as logistic regression, decision trees, random forests and the like, and evaluates the electronic medical record according to whether the key information can improve sufficient diagnosis basis. The method has the advantages that a large number of electronic medical records can be efficiently processed, labor and time cost is saved, and the method has the defect that the representation according to the recognition model is greatly dependent on the quality of an information extraction result. The information extraction is firstly carried out by medical experts to design rules and formulate standards, then structured data are matched from an original text, and finally the structured data are subjected to standardization processing, so that the obtained result has high uncertainty. Due to the obvious shortcomings of both methods, the data quality evaluation of the electronic medical record is still a challenge at present.
In summary, the electronic medical record has a data quality problem, and an accurate and efficient data quality evaluation method is needed to solve the problem.
Disclosure of Invention
The invention aims to provide an electronic medical record data quality evaluation method based on short text classification, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a method for evaluating the quality of electronic medical record data based on short text classification is structurally characterized in that: the method comprises the following steps:
s1: data processing:
s1.1: classifying the electronic medical record data into the current medical history, physical examination, imaging examination and laboratory examination, and constructing corresponding data sets according to the diagnosis basis of different diseases;
s1.2: respectively splitting different types of data into short text sequences, taking commas and periods as separators, splitting original data to form short sentence samples, and constructing short text data sets according to the short sentence samples and diagnosis results thereof;
s1.3: removing a short sentence sample containing diagnosis description, and avoiding directly prompting a corresponding diagnosis result;
s2: according to the identification:
s2.1: dividing a data set into a training set and a verification set according to a ratio of 4:1, wherein the training set is used for training a model, and the model is optimized through a cross entropy loss function based on errors of a predicted label and a real label, and the cross entropy loss is calculated as follows:
the verification set is used for verifying the expression of the model, the effectiveness of the model is proved by calculating the precision, the recall rate and the F1 score, and the precision is calculated as follows:
the recall is calculated as follows:
the F1 score was calculated as follows:
where TP is the number of positive samples with positive prediction label, FN is the number of positive samples with negative prediction label, FP is the number of negative samples with positive prediction label
S2.2: respectively taking different types of data as the input of the models, and training different basis recognition models;
s2.3: sequentially carrying out the following processing, namely firstly, Embedding an input word of an original text x into an Embedding layer, and calculating to obtain a word vector representation e, wherein the calculation is as follows:
ei=Embed(xi)
and then inputting e into a bidirectional long-short term memory network (BilSTM), and calculating to obtain a hidden state h as follows:
it=σ(Wiht-1+Uixt+bi)
ft=σ(Wfht-1+Ufxt+bf)
ot=σ(Woht-1+Uoxt+bo)
at=tanh(Waht-1+Uaxt+ba)
wherein t is a time step, i is an input gate, f is a forgetting gate, o is an output gate, c is a cell state, h is a hidden state, W, U, B are model parameters, sigma and tanh are activation functions, and finally the hidden state is input into an Attention layer Attention to be calculated to obtain a predicted labelThe calculation is as follows:
si=vtanh(hi)
wi=softmax(si)
wherein w is the weight and v is the model parameter;
s2.4: the model output is the probability of identifying short text as a diagnostic basis for different diseases;
s3: and quality evaluation, wherein the quality evaluation comprises pure data, high-quality data, low-quality data and noise data.
Preferably, the clean data in step S3 indicates that the predicted labels of all phrases in the electronic medical record are consistent with the true labels, which indicates that the sample has sufficient diagnostic basis.
Preferably, the high quality data in step S3 means that the most predictive label in the electronic medical record is a true label, which indicates that the sample has a large amount of diagnostic basis and a small amount of noise information.
Preferably, the low quality data in step S3 means that the most predictive label in the electronic medical record is not a true label, which indicates that the sample has a small amount of diagnostic basis and a large amount of noise information.
Preferably, the noise data in step S3 indicates that the prediction labels of all phrases in the electronic medical record do not match the true labels, and that the sample contains noise information at all.
Compared with the prior art, the method provided by the invention has the following advantages:
1) the labor and time costs are low. The manual evaluation method needs to check the electronic medical records, information extraction is combined with the identification method, information extraction rules need to be formulated, and the two methods not only consume a large amount of manpower and time, but also provide higher requirements for medical personnel participating in tasks. The short text classification only needs to split the original data into short sentences, and the short text classification model is used for identifying the short sentences, the whole process is completely realized by a computer, and the labor and time cost is saved on the whole.
2) The noise at the phrase level is small. For the model directly identified according to the electronic medical record, the noise at the short sentence level can influence the overall judgment, and strong interference is caused. The short text classification model used by the invention identifies each short sentence sample independently, and even if noise information exists in part of short sentences, the data quality evaluation of the model on the whole electronic medical record sample is difficult to influence. Therefore, the method of the invention has stronger anti-interference capability.
3) The evaluation result is stable and reliable. The prediction labels of the single electronic medical record sample have great contingency and are not necessarily convincing. Compared with the prior art, the data quality evaluation result based on the plurality of short sentence sample prediction labels is stable and reliable, and meanwhile, the noise of the electronic medical record samples is prompted, so that the method is more suitable for practical application scenes.
Drawings
FIG. 1 is a schematic view of the overall process of the present invention;
FIG. 2 is a schematic diagram of the data processing of the present invention;
FIG. 3 is a diagram illustrating a structure of a recognition model according to the present invention;
FIG. 4 is a schematic diagram of the bidirectional long short term memory network BilSTM according to the present invention;
FIG. 5 is a schematic diagram of the Attention layer Attention of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1-5, the present invention provides a technical solution, a method for evaluating the quality of electronic medical record data based on short text classification, comprising the following steps:
s1: data processing:
s1.1: classifying the electronic medical record data into the current medical history, physical examination, imaging examination and laboratory examination, and constructing corresponding data sets according to the diagnosis basis of different diseases;
s1.2: respectively splitting different types of data into short text sequences, taking commas and periods as separators, splitting original data to form short sentence samples, and constructing short text data sets according to the short sentence samples and diagnosis results thereof;
s1.3: removing a short sentence sample containing diagnosis description, and avoiding directly prompting a corresponding diagnosis result; for example, "consider two lungs with multiple bronchiectasis and infection" does not give a diagnostic basis, but directly indicates the result of diagnosis, and this diagnostic description cannot be used as a reference for data quality evaluation,
s2: according to the identification:
s2.1: dividing a data set into a training set and a verification set according to a ratio of 4:1, wherein the training set is used for training a model, and the model is optimized through a cross entropy loss function based on errors of a predicted label and a real label, and the cross entropy loss is calculated as follows:
the verification set is used for verifying the expression of the model, the effectiveness of the model is proved by calculating the precision, the recall rate and the F1 score, and the precision is calculated as follows:
the recall is calculated as follows:
the F1 score was calculated as follows:
where TP is the number of positive samples with positive prediction label, FN is the number of positive samples with negative prediction label, FP is the number of negative samples with positive prediction label
S2.2: respectively taking different types of data as the input of the models, and training different basis recognition models;
s2.3: sequentially carrying out the following processing, namely firstly, Embedding an input word of an original text x into an Embedding layer, and calculating to obtain a word vector representation e, wherein the calculation is as follows:
ei=Embed(xi)
then e is input into a bidirectional long-short term memory network BilSTM, as shown in FIG. 3, a hidden state h is obtained by calculation as follows:
it=σ(Wiht-1+Uixt+bi)
ft=σ(Wfht-1+Ufxt+bf)
ot=σ(Woht-1+Uoxt+bo)
at=tanh(Waht-1+Uaxt+ba)
wherein t is a time step, i is an input gate, f is a forgetting gate, o is an output gate, c is a cell state, h is a hidden state, W, U, B are model parameters, σ and tanh are activation functions, and finally the hidden state is input to the Attention layer Attention, as shown in fig. 4, a prediction label is obtained by calculationThe calculation is as follows:
si=v tanh(hi)
wi=soft max(si)
wherein w is the weight and v is the model parameter;
s2.4: the model output is the probability of identifying short text as diagnostic basis for different diseases, such as "interstitial lung disease-0.8538, bronchiectasis-0.0755, … …";
s3: and quality evaluation, wherein the quality evaluation comprises pure data, high-quality data, low-quality data and noise data.
In this embodiment, the clean data in step S3 means that the predicted labels of all phrases in the electronic calendar are consistent with the true labels, which indicates that the sample has sufficient diagnosis basis.
In this embodiment, the high quality data in step S3 means that the most predictive label in the electronic medical record is a true label, which indicates that the sample has a large amount of diagnostic bases and a small amount of noise information.
In this embodiment, the low quality data in step S3 means that the most predictive label in the electronic medical record is not a true label, which indicates that the sample has a small amount of diagnosis-dependent data and a large amount of noise information.
In this embodiment, the noise data in step S3 means that the predicted labels of all phrases in the electronic calendar are inconsistent with the true labels, which indicates that the sample contains all noise information.
The high-quality electronic medical record data contains accurate and complete information, and the disease of the patient can be effectively inferred. The low-quality electronic medical record data has a large amount of error information and redundant information, and the clinical performance is often inconsistent with the diagnosis result. In order to distinguish the two, a basis identification model needs to be constructed, and the prediction label of the model is compared with the real label of the electronic medical record. In order to accurately and efficiently evaluate the quality of electronic medical record data, the invention provides a method based on short text classification. The method provided by the invention does not need to manually process the original text of the electronic medical record, thereby not only saving the labor and time cost, but also reducing the requirements on professional medical personnel. Meanwhile, the deep learning model can make full use of massive electronic medical record data to effectively classify the split sentences, so that reasonable evaluation is made.
The method provided by the invention has the following advantages:
1) the labor and time costs are low. The manual evaluation method needs to check the electronic medical records, information extraction is combined with the identification method, information extraction rules need to be formulated, and the two methods not only consume a large amount of manpower and time, but also provide higher requirements for medical personnel participating in tasks. The short text classification only needs to split the original data into short sentences, and the short text classification model is used for identifying the short sentences, the whole process is completely realized by a computer, and the labor and time cost is saved on the whole.
2) The noise at the phrase level is small. For the model directly identified according to the electronic medical record, the noise at the short sentence level can influence the overall judgment, and strong interference is caused. The short text classification model used by the invention can independently identify each short sentence sample, and even if noise information exists in part of short sentences, the data quality evaluation of the model on the whole electronic medical record sample is difficult to influence. Therefore, the method of the invention has stronger anti-interference capability.
3) The evaluation result is stable and reliable. The prediction labels of the single electronic medical record sample have great contingency and are not necessarily convincing. Compared with the prior art, the data quality evaluation result based on the plurality of short sentence sample prediction labels is stable and reliable, and meanwhile, the noise of the electronic medical record samples is prompted, so that the method is more suitable for practical application scenes.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference in the claims is not intended to be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should make the description as a whole, and the embodiments may be appropriately combined to form other embodiments understood by those skilled in the art.
Claims (5)
1. A short text classification-based electronic medical record data quality evaluation method is characterized by comprising the following steps: the method comprises the following steps:
s1: data processing:
s1.1: classifying the electronic medical record data into the current medical history, physical examination, imaging examination and laboratory examination, and constructing corresponding data sets according to the diagnosis basis of different diseases;
s1.2: respectively splitting different types of data into short text sequences, taking commas and periods as separators, splitting original data to form short sentence samples, and constructing short text data sets according to the short sentence samples and diagnosis results thereof;
s1.3: removing a short sentence sample containing diagnosis description, and avoiding directly prompting a corresponding diagnosis result;
s2: according to the identification:
s2.1: dividing a data set into a training set and a verification set according to the ratio of 4:1, wherein the training set is used for training a model, and optimizing the model through a cross entropy loss function based on the error of a predicted label and a real label, wherein the cross entropy loss is calculated as follows:
the verification set is used for verifying the expression of the model, the effectiveness of the model is proved by calculating the precision, the recall rate and the F1 score, and the precision is calculated as follows:
the recall is calculated as follows:
the F1 score was calculated as follows:
where TP is the number of positive samples with positive prediction label, FN is the number of positive samples with negative prediction label, FP is the number of negative samples with positive prediction label
S2.2: respectively taking different types of data as the input of the models, and training different basis recognition models;
s2.3: sequentially carrying out the following processing, namely firstly, Embedding an input word of an original text x into an Embedding layer, and calculating to obtain a word vector representation e, wherein the calculation is as follows:
ei=Embed(xi)
inputting e into a bidirectional long-short term memory network (BilSTM), and calculating to obtain a hidden state h as follows:
it=σ(Wiht-1+Uixt+bi)
ft=σ(Wfht-1+Ufxt+bf)
ot=σ(Woht-1+Uoxt+bo)
at=tanh(Waht-1+Uaxt+ba)
wherein t is a time step, i is an input gate, f is a forgetting gate, o is an output gate, c is a cell state, h is a hidden state, W, U, B are model parameters, sigma and tanh are activation functions, and finally the hidden state is input into an Attention layer Attention, a meter and the like to obtain a prediction labelThe calculation is as follows:
si=vtanh(hi)
wi=softmax(si)
wherein w is the weight and v is the model parameter;
s2.4: the model output is the probability of identifying short text as a diagnostic basis for different diseases;
s3: and quality evaluation, wherein the quality evaluation comprises pure data, high-quality data, low-quality data and noise data.
2. The method for evaluating the quality of the electronic medical record data based on the short text classification as claimed in claim 1, wherein: the clean data in step S3 means that the predicted labels of all phrases in the electronic medical record are consistent with the true labels, which indicates that the sample has sufficient diagnostic basis.
3. The method for evaluating the quality of the electronic medical record data based on the short text classification as claimed in claim 1, wherein: the high quality data in step S3 means that the most predictive label in the electronic medical record is a true label, which indicates that the sample has a large amount of diagnostic bases and a small amount of noise information.
4. The method for evaluating the quality of the electronic medical record data based on the short text classification as claimed in claim 1, wherein: the low quality data in step S3 means that the most predictive label in the electronic medical record is not a true label, which indicates that the sample has a small amount of diagnostic basis and a large amount of noise information.
5. The method for evaluating the quality of the electronic medical record data based on the short text classification as claimed in claim 1, wherein: the noise data in step S3 means that the prediction labels of all phrases in the electronic medical record are not consistent with the true labels, which indicates that the sample contains noise information completely.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110587641.1A CN113360643A (en) | 2021-05-27 | 2021-05-27 | Electronic medical record data quality evaluation method based on short text classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110587641.1A CN113360643A (en) | 2021-05-27 | 2021-05-27 | Electronic medical record data quality evaluation method based on short text classification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113360643A true CN113360643A (en) | 2021-09-07 |
Family
ID=77528021
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110587641.1A Pending CN113360643A (en) | 2021-05-27 | 2021-05-27 | Electronic medical record data quality evaluation method based on short text classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113360643A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114493810A (en) * | 2022-04-14 | 2022-05-13 | 成都信息工程大学 | Internet of things data processing method, device and medium |
CN116719945A (en) * | 2023-08-08 | 2023-09-08 | 北京惠每云科技有限公司 | Medical short text classification method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109189767A (en) * | 2018-08-01 | 2019-01-11 | 北京三快在线科技有限公司 | Data processing method, device, electronic equipment and storage medium |
CN110444259A (en) * | 2019-06-06 | 2019-11-12 | 昆明理工大学 | Traditional Chinese medical electronic case history entity relationship extracting method based on entity relationship mark strategy |
CN110569353A (en) * | 2019-07-03 | 2019-12-13 | 重庆大学 | Attention mechanism-based Bi-LSTM label recommendation method |
CN111488739A (en) * | 2020-03-17 | 2020-08-04 | 天津大学 | Implicit discourse relation identification method based on multi-granularity generated image enhancement representation |
-
2021
- 2021-05-27 CN CN202110587641.1A patent/CN113360643A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109189767A (en) * | 2018-08-01 | 2019-01-11 | 北京三快在线科技有限公司 | Data processing method, device, electronic equipment and storage medium |
CN110444259A (en) * | 2019-06-06 | 2019-11-12 | 昆明理工大学 | Traditional Chinese medical electronic case history entity relationship extracting method based on entity relationship mark strategy |
CN110569353A (en) * | 2019-07-03 | 2019-12-13 | 重庆大学 | Attention mechanism-based Bi-LSTM label recommendation method |
CN111488739A (en) * | 2020-03-17 | 2020-08-04 | 天津大学 | Implicit discourse relation identification method based on multi-granularity generated image enhancement representation |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114493810A (en) * | 2022-04-14 | 2022-05-13 | 成都信息工程大学 | Internet of things data processing method, device and medium |
CN116719945A (en) * | 2023-08-08 | 2023-09-08 | 北京惠每云科技有限公司 | Medical short text classification method and device, electronic equipment and storage medium |
CN116719945B (en) * | 2023-08-08 | 2023-10-24 | 北京惠每云科技有限公司 | Medical short text classification method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111540468B (en) | ICD automatic coding method and system for visualizing diagnostic reasons | |
CN108831559B (en) | Chinese electronic medical record text analysis method and system | |
CN109697285B (en) | Hierarchical BilSt Chinese electronic medical record disease coding and labeling method for enhancing semantic representation | |
CN109460473B (en) | Electronic medical record multi-label classification method based on symptom extraction and feature representation | |
CN109935336B (en) | Intelligent auxiliary diagnosis system for respiratory diseases of children | |
CN110705293A (en) | Electronic medical record text named entity recognition method based on pre-training language model | |
CN109036577B (en) | Diabetes complication analysis method and device | |
CN106980608A (en) | A kind of Chinese electronic health record participle and name entity recognition method and system | |
CN106874643A (en) | Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector | |
CN106407443A (en) | Structured medical data generation method and device | |
CN110532398B (en) | Automatic family map construction method based on multi-task joint neural network model | |
CN109492105B (en) | Text emotion classification method based on multi-feature ensemble learning | |
CN108091397A (en) | A kind of bleeding episode Forecasting Methodology for the Ischemic Heart Disease analyzed based on promotion-resampling and feature association | |
CN110600121B (en) | Knowledge graph-based primary etiology diagnosis method | |
CN112241457A (en) | Event detection method for event of affair knowledge graph fused with extension features | |
CN109003677B (en) | Structured analysis processing method for medical record data | |
CN114628008B (en) | Social user depression tendency detection method based on heterogeneous graph attention network | |
CN113360643A (en) | Electronic medical record data quality evaluation method based on short text classification | |
CN112530584A (en) | Medical diagnosis assisting method and system | |
CN112541066A (en) | Text-structured-based medical and technical report detection method and related equipment | |
CN114188022A (en) | Clinical children cough intelligent pre-diagnosis system based on textCNN model | |
CN112489740A (en) | Medical record detection method, training method of related model, related equipment and device | |
CN111524570B (en) | Ultrasonic follow-up patient screening method based on machine learning | |
CN114242194A (en) | Natural language processing device and method for medical image diagnosis report based on artificial intelligence | |
CN113342973A (en) | Diagnosis method of auxiliary diagnosis model based on disease two-classifier |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210907 |
|
RJ01 | Rejection of invention patent application after publication |