CN112863628A - Electronic medical record data processing method and system - Google Patents

Electronic medical record data processing method and system Download PDF

Info

Publication number
CN112863628A
CN112863628A CN202110281535.0A CN202110281535A CN112863628A CN 112863628 A CN112863628 A CN 112863628A CN 202110281535 A CN202110281535 A CN 202110281535A CN 112863628 A CN112863628 A CN 112863628A
Authority
CN
China
Prior art keywords
medical record
deep learning
data
documents
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110281535.0A
Other languages
Chinese (zh)
Inventor
陈�峰
刘升平
梁家恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202110281535.0A priority Critical patent/CN112863628A/en
Publication of CN112863628A publication Critical patent/CN112863628A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to a method and a system for processing electronic medical record data, wherein the method comprises the following steps: mapping the names in the medical record document according to a preset mapping rule; and classifying the medical record documents which cannot be mapped according to the preset mapping rule through a pre-trained deep learning model. The invention can reduce the workload of manually checking each type of document by combining the rules with the model, and particularly has more obvious effect of saving the workload under the condition that the types of the documents of the comprehensive hospital are more or the names of the documents of the special hospital are more special.

Description

Electronic medical record data processing method and system
Technical Field
The invention relates to the field of deep learning, in particular to a method and a system for processing electronic medical record data.
Background
The electronic medical records are unstructured medical records, and the medical record documents of different hospitals need to be subjected to standardized mapping due to the difference of medical record content structures and document names. It is common practice to map the content of a medical record according to the name of the document in combination with a title or keyword in the content.
Because medical records of different hospitals are different in content structure and name naming, the medical records are only mapped from file names or keywords, specialized hospitals with more finely divided medical records or medical records with long tails cannot be completely mapped, and when the medical records of a new hospital are mapped, a large amount of work is required for checking the names and the contents of all different types of documents.
Disclosure of Invention
The invention provides an electronic medical record data processing method and system, which can solve the technical problem of large workload in checking names and contents of different types of documents.
The technical scheme for solving the technical problems is as follows:
an electronic medical record data processing method comprises the following steps:
mapping the names in the medical record document according to a preset mapping rule;
and classifying the medical record documents which cannot be mapped according to the preset mapping rule through a pre-trained deep learning model.
The invention has the beneficial effects that:
the workload of manually checking each type of document can be reduced by combining the rules with the model, and the effect of saving the workload is more obvious especially under the condition that the types of the documents of the comprehensive hospital are more or the names of the documents of the special hospital are more special.
Further, the classifying by the deep learning model specifically includes:
and coding the medical record document, and mapping a coding result to a predefined category space through a deep learning model to obtain the category of the medical record document.
The beneficial effects of the further scheme are as follows: and (4) after the medical record document which cannot be processed in the rule mapping mode is coded, determining the category through a deep learning model.
Further, the category space includes N document categories and one other category, and the classification by the deep learning model specifically includes:
and rejecting the medical record documents classified into the other classes.
The beneficial effects of the further scheme are as follows: and determining the medical record document of which the type cannot be determined through the deep learning model through the set other classes.
Further, still include: and when new electronic medical record data are added into the training data of the deep learning model, performing incremental training on the deep learning model.
The beneficial effects of the further scheme are as follows: and the incremental training mode is carried out on the newly added electronic medical record data, so that the model training data can be enriched.
Further, performing incremental training on the deep learning model, specifically including:
classifying and counting according to the name type of the medical record documents in the training data after the new electronic medical record data is added; respectively sending the medical record documents of the same type into the deep learning model in a batch mode for prediction to obtain the prediction class and the corresponding probability of each medical record document in the batch data; if the maximum probability obtained by prediction is larger than or equal to a preset threshold value, modifying the types of the medical record documents with different prediction types corresponding to the maximum probability into the prediction type corresponding to the maximum probability and then adding the modified medical record documents into the training data; and if the predicted maximum probability is smaller than the preset threshold, randomly extracting a part of manual labeling from the batch of data, adding the part of manual labeling into training data, and continuing training the model until the predicted maximum probability is larger than or equal to the preset threshold.
The beneficial effects of the further scheme are as follows: and (3) counting the medical records of the new hospital by adopting a result based on model prediction, labeling part of medical record documents according to the counting result, and performing incremental training to adapt to the types of the documents of the new hospital and reduce the manual workload.
An electronic medical record data processing system comprising:
the rule mapping module is used for mapping the names in the medical record document according to a preset mapping rule;
and the classification module is used for classifying the medical record documents which cannot be mapped according to the preset mapping rule through a pre-trained deep learning model.
Further, the classification module is specifically configured to:
and coding the medical record document, and mapping a coding result to a predefined category space through a deep learning model to obtain the category of the medical record document.
Further, the category space includes N document categories and one other category, and the classification is specifically further configured to:
and rejecting the medical record documents classified into the other classes.
Further, still include: and the training module is used for performing incremental training on the deep learning model when new electronic medical record data is added to the training data of the deep learning model.
Further, the training module is specifically configured to:
classifying and counting according to the name type of the medical record documents in the training data after the new electronic medical record data is added; respectively sending the medical record documents of the same type into the deep learning model in a batch mode for prediction to obtain the prediction class and the corresponding probability of each medical record document in the batch data; if the maximum probability obtained by prediction is larger than or equal to a preset threshold value, modifying the types of the medical record documents with different prediction types corresponding to the maximum probability into the prediction type corresponding to the maximum probability and then adding the modified medical record documents into the training data; and if the predicted maximum probability is smaller than the preset threshold, randomly extracting a part of manual labeling from the batch of data, adding the part of manual labeling into training data, and continuing training the model until the predicted maximum probability is larger than or equal to the preset threshold.
Drawings
Fig. 1 is a flowchart of an electronic medical record data processing method according to an embodiment of the present invention;
FIG. 2 is a structural diagram of the structural LSTM.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, a method for processing electronic medical record data provided by an embodiment of the present invention includes:
s1, mapping the names in the medical record document according to a preset mapping rule;
specifically, for the case of determining the names of the medical record documents, a common document name-document name mapping rule can be established, and the mapping is performed in a rule mode, for example, some medical record documents have common importance in medical record documents of various hospitals, the names and the laws are uniform, the ambiguity is small, such as a medical record first page and a medical record first page, and the mapping from the document name to a self-defined document name can be directly performed on the documents, such as the medical record first page or the medical record first page can be mapped to the medical record first page.
Similarly, for the case that there are nodes with distinguishable types in the medical record documents, a mapping rule of common field names-document names can be established and mapped in a regular manner, for example, key fields in some medical records can uniquely distinguish the medical record documents, and for example, the "signature of an anesthesiologist" and the "anesthesia consent" of the medical record documents are in a one-to-one correspondence relationship.
And S2, classifying the medical record documents which can not be mapped according to the preset mapping rule through a pre-trained deep learning model.
Specifically, by establishing the commonly used document name-document name mapping rule and field name-document name mapping rule in step S1, the content of the document name or field name of the medical record document in the rule list in the input electronic medical record data can be subjected to rule mapping processing, and if the document name or field name of the current input data is not in the rule list, the medical record document belongs to an input that cannot be covered by the rule, and such medical record documents need to be classified by using a deep learning model. The deep learning model may be implemented as Hierarchical LSTM as shown in fig. 2, or may be replaced with other deep models, such as CNN.
According to the electronic medical record data processing method provided by the embodiment of the invention, the workload of manually checking each type of document can be reduced by combining the rules with the model, and especially, the effect of saving the workload is more obvious under the condition that the types of the documents of the comprehensive hospital are more or the names of the documents of the special hospital are more special.
Optionally, in this embodiment, step S2 specifically includes:
and S21, coding the medical record document, and mapping the coding result to a predefined category space through a deep learning model to obtain the category of the medical record document.
Referring to the structure diagram of the document end, the encoding is used for mapping an object (such as a word in the text and a sentence vector obtained by encoding) to another vector through a depth network (LSTM or CNN), and the process is an encoding process.
The classification is also to map the coded result to a well-defined class space through a deep learning network, where the class is the document class and one of the other classes. Assuming that there are N document classes, the final class space dimension is N +1, and each input is uniquely mapped into the N +1 classes.
In this embodiment, the medical record documents that cannot be processed by the rule mapping method are encoded and then the category is determined by the deep learning model.
Optionally, in this embodiment, step S2 specifically further includes:
and S22, rejecting the medical record documents classified to the other classes.
Specifically, there are some medical record documents, which are not predefined standard medical record documents, are not in the range of the standard medical record documents, and all belong to "other classes", and if the documents are classified into "other classes", the documents are not mapped, that is, rejected.
In this embodiment, the medical record documents of which the type cannot be determined by the deep learning model are determined by the other classes set.
Optionally, in this embodiment, the method further includes:
and S3, when new electronic medical record data are added to the training data of the deep learning model, performing incremental training on the deep learning model.
In the embodiment, the incremental training mode is performed on the newly added electronic medical record data, so that the model training data can be enriched.
Optionally, in this embodiment, step S3 specifically includes:
classifying and counting according to the name type of the medical record documents in the training data after the new electronic medical record data is added; respectively sending the medical record documents of the same type into the deep learning model in a batch mode for prediction to obtain the prediction class and the corresponding probability of each medical record document in the batch data; if the maximum probability obtained by prediction is larger than or equal to a preset threshold value, modifying the types of the medical record documents with different prediction types corresponding to the maximum probability into the prediction type corresponding to the maximum probability and then adding the modified medical record documents into the training data; and if the predicted maximum probability is smaller than the preset threshold, randomly extracting a part of manual labeling from the batch of data, adding the part of manual labeling into training data, and continuing training the model until the predicted maximum probability is larger than or equal to the preset threshold.
Specifically, because the medical record content structure of the new hospital may not appear in the previous model training corpus, the incremental training of the new hospital data is needed, firstly, the classification and statistics frequency is performed according to the name of the medical record document, the medical record batch forms of the same type are sent to the model for prediction, the prediction result is counted on the batch data, i.e., the probability distribution of each document over N +1 classes, a threshold is set, and if the probability of a certain prediction class is greater than the threshold, the confidence degree of the result mapped to the model prediction is considered to be high, the category with the maximum probability is taken as the prediction category of the current data, the case history type with wrong prediction is directly modified into the result predicted by the model on most data and added into the training data, and randomly extracting a part of manual work from the data with the accuracy rate not reaching the threshold value, adding the part of manual work into the training corpus, and continuing training the model.
In the embodiment, the medical records of the new hospital are counted by adopting the result based on model prediction, and part of medical record documents are marked according to the statistical result to carry out incremental training, so that the method adapts to the type of the documents of the new hospital and reduces the manual workload.
The embodiment further provides an electronic medical record data processing system, including:
the rule mapping module is used for mapping the names in the medical record document according to a preset mapping rule;
and the classification module is used for classifying the medical record documents which cannot be mapped according to the preset mapping rule through a pre-trained deep learning model.
Optionally, in this embodiment, the classification module is specifically configured to:
and coding the medical record document, and mapping a coding result to a predefined category space through a deep learning model to obtain the category of the medical record document.
Optionally, in this embodiment, the category space includes N document categories and one other category, and the classification is further specifically configured to:
and rejecting the medical record documents classified into the other classes.
Optionally, in this embodiment, the method further includes: and the training module is used for performing incremental training on the deep learning model when new electronic medical record data is added to the training data of the deep learning model.
Optionally, in this embodiment, the training module is specifically configured to:
classifying and counting according to the name type of the medical record documents in the training data after the new electronic medical record data is added; respectively sending the medical record documents of the same type into the deep learning model in a batch mode for prediction to obtain the prediction class and the corresponding probability of each medical record document in the batch data; if the maximum probability obtained by prediction is larger than or equal to a preset threshold value, modifying the types of the medical record documents with different prediction types corresponding to the maximum probability into the prediction type corresponding to the maximum probability and then adding the modified medical record documents into the training data; and if the predicted maximum probability is smaller than the preset threshold, randomly extracting a part of manual labeling from the batch of data, adding the part of manual labeling into training data, and continuing training the model until the predicted maximum probability is larger than or equal to the preset threshold.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An electronic medical record data processing method is characterized by comprising the following steps:
mapping the names in the medical record document according to a preset mapping rule;
and classifying the medical record documents which cannot be mapped according to the preset mapping rule through a pre-trained deep learning model.
2. The method for processing electronic medical record data as claimed in claim 1, wherein the classifying by the deep learning model specifically comprises:
and coding the medical record document, and mapping a coding result to a predefined category space through a deep learning model to obtain the category of the medical record document.
3. The method for processing electronic medical record data as claimed in claim 2, wherein the category space includes N document categories and one other category, and the classification by the deep learning model further includes:
and rejecting the medical record documents classified into the other classes.
4. The electronic medical record data processing method according to any one of claims 1-3, further comprising: and when new electronic medical record data are added into the training data of the deep learning model, performing incremental training on the deep learning model.
5. The method for processing electronic medical record data as claimed in claim 4, wherein the incremental training of the deep learning model specifically comprises:
classifying and counting according to the name type of the medical record documents in the training data after the new electronic medical record data is added; respectively sending the medical record documents of the same type into the deep learning model in a batch mode for prediction to obtain the prediction class and the corresponding probability of each medical record document in the batch data; if the maximum probability obtained by prediction is larger than or equal to a preset threshold value, modifying the types of the medical record documents with different prediction types corresponding to the maximum probability into the prediction type corresponding to the maximum probability and then adding the modified medical record documents into the training data; and if the predicted maximum probability is smaller than the preset threshold, randomly extracting a part of manual labeling from the batch of data, adding the part of manual labeling into training data, and continuing training the model until the predicted maximum probability is larger than or equal to the preset threshold.
6. An electronic medical record data processing system, comprising:
the rule mapping module is used for mapping the names in the medical record document according to a preset mapping rule;
and the classification module is used for classifying the medical record documents which cannot be mapped according to the preset mapping rule through a pre-trained deep learning model.
7. The electronic medical record data processing system of claim 6, wherein the classification module is specifically configured to:
and coding the medical record document, and mapping a coding result to a predefined category space through a deep learning model to obtain the category of the medical record document.
8. The electronic medical record data processing system as claimed in claim 7, wherein said category space comprises N document categories and one other category, said classification being further configured to:
and rejecting the medical record documents classified into the other classes.
9. The electronic medical record data processing system as claimed in any one of claims 6-8, further comprising: and the training module is used for performing incremental training on the deep learning model when new electronic medical record data is added to the training data of the deep learning model.
10. The electronic medical record data processing system of claim 9, wherein the training module is specifically configured to:
classifying and counting according to the name type of the medical record documents in the training data after the new electronic medical record data is added; respectively sending the medical record documents of the same type into the deep learning model in a batch mode for prediction to obtain the prediction class and the corresponding probability of each medical record document in the batch data; if the maximum probability obtained by prediction is larger than or equal to a preset threshold value, modifying the types of the medical record documents with different prediction types corresponding to the maximum probability into the prediction type corresponding to the maximum probability and then adding the modified medical record documents into the training data; and if the predicted maximum probability is smaller than the preset threshold, randomly extracting a part of manual labeling from the batch of data, adding the part of manual labeling into training data, and continuing training the model until the predicted maximum probability is larger than or equal to the preset threshold.
CN202110281535.0A 2021-03-16 2021-03-16 Electronic medical record data processing method and system Pending CN112863628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110281535.0A CN112863628A (en) 2021-03-16 2021-03-16 Electronic medical record data processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110281535.0A CN112863628A (en) 2021-03-16 2021-03-16 Electronic medical record data processing method and system

Publications (1)

Publication Number Publication Date
CN112863628A true CN112863628A (en) 2021-05-28

Family

ID=75994708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110281535.0A Pending CN112863628A (en) 2021-03-16 2021-03-16 Electronic medical record data processing method and system

Country Status (1)

Country Link
CN (1) CN112863628A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335653A (en) * 2019-06-30 2019-10-15 浙江大学 Non-standard case history analytic method based on openEHR case history format
CN110727880A (en) * 2019-10-18 2020-01-24 西安电子科技大学 Sensitive corpus detection method based on word bank and word vector model
CN111475804A (en) * 2020-03-05 2020-07-31 浙江省北大信息技术高等研究院 Alarm prediction method and system
US20200364303A1 (en) * 2019-05-15 2020-11-19 Nvidia Corporation Grammar transfer using one or more neural networks
CN112434159A (en) * 2020-11-17 2021-03-02 东南大学 Method for classifying thesis multiple labels by using deep neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200364303A1 (en) * 2019-05-15 2020-11-19 Nvidia Corporation Grammar transfer using one or more neural networks
CN110335653A (en) * 2019-06-30 2019-10-15 浙江大学 Non-standard case history analytic method based on openEHR case history format
CN110727880A (en) * 2019-10-18 2020-01-24 西安电子科技大学 Sensitive corpus detection method based on word bank and word vector model
CN111475804A (en) * 2020-03-05 2020-07-31 浙江省北大信息技术高等研究院 Alarm prediction method and system
CN112434159A (en) * 2020-11-17 2021-03-02 东南大学 Method for classifying thesis multiple labels by using deep neural network

Similar Documents

Publication Publication Date Title
US8527436B2 (en) Automated parsing of e-mail messages
CN110851598B (en) Text classification method and device, terminal equipment and storage medium
WO2022142011A1 (en) Method and device for address recognition, computer device, and storage medium
CN111046035A (en) Data automation processing method, system, computer equipment and readable storage medium
US11269810B2 (en) Computerized methods of data compression and analysis
CN110597844B (en) Unified access method for heterogeneous database data and related equipment
CN101046858B (en) Electronic information comparing system and method and anti-garbage mail system
CN113407679A (en) Text topic mining method and device, electronic equipment and storage medium
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
WO2023240878A1 (en) Resource recognition method and apparatus, and device and storage medium
CN115827819A (en) Intelligent question and answer processing method and device, electronic equipment and storage medium
US20230252140A1 (en) Methods and systems for identifying anomalous computer events to detect security incidents
CN115238071A (en) Data standard generation method, storage medium and system based on similar clustering and data exploration
CN113434672A (en) Text type intelligent identification method, device, equipment and medium
CN112863628A (en) Electronic medical record data processing method and system
CN109918638B (en) Network data monitoring method
CN111309911A (en) Case topic discovery method for judicial field
CN107491423B (en) Chinese document gene quantization and characterization method based on numerical value-character string mixed coding
CN113691548A (en) Data acquisition and classified storage method and system thereof
Situmeang Impact of text preprocessing on named entity recognition based on conditional random field in Indonesian text
US20220107919A1 (en) Computerized systems and methods of data compression
CN117235629B (en) Intention recognition method, system and computer equipment based on knowledge domain detection
CN114925185B (en) Interaction method, model training method, device, equipment and medium
CN112364642B (en) Text processing method and device
CN116932732A (en) Method, device, electronic equipment and storage medium for determining target keywords

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination