CN114446422A - Medical record marking method, system and corresponding equipment and storage medium - Google Patents

Medical record marking method, system and corresponding equipment and storage medium Download PDF

Info

Publication number
CN114446422A
CN114446422A CN202111536210.9A CN202111536210A CN114446422A CN 114446422 A CN114446422 A CN 114446422A CN 202111536210 A CN202111536210 A CN 202111536210A CN 114446422 A CN114446422 A CN 114446422A
Authority
CN
China
Prior art keywords
word
words
standard
diagnostic
diagnosis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111536210.9A
Other languages
Chinese (zh)
Inventor
赵建强
王梦迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wanghai Kangxin Beijing Technology Co ltd
Original Assignee
Wanghai Kangxin Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wanghai Kangxin Beijing Technology Co ltd filed Critical Wanghai Kangxin Beijing Technology Co ltd
Priority to CN202111536210.9A priority Critical patent/CN114446422A/en
Publication of CN114446422A publication Critical patent/CN114446422A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The application discloses a medical record marking method, a medical record marking system, corresponding equipment and a storage medium, wherein the method comprises the following steps: extracting diagnosis words in the medical record information; for each diagnosis word, calculating the relevance scores of the diagnosis word and all standard words, and recalling a preset number of most relevant standard words according to the ranking of the relevance scores; calculating the text similarity of each diagnosis word and each corresponding recalled standard word; inputting every two diagnostic words with text similarity greater than or equal to a preset threshold value and corresponding standard words into the trained semantic similarity model and performing semantic similarity sequencing; and selecting the standard word with the highest semantic similarity as the standard diagnostic word of the corresponding diagnostic word. The invention can greatly improve the accuracy and normalization of the medical record diagnosis words so as to improve the quality of the medical record.

Description

Medical record marking method, system and corresponding equipment and storage medium
Technical Field
The application relates to the field of electric digital data processing, in particular to a medical record marking method. The application also relates to a medical record marking system and a corresponding computer device and computer readable storage medium.
Background
In the process of writing the medical records, there are various problems such as serious copying phenomenon, simple and irregular disease records, different individual comprehension and the like, so that the medical records cannot faithfully and accurately reflect the actual disease changes, treatment effects and the like of patients, and meanwhile, the same medical history in a rule caused by copying also affects the quality of the medical records, thereby causing greater medical dispute hidden dangers.
Disclosure of Invention
The invention provides a medical record marking method, a medical record marking system, corresponding equipment and a storage medium, which can greatly improve the accuracy and normalization of medical record diagnosis words and further improve the quality of medical records.
In a first aspect of the present invention, there is provided a method of medical record marking, the method comprising:
extracting diagnosis words in the medical record information;
for each diagnosis word, calculating the relevance scores of the diagnosis word and all standard words, and recalling a preset number of most relevant standard words according to the ranking of the relevance scores;
calculating the text similarity of each diagnosis word and each corresponding recalled standard word;
inputting every two diagnostic words with text similarity greater than or equal to a preset threshold value and corresponding standard words into the trained semantic similarity model and performing semantic similarity sequencing;
and selecting the standard word with the highest semantic similarity as the standard diagnostic word of the corresponding diagnostic word.
In a second aspect of the present invention, there is provided a medical record marking system, comprising:
the diagnostic word extraction module is used for extracting diagnostic words in the medical record information;
the relevant standard word recalling module is used for calculating the relevance scores of each diagnosis word and all the standard words and recalling the most relevant standard words in a preset number according to the ranking of the relevance scores;
the text similarity calculation module is used for calculating the text similarity of each diagnosis word and each corresponding recalled standard word;
the semantic similarity sorting module is used for inputting every two diagnostic words with text similarity greater than or equal to a preset threshold value and corresponding standard words into the trained semantic similarity model and sorting the semantic similarity;
and the standard word selecting module is used for selecting the standard word with the highest semantic similarity as the standard diagnostic word of the corresponding diagnostic word.
In a third aspect of the invention, a computer device is provided, comprising a processor, a memory and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of the method according to the first aspect of the invention or implements the functions of the system according to the second aspect of the invention.
According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method according to the first aspect of the present invention or performs the functions of the system according to the second aspect of the present invention.
According to the invention, the diagnosis words in the medical record information are extracted, the relevance scores of the diagnosis words and all the standard words are calculated for each diagnosis word, the most relevant standard words with preset number are recalled according to the sequence of the relevance scores, the text similarity of each diagnosis word and each corresponding recalled standard word is calculated, the diagnosis words with the text similarity larger than or equal to a preset threshold and the corresponding standard words are input into a trained semantic similarity model in pairs and are subjected to semantic similarity sequencing, the standard word with the highest semantic similarity is selected as the standard diagnosis word of the corresponding diagnosis word, the accuracy and the normalization of the medical record diagnosis words can be automatically and greatly improved, the medical record quality is improved, the difficulty of manual quality control of the medical record is avoided or reduced, and the workload of quality control personnel is reduced. Tests show that the accuracy of the diagnosis words of the medical records can reach more than 97 percent through the medical records subjected to standardization treatment.
Other features and advantages of the present invention will become more apparent from the detailed description of the embodiments of the present invention when taken in conjunction with the accompanying drawings.
Drawings
FIG. 1 is a flow chart of one embodiment of a method according to the present invention;
FIG. 2 is a block diagram of one embodiment of a system according to the present invention.
For the sake of clarity, the figures are schematic and simplified drawings, which only show details which are necessary for understanding the invention and other details are omitted.
Detailed Description
Embodiments and examples of the present invention will be described in detail below with reference to the accompanying drawings.
The scope of applicability of the present invention will become apparent from the detailed description given hereinafter. It should be understood, however, that the detailed description and the specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only.
FIG. 1 is a flow chart of a preferred embodiment of a method for marking a medical record according to the present invention.
In step S102, diagnostic words in the medical record information are extracted. For the plain text content of the medical record (also called electronic medical record) written by the doctor, the content in the medical record can be extracted by classifying (such as physical signs, disease diagnosis, operation and the like) the content in the medical record through natural language processing, for example, and then for the disease diagnosis and/or operation part, the diagnosis words can be extracted through a trained natural language processing model, for example. The natural language processing model is trained by learning diagnostic word labels of a professional physician. For example, if the medical record information includes "prompt for liver fat infiltration, liver cyst", the extracted diagnosis words may be "liver fat infiltration" and "liver cyst".
In step S104, for each disease diagnosis word or surgical operation diagnosis word, the relevance scores thereof to all the standard words in the pre-established standard word library are calculated, and a predetermined number of the most relevant standard words are recalled according to the ranking of the relevance scores. The standard words are words in ICD (International Classification of Diseases) 10 and ICD 9. The number of most relevant criterion words recalled may compromise the processing speed and accuracy determination, and may be, for example, between 40 and 70, such as 50, such as 55, such as 60, etc.
The relevance score between the diagnostic word and the standard word can be calculated using conventional BM25 algorithms. The BM25 algorithm is commonly used as a search relevance score that morphemes search term Q to generate morpheme Qi(ii) a Then, for each search result D, each morpheme q is calculatediThe correlation with D is scored and finally, q is assignediRelative to DAnd carrying out weighted summation on the correlation scores, thereby obtaining the correlation score of Q and D. The general formula of the BM25 algorithm is as follows:
Figure BDA0003412663780000031
since the BM25 algorithm is well known, it is not described herein.
After calculating the relevance scores of each diagnostic word and all the standard words by using the BM25 algorithm, the most relevant standard words are recalled by ranking according to the relevance scores, for example, 50.
In step S106, the text similarity between each diagnostic word and each corresponding recalled standard word is calculated.
In an embodiment, the text similarity may be determined by the Levenshtein distance (also known as the edit distance, i.e., the minimum number of editing operations required to transition from one string to another). Calculating editing distances between the recalled standard words and corresponding diagnosis words one by one, auditing the calculated effect of the BM25 algorithm by setting a distance threshold, and if the editing distance is greater than or equal to the set distance threshold, the corresponding recalled standard words are approved; otherwise, if the edit distance is smaller than the set distance threshold, the corresponding recalled standard word is not approved.
In other embodiments, the text similarity may also be determined by other string similarity algorithms such as cosine similarity, matrix similarity, and the like.
In step S108, it is determined whether the calculated text similarity is equal to or greater than a predetermined threshold. If the calculated text similarity is equal to or greater than the predetermined threshold, the process proceeds to step S110; otherwise, the calculated text similarity is smaller than the predetermined threshold, the process proceeds to step S120.
In step S110, the diagnostic word and the corresponding standard word are input into the trained semantic similarity model in pairs and subjected to semantic similarity ranking.
In an embodiment, a large number, e.g., millions, of BERT models trained on medical data may be used to determine semantic similarity between a diagnostic word and a corresponding standard word. The BERT (bidirectional Encoder retrieval from transformations) model may be various well-known BERT models, such as ERNIE (Chinese Heart), ALBERT, etc. The similarity criteria are given by the specialist. For example: the similarity of 'fragile diabetes mellitus' and 'type 1 diabetes mellitus prophase microalbuminuria' is 0.3, the similarity of 'upper limb fracture' and 'open upper limb fracture' is 0.5 and the like.
In step S112, the standard word with the highest semantic similarity is selected as the standard diagnostic word of the corresponding diagnostic word.
In step S120, the diagnosis words having the text similarity smaller than the predetermined threshold are input into the trained medical entity recognition model to recognize the medical entities in the corresponding diagnosis words.
In an embodiment, a BERT model trained using a large amount of data labeling medical entities may be employed as a medical entity recognition model to identify medical entities in corresponding diagnostic words. For example: the data is "after consultation i am admitted to the hospital with femoral neck fracture diagnosis", the annotator will annotate the entity "femoral neck" as the part and the entity "fracture" as seen clinically. Also, the BERT model may be various well-known BERT models, such as ERNIE (Wen Heart), ALBERT, and the like.
In step S122, based on the pre-constructed knowledge-graph, the corresponding standard words are recalled from the knowledge-graph according to the identified medical entities. The knowledge graph is constructed by professional medical personnel based on a large amount of medical data, the knowledge graph comprises medical entities, entity attributes, relationships among the entities and the like, and the standard words are obtained by reasoning the relationships among the entities through the knowledge graph. Since the invention is not in the knowledge graph itself, the detailed description of the knowledge graph is omitted here. After step S122, the process proceeds to step S110.
In an embodiment, after one or more standard diagnostic words are selected, the standard diagnostic words may be converted into standard ICD (International Classification of Diseases) codes according to the one or more standard diagnostic words, and the standard ICD codes are used to update the existing corresponding ICD codes in the medical record, such as ICD for admission diagnosis, ICD for discharge main diagnosis, ICD for other diagnosis in discharge, ICD for operation and operation, and the like, so that the ICD codes of the medical record are more accurate.
The BM25 algorithm is very dependent on the accuracy of the segmentation tool and is generally unable to represent the correlation between synonyms, which makes normalization using the BM25 algorithm alone less accurate. The medical record diagnosis words are standardized by combining the BM25 algorithm, the knowledge map and the deep learning language model BERT, and the medical record quality control effect is greatly improved by two modes of text similarity and semantic similarity.
FIG. 2 shows a block diagram of a preferred embodiment of a medical records marking system according to the present invention, the system comprising:
a diagnostic word extraction module 202, configured to extract diagnostic words in the medical record information;
a related standard word recalling module 204, configured to calculate, for each diagnostic word, a relevance score between the diagnostic word and all standard words, and recall a predetermined number of most relevant standard words according to the ranking of the relevance scores;
a text similarity calculation module 206, configured to calculate a text similarity between each diagnosis word and each corresponding recalled standard word;
the semantic similarity sorting module 208 is configured to input the diagnostic words and the corresponding standard words, of which the text similarity is greater than or equal to a predetermined threshold, into the trained semantic similarity model in pairs and perform semantic similarity sorting;
the standard word selecting module 210 is configured to select a standard word with the highest semantic similarity as a standard diagnostic word of the corresponding diagnostic word.
In another embodiment, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method embodiment or other corresponding method embodiments described in conjunction with fig. 1 or implements the functions of the system embodiment or other corresponding system embodiments described in conjunction with fig. 2, and is not described herein again.
In another embodiment, the present invention provides a computer device, including a processor, a memory, and a computer program stored in the memory and capable of running on the processor, where the processor implements the steps of the method embodiment or other corresponding method embodiments described in conjunction with fig. 1 or implements the functions of the system embodiment or other corresponding system embodiments described in conjunction with fig. 2 when executing the computer program, and details of the steps are not repeated herein.
The various embodiments described herein, or certain features, structures, or characteristics thereof, may be combined as suitable in one or more embodiments of the invention. Additionally, in some cases, the order of steps depicted in the flowcharts and/or in the pipelined process may be modified, as appropriate, and need not be performed exactly in the order depicted. In addition, various aspects of the invention may be implemented using software, hardware, firmware, or a combination thereof, and/or other computer implemented modules or devices that perform the described functions. Software implementations of the present invention may include executable code stored in a computer readable medium and executed by one or more processors. The computer-readable medium may include a computer hard drive, ROM, RAM, flash memory, portable computer storage media such as CD-ROM, DVD-ROM, flash drives, and/or other devices with a Universal Serial Bus (USB) interface, and/or any other suitable tangible or non-transitory computer-readable medium or computer memory on which executable code may be stored and executed by a processor. The present invention may be used in conjunction with any suitable operating system.
As used herein, the singular forms "a", "an" and "the" include plural references (i.e., have the meaning "at least one"), unless the context clearly dictates otherwise. It will be further understood that the terms "has," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
The foregoing describes some preferred embodiments of the present invention, but it should be emphasized that the invention is not limited to these embodiments, but can be implemented in other ways within the scope of the inventive subject matter. Various modifications and alterations of this invention will become apparent to those skilled in the art without departing from the spirit and scope of this invention.

Claims (10)

1. A method for marking a medical record, the method comprising:
extracting diagnosis words in the medical record information;
for each diagnosis word, calculating the relevance scores of the diagnosis word and all standard words, and recalling a preset number of most relevant standard words according to the ranking of the relevance scores;
calculating the text similarity of each diagnostic word and each corresponding recalled standard word;
inputting every two diagnostic words with text similarity greater than or equal to a preset threshold value and corresponding standard words into the trained semantic similarity model and performing semantic similarity sequencing;
and selecting the standard word with the highest semantic similarity as the standard diagnostic word of the corresponding diagnostic word.
2. The method of claim 1, further comprising:
inputting the diagnosis words with the text similarity smaller than a preset threshold value into the trained medical entity recognition model to recognize the medical entities in the corresponding diagnosis words;
recalling corresponding standard words from the knowledge graph according to the identified medical entities based on the pre-constructed knowledge graph;
inputting every two of the diagnostic words with the text similarity smaller than a preset threshold value and the standard words recalled from the knowledge graph into the trained semantic similarity model and sequencing the semantic similarities.
3. The method of claim 1, further comprising:
and determining the ICD code of the medical record according to one or more standard diagnostic words.
4. The method of claim 1, wherein the relevance score is calculated using the BM25 algorithm.
5. The method of claim 1, wherein the text similarity is an edit distance.
6. The method of claim 1, wherein the trained semantic similarity model is a trained BERT model.
7. The method of claim 1, wherein the trained medical entity recognition model is a BERT model trained using data labeling medical entities.
8. A medical record marking system, comprising:
the diagnostic word extraction module is used for extracting diagnostic words in the medical record information;
the relevant standard word recalling module is used for calculating the relevance scores of each diagnosis word and all the standard words and recalling the most relevant standard words in a preset number according to the ranking of the relevance scores;
the text similarity calculation module is used for calculating the text similarity of each diagnosis word and each corresponding recalled standard word;
the semantic similarity sorting module is used for inputting every two diagnostic words with text similarity greater than or equal to a preset threshold value and corresponding standard words into the trained semantic similarity model and sorting the semantic similarity;
and the standard word selecting module is used for selecting the standard word with the highest semantic similarity as the standard diagnostic word of the corresponding diagnostic word.
9. A computer device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program realizes the steps of the method according to any of the claims 1-7 or the functions of the system according to claim 8.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7 or the functions of the system according to claim 8.
CN202111536210.9A 2021-12-15 2021-12-15 Medical record marking method, system and corresponding equipment and storage medium Pending CN114446422A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111536210.9A CN114446422A (en) 2021-12-15 2021-12-15 Medical record marking method, system and corresponding equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111536210.9A CN114446422A (en) 2021-12-15 2021-12-15 Medical record marking method, system and corresponding equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114446422A true CN114446422A (en) 2022-05-06

Family

ID=81363551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111536210.9A Pending CN114446422A (en) 2021-12-15 2021-12-15 Medical record marking method, system and corresponding equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114446422A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115910213A (en) * 2022-10-26 2023-04-04 广州金域医学检验中心有限公司 Method, device, equipment and medium for screening human phenotype ontology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115910213A (en) * 2022-10-26 2023-04-04 广州金域医学检验中心有限公司 Method, device, equipment and medium for screening human phenotype ontology
CN115910213B (en) * 2022-10-26 2023-12-29 广州金域医学检验中心有限公司 Screening method, device, equipment and medium for human phenotype ontology

Similar Documents

Publication Publication Date Title
US8886514B2 (en) Means and a method for training a statistical machine translation system utilizing a posterior probability in an N-best translation list
US8538745B2 (en) Creating a terms dictionary with named entities or terminologies included in text data
CN106844351B (en) Medical institution organization entity identification method and device oriented to multiple data sources
CN108021553A (en) Word treatment method, device and the computer equipment of disease term
EP4026047A1 (en) Automated information extraction and enrichment in pathology report using natural language processing
JP6767042B2 (en) Scenario passage classifier, scenario classifier, and computer programs for it
CN111627512A (en) Recommendation method and device for similar medical records, electronic equipment and storage medium
CN110096572B (en) Sample generation method, device and computer readable medium
US20140255886A1 (en) Systems and Methods for Content Scoring of Spoken Responses
CN114996388A (en) Intelligent matching method and system for diagnosis name standardization
CN111814463B (en) International disease classification code recommendation method and system, corresponding equipment and storage medium
CN112037909B (en) Diagnostic information review system
CN111506673A (en) Medical record classification code determination method and device
CN109299467B (en) Medical text recognition method and device and sentence recognition model training method and device
CN111597789A (en) Electronic medical record text evaluation method and equipment
CN115983233A (en) Electronic medical record duplication rate estimation method based on data stream matching
CN114446422A (en) Medical record marking method, system and corresponding equipment and storage medium
Sedghi et al. Mining clinical text for stroke prediction
CN111177309A (en) Medical record data processing method and device
JP4979637B2 (en) Compound word break estimation device, method, and program for estimating compound word break position
CN113111660A (en) Data processing method, device, equipment and storage medium
CN117422074A (en) Method, device, equipment and medium for standardizing clinical information text
CN111368547A (en) Entity identification method, device, equipment and storage medium based on semantic analysis
JP2017021523A (en) Term meaning code determination device, method and program
JP5239161B2 (en) Language analysis system, language analysis method, and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination