CN113590842A

CN113590842A - Medical term standardization method and system

Info

Publication number: CN113590842A
Application number: CN202110897072.0A
Authority: CN
Inventors: 施淼元; 杨一帆; 李茂龙
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2021-11-02

Abstract

The embodiment of the invention provides a medical term standardization method. The method comprises the following steps: an identification step: inputting the medical phrase text to be standardized into a named entity recognition model, and outputting a recognition result, wherein the recognition result comprises an integral medical classification result and a fine-grained classification result; an alignment step: performing alignment processing on the recognition result and the standardized term names in the knowledge map by using the knowledge map with the medical category and the hierarchical structure information; a generation step: the normalized medical term is generated using the alignment process results. The embodiment of the invention also provides a medical term standardization system. The embodiment of the invention uses fine-grained medical knowledge to express and explain complex medical terms, can well distinguish similar but different diseases or symptoms according to the upper and lower levels of classification in the map, improves the accuracy of standardization of medical terms, and obtains the standard result with interpretability.

Description

Medical term standardization method and system

Technical Field

The invention relates to the field of natural language processing, in particular to a medical term standardization method and system.

Background

Regardless of Chinese and English, natural language processing always faces the problems of strong professional medical vocabulary, rich terms and multiple words and meanings in the medical field. Due to the differences in medical systems, different clinicians and health care facilities often use different clinical terms when referring to the same thing. The existing authoritative medical terms are labeled SNOMED CT (english clinical medical term), UMLS (unified medical language system), ICD10 (10 th revision of international disease classification), ICD11 (11 th revision of international disease classification), and the like. SNOMED CT (English clinical medicine term) has 350k concepts and 1.12M clinical description terms, and UMLS (unified medical language system) collects the existing health and biomedical term standards of radix stemonae, and integrates a super word narrative table containing more than 300 ten thousand concepts and more than 1400 ten thousand concept names. For the development of medical technology and the unification of medical information, a fully unified medical terminology system plays an important role for the standardization and the electronization of clinical medical information.

Common normalization or noun alignment methods are based on matching. Matching-based methods generally extract statistical and representational features of medical terms.

1. The extraction method of the statistical characteristics is realized by methods of a contribution matrix, word frequency, document frequency, tf-idf, Bayes and the like.

2. The expressed features can be obtained by calculating word vectors or semantic vectors of the medical terms through a neural network.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:

because the medical terms have strong professional and multiple words and meanings; and the results of clinical symptoms are also expected to be error-free, high in seriousness and certain in interpretability for medical care personnel and patients, and the characteristics cannot be met by a simple matching-based method.

1. The algorithm precision of the matching-based method is difficult to meet the data requirement in clinical medicine; when standardized, errors occur frequently and are not acceptable in medical diagnostic reports.

2. Since the normalized results are based on a match, it may not be known whether the normalized results are correct, why they are correct, or why they are wrong when the healthcare worker sees the normalized results. Matching-based normalization necessarily lacks interpretation of the results, which is important to healthcare workers and patients in clinical medicine.

Disclosure of Invention

The method aims to at least solve the problems that in the prior art, the standard matching is inaccurate due to strong professional vocabulary, rich terms and multiple words and meanings, the matching result is lack of interpretability, and troubles are caused to medical staff and patients.

In a first aspect, an embodiment of the present invention provides a method for normalizing medical terms, including:

an identification step: inputting a medical phrase text to be standardized into a named entity recognition model, and outputting a recognition result, wherein the recognition result comprises an integral medical classification result and a fine-grained classification result;

an alignment step: performing an alignment process on the recognition result and a standardized term name in a knowledge graph by using the knowledge graph with medical category and hierarchy structure information;

a generation step: generating a standardized medical term using the alignment process result.

In a second aspect, an embodiment of the present invention provides a medical term normalization system, including:

identifying a program module: the system comprises a named entity recognition model, a database and a database, wherein the named entity recognition model is used for inputting medical expression texts to be standardized to the named entity recognition model and outputting recognition results, and the recognition results comprise overall medical classification results and fine-grained classification results;

an alignment program module: a knowledge graph having medical category and hierarchical structure information is used for aligning the recognition result with the standardized term names in the knowledge graph;

generating a program module: for generating a standardized medical term using the alignment process result.

In a third aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the medical term normalization method of any of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, wherein the program is executed by a processor to implement the steps of the medical term normalization method according to any embodiment of the present invention.

The embodiment of the invention has the beneficial effects that: the medical terms are divided into fine granularity by using medical knowledge and standard disease classification and using a named entity recognition technology, and then the medical terms are organized into a map form by using a knowledge map technology. The fine-grained medical knowledge is used for representing and explaining complex medical terms, similar but different diseases or symptoms can be well distinguished according to upper and lower levels of classification in the map, the accuracy of standardization of medical terms is improved, and the obtained standardized result can be explained.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of a method for normalizing medical terms provided by an embodiment of the invention;

FIG. 2 is an overall flow chart of a medical term normalization method provided by an embodiment of the invention;

fig. 3 is a schematic structural diagram of a medical term normalization system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a method for normalizing medical terms according to an embodiment of the present invention, which includes the following steps:

s11: an identification step: inputting a medical phrase text to be standardized into a named entity recognition model, and outputting a recognition result, wherein the recognition result comprises an integral medical classification result and a fine-grained classification result;

s12: an alignment step: performing an alignment process on the recognition result and a standardized term name in a knowledge graph by using the knowledge graph with medical category and hierarchy structure information;

s13: a generation step: generating a standardized medical term using the alignment process result.

In general, a person skilled in the art will try to solve the standardization problem by means of word matching, and when the above-mentioned problems such as errors and the like are encountered, the person can modify the standard by adding the corresponding terminology. However, this method is not intellectual and explanatory and is subject to inexhaustible situations. In the embodiment, the medical term knowledge is analyzed and mapped to solve the problems, professional and standard medical knowledge is embedded into the knowledge map, and the medical knowledge is organized by using the fine-grained knowledge map.

For step S11, when receiving the medical term, the method first inputs the text of the medical term to be normalized to the named entity recognition model to determine the recognition result of the medical term, which includes the medical classification result and the fine-grained classification result.

The categories in the named entity recognition model of the method are trained by referring to international disease classification standards, and if medical science progresses and the classification changes, the categories can be retrained based on new classifications. For example, named entity identification is expected from the categories and common medical terms in the international disease classification standard ICD-11. Unlike conventional named entity recognition, which recognizes the entire term, the model is the overall result of recognizing the term and the result of fine-grained classification. For example, a common named entity is identified that would classify "upper labial sulcus malignancy" as a disease; the named entity recognition model of the method can be classified as: "integral: disease, site: upper labial sulcus, property: malignancy, disease: tumor ". This allows for the complete retrieval of all information contained within a term.

As an embodiment, the named entity recognition model is trained by a medical term corpus with a reference overall medical classification result and a reference fine-grained classification result, and includes:

inputting the medical term training corpus into the named entity recognition model to obtain a predicted overall medical classification result and a predicted fine-grained classification result;

training the named entity recognition model based on the errors of the benchmark overall medical classification result and the benchmark fine-grained classification result and the predicted overall medical classification result and the predicted fine-grained classification result so as to enable the predicted overall medical classification result and the predicted fine-grained classification result to approach the benchmark overall medical classification result and the benchmark fine-grained classification result.

In the embodiment, when training the named entity recognition model, a medical term training corpus is prepared, wherein the medical term training corpus may be a daily commonly arranged medical term corpus, and the corpora further carry corresponding benchmark overall medical classification results and benchmark fine-grained classification results, which may be manually determined.

After preparing the corresponding training corpus, training the named entity recognition model, and training the named entity recognition model by using the error between the prediction result and the reference result, so that the prediction result of the named entity recognition model gradually approaches to the prepared reference result, and the training of the named entity recognition model is completed.

For step S12, for a simple example, if the medical phrase is entered as "low abdomen pain," the use of the named entity recognition model in step S11 results in "whole: disease, site: lower abdomen, property: pain ". Since the word "lower abdomen" is somewhat spoken, there is still a need in the medical clinic to avoid spoken words. The recognition result is aligned using the knowledge map for the standardized term names, for example, the lower abdomen is aligned to the standard medical site "lower abdomen".

As one embodiment, the knowledge-graph is constructed from medical category and hierarchy information extracted from disease classification criteria and disease data.

In this embodiment, the disease classification criteria may refer to international disease classification (ICD) (if medical progress is made, with the updating of classification criteria, classification may be performed using the updated criteria, which is not limited herein).

For example, classification is performed according to the classification of ICD-11, and 28 classes, body parts, disease names, and the like are prepared in total. Corresponding to the 28 classes, disease data corresponding to the classes, such as a pathology book, is obtained. The knowledge map has a map structure and a hierarchical structure, such as that the lower abdomen belongs to the abdomen, the abdomen belongs to the trunk, the glioblastoma belongs to the brain glioma, and the brain glioma is the association in the brain. Thus, the knowledge graph is constructed.

For step S13, the clinical medical texts as to be standardized are: "lower abdominal pain" then the standardized result is "lower abdominal pain". In this way, a more standardized medical text is obtained, thereby helping medical personnel to clarify the condition of the disease.

According to the embodiment, the medical term is divided into fine-grained medical terms by using medical knowledge and standard disease classification and then organized into a map form by using a knowledge map technology. Fine-grained medical knowledge is used to represent and explain complex medical terms, and similar but not identical diseases or symptoms can be well distinguished according to the classification of the upper and lower levels in the map. The accuracy of medical phrase standardization is improved.

For example: the "malignant tumor of upper lip sulcus" and "benign tumor of lower lip sulcus" can be divided into "parts: upper labial sulcus, property: malignancy, disease: tumor "and" site: lower labial sulcus, property: malignancy, disease: tumor ". Then, according to the link of the medical knowledge map, it is known that the upper lip sulcus and the lower lip sulcus are different body parts belonging to the lips, and the malignant and benign tumors are 2 different shapes. In this way, the ability to accurately identify "upper lip sulcus malignant tumor" and "lower lip sulcus benign tumor" is not a result that can be normalized to each other, and can be given because of the different interpretation of the location and the trait.

As an embodiment, after the generating step, the method further comprises:

an output step: outputting the standardized medical expression and the reasoning process in the aligning step.

In the present embodiment, a standardized report is created, and the result of inference is output as a basis for standardization. As "lower gingival pus" is standardized as "lower gingival ulcer", and the process of "lower side" → "lower side", "pus discharge" → "ulcer" is shown. The prompt position is the lower part; the part is the gum; the disease is an ulcer. If clinical texts which cannot be standardized or can be only partially standardized are encountered, reasons for incapability of standardization can be given, so that medical staff can understand the clinical texts, understand reasoning processes, and facilitate the medical staff to edit if standardization is wrong.

In one embodiment, modifications to the reasoning process by medical personnel and/or patients are received, and the alignment process result is redetermined based on the modified reasoning process to generate modified standardized medical terminology.

In the embodiment, after the medical staff edits and modifies the content, the content of the incorrect reasoning can be modified correctly based on the content modified by the medical staff, so that the correct standardized medical phrase is generated.

As an embodiment, the medical term includes: clinical medical terminology of healthcare workers and/or indications of patient disorders.

In this embodiment, the medical phrase may not only help medical personnel, but also help patients. For example, ordinary people only speak about symptoms when they are suffering from pain, but do not say what sign they want to hang, and for example, hand pain can be classified into various types, "bone headache", "joint pain", "numbness of muscles on the hand". The first two are usually hung in the orthopedics, and "tingling" may be associated with the neurology of the cervical spine. At this time, the standardization of the method is applied to the disease expression of the patient, and the medical terms corresponding to the disease are clearly distinguished, so that the patient is guided to register according to the standardized disease, the problems of electronic medical record, registered guide auxiliary diagnosis, medical examination and the like can be solved, and the application scene of the method is further improved.

Overall, the overall process of the method is shown in fig. 2:

1. acquiring reference medical knowledge as a basis for standardization: the classification in the international disease classification standard ICD-11 and the book publication of infectious diseases. All medical knowledge is the standard required by the national defense and health commission, and has the profession and the authority.

2. And extracting information from the reference knowledge, and extracting medical category and level information from the reference knowledge.

3. Making a knowledge map according to the reference medical knowledge, classifying according to the classification of ICD-11, and making 28 classes, human body parts, disease names and the like in total. The knowledge map has a map structure and a hierarchical structure, such as that the lower abdomen belongs to the abdomen, the abdomen belongs to the trunk, the glioblastoma belongs to the brain glioma, and the brain glioma is the association in the brain.

4. Named entity identification is expected according to the categories and common medical terms in the international disease classification standard ICD-11. Unlike common named entity recognition, which recognizes the entire term, our model is the overall result of recognizing the term and the result of fine-grained classification. For example, a common named entity is identified that would classify "upper labial sulcus malignancy" as a disease; our model would be classified as "whole: disease, site: upper labial sulcus, property: malignancy, disease: tumor ". This allows for the complete retrieval of all information contained within a term.

5. The clinical medicine text needing standardization is led into the model, and according to the recognition result, the text can be recognized as part, disease, symptom, property, time, etiology and the like in a fine-grained manner

6. Based on the fine-grained recognition results, the normalized text is aligned to the normalized disease name, e.g., the lower abdomen is aligned to the standard medical site "lower abdomen", and if the clinical site is not present or miswritten, an explanation is made at the final normalization result. Clinical medical texts as to be standardized are: "lower abdominal pain" then the standardized result is "lower abdominal pain"

7. A standardized report is made and the results of the inference are output as a basis for standardization. If "lower gingival purulence" is standardized as "lower gingival ulcer", and it is known that the orientation is lower; the part is the gum; the disease is an ulcer. If clinical texts which cannot be standardized or can be only partially standardized are encountered, reasons for the non-standardization can also be given, and the clinical texts can be edited by medical staff.

8. And storing the standardized result into an electronic medical record or a document.

In the existing matching-based method in the industry, a medical text is treated as a common natural language text, and the method ignores the characteristics of the medical text, such as the specialty, the knowledge and the like, and is often poor in effect.

The medical knowledge map is often difficult to construct, and one of the main reasons is lack of medical knowledge and difficulty in using the English medical standard. The method constructs a standardized medical classification and hierarchical map by extracting the information of national standard and professional data. The map can be applied to tasks standardized with medical terms, and other tasks such as electronic medical record, diagnosis guide and assistance, medical examination and the like can utilize the standardized knowledge.

Matching-based methods can even improve some accuracy by iterative methods, which are not interpretable once errors occur. It is difficult for a health care worker or patient to understand why a match is wrong in the face of wrong standardized results; even in the face of unusual medical terminology, situations arise where it is uncertain whether the result of the normalization is correct.

The normalized results based on the fine-grained knowledgemaps tell the medical personnel or patient the inference logic for normalization. Reasons can be given when the standard cannot be standardized or partially standardized, and the clinical text can be edited and modified conveniently.

Standardization based on fine-grained knowledge maps is intellectual and professional, and can generate interdisciplinary cooperation according to more complex medical scenes and fields.

Fig. 3 is a schematic structural diagram of a medical term normalization system according to an embodiment of the present invention, which can execute the medical term normalization method according to any of the above embodiments and is configured in a terminal.

The present embodiment provides a medical term normalization system 10, which includes: an identification program module 11, an alignment program module 12 and a generation program module 13.

The recognition program module 11 is configured to input a medical phrase text to be standardized into a named entity recognition model, and output a recognition result, where the recognition result includes an overall medical classification result and a fine-grained classification result; the alignment program module 12 is used for aligning the recognition result with the standardized term names in the knowledge map by using the knowledge map with the medical category and the hierarchical structure information; the generator module 13 is used to generate a standardized medical expression using the alignment process result.

After the aligning program module, the system further comprises:

an output program module: for outputting the standardized medical expression and the reasoning process in the aligning step.

The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the medical term standardization method in any method embodiment;

as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium that, when executed by a processor, perform the medical term normalization method in any of the method embodiments described above.

The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

An embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the medical term normalization method of any of the embodiments of the present invention.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.

(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.

(4) Other electronic devices with data processing capabilities.

As used herein, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of medical phrase normalization, comprising:

2. The method of claim 1, wherein after the generating step, the method further comprises:

3. The method of claim 1, wherein the knowledge-graph is constructed from medical category and hierarchy information extracted from disease classification criteria and disease data.

4. The method of claim 1, wherein the named entity recognition model is trained from a medical term corpus with baseline global medical classification results and baseline fine-grained classification results, comprising:

5. The method of claim 1, wherein the medical term comprises: clinical medical terminology of healthcare workers and/or indications of patient disorders.

6. The method of claim 2, wherein the outputting step further comprises:

modifications to the reasoning process by medical personnel and/or patients are received, and the alignment process result is redetermined based on the modified reasoning process to generate modified standardized medical terminology.

7. A medical phrase normalization system, comprising:

8. The system of claim 7, wherein after the alignment program module, the system further comprises:

9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any of claims 1-6.

10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.