CN110991170A

CN110991170A - Chinese disease name intelligent standardization method and system based on electronic medical record information

Info

Publication number: CN110991170A
Application number: CN201911232227.8A
Authority: CN
Inventors: 邓柯; 李祺; 刘军
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2020-04-10
Anticipated expiration: 2039-12-05
Also published as: CN110991170B

Abstract

The invention provides a Chinese disease name intelligent standardization method and system based on electronic medical record information, wherein the method comprises the following steps: acquiring an electronic medical record to be processed, and extracting a disease name from the electronic medical record; inputting the extracted disease name into a preset standardized probability model to obtain a standardized code corresponding to the disease name; the generation mode of the standardized probability model is as follows: acquiring a plurality of historical electronic medical records, extracting disease names and disease codes corresponding to the disease names from the historical electronic medical records, and generating a mapping list of the disease names and the disease codes; selecting at least two classification algorithms from preset classification algorithms to establish at least two prediction models, and selecting data in a disease name and disease code mapping list to train the at least two prediction models respectively; and carrying out model averaging on the at least two prediction models to obtain a standardized probability model. The method can more accurately obtain the standardized codes corresponding to the disease names of the electronic medical records.

Description

Chinese disease name intelligent standardization method and system based on electronic medical record information

Technical Field

The invention relates to the technical field of intelligent standardization of Chinese disease names of electronic medical record information, in particular to an intelligent standardization method and system of Chinese disease names based on electronic medical record information.

Background

The system research on the electronic medical records is helpful for people to know the occurrence and propagation conditions of various diseases in the crowd more systematically, thereby providing help for improving the public health environment. In the research of electronic medical records, the identification of disease names is an important ring. In order to facilitate communication and exchange among various regions, the world health organization classifies Diseases according to characteristics of causes, pathology, clinical manifestations, anatomical locations and the like of the Diseases, and establishes a set of international unified disease classification method, the international classification of Diseases and related health problems (ICD-10) of the 10 th revision is currently used, and the coding system codes the Diseases by a method of adding letters and numbers. However, when the actual electronic medical record is entered, due to different naming habits of doctors, limited entry time and the like, the name of a disease input by a doctor in the system is often inconsistent with the standard disease code, and the inconsistency of the name use does not bring too much difficulty to communication among medical workers, but brings great trouble to researchers of the electronic medical record, and the reasons are mainly as follows: the disease coding standards are not completely unified, although China medical institutions generally use ICD-10 codes as basic disease coding systems, different medical institutions often perform personalized modification and expansion of ICD-10 coding systems in international standards to different degrees according to self business requirements, so that the disease coding systems in different regions have partial codes which are not consistent. How to accurately identify the standard disease codes corresponding to the non-standard disease names is a technical problem which needs to be solved urgently.

Disclosure of Invention

In view of the above, there is a need for a method and a system for intelligent standardization of disease names based on electronic medical record information, which can output standardized codes corresponding to disease names of electronic medical records more accurately.

The first aspect of the present application provides a method for intelligently standardizing Chinese disease names based on electronic medical record information, the method comprising:

acquiring an electronic medical record to be processed, and extracting a disease name from the electronic medical record;

inputting the extracted disease name into a preset standardized probability model to obtain a standardized code corresponding to the disease name;

the generation mode of the standardized probability model is as follows:

acquiring a plurality of historical electronic medical records, extracting disease names and disease codes corresponding to the disease names from the historical electronic medical records, and generating a mapping list of the disease names and the disease codes;

selecting at least two classification algorithms from preset classification algorithms to establish at least two prediction models, and selecting data in the disease name and disease code mapping list to train the at least two prediction models respectively;

and carrying out model averaging on the at least two prediction models to obtain a standardized probability model.

Preferably, the training method of the prediction model comprises:

acquiring a plurality of historical electronic medical records, wherein the plurality of historical electronic medical records comprise disease names and disease codes corresponding to the disease names;

acquiring disease names and disease codes corresponding to the disease names from the historical electronic medical records, and generating a mapping list of the disease names and the disease codes;

performing cross validation on the data in the mapping list for preset times, and dividing the data into a training set and a validation set in each cross validation;

establishing a prediction model based on a classification algorithm, and training the prediction model by using the training set;

predicting the disease names in the verification set by using the trained prediction model, and comparing the predicted disease codes with the disease codes corresponding to the disease names in the verification set;

if the standard codes of the predicted disease names are inconsistent with the disease codes corresponding to the disease names in the verification set through comparison, the disease names in the verification set are corrected and substituted into the prediction model again for prediction.

Preferably, the preset classification algorithm includes: naive Bayes algorithm, multi-classification support vector machine algorithm, logistic regression classification algorithm, decision tree classification algorithm, neural network algorithm.

Preferably, the step of performing model averaging on the at least two classification prediction models to obtain a normalized probability model comprises:

counting the prediction accuracy of a prediction model based on different classification algorithms;

giving preset weight values according to the prediction accuracy rates of the different prediction models, wherein the weight values given by the prediction models with high prediction accuracy rates are high;

wherein the normalized probability model comprises the prediction models based on the different classification algorithms and a weight corresponding to each prediction model.

Preferably, the steps further comprise:

obtaining a prediction result of the standardized probability model, and establishing a mapping rule database and an error matching data list according to the prediction result;

the mapping rule database stores the mapping relation between the acquired disease name in the electronic medical record and the disease code;

the error matching data list stores the disease name and the disease code list which are not matched with the disease code in the acquired electronic medical record.

Preferably, the step of obtaining the prediction result of the normalized probability model and establishing the mapping rule database and the error matching data list according to the prediction result comprises:

comparing the prediction probability of the disease name predicted according to the standardized probability model under the standardized code with a preset probability threshold;

if the prediction probability of the disease name predicted by the standardized probability model under the standardized code of the disease name is larger than a preset probability threshold, storing the mapping relation between the disease name and the disease code acquired from the electronic medical record into a mapping rule database, wherein the mapping relation between the disease name and the disease code in the acquired electronic medical record is stored in the mapping rule database;

if the prediction probability of the disease name predicted by the standardized probability model under the standardized code of the disease name is smaller than a preset probability threshold, storing the mapping relation between the disease name and the disease code acquired in the electronic medical record into an error matching data list, wherein the error matching data list stores the disease name and the disease code list which are unmatched with the disease name and the disease code in the acquired electronic medical record.

Preferably, the steps further comprise:

receiving a correction instruction, and correcting the data in the error matching data list, wherein the corrected content comprises one or two of the following contents: correcting the description mode of the disease name, and correcting the disease code which is not matched with the disease name.

Preferably, the method further comprises:

searching for a standardized disease name corresponding to the standardized code according to the predicted standardized code;

outputting the standardized disease name.

A second aspect of the present application provides an intelligent standardization system for chinese disease names based on electronic medical record information, the system comprising:

the acquisition module is used for acquiring an electronic medical record to be processed and extracting a disease name from the electronic medical record;

the prediction module is used for inputting the extracted disease name into a preset standardized probability model to obtain a standardized code corresponding to the disease name;

the generation mode of the standardized probability model is as follows:

The Chinese disease name intelligent standardization method based on the electronic medical record information adopts different classification algorithms to create a plurality of prediction models, carries out model averaging on the prediction models to obtain a standardized probability model, and outputs a standardized disease code to an input disease name to be standardized by using the standardized probability model.

Drawings

Fig. 1 is an application environment diagram of the intelligent standardization method for chinese disease names based on electronic medical record information according to an embodiment of the present invention.

Fig. 2 is a flowchart of a method for generating a normalized probability model according to an embodiment of the present invention.

Fig. 3 is a flowchart of an intelligent standardization method for chinese disease names based on electronic medical record information according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of an intelligent standardization system for chinese disease names based on electronic medical record information according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a computer device according to an embodiment of the invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The Chinese disease name intelligent standardization method based on electronic medical record information is applied to a user terminal 1, and the user terminal 1 and a computer device 2 are in communication connection through a network. The network may be a wired network or a Wireless network, such as radio, Wireless Fidelity (WIFI), cellular, satellite, broadcast, etc.

The user terminal 1 may be an electronic device installed with an intelligent standardization method for chinese disease names based on electronic medical record information, including but not limited to a smart phone, a tablet computer, a laptop convenient computer, a desktop computer, and the like.

The computer device 2 may be an electronic device storing an electronic medical record, such as a personal computer, a server, and the like, wherein the server may be a single server, a server cluster, a cloud server, or the like.

Please refer to fig. 2, which is a flowchart illustrating a method for generating a normalized probability model according to an embodiment of the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

And step S11, acquiring a plurality of historical electronic medical records, extracting disease names and disease codes corresponding to the disease names from the historical electronic medical records, and generating a mapping list of the disease names and the disease codes.

In an embodiment of the present invention, the method for acquiring an electronic medical record may be implemented by retrieving medical record information in a medical record repository of a hospital, where the medical record information includes outpatient medical record information and inpatient medical record information.

In an embodiment of the present invention, after the disease name in the medical record information and the disease code corresponding to the disease name are extracted, the extracted disease name and the disease code corresponding to the disease name are further preprocessed, where the preprocessed content includes removing redundant spaces, punctuations, and character information in the disease name and the disease code. And correspondingly storing the preprocessed disease name and the disease code corresponding to the disease name to obtain a mapping list of the disease name and the disease code.

Step S12, selecting at least two classification algorithms from preset classification algorithms to establish at least two prediction models, and selecting the disease names and the data in the disease code mapping list to train the at least two prediction models respectively.

In an embodiment of the present invention, the classification algorithm may include: naive Bayes algorithm, multi-classification support vector machine algorithm, logistic regression classification algorithm, decision tree classification algorithm, neural network algorithm.

The training method of the prediction model can comprise the following steps:

performing cross validation on the data in the mapping list for a preset number of times, and dividing the data into a training set and a validation set in each cross validation, where in an embodiment, the preset number of times may be 5 times, and in another embodiment, the preset number of times may be 10 times;

establishing a prediction model based on a classification algorithm, and training the prediction model based on the classification algorithm by using the training set;

In one embodiment, the at least two classification prediction models include a first classification prediction model based on a logistic regression classification algorithm, a second classification prediction model based on a multi-classification support vector machine algorithm, a third classification prediction model based on a naive bayes algorithm, a fourth classification model based on a decision tree classification algorithm, and a fifth classification prediction model based on a neural network algorithm.

For example, in one embodiment, the at least two classification prediction models are the first classification prediction model and the second classification prediction model, respectively. The training method of the first prediction model comprises the steps of dividing the disease names and data in a disease code mapping list into a training set and a verification set, introducing the data in the training set into a logistic regression classification algorithm for constructing the first prediction model, wherein the disease names and the disease codes in the training set have one-to-one correspondence, and a logistic regression classification algorithm uses a logistic function as a connection function to convert a predicted value on a real number domain into a value between 0 and 1, so that probability distribution of samples belonging to different categories can be given, and the purpose of classification is achieved. That is, the probability distribution of the disease name belonging to different disease codes is calculated by a logistic regression classification algorithm, for example, the disease name is typhoid, and the probability of the disease name appearing under the standardized code A01.101 is 99%, the probability of the disease name appearing under the standardized code A01.002 is 0.2%, and the probability of the disease name appearing under the standardized code A01.003 is 0.12% through the first prediction model.

The training method of the second prediction model is to divide data in a disease name and disease code mapping list into a training set and a verification set, and import the data in the training set into a multi-classification support vector machine algorithm for constructing the second prediction model, wherein the multi-classification support vector machine algorithm realizes multi-classification of samples by orderly constructing a plurality of decision boundaries, and realizes multi-classification in a one-to-one mode, the calculation process is a method of using a standard support vector machine for any 2 of m classifications, namely m (m +1)/2 classifications are performed in total, and the classification of the samples is judged according to the highest-score classification in all judgment results. The standard disease name corresponding to the disease name can be predicted through the multi-classification support vector machine algorithm.

In other embodiments, the data in the mapping list of the disease names and the disease codes are obtained to construct a third prediction model based on a naive bayes algorithm, and the methods of the fourth classification model based on a decision tree classification algorithm and the fifth prediction model based on a neural network algorithm are the prior art and are not described again.

And step S13, carrying out model average on the at least two prediction models to obtain a standardized probability model.

In an embodiment of the present invention, the step of performing model averaging on the at least two prediction models to obtain a normalized probability model includes:

For example, the normalized probability model includes five prediction models based on different classification algorithms and weights of the different prediction models, the five prediction models are respectively a first classification prediction model based on a logistic regression classification algorithm, a second classification prediction model based on a multi-classification support vector machine algorithm, a third classification prediction model based on a naive bayes algorithm, a fourth classification model based on a decision tree classification algorithm and a fifth classification prediction model based on a neural network algorithm, and according to prediction accuracy rates of the five prediction models, the prediction accuracy rate of the prediction model is obtained by statistics of verification results of the verification set in step S12. Respectively giving a preset weight value to the five prediction models according to the prediction accuracy, for example, the accuracy of a first classification prediction model based on a logistic regression classification algorithm is 93.2%, the weight value is 0.2, the accuracy of a second classification prediction model based on a multi-classification support vector machine algorithm is 98.7%, the weight value is 0.28, the accuracy of a third classification prediction model based on a naive Bayes algorithm is 99.2%, the weight value is 0.32, the accuracy of a fourth classification model based on a decision tree classification algorithm is 90.1%, the weight value is 0.1, the accuracy of a fifth classification prediction model based on a neural network algorithm is 92.1%, and the weight value is 0.1. Inputting any disease name into the standardized probability model, calculating the probability that the disease name belongs to a standard disease code by using the five prediction models, and calculating the probability that the disease name belongs to a standard disease code by using a weighted average method according to the probabilities calculated by the five prediction models.

In another embodiment of the present invention, the steps further comprise:

The step of obtaining the prediction result of the normalized probability model and establishing the mapping rule database and the error matching data list according to the prediction result may include:

The steps further include receiving a correction instruction to correct the data in the mismatch data list, wherein the contents of the correction include one or both of the following: correcting the description mode of the disease name, and correcting the disease code which is not matched with the disease name.

For example, the disease name in the electronic medical record is acquired as paratyphoid b, prediction is performed through the standardized probability model, the probability of a01.201 of the disease name under the standardized code of the disease name is 99% after prediction, the prediction probability is compared with a preset probability threshold, for example, the preset probability threshold is 95%, the prediction probability is greater than the preset probability threshold, the acquired disease name and the disease code are stored in a mapping rule database, data in the mapping rule data is a corresponding relation between the original disease name and the disease code, which can output the standardized code higher than the preset probability threshold through the standardized probabilistic model, and the corresponding relation is stored in the mapping rule database and can be used for training or optimizing the prediction model. For another example, the disease name in the electronic medical record is acquired as the abnormal paratyphoid, the standardized probability model is used for predicting, the probability of A01.201 of the disease name under the standardized code of the disease name is obtained after prediction is 30%, the probability is smaller than a preset probability threshold value, the disease name acquired by searching the electronic medical record is the abnormal paratyphoid after searching, the disease code corresponding to the disease name is A01.201, the disease name and the disease code acquired in the electronic medical record are stored in an error matching data list, and the disease name and the disease code acquired in the electronic medical record are corrected, wherein the corrected content comprises the standardization of the description mode of the disease name or the correction of the wrong disease code, for example, the abnormal paratyphoid is changed into the B paratyphoid.

The above is a method for establishing a standardized probabilistic model, which can be completed offline.

Referring to fig. 2, a flowchart of a method for generating a normalized probability model according to an embodiment of the present invention is shown. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

And step S21, acquiring the electronic medical record to be processed, and extracting the disease name from the electronic medical record.

In an embodiment of the present invention, the method for acquiring an electronic medical record may be implemented by retrieving medical record information in a medical record repository of a hospital, where the medical record information includes outpatient medical record information and inpatient medical record information. And extracting the disease name in the electronic medical record, and preprocessing the extracted disease name, wherein the preprocessing content comprises removing redundant blank spaces, punctuations and character information in the disease name.

And step S22, inputting the extracted disease name into a preset standardized probability model to obtain a standardized code corresponding to the disease name.

The normalized probability model is the normalized probability model generated in steps S11-S13.

For example, the normalized probability model includes five prediction models based on different classification algorithms and weights of the different prediction models, and the five prediction models are: a first classification prediction model based on a logistic regression classification algorithm, a second classification prediction model based on a multi-classification support vector machine algorithm, a third classification prediction model based on a naive Bayes algorithm, a fourth classification model based on a decision tree classification algorithm and a fifth classification prediction model based on a neural network algorithm, according to the prediction accuracy of the five prediction models, respectively endowing the five prediction models with a preset weight value, for example, the weight value of the first classification prediction model based on the logistic regression classification algorithm is 0.2, the weight value of the second classification prediction model based on the multi-classification support vector machine algorithm is 0.28, the weight value of the third classification prediction model based on the naive bayes algorithm is 0.32, the weight value of the fourth classification model based on the decision tree classification algorithm is 0.1, and the weight value of the fifth classification prediction model based on the neural network algorithm is 0.1. Inputting the disease name obtained in step S21 into the standardized probability model, calculating the probability that the disease name belongs to a standard disease code using the five prediction models, and calculating the probability that the disease name belongs to a standard disease code using a weighted average method using the probabilities calculated by the five prediction models.

The above fig. 3 describes the method for intelligently standardizing the name of the chinese disease based on the electronic medical record information in detail, and the functional modules of the software system for implementing the method for intelligently standardizing the name of the chinese disease based on the electronic medical record information and the hardware device architecture for implementing the method for intelligently standardizing the name of the chinese disease based on the electronic medical record information are described below with reference to fig. 4 to 5.

It is to be understood that the embodiments are illustrative only and that the scope of the claims is not limited to this configuration.

In some embodiments, the Chinese disease name intelligent normalization system 10 based on electronic medical record information runs in a computer device. The computer device is connected with a plurality of user terminals through a network. The Chinese disease name intelligent standardization system 10 based on electronic medical record information can comprise a plurality of functional modules consisting of program code segments. The program codes of the program segments in the electronic medical record information-based intelligent standardization system 10 for Chinese disease names can be stored in the memory of the computer device and executed by the at least one processor to realize the electronic medical record information-based intelligent standardization function for Chinese disease names.

In this embodiment, the intelligent standardized system 10 for chinese disease names based on electronic medical record information can be divided into a plurality of functional modules according to the functions executed by the system. Referring to fig. 4, the functional modules may include: an obtaining module 101 and a predicting module 102. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.

The acquisition module 101 is configured to acquire an electronic medical record to be processed, and extract a disease name from the electronic medical record.

The prediction module 102 is configured to input the extracted disease name into a preset standardized probability model, so as to obtain a standardized code corresponding to the disease name.

In some embodiments, the prediction module 102 is also used to train a standard normalized probability model.

Acquiring a plurality of historical electronic medical records, extracting disease names and disease codes corresponding to the disease names from the historical electronic medical records, and generating a mapping list of the disease names and the disease codes.

Selecting at least two classification algorithms from preset classification algorithms to establish at least two prediction models, and selecting the data in the disease name and disease code mapping list to train the at least two prediction models respectively.

The training method of the prediction model can comprise the following steps:

In another embodiment of the present invention, the steps further comprise:

FIG. 5 is a diagram of a computer device according to a preferred embodiment of the present invention.

The user terminal 1 comprises a memory 20, a processor 30 and a computer program 40 stored in the memory 20 and operable on the processor 30, such as a chinese disease name intelligent standardization program based on electronic medical record information. The processor 30 executes the computer program 40 to implement the steps in the above-mentioned embodiment of the intelligent standardization method for Chinese disease names based on electronic medical record information, or the processor 30 executes the computer program 40 to implement the functions of each module/unit in the above-mentioned embodiment of the intelligent standardization system for Chinese disease names based on electronic medical record information, for example, the unit 101 and the unit 102 in fig. 4.

Illustratively, the computer program 40 may be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 40 in the user terminal 1. For example, the computer program 40 may be divided into an acquisition module 101 and a prediction module 102 in fig. 4. The function of the functional module is detailed in the third embodiment.

The user terminal 1 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. It will be appreciated by a person skilled in the art that the schematic diagram is only an example of the user terminal 1 and does not constitute a limitation of the user terminal 1, and that it may comprise more or less components than those shown, or some components may be combined, or different components, for example, the user terminal 1 may further comprise input and output devices, network access devices, buses, etc.

The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor 30 may be any conventional processor or the like, the processor 30 being the control center of the user terminal 1, various interfaces and lines connecting the various parts of the entire user terminal 1.

The memory 20 may be used to store the computer program 40 and/or the modules/units, and the processor 30 implements various functions of the user terminal 1 by running or executing the computer program and/or the modules/units stored in the memory 20 and calling data stored in the memory 20. The memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the user terminal 1, and the like. In addition, the memory 20 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The integrated modules/units of the user terminal 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and which, when executed by a processor, may implement the steps of the above-described embodiments of the method. Wherein the computer program comprises computer program code, which may be in source code form, object code form, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

In the embodiments provided in the present invention, it should be understood that the disclosed computer apparatus and method can be implemented in other ways. For example, the above-described embodiments of the computer apparatus are merely illustrative, and for example, the division of the units is only one logical function division, and there may be other divisions when the actual implementation is performed.

In addition, functional units in the embodiments of the present invention may be integrated into the same processing unit, or each unit may exist alone physically, or two or more units are integrated into the same unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The units or computer means recited in the computer means claims may also be implemented by the same unit or computer means, either in software or in hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. An intelligent standardization method for Chinese disease names based on electronic medical record information is characterized by comprising the following steps:

the generation mode of the standardized probability model is as follows:

2. The intelligent standardization method for Chinese disease names based on electronic medical record information as claimed in claim 1, wherein the training method for the prediction model comprises:

3. The intelligent standardization method for Chinese disease names based on electronic medical record information as claimed in claim 2, wherein the predetermined classification algorithm comprises: naive Bayes algorithm, multi-classification support vector machine algorithm, logistic regression classification algorithm, decision tree classification algorithm, neural network algorithm.

4. The intelligent standardization method for Chinese disease names based on electronic medical record information as claimed in claim 3, wherein the step of performing model averaging on the at least two prediction models to obtain a standardized probability model comprises:

5. The intelligent standardization method of Chinese disease names based on electronic medical record information as claimed in claim 4, wherein said steps further comprise:

6. The intelligent standardization method of chinese disease names based on electronic medical record information as claimed in claim 5, wherein the step of obtaining the prediction result of the standardized probabilistic model and establishing the mapping rule database and the list of mismatching data according to the prediction result comprises:

7. The intelligent standardization method of Chinese disease names based on electronic medical record information as claimed in claim 6, wherein said steps further comprise:

8. The intelligent standardization method for Chinese disease names based on electronic medical record information as claimed in claim 1, wherein the method further comprises:

outputting the standardized disease name.

9. An intelligent Chinese disease name standardization system based on electronic medical record information is characterized by comprising the following components:

the generation mode of the standardized probability model is as follows:

10. The system of claim 9, wherein the step of performing model averaging on the at least two predictive models to obtain a normalized probability model comprises:

wherein the normalized probability model comprises the prediction models based on the different classification algorithms and weights of the different prediction models.