CN109637605B - Electronic medical record structuring method and computer-readable storage medium - Google Patents
Electronic medical record structuring method and computer-readable storage medium Download PDFInfo
- Publication number
- CN109637605B CN109637605B CN201811513668.0A CN201811513668A CN109637605B CN 109637605 B CN109637605 B CN 109637605B CN 201811513668 A CN201811513668 A CN 201811513668A CN 109637605 B CN109637605 B CN 109637605B
- Authority
- CN
- China
- Prior art keywords
- attribute
- knowledge base
- keywords
- medical record
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Epidemiology (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention provides an electronic medical record structuring method and a computer readable storage medium. Wherein, the method comprises the following steps: loading a first medical knowledge base; the first electronic medical record is divided into sentences according to special symbols to obtain a plurality of text sentences; matching each of the plurality of text sentences with attributes in the first medical knowledge base using a matching scoring algorithm; and storing the matching result. By the method and the device, the problem that the electronic medical record cannot be completely structured in the related technology is solved, and the electronic medical record is completely structured.
Description
Technical Field
The invention relates to the field of medical treatment, in particular to an electronic medical record structuring method and a computer readable storage medium.
Background
With the electronic, networked and intelligent implementation of medical systems, medical data of patients is stored in electronic medical records, including all-round information such as chief complaints, medical history, examinations, diagnoses, treatment plans, and treatments. In the context of big data, these raw data provide new possibilities for medical diagnosis decisions, allowing people to consider mining information from these medical history data, extracting rules, designing intelligent systems, and further improving medical level and quality.
However, the electronic medical record database often stores the original text entered by the doctor, and although written according to some specified templates, the electronic medical record database still has some freedom and flexibility of natural language expression. Thus, such data is not fully structured, but merely semi-structured, and is not suitable for more advanced research tasks and intelligent medical projects. This puts our requirements on structuring the original text data.
Due to the diversity of natural language expression modes and the professional nature of medical terms, the structuring method of the electronic medical record text has certain difficulty, and the current domestic development of related research is not sufficient. For the electronic medical record structuring method, the result of domestic research work is mainly to make positive or negative judgment on disease information by utilizing positive and negative semantics based on the electronic medical record at present, and the method can solve the problem of disease information calibrated by binary logic, but cannot extract information of types such as numerical values, disease degrees and the like; furthermore, no corresponding solutions have been proposed for the results of the present study on the occurrence sites of patient-related disease information. The incompleteness of information extraction forms certain limitation for medical research, development of intelligent diagnostic decision systems and other works.
The invention aims to perform complete information extraction on the electronic medical record aiming at different types of disease information and medical treatment information and realize the complete structurization of the electronic medical record text.
Disclosure of Invention
The invention provides an electronic medical record structuring method and a computer readable storage medium, which at least solve the problem that the electronic medical record cannot be completely structured in the related technology.
In a first aspect, an embodiment of the present invention provides an electronic medical record structuring method, including: loading a first medical knowledge base; the first electronic medical record is divided into sentences according to special symbols to obtain a plurality of text sentences; matching each of the plurality of text sentences with attributes in the first medical knowledge base using a matching scoring algorithm; and storing the matching result.
In a second aspect, an embodiment of the present invention provides a computer-readable storage medium, on which computer program instructions are stored, which, when executed by a processor, implement the method of the first aspect.
By the electronic medical record structuring method and the computer readable storage medium provided by the embodiment of the invention, a first medical knowledge base is loaded; the first electronic medical record is divided into sentences according to special symbols to obtain a plurality of text sentences; matching each of the plurality of text sentences with attributes in the first medical knowledge base using a matching scoring algorithm; the matching result is stored, the problem that the electronic medical record cannot be completely structured in the related technology is solved, and the complete structuring of the electronic medical record is realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:
FIG. 1 is a flow chart of a method for structuring an electronic medical record according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a hardware structure of an electronic medical record structuring device according to an embodiment of the invention;
FIG. 3 is a flow chart of a method for structuring an electronic medical record according to a preferred embodiment of the present invention;
FIG. 4 is a schematic diagram of an example of a first medical knowledge base structure in the field of prosthodontics in accordance with a preferred embodiment of the present invention;
FIG. 5 is a schematic diagram of an example of an electronic medical record in accordance with a preferred embodiment of the present invention;
FIG. 6 is a diagram illustrating the results of structured matching of electronic medical records according to the preferred embodiment of the present invention;
fig. 7 is a statistical chart of matching frequencies of attributes in the structured matching results of the electronic medical record according to the preferred embodiment of the invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In this embodiment, an electronic medical record structuring method is provided, and fig. 1 is a flowchart of the electronic medical record structuring method according to the embodiment of the present invention, and as shown in fig. 1, the flowchart includes the following steps:
step S101, loading a first medical knowledge base;
step S102, carrying out sentence segmentation on the first electronic medical record according to special symbols to obtain a plurality of text sentences;
step S103, matching each text sentence in the plurality of text sentences with attributes in a first medical knowledge base by using a matching scoring algorithm;
and step S104, storing the matching result.
Through the steps, the text sentences can be well matched with the attributes in the first medical knowledge base by using the matching scoring algorithm, and matched keywords can not only comprise disease information calibrated by binary logic, but also can be matched with information of types such as numerical values, disease degrees and the like, so that the problem that the electronic medical record in the related technology cannot be completely structured is solved, and the complete structuring of the electronic medical record is realized.
Optionally, the first medical knowledge base comprises a plurality of parts, each part comprising one or more attributes, one or more keywords corresponding to the attributes, each attribute comprising at least: attribute name, attribute value, and location, each keyword also including a score for the keyword. For example, in the first medical knowledge base, the basic unit is an attribute, which is composed of three parts, namely an attribute name, an attribute value and a position, wherein the attribute name can be a symptom, a physical feature or a treatment means of a certain disease; the corresponding attribute values can be the existence or nonexistence and the degree of severity of symptoms, the concrete expression of physical characteristics or the concrete method of treatment means and the like; the location may be a body part having a corresponding attribute. A collection of attributes collectively belong to a section (e.g., examination, treatment plan, etc.), with each section comprising the entire knowledge base.
Due to the complexity of medical diagnosis and treatment, the first medical knowledge base can be improved in several ways in this embodiment in order to describe the medical knowledge in detail and to keep the information of the original medical record as much as possible in the structuring process: a) expanding attribute value types; b) adding a "location" to each attribute to describe the body part of the corresponding attribute; c) adding description of time sequence information; d) and classifying the attributes based on the medical knowledge to form hierarchical expression of the medical knowledge.
The concrete description is as follows:
a) the first medical knowledge base attribute value types comprise real number types, Boolean types, discrete classification types and the like, and the value-taking modes of the attribute values comprise judgment, single selection, numbers, multiple selection and various combinations of the modes. This diversified expression form enables the expression of values of various attributes appearing in medicine.
b) Since most attributes in the first medical knowledge base relate to a specific body part, such as the occurrence part of disease information, the implementation part of medical measures, etc., the corresponding body part description is added to the attributes in the present embodiment. Meanwhile, adding the description of "location" requires adding extraction of "location" information in the structuralization method, which will be further explained in this embodiment.
c) Since the medical action itself is a procedural action rather than a simple static combination of various medical measures, especially treatment plans and treatment measures that are prescribed for a patient's pathology, there is a precedence relationship between the different medical measures. In order to keep the sequential dependency relationship between different medical measures, the description of time series information is added to the first medical knowledge base. For example, serialized expression of attributes may be achieved by adding two members, step and substate, to an attribute that requires the expression of a time series to describe the order in which the attribute appears during treatment.
d) Based on medical considerations, the first medical knowledge base involved in the present embodiment is divided into eight parts, namely, a chief complaint, a double visit, a present medical history, a past medical history, an examination, a diagnosis, a treatment plan and a treatment, and each part is designed and graded with respect to attributes of a specific medical field to be described. For example, in the field of dental restoration, the examination section includes examination results of both the teeth and the oral cavity, and the examination results of the oral cavity are divided into two sub-sections according to whether or not the examination results are related to the tooth position, and each of the sections includes a plurality of attributes that describe information on diseases occurring in various examinations in detail.
The first medical knowledge base can realize the structured expression of the original medical record text relatively suitably.
Optionally, the special symbol comprises at least one of: chinese and English commas, periods, line feed symbols and tab symbols.
Optionally, before loading the first medical knowledge base, the method further comprises: loading a second medical knowledge base; extracting keywords and scores thereof according to a second medical knowledge base and a second electronic medical record; and constructing a first medical knowledge base according to the second medical knowledge base, the extracted keywords and the scores thereof. In each embodiment, the structure of the first medical knowledge base needs to have corresponding specifications, and in this embodiment, a second medical knowledge base is provided, which is equivalent to a specification template of the first medical knowledge base; similar to the first medical knowledge base, the second medical knowledge base also includes a plurality of portions, each portion including one or more attributes; each attribute includes at least: attribute name, attribute value, and location. Unlike the first medical knowledge base, the second medical knowledge base does not have one or more keywords corresponding to the attributes, and score information of the keywords. The keywords and their score information are extracted from the second electronic medical record. The first medical knowledge base is constructed by adding one or more keywords and scores thereof to each attribute in the second medical knowledge base.
Optionally, extracting the keyword name and the keyword score according to the second medical knowledge base and the second electronic medical record comprises: segmenting words of text sentences in the second electronic medical record according to the attribute names and the attribute values to obtain a plurality of keywords, and taking synonyms and synonyms of the keywords as the keywords; keywords are given different scores depending on their importance (whether they are common words), negativity (whether they are negative words), and the weight of the logical relationship (AND, OR, NOT).
Optionally, matching, using a matching scoring algorithm, the attributes in the first medical knowledge base for each of the plurality of text sentences comprises: matching the keywords of all attributes and the scores of the keywords with each text sentence to obtain the total score of each text sentence corresponding to all attributes; matching the keywords and the scores thereof in the attribute of the text sentence with the total score of the attribute higher than a preset threshold value to obtain the attribute value score and the position score of the attribute corresponding to the attribute in the attribute value and the position in the text sentence; and taking the attribute value with the highest attribute value score and position score, the position and the corresponding attribute as the matching result of the text sentence. Through the matching scoring algorithm, the matching of the text sentences and the attributes is realized.
Optionally, the matching result includes: the text sentence, and the corresponding attribute, attribute value, position, affiliated part, and position of the text sentence in the first electronic medical record. When the matching result is stored, the matching result of each text sentence can be stored as a line of data, and the matching results of all the text sentences are sequentially arranged according to the time sequence and the part of the text sentence, and stored in the csv format, so that the subsequent data can be inquired and processed.
Optionally, the method further comprises: text sentences that are not correctly matched by any attribute (including text sentences that have matched an attribute but have not matched an attribute value) are extracted and saved. By the above manner, the matching degree of the text sentence can be grasped. Wherein, for each part of text sentences not matched with attributes, the following steps can be saved: a text sentence, a text starting position, a text ending position, a medical record folder number and a medical record number; for each portion of text sentences that do not match to an attribute value may be saved as: text sentences, matched attributes, text starting positions, text ending positions, medical record folder numbers and medical record numbers. The stored format is preferably the.xls format.
After the text sentences which are not correctly matched by any attribute are extracted, the text sentences can be subjected to word segmentation, sorting, manual screening and other processing to find the defects of keywords or attribute classification in the second medical knowledge base, and iterative optimization of the second medical knowledge base is realized by adding/deleting/adjusting scores and other operations on the keywords, so that the matching rate and the accuracy of the second medical knowledge base on the text sentences of the electronic medical record are further improved.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The electronic medical record structuring method according to the embodiment of the invention described in conjunction with fig. 1 can be implemented by an electronic medical record structuring device. Fig. 2 shows a hardware structure diagram of an electronic medical record structured device according to an embodiment of the present invention.
The electronic medical record structuring device may comprise a processor 21 and a memory 22 storing computer program instructions.
Specifically, the processor 21 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured as one or more Integrated circuits implementing the embodiments of the present invention.
Memory 22 may include mass storage for data or instructions. By way of example, and not limitation, memory 22 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 22 may include removable or non-removable (or fixed) media, where appropriate. The memory 22 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 22 is a non-volatile solid-state memory. In a particular embodiment, the memory 22 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.
The processor 21 reads and executes the computer program instructions stored in the memory 22 to implement any one of the electronic medical record structuring methods in the above embodiments.
In one example, the electronic medical record structured apparatus can also include a communication interface 23 and a bus 20. As shown in fig. 2, the processor 21, the memory 22, and the communication interface 23 are connected via the bus 20 to complete mutual communication.
The communication interface 23 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present invention.
The bus 20 includes hardware, software, or both that couple the components of the electronic medical record structured device to one another. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 20 may include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.
The electronic medical record structuring device can execute the electronic medical record structuring method in the embodiment of the invention based on the acquired data, thereby realizing the electronic medical record structuring method described in conjunction with fig. 1.
In addition, in combination with the electronic medical record structuring method in the foregoing embodiment, the embodiment of the present invention can provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any one of the electronic medical record structuring methods in the above embodiments.
In order that the description of the embodiments of the invention will be more apparent, reference is now made to the preferred embodiments for illustration.
The preferred embodiment provides an electronic medical record structuring method, and fig. 3 is a flowchart of the electronic medical record structuring method according to the preferred embodiment of the present invention, as shown in fig. 3, the flowchart includes the following steps:
step 1: a first medical knowledge base is constructed.
In the preferred embodiment, the building of the first medical knowledge base based on the second medical knowledge base comprises the following steps:
1. a second medical knowledge base format is defined in table 1, a detailed description of "requirements" in table 1 is given in table 2, and a schematic diagram of an example of the structure of the first medical knowledge base in the field of dental restoration is given in fig. 4.
TABLE 1A second medical knowledge base format
TABLE 2 detailed description of "requirements" in TABLE 1
Require that | Description of the invention |
Single selection | Default "unknown", attribute name score>If =1, the option with the highest score is selected |
Single selection | Selecting the option with the highest score |
Multiple selection | Select all scores>Option of =1 |
Judgment of | Default "none", attribute name score>If no negative word appears, then "yes" is selected " |
Number of | Selecting the option (unit) with highest score, and finding out the number words before the unit in the sentence |
Time | Is selected toThe time word before the unit in the sentence is found according to the highest option (unit) |
Single selection/number | Attribute name score>If =1, select the option with the highest score, or find out the number |
2. Manually segmenting all the phrases in the graph 5, and screening medical records by sampling to obtain frequently-occurring keywords (including synonyms, near-synonyms, short abbreviations, wrongly written characters and the like);
3. and adding the keywords to the second medical knowledge base behind each matched object to form the first medical knowledge base. And assigning different scores according to different importance and parts of speech corresponding to the keywords (for example, the professional term is positive score, the negative word is negative score, and the common word is 0 score). The AND or NOR relationship is also implemented by a score: for example, since a score of 1 or more is specified as a successful match, if two keywords are required to appear simultaneously, the scores of the two words may be set to 0.5, respectively. As shown in table 3.
TABLE 3 first medical knowledge base format
Step 2: and performing clause division on the first electronic medical record.
In most cases, a short sentence (divided by commas) in an electronic medical record corresponds to a set of "attribute-attribute values". Therefore, the electronic medical record is divided according to punctuation marks.
1. And (4) carrying out sentence division on the whole medical record text according to Chinese and English commas, periods, line feed symbols and tabulation symbols.
2. Handling special cases of divisions (e.g. decimal point, serial number, etc.)
And step 3: a structured format is defined.
1. The structured target basic format is: the text sentences, attributes, attribute values, positions, the affiliated parts and the corresponding positions of the texts in the electronic medical record. For the attribute needing to add time series, the target format is: the text sentence, the attribute value, the position, the step, the substep, the belonged part and the corresponding position of the text in the electronic medical record.
2. The file is used as a line of content, and the whole medical record file is arranged by sentences and stored into a csv format.
And 4, step 4: the text sentence is matched with a first medical knowledge base.
1. For each text sentence, all attributes are traversed. For each attribute, the initial value of the matching score of the attribute name and the initial value of the matching score of each option of the attribute value are set to 0.
2. And matching the attribute name, the attribute value and the position of the attribute. The specific matching process is as follows:
a) attribute name matching
And matching the keyword group corresponding to the attribute name with the text sentence, and accumulating scores (positive keywords plus negative keywords minus) if the matching is successful to obtain the total score of all keyword matching of the attribute name. And if the score exceeds a certain threshold value, the text sentence is considered to be successfully matched with the attribute name of the attribute, and the attribute value is matched.
b) Option type attribute value matching
And matching the corresponding key phrases with the text sentences for each option of the attribute values, and accumulating scores if the matching is successful to obtain the total score of the matching of all the keywords of the option. For the single-choice attribute, taking the option with the highest cumulative score as the attribute value of the attribute; for the multi-choice attribute, taking all options with the accumulated scores exceeding a certain threshold value as the attribute value of the attribute; and regarding the judgment type attribute, if the option cumulative score exceeds a certain threshold value, the attribute value is considered to be successfully matched.
c) Numerical attribute value matching
And circularly judging each character in the text sentence, finding out a continuous character string expressing numerical values in the text sentence, and converting the character string into a numerical value type serving as an attribute value of the attribute.
d) Location matching
If the attribute is related to the tooth position, the tooth position (three consecutive '/' as a feature) in the text sentence is matched by the regular expression as the value of the attribute position. If the position has a plurality of options, matching each option of the position by adopting the same method of option type attribute value matching, and selecting the option with the accumulated score meeting the requirement as the position value according to different requirements of the position value.
3. According to the score matching criteria corresponding to different requirements shown in table 3, it is determined whether the "attribute-attribute value" meets the requirements. If yes, storing the text sentence and the corresponding attribute-attribute value pair according to the format in the step 3; and if not, entering the next attribute for matching.
4. And for the condition that a plurality of attributes are matched successfully, saving each matched result.
5. And extracting time series information in the text information.
Since the different operations in the treatment planning section are ordered, they need to be represented in the structured result. For each text sentence in the treatment planning section, the sequence number of the beginning representation step of the text sentence is found as the operation sequence of the corresponding attribute of the sentence.
Since multiple solution alternatives may also occur in each step, they may also need to be embodied in the structured result. For each text sentence, judging whether the text sentence has a word representing the relation of 'or', if so, separating the text sentence and respectively carrying out attribute matching.
6. And the information in the text sentence can be sufficiently extracted based on the matching of the matching scoring algorithm. In most cases, one text sentence corresponds to one attribute; for the case that one text sentence corresponds to a plurality of attributes, the attributes can be matched according to the algorithm logic. Because the medical knowledge base related to the invention contains value descriptions of various types such as Boolean types, real number types, classification types and the like and words with positive and negative semantics are added in key phrases, the matching algorithm not only can correctly identify the positive and negative semantics of the disease information, but also can extract specific numerical information of the disease information (for explaining the severity of the disease, the measured value and the like), which cannot be realized by other current structuring methods.
The keyword group is used for matching the text sentences, so that various types of information in the text sentences can be recognized, including positive and negative semantics, different options of attribute values, numerical values and the like, and the applicability of the method is greatly expanded.
And 5: and storing the text sentences which are not completely matched in the matching process.
1. The incompletely matched file formats are: text sentences, matched attributes, text starting positions, text ending positions, medical record folder numbers and medical record file numbers.
2. And arranging the sentences which are not successfully matched in all the medical record files by sentences by taking the sentences as a line of content, and storing the sentences in the xls format.
3. And checking the matching condition of each text sentence in each medical record file. If the text sentence does not satisfy the condition of successful matching, it is saved in the xls table of the corresponding part.
Structured result analysis
In this embodiment, a set of tools for structuring the text of the medical record is developed by using the python language for the above electronic medical record structuring method, and the structuring work is performed on more than three thousand electronic medical record texts. Presentation and analytical statistics of this result will be given below.
The medical record text processed by the embodiment is from the related medical record of the dentition defect in the department of dental restoration, the medical knowledge base is obtained by sorting based on the related knowledge in the field of dental restoration, and part of the knowledge base is shown in fig. 4. An example of a medical record text is shown in FIG. 5, and a structured result is shown in FIG. 6.
From the structural results, the method achieves the following beneficial effects:
1. the position information appearing in the medical record text can be accurately identified, and the position information and the tooth position information appearing in the text such as the upper jaw and the lower jaw can be accurately identified.
2. The attributes and the corresponding attribute values in the medical record text can be effectively marked, wherein the attribute values of different types can be effectively identified.
3. The sequence of different treatment measures in the text can be effectively extracted.
Compared with the existing medical record structuring methods, the method provided by the embodiment of the invention constructs a more comprehensive first medical knowledge base, can be more attached to the medical record text, and can also more completely extract the information in the medical record text. However, the existing methods, such as the structured method based on semantic positive and negative, often can only give positive/negative judgment to the medical professional words described in the knowledge base according to the text, but cannot give more comprehensive information (such as the location, degree and the like of the onset) to the attribute.
The first medical knowledge base used in this example contains 12 sections, totaling 389 attributes. The attribute value is in types of multi-selection, single-selection judgment, numerical value and the like, and the attribute position value is in types of single-selection, tooth position taking and the like. Fig. 7 shows frequency statistics of some attributes in the example structured result, and statistics about differences in values of the attributes are not reflected in the data. As can be seen from fig. 7, the frequency of occurrence of different attributes in the three thousand medical records has a large difference, which reflects that a statistical method is provided for understanding the disease in addition to some common diseases in the medical records.
The method for structuring the electronic medical records provided by the embodiment of the invention can complete the structuring task required in the first medical knowledge base by randomly extracting a certain number of medical records and manually marking the medical records in comparison with the first medical knowledge base to measure the effect of the structuring result given by the method by taking the effect as a standard.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. An electronic medical record structuring method is characterized by comprising the following steps:
loading a first medical knowledge base, wherein the first medical knowledge base comprises a plurality of parts, and each part comprises one or more attributes and one or more keywords corresponding to the attributes; each attribute includes at least: attribute name, attribute value, position and value type/mode description, each keyword further includes: a score for the keyword;
the first electronic medical record is divided into sentences according to special symbols to obtain a plurality of text sentences;
matching, using a matching scoring algorithm, attributes in the first medical knowledge base for each of the plurality of text sentences, comprising:
matching the attribute name keywords and the scores of the attribute names of each text sentence to obtain the total score of the attribute names of each text sentence corresponding to each attribute;
matching the text sentences with the attribute name total scores of the attributes higher than the preset threshold value with the keywords and the scores of the keywords and the positions in the attributes to obtain the attribute value scores and the position scores of the attributes corresponding to the attributes in the text sentences;
taking the attribute value, the position and the corresponding attribute corresponding to the highest attribute value score and the highest position score as the matching result of the text sentence;
and storing the matching result.
2. The method of claim 1, wherein the type of attribute value comprises at least one of: real number type, boolean type, discrete classification type; the attribute value is selected in at least one of the following manners: judgment, single selection, digit, multiple selection.
3. The method of claim 1, wherein the special symbol comprises at least one of: chinese and English commas, periods, line feed symbols and tab symbols.
4. The method of claim 1, wherein prior to loading the first medical knowledge base, the method further comprises:
loading a second medical knowledge base, wherein the second medical knowledge base comprises a plurality of parts, and each part comprises one or more attributes; each attribute includes at least: attribute name, attribute value and location;
extracting keywords and scores thereof according to the second medical knowledge base and the second electronic medical record;
and constructing the first medical knowledge base according to the second medical knowledge base, the extracted keywords and the scores of the keywords.
5. The method of claim 4, wherein extracting keyword names and keyword scores from the second medical knowledge base and the second electronic medical record comprises:
performing word segmentation on the second electronic medical record according to the attribute names and the attribute values to obtain a plurality of keywords, and taking the similar words and the synonyms of the keywords as the keywords;
different scores are given to the keywords according to the importance, the negativity and the weight of the logical relationship.
6. The method of claim 1, wherein the matching result comprises: the text sentence, and the corresponding attribute, attribute value, position, belonging part and position of the text sentence in the first electronic medical record.
7. The method according to claim 4 or 5, characterized in that the method further comprises:
extracting and storing text sentences which are not correctly matched by any attribute;
performing word segmentation, sequencing and manual screening on the extracted text sentences, and comparing to find the defects of keyword or attribute classification in the second medical knowledge base;
and performing addition/deletion/score adjustment operation on the keywords to realize iterative optimization of the second medical knowledge base.
8. A computer-readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811513668.0A CN109637605B (en) | 2018-12-11 | 2018-12-11 | Electronic medical record structuring method and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811513668.0A CN109637605B (en) | 2018-12-11 | 2018-12-11 | Electronic medical record structuring method and computer-readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109637605A CN109637605A (en) | 2019-04-16 |
CN109637605B true CN109637605B (en) | 2022-05-10 |
Family
ID=66072953
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811513668.0A Active CN109637605B (en) | 2018-12-11 | 2018-12-11 | Electronic medical record structuring method and computer-readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109637605B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110277149A (en) * | 2019-06-28 | 2019-09-24 | 北京百度网讯科技有限公司 | Processing method, device and the equipment of electronic health record |
CN110704632A (en) * | 2019-08-26 | 2020-01-17 | 南京医渡云医学技术有限公司 | Method and device for processing clinical data, readable medium and electronic equipment |
TWI750513B (en) * | 2019-10-05 | 2021-12-21 | 業務人資訊有限公司 | Insurance claim and underwriting assistance system and implementation method thereof |
CN111192646A (en) * | 2019-12-30 | 2020-05-22 | 北京爱医生智慧医疗科技有限公司 | Method and device for extracting physical sign information in electronic medical record |
CN112101034B (en) * | 2020-09-09 | 2024-02-27 | 沈阳东软智能医疗科技研究院有限公司 | Method and device for judging attribute of medical entity and related product |
CN112883712B (en) * | 2021-02-05 | 2023-05-02 | 中国人民解放军南部战区总医院 | Intelligent input method and device for electronic medical record |
CN113988082A (en) * | 2021-10-28 | 2022-01-28 | 泰康保险集团股份有限公司 | Text processing method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001101184A (en) * | 1999-10-01 | 2001-04-13 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for generating structurized document and storage medium with structurized document generation program stored therein |
CN102298588A (en) * | 2010-06-25 | 2011-12-28 | 株式会社理光 | Method and device for extracting object from non-structured document |
CN107578798A (en) * | 2017-10-26 | 2018-01-12 | 北京康夫子科技有限公司 | The processing method and system of electronic health record |
CN108009157A (en) * | 2017-12-27 | 2018-05-08 | 北京嘉和美康信息技术有限公司 | A kind of sentence classifying method and device |
CN108711443A (en) * | 2018-05-07 | 2018-10-26 | 成都智信电子技术有限公司 | The text data analysis method and device of electronic health record |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1614587A (en) * | 2003-11-07 | 2005-05-11 | 杨立伟 | Method for digesting Chinese document automatically |
CN103020453B (en) * | 2012-12-15 | 2015-12-02 | 中国科学院深圳先进技术研究院 | Based on the structured electronic patient record generation method of ontology |
CN106095913A (en) * | 2016-06-08 | 2016-11-09 | 广州同构医疗科技有限公司 | A kind of electronic health record text structure method |
CN106897568A (en) * | 2017-02-28 | 2017-06-27 | 北京大数医达科技有限公司 | The treating method and apparatus of case history structuring |
CN107085655B (en) * | 2017-04-07 | 2020-11-24 | 江西中医药大学 | Traditional Chinese medicine data processing method and system based on attribute constraint concept lattice |
CN107908768A (en) * | 2017-09-30 | 2018-04-13 | 北京颐圣智能科技有限公司 | Method, apparatus, computer equipment and the storage medium of electronic health record processing |
CN108182972B (en) * | 2017-12-15 | 2021-07-20 | 中电科软件信息服务有限公司 | Intelligent coding method and system for Chinese disease diagnosis based on word segmentation network |
-
2018
- 2018-12-11 CN CN201811513668.0A patent/CN109637605B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001101184A (en) * | 1999-10-01 | 2001-04-13 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for generating structurized document and storage medium with structurized document generation program stored therein |
CN102298588A (en) * | 2010-06-25 | 2011-12-28 | 株式会社理光 | Method and device for extracting object from non-structured document |
CN107578798A (en) * | 2017-10-26 | 2018-01-12 | 北京康夫子科技有限公司 | The processing method and system of electronic health record |
CN108009157A (en) * | 2017-12-27 | 2018-05-08 | 北京嘉和美康信息技术有限公司 | A kind of sentence classifying method and device |
CN108711443A (en) * | 2018-05-07 | 2018-10-26 | 成都智信电子技术有限公司 | The text data analysis method and device of electronic health record |
Non-Patent Citations (1)
Title |
---|
基于本体的临床医学案例知识库研究;周钧;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130315;第19-32页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109637605A (en) | 2019-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109637605B (en) | Electronic medical record structuring method and computer-readable storage medium | |
CN105069124B (en) | A kind of International Classification of Diseases coding method of automation and system | |
CN106598959B (en) | Method and system for determining mutual translation relationship of bilingual sentence pairs | |
US20100174528A1 (en) | Creating a terms dictionary with named entities or terminologies included in text data | |
CN110096573B (en) | Text parsing method and device | |
CN112908436B (en) | Clinical test data structuring method, clinical test recommending method and device | |
CN113051905A (en) | Medical named entity recognition training model and medical named entity recognition method | |
CN111177375B (en) | Electronic document classification method and device | |
CN112257422A (en) | Named entity normalization processing method and device, electronic equipment and storage medium | |
CN113488157B (en) | Intelligent diagnosis guiding processing method and device, electronic equipment and storage medium | |
CN110929520A (en) | Non-named entity object extraction method and device, electronic equipment and storage medium | |
CN114970491B (en) | Text connectivity judgment method and device, electronic equipment and storage medium | |
CN114358001A (en) | Method for standardizing diagnosis result, and related device, equipment and storage medium thereof | |
CN111160034A (en) | Method and device for labeling entity words, storage medium and equipment | |
CN114281983B (en) | Hierarchical text classification method, hierarchical text classification system, electronic device and storage medium | |
CN114238639A (en) | Construction method and device of medical term standardized framework and electronic equipment | |
WO2024007810A1 (en) | Coding method and apparatus based on medical diseases and medicines | |
CN114842982B (en) | Knowledge expression method, device and system for medical information system | |
CN108573025B (en) | Method and device for extracting sentence classification characteristics based on mixed template | |
CN115757801A (en) | Decision tree-based model training method and device for medical texts | |
CN114416977A (en) | Text difficulty grading evaluation method and device, equipment and storage medium | |
CN114333461B (en) | Automatic subjective question scoring method and system | |
CN112955961B (en) | Method and system for normalization of gene names in medical text | |
CN114528824A (en) | Text error correction method and device, electronic equipment and storage medium | |
CN112257416A (en) | Inspection new word discovery method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |