CN109637605B - Electronic medical record structuring method and computer-readable storage medium - Google Patents

Electronic medical record structuring method and computer-readable storage medium Download PDF

Info

Publication number
CN109637605B
CN109637605B CN201811513668.0A CN201811513668A CN109637605B CN 109637605 B CN109637605 B CN 109637605B CN 201811513668 A CN201811513668 A CN 201811513668A CN 109637605 B CN109637605 B CN 109637605B
Authority
CN
China
Prior art keywords
attribute
knowledge base
keywords
medical record
attributes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811513668.0A
Other languages
Chinese (zh)
Other versions
CN109637605A (en
Inventor
文再文
陈青筱
谢屿
张嘉琦
刘普凡
刘德斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Peking University School of Stomatology
Original Assignee
Peking University
Peking University School of Stomatology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Peking University School of Stomatology filed Critical Peking University
Priority to CN201811513668.0A priority Critical patent/CN109637605B/en
Publication of CN109637605A publication Critical patent/CN109637605A/en
Application granted granted Critical
Publication of CN109637605B publication Critical patent/CN109637605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides an electronic medical record structuring method and a computer readable storage medium. Wherein, the method comprises the following steps: loading a first medical knowledge base; the first electronic medical record is divided into sentences according to special symbols to obtain a plurality of text sentences; matching each of the plurality of text sentences with attributes in the first medical knowledge base using a matching scoring algorithm; and storing the matching result. By the method and the device, the problem that the electronic medical record cannot be completely structured in the related technology is solved, and the electronic medical record is completely structured.

Description

电子病历结构化方法及计算机可读存储介质Electronic medical record structuring method and computer readable storage medium

技术领域technical field

本发明涉及医疗领域,具体而言,涉及一种电子病历结构化方法及计算机可读存储介质。The present invention relates to the medical field, in particular, to an electronic medical record structuring method and a computer-readable storage medium.

背景技术Background technique

随着医疗系统的电子化、网络化和智能化,病人的医疗数据被保存在电子病历中,包含主诉、病史、检查、诊断、治疗计划、处置等全方位的信息。在大数据的背景下,这些原始数据提供了医疗诊断决策的新的可能性,使得人们考虑从这些病历数据中挖掘信息、提取规则,设计智能系统,进一步提高医疗水平和医疗质量。With the electronic, networked and intelligent medical system, the patient's medical data is stored in the electronic medical record, including all-round information such as chief complaint, medical history, examination, diagnosis, treatment plan, and disposal. In the context of big data, these raw data provide new possibilities for medical diagnosis and decision-making, which makes people consider mining information, extracting rules, and designing intelligent systems from these medical record data to further improve medical level and medical quality.

但是,电子病历数据库往往保存的是医生录入的原始文本,尽管是按照一些指定模板撰写的,仍然会有一些自然语言表达的自由性和灵活性。因此,这样的数据并非完全结构化的,而仅仅是半结构化的数据,并不适用于更深层次的科研任务和智能医疗项目。这为我们提出了结构化原始文本数据的要求。However, the electronic medical record database often saves the original text entered by the doctor. Although it is written according to some specified templates, it still has some freedom and flexibility of natural language expression. Therefore, such data is not fully structured, but only semi-structured data, which is not suitable for deeper scientific research tasks and smart medical projects. This creates a requirement for us to structure raw text data.

由于自然语言表达方式的多样性以及医学术语的专业性,电子病历文本的结构化方法存在一定的难度,而国内目前对相关研究的工作开展尚不充分。对于电子病历结构化方法,国内研究工作的结果目前主要是基于电子病历利用语义正反对疾病信息做出肯定或否定的判断,这种方式能够解决以二值逻辑标定的疾病信息,但对于数值、疾病程度等类型的信息则不能提取;此外,对于患者相关疾病信息的发生部位目前的研究结果也未提出对应的解决方案。这种信息提取的不完整性对于医学研究、诊断决策智能系统的开发等工作形成了一定的局限。Due to the diversity of natural language expressions and the professionalism of medical terminology, there are certain difficulties in the structuring method of electronic medical record texts, and the current domestic research work is still insufficient. For the electronic medical record structuring method, the results of domestic research work are mainly based on the use of semantic positive and negative disease information to make positive or negative judgments on electronic medical records. This method can solve the disease information calibrated by binary logic, but for numerical, Information such as the degree of disease cannot be extracted; in addition, the current research results on the occurrence site of patient-related disease information have not proposed a corresponding solution. The incompleteness of this information extraction has formed certain limitations for medical research, the development of diagnostic decision-making intelligent systems, and so on.

本发明目的是针对不同类型的疾病信息、医疗处置信息对电子病历进行完整的信息提取,实现对电子病历文本的完全结构化。The purpose of the present invention is to perform complete information extraction on the electronic medical record for different types of disease information and medical treatment information, so as to realize the complete structuring of the text of the electronic medical record.

发明内容SUMMARY OF THE INVENTION

本发明提供了一种电子病历结构化方法及计算机可读存储介质,以至少解决相关技术中电子病历不能完全结构化的问题。The present invention provides a method for structuring an electronic medical record and a computer-readable storage medium, so as to at least solve the problem that the electronic medical record cannot be completely structured in the related art.

第一方面,本发明实施例提供了一种电子病历结构化方法,包括:载入第一医学知识库;对第一电子病历按照特殊符号进行分句,得到多个文本句子;利用匹配打分算法,对所述多个文本句子中每个文本句子匹配所述第一医学知识库中的属性;保存匹配结果。In a first aspect, an embodiment of the present invention provides a method for structuring an electronic medical record, including: loading a first medical knowledge base; segmenting the first electronic medical record according to special symbols to obtain multiple text sentences; using a matching scoring algorithm , matching the attributes in the first medical knowledge base for each text sentence in the plurality of text sentences; and saving the matching result.

第二方面,本发明实施例提供了一种计算机可读存储介质,其上存储有计算机程序指令,当所述计算机程序指令被处理器执行时实现第一方面所述的方法。In a second aspect, an embodiment of the present invention provides a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the method described in the first aspect is implemented.

通过本发明实施例提供的电子病历结构化方法及计算机可读存储介质,采用载入第一医学知识库;对第一电子病历按照特殊符号进行分句,得到多个文本句子;利用匹配打分算法,对多个文本句子中每个文本句子匹配第一医学知识库中的属性;保存匹配结果,解决了相关技术中电子病历不能完全结构化的问题,实现了电子病历的完全结构化。According to the electronic medical record structuring method and the computer-readable storage medium provided by the embodiment of the present invention, the first medical knowledge base is loaded; the first electronic medical record is sentenced according to special symbols to obtain a plurality of text sentences; the matching scoring algorithm is used , matches the attributes in the first medical knowledge base for each text sentence in the multiple text sentences; saves the matching results, solves the problem that the electronic medical record cannot be completely structured in the related technology, and realizes the complete structuring of the electronic medical record.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings described herein are used to provide a further understanding of the present invention and constitute a part of the present application. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the attached image:

图1是根据本发明实施例的电子病历结构化方法的流程图;1 is a flowchart of a method for structuring an electronic medical record according to an embodiment of the present invention;

图2是根据本发明实施例的电子病历结构化设备的硬件结构示意图;2 is a schematic diagram of a hardware structure of an electronic medical record structuring device according to an embodiment of the present invention;

图3是根据本发明优选实施例的电子病历结构化方法的流程图;3 is a flowchart of a method for structuring an electronic medical record according to a preferred embodiment of the present invention;

图4是根据本发明优选实施例的口腔修复领域的第一医学知识库结构示例的示意图;4 is a schematic diagram of an example of the structure of a first medical knowledge base in the field of oral prosthodontics according to a preferred embodiment of the present invention;

图5是根据本发明优选实施例的电子病历的示例的示意图;5 is a schematic diagram of an example of an electronic medical record according to a preferred embodiment of the present invention;

图6是根据本发明优选实施例的电子病历结构化匹配结果的示意图;6 is a schematic diagram of a structural matching result of an electronic medical record according to a preferred embodiment of the present invention;

图7是根据本发明优选实施例的电子病历结构化匹配结果中属性的匹配频率统计图表。FIG. 7 is a statistical chart of matching frequency of attributes in the structural matching result of electronic medical records according to a preferred embodiment of the present invention.

具体实施方式Detailed ways

下面将详细描述本发明的各个方面的特征和示例性实施例,为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细描述。应理解,此处所描述的具体实施例仅用于解释本发明,并不用于限定本发明。对于本领域技术人员来说,本发明可以在不需要这些具体细节中的一些细节的情况下实施。下面对实施例的描述仅仅是为了通过示出本发明的示例来提供对本发明更好的理解。The features and exemplary embodiments of various aspects of the present invention will be described in detail below. In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. It will be apparent to those skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is only intended to provide a better understanding of the present invention by illustrating examples of the invention.

需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element defined by the phrase "comprises" does not preclude the presence of additional identical elements in a process, method, article, or device that includes the element.

在本实施例中提供了一种电子病历结构化方法,图1是根据本发明实施例的电子病历结构化方法的流程图,如图1所示,该流程包括如下步骤:A method for structuring an electronic medical record is provided in this embodiment. FIG. 1 is a flowchart of a method for structuring an electronic medical record according to an embodiment of the present invention. As shown in FIG. 1 , the process includes the following steps:

步骤S101,载入第一医学知识库;Step S101, loading the first medical knowledge base;

步骤S102,对第一电子病历按照特殊符号进行分句,得到多个文本句子;Step S102, segmenting the first electronic medical record according to special symbols to obtain a plurality of text sentences;

步骤S103,利用匹配打分算法,对多个文本句子中每个文本句子匹配第一医学知识库中的属性;Step S103, using the matching scoring algorithm to match the attributes in the first medical knowledge base to each text sentence in the plurality of text sentences;

步骤S104,保存匹配结果。Step S104, save the matching result.

通过上述的步骤,利用匹配打分算法能够很好地将文本句子与第一医学知识库中的属性进行匹配,匹配的关键词可以不仅包括以二值逻辑标定的疾病信息,还能够匹配数值、疾病程度等类型的信息,从而解决了相关技术中电子病历不能完全结构化的问题,实现了电子病历的完全结构化。Through the above steps, the text sentence can be well matched with the attributes in the first medical knowledge base by using the matching scoring algorithm, and the matched keywords can not only include disease information calibrated by binary logic, but also can match numerical values, diseases It can solve the problem that the electronic medical record cannot be completely structured in the related technology, and realize the complete structuring of the electronic medical record.

可选地,第一医学知识库包括多个部分,每个部分包括一个或者多个属性、与属性对应的一个或者多个关键词,每个属性至少包括:属性名称、属性值和位置,每个关键词还包括该关键词的分数。例如,在第一医学知识库中,其基本单元为一个属性,由属性名称、属性值和位置三部分组成,属性名称可为某种疾病的症状、身体特征或治疗手段等;其相应的属性值可为症状的有无及轻重程度、身体特征的具体表现或治疗手段的具体方法等;位置可为具有对应属性的身体部位。一批属性共同属于某个部分(section)(如检查、治疗计划等),各个部分构成整个知识库。Optionally, the first medical knowledge base includes multiple parts, each part includes one or more attributes, one or more keywords corresponding to the attributes, and each attribute at least includes: attribute name, attribute value and location, each A keyword also includes the keyword's score. For example, in the first medical knowledge base, its basic unit is an attribute, which consists of three parts: attribute name, attribute value and location. The attribute name can be the symptoms, physical characteristics or treatment methods of a certain disease; its corresponding attribute The value can be the presence or absence of symptoms, the severity of the symptoms, the specific manifestation of physical characteristics or the specific method of treatment, etc.; the location can be a body part with corresponding attributes. A set of attributes collectively belongs to a section (such as examination, treatment plan, etc.), and each section constitutes the entire knowledge base.

由于医学诊断以及治疗措施本身的复杂性,为了能够详尽地对医学知识进行描述以及在结构化过程中尽可能地保留原始病历的信息,本实施例中对第一医学知识库可以进行以下几个方面的改进:a)拓展属性值取值类型;b)对每个属性增加“位置”以描述对应属性的身体部位;c)增加对时间序列信息的描述;d)对属性基于医学知识进行分类,形成对医学知识的层次化表达。Due to the complexity of medical diagnosis and treatment measures themselves, in order to describe the medical knowledge in detail and preserve the information of the original medical record as much as possible in the structuring process, in this embodiment, the first medical knowledge base can perform the following steps: Improvements in aspects: a) expand the value type of attribute values; b) add "location" to each attribute to describe the body part of the corresponding attribute; c) add description of time series information; d) classify attributes based on medical knowledge , forming a hierarchical expression of medical knowledge.

具体说明如下:The specific instructions are as follows:

a)第一医学知识库属性值类型有实数类型、布尔类型、离散分类类型等,而在属性值的取值方式上包括判断、单选、数字、多选以及这几种方式的各种组合。这种多样化的表达形式能够实现医学中出现的各种属性的取值表达。a) The attribute value types of the first medical knowledge base include real number type, Boolean type, discrete classification type, etc., and the value method of attribute value includes judgment, single selection, number, multiple selection and various combinations of these methods . This diverse expression form can realize the value expression of various attributes that appear in medicine.

b)由于第一医学知识库中的大部分属性都涉及某一具体身体部位,例如疾病信息的发生部位、医疗措施的实施部位等,在本实施例中对属性增加对应的身体部位描述。而同时,增加“位置”描述之后需要再结构化方法中增加对“位置”信息的提取,这在本实施例中会进一步进行说明。b) Since most of the attributes in the first medical knowledge base relate to a specific body part, such as the occurrence part of disease information, the implementation part of medical measures, etc., in this embodiment, a corresponding body part description is added to the attributes. At the same time, after adding the "location" description, it is necessary to add the extraction of the "location" information in the structuring method, which will be further described in this embodiment.

c)由于医疗行为本身是一个过程化的行为,而不是各种医疗措施的简单静态组合,尤其是针对患者病状指定的治疗计划和处置措施,不同医疗措施之间有先后关系。为了保留不同医疗措施之间的先后依赖关系,对第一医学知识库增加了时间序列信息的描述。例如,可以通过对需要表达时间序列的属性增加step和substep两个成员用于描述该属性在治疗过程中出现的次序,实现对属性的序列化表达。c) Since the medical behavior itself is a procedural behavior, rather than a simple static combination of various medical measures, especially the treatment plan and disposal measures specified for the patient's symptoms, there is a sequential relationship between different medical measures. In order to preserve the sequential dependencies between different medical measures, a description of time series information is added to the first medical knowledge base. For example, the serialized expression of attributes can be realized by adding two members, step and substep, to attributes that need to express time series to describe the order in which the attributes appear in the treatment process.

d)基于医学上的考虑,本实施例中涉及的第一医学知识库分为主诉、复诊、现病史、既往史、检查、诊断、治疗计划、处置八个部分,每个部分针对具体需要描述的医学领域进行属性的设计和分级。例如在口腔修复领域,检查部分包括对牙齿和口腔两部分的检查结果,口腔部分的检查结果按照是否与牙位相关分为两个子部分,上述每个部分包括若干属性对各种检查中出现的疾病信息进行详尽地描述。d) Based on medical considerations, the first medical knowledge base involved in this embodiment is divided into eight parts: chief complaint, follow-up, history of present illness, past history, examination, diagnosis, treatment plan, and disposal, each part describing specific needs Attributes are designed and graded in the medical field. For example, in the field of prosthodontics, the inspection part includes the inspection results of the teeth and the oral cavity. The inspection results of the oral part are divided into two sub-parts according to whether they are related to the tooth position. Each of the above parts includes several attributes. Disease information is described in detail.

上述的第一医学知识库能够比较合适地实现对原始病历文本结构化表达。The above-mentioned first medical knowledge base can appropriately implement the structured expression of the original medical record text.

可选地,特殊符号包括以下至少之一:中英文逗号、句号、换行符、制表符。Optionally, the special symbols include at least one of the following: Chinese and English commas, periods, line breaks, and tabs.

可选地,在载入第一医学知识库之前,方法还包括:载入第二医学知识库;根据第二医学知识库和第二电子病历提取关键词及其分数;根据第二医学知识库和提取到的关键词及其分数,构建第一医学知识库。在每个实施例中,第一医学知识库的结构需要有相应的规范,在本实施例中提供了第二医学知识库,这个第二医学知识库相当于第一医学知识库的规范模版;与第一医学知识库类似,第二医学知识库也包括多个部分,每个部分包括一个或者多个属性;每个属性至少包括:属性名称、属性值和位置。与第一医学知识库不同的是,第二医学知识库中没有与属性对应的一个或者多个关键词,以及关键词的分数信息。这些关键词及其分数信息是从第二电子病历中提取出来的。第一医学知识库是在第二医学知识库中针对各个属性增加一个或者多个关键词及其分数后构建而成的。Optionally, before loading the first medical knowledge base, the method further includes: loading a second medical knowledge base; extracting keywords and their scores according to the second medical knowledge base and the second electronic medical record; according to the second medical knowledge base And the extracted keywords and their scores to build the first medical knowledge base. In each embodiment, the structure of the first medical knowledge base needs to have corresponding specifications, and in this embodiment, a second medical knowledge base is provided, and this second medical knowledge base is equivalent to a standard template of the first medical knowledge base; Similar to the first medical knowledge base, the second medical knowledge base also includes a plurality of parts, each part includes one or more attributes; each attribute at least includes: attribute name, attribute value and location. Different from the first medical knowledge base, the second medical knowledge base does not have one or more keywords corresponding to the attributes and score information of the keywords. These keywords and their score information are extracted from the second electronic medical record. The first medical knowledge base is constructed by adding one or more keywords and their scores for each attribute in the second medical knowledge base.

可选地,根据第二医学知识库和第二电子病历提取关键词名称及关键词分数包括:对第二电子病历中的文本句子按照属性名称和属性值进行分词,得到多个关键词,并将该关键词的近义词、同义词也一并作为关键词;根据关键词的重要性(是否为常用词)、否定性(是否为否定词)以及逻辑关系(与、或、非)的权重,给予其不同的分数。Optionally, extracting keyword names and keyword scores according to the second medical knowledge base and the second electronic medical record includes: performing word segmentation on text sentences in the second electronic medical record according to attribute names and attribute values to obtain a plurality of keywords, and The synonyms and synonyms of the keyword are also used as keywords; according to the importance of the keyword (whether it is a common word), negativity (whether it is a negative word) and the weight of the logical relationship (and, or, not), give its different scores.

可选地,利用匹配打分算法,对多个文本句子中每个文本句子匹配第一医学知识库中的属性包括:将每个文本句子对所有属性的关键词及其分数进行匹配,得到每个文本句子对应于所有属性的总分数;将属性的总分数高于预设阈值的文本句子对该属性中的关键词及其分数进行匹配,得到这个文本句子中属性值、位置对应于该属性的属性值分数和位置分数;将属性值分数和位置分数最高的属性值、位置及对应的属性,作为这个文本句子的匹配结果。通过上述的匹配打分算法,实现了文本句子与属性的匹配。Optionally, using a matching scoring algorithm, matching the attributes in the first medical knowledge base to each text sentence in the multiple text sentences includes: matching each text sentence to the keywords of all attributes and their scores to obtain each text sentence. The text sentence corresponds to the total score of all attributes; the text sentence whose total score of the attribute is higher than the preset threshold is matched with the keywords in the attribute and their scores, and the attribute value and position in the text sentence corresponding to the attribute are obtained. Attribute value score and position score; the attribute value, position and corresponding attribute with the highest attribute value score and position score are used as the matching result of this text sentence. Through the above matching scoring algorithm, the matching between text sentences and attributes is realized.

可选地,匹配结果包括:文本句子,以及该文本句子对应的属性、属性值、位置、所属部分、文本句子在第一电子病历中的位置。在保存匹配结果时,可以将每个文本句子的匹配结果保存为一行数据,并按照时间序列以及文本句子所属部分,将所有文本句子的匹配结果依次排列,保存为.csv格式,以便后续数据的查询与处理。Optionally, the matching result includes: a text sentence, and the attribute, attribute value, position, part to which the text sentence corresponds, and the position of the text sentence in the first electronic medical record. When saving the matching results, you can save the matching results of each text sentence as a row of data, and arrange the matching results of all text sentences in sequence according to the time series and the part of the text sentence, and save them in . query and processing.

可选地,方法还包括:提取并保存未被任何属性正确匹配到的文本句子(包括匹配到了属性,但是没有匹配到属性值的文本句子)。通过上述方式,可以掌握文本句子的匹配程度。其中,对于每个部分未匹配到属性的文本句子可以保存为:文本句子,文本起始位置,文本结束位置,病历文件夹编号,病历编号;对于每个部分未匹配到属性值的文本句子可以保存为:文本句子,匹配到的属性,文本起始位置,文本结束位置,病历文件夹编号,病历编号。保存的格式优选为.xls格式。Optionally, the method further includes: extracting and saving text sentences that are not correctly matched by any attribute (including text sentences that match attributes but do not match attribute values). In the above manner, the matching degree of text sentences can be grasped. Among them, for each part of the text sentence that does not match the attribute value can be saved as: text sentence, text start position, text end position, medical record folder number, medical record number; for each part of the text sentence that does not match the attribute value can be Save as: text sentence, matched attributes, text start position, text end position, medical record folder number, medical record number. The saved format is preferably .xls format.

在提取出未被任何属性正确匹配到的文本句子之后,还可以对这些文本句子进行分词、排序、人工筛查等处理,以发现第二医学知识库中关键词或者属性分类的不足,并通过对关键词进行添加/删除/调整分数等操作,实现对第二医学知识库的迭代优化,从而进一步提高第二医学知识库对电子病历的文本句子的匹配率和准确率。After extracting text sentences that are not correctly matched by any attribute, these text sentences can also be processed by word segmentation, sorting, manual screening, etc. to find the lack of keyword or attribute classification in the second medical knowledge base, and pass The operations such as adding/deleting/adjusting the score of keywords are performed to realize the iterative optimization of the second medical knowledge base, thereby further improving the matching rate and accuracy of the second medical knowledge base to the text sentences of the electronic medical record.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention essentially or the parts that contribute to the prior art can be embodied in the form of software products, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in the various embodiments of the present invention.

结合图1描述的本发明实施例的电子病历结构化方法可以由电子病历结构化设备来实现。图2示出了本发明实施例提供的电子病历结构化设备的硬件结构示意图。The electronic medical record structuring method of the embodiment of the present invention described in conjunction with FIG. 1 may be implemented by an electronic medical record structuring device. FIG. 2 shows a schematic diagram of a hardware structure of an electronic medical record structuring device provided by an embodiment of the present invention.

电子病历结构化设备可以包括处理器21以及存储有计算机程序指令的存储器22。The electronic medical record structuring device may include a processor 21 and a memory 22 storing computer program instructions.

具体地,上述处理器21可以包括中央处理器(CPU),或者特定集成电路(Application Specific Integrated Circuit,ASIC),或者可以被配置成实施本发明实施例的一个或多个集成电路。Specifically, the above-mentioned processor 21 may include a central processing unit (CPU), or a specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured as one or more integrated circuits implementing the embodiments of the present invention.

存储器22可以包括用于数据或指令的大容量存储器。举例来说而非限制,存储器22可包括硬盘驱动器(Hard Disk Drive,HDD)、软盘驱动器、闪存、光盘、磁光盘、磁带或通用串行总线(Universal Serial Bus,USB)驱动器或者两个或更多个以上这些的组合。在合适的情况下,存储器22可包括可移除或不可移除(或固定)的介质。在合适的情况下,存储器22可在数据处理装置的内部或外部。在特定实施例中,存储器22是非易失性固态存储器。在特定实施例中,存储器22包括只读存储器(ROM)。在合适的情况下,该ROM可以是掩模编程的ROM、可编程ROM(PROM)、可擦除PROM(EPROM)、电可擦除PROM(EEPROM)、电可改写ROM(EAROM)或闪存或者两个或更多个以上这些的组合。Memory 22 may include mass storage for data or instructions. By way of example and not limitation, memory 22 may include a Hard Disk Drive (HDD), a floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape or Universal Serial Bus (USB) drive or two or more A combination of more than one of the above. Memory 22 may include removable or non-removable (or fixed) media, as appropriate. Where appropriate, memory 22 may be internal or external to the data processing device. In certain embodiments, memory 22 is non-volatile solid state memory. In particular embodiments, memory 22 includes read only memory (ROM). Where appropriate, the ROM may be a mask programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), electrically rewritable ROM (EAROM) or flash memory or A combination of two or more of the above.

处理器21通过读取并执行存储器22中存储的计算机程序指令,以实现上述实施例中的任意一种电子病历结构化方法。The processor 21 reads and executes the computer program instructions stored in the memory 22 to implement any one of the electronic medical record structuring methods in the foregoing embodiments.

在一个示例中,电子病历结构化设备还可包括通信接口23和总线20。其中,如图2所示,处理器21、存储器22、通信接口23通过总线20连接并完成相互间的通信。In one example, the electronic medical record structuring device may also include a communication interface 23 and a bus 20 . Among them, as shown in FIG. 2 , the processor 21 , the memory 22 , and the communication interface 23 are connected through the bus 20 to complete the mutual communication.

通信接口23,主要用于实现本发明实施例中各模块、装置、单元和/或设备之间的通信。The communication interface 23 is mainly used to implement communication between modules, apparatuses, units and/or devices in the embodiments of the present invention.

总线20包括硬件、软件或两者,将电子病历结构化设备的部件彼此耦接在一起。举例来说而非限制,总线可包括加速图形端口(AGP)或其他图形总线、增强工业标准架构(EISA)总线、前端总线(FSB)、超传输(HT)互连、工业标准架构(ISA)总线、无限带宽互连、低引脚数(LPC)总线、存储器总线、微信道架构(MCA)总线、外围组件互连(PCI)总线、PCI-Express(PCI-X)总线、串行高级技术附件(SATA)总线、视频电子标准协会局部(VLB)总线或其他合适的总线或者两个或更多个以上这些的组合。在合适的情况下,总线20可包括一个或多个总线。尽管本发明实施例描述和示出了特定的总线,但本发明考虑任何合适的总线或互连。The bus 20 includes hardware, software, or both, coupling the components of the electronic medical record structuring device to each other. By way of example and not limitation, the bus may include an Accelerated Graphics Port (AGP) or other graphics bus, Enhanced Industry Standard Architecture (EISA) bus, Front Side Bus (FSB), HyperTransport (HT) Interconnect, Industry Standard Architecture (ISA) Bus, Infiniband Interconnect, Low Pin Count (LPC) Bus, Memory Bus, Microchannel Architecture (MCA) Bus, Peripheral Component Interconnect (PCI) Bus, PCI-Express (PCI-X) Bus, Serial Advanced Technology Attachment (SATA) bus, Video Electronics Standards Association Local (VLB) bus or other suitable bus or a combination of two or more of these. Where appropriate, bus 20 may include one or more buses. Although embodiments of the present invention describe and illustrate a particular bus, the present invention contemplates any suitable bus or interconnect.

该电子病历结构化设备可以基于获取到的数据,执行本发明实施例中的电子病历结构化方法,从而实现结合图1描述的电子病历结构化方法。The electronic medical record structuring device may execute the electronic medical record structuring method in the embodiment of the present invention based on the acquired data, thereby realizing the electronic medical record structuring method described in conjunction with FIG. 1 .

另外,结合上述实施例中的电子病历结构化方法,本发明实施例可提供一种计算机可读存储介质来实现。该计算机可读存储介质上存储有计算机程序指令;该计算机程序指令被处理器执行时实现上述实施例中的任意一种电子病历结构化方法。In addition, in combination with the electronic medical record structuring method in the foregoing embodiments, the embodiments of the present invention may provide a computer-readable storage medium for implementation. Computer program instructions are stored on the computer-readable storage medium; when the computer program instructions are executed by the processor, any one of the electronic medical record structuring methods in the foregoing embodiments is implemented.

为了使本发明实施例的描述更加清楚,下面结合优选实施例进行描述和说明。In order to make the description of the embodiments of the present invention clearer, the following descriptions and explanations are made with reference to the preferred embodiments.

本优选实施例提供了一种电子病历结构化方法,图3是根据本发明优选实施例的电子病历结构化方法的流程图,如图3所示,该流程图包括如下步骤:This preferred embodiment provides a method for structuring an electronic medical record. FIG. 3 is a flowchart of a method for structuring an electronic medical record according to a preferred embodiment of the present invention. As shown in FIG. 3 , the flowchart includes the following steps:

步骤1:构建第一医学知识库。Step 1: Build the first medical knowledge base.

在本优选实施例中,基于第二医学知识库构建第一医学知识库,包括如下步骤:In this preferred embodiment, building the first medical knowledge base based on the second medical knowledge base includes the following steps:

1、在表1定义了第二医学知识库格式,表2给出了对表1中“要求”的详细说明,图4给出了口腔修复领域的第一医学知识库结构示例的示意图。1. The format of the second medical knowledge base is defined in Table 1, the detailed description of the "requirements" in Table 1 is given in Table 2, and Figure 4 is a schematic diagram of an example of the structure of the first medical knowledge base in the field of prosthodontics.

表1 一种第二医学知识库格式Table 1 A second medical knowledge base format

Figure DEST_PATH_IMAGE002A
Figure DEST_PATH_IMAGE002A

表2 表1中的“要求”的详细说明Table 2 Details of "Requirements" in Table 1

要求Require 说明illustrate 单选radio 默认“未知”,属性名得分 >=1 时,选择得分最高的选项The default is "unknown", when the attribute name score >= 1, select the option with the highest score 单选 *single choice * 选择得分最高的选项Choose the option with the highest score 多选Multiple choice 选择所有得分 >=1 的选项Select all options with a score >=1 判断judge 默认“无”,属性名得分 >=1 时,且没有出现否定词时,选择“是”The default is "None", when the attribute name score >=1, and there is no negative word, select "Yes" 数字number 选择得分最高的选项(单位),找到句子中单位前的数词Choose the option (unit) with the highest score and find the number before the unit in the sentence 时间time 选择得分最高的选项(单位),找到句子中单位前的时间词Choose the option (unit) with the highest score and find the time word before the unit in the sentence 单选/数字radio/numeric 属性名得分 >=1 时,选择得分最高的选项,或找出数字When attribute name score >=1, choose the option with the highest score, or find the number

2、对图5中的所有短语进行人工分词,并通过抽样筛查病历,得到常出现的关键词(包含同义词、近义词、简写缩写、错别字等等);2. Perform manual word segmentation on all the phrases in Figure 5, and screen the medical records by sampling to obtain frequently occurring keywords (including synonyms, synonyms, abbreviations, typos, etc.);

3、将关键词添加到第二医学知识库每个匹配对象后面,构成第一医学知识库。并按照关键词对应的不同重要性和词性赋予不同的分数(例如专业术语为正分,否定词为负分,常用词为0分)。同时还通过分数来实现“与或非”关系:例如,由于规定得分大于等于1为匹配成功,因此如果要求两个关键词同时出现,可以设置两词的分数分别为0.5。如表3所示。3. Add keywords to the back of each matching object in the second medical knowledge base to form the first medical knowledge base. And give different scores according to the different importance and part of speech corresponding to the keywords (for example, professional terms are positive, negative words are negative, and common words are 0). At the same time, the "and or not" relationship is also realized through scores: for example, since the score is greater than or equal to 1, the matching is successful, so if two keywords are required to appear at the same time, the scores of the two words can be set to 0.5 respectively. as shown in Table 3.

表3 一种第一医学知识库格式Table 3 A format of the first medical knowledge base

Figure DEST_PATH_IMAGE004A
Figure DEST_PATH_IMAGE004A

步骤2:对第一电子病历进行分句。Step 2: Clause the first electronic medical record.

大部分情况下,电子病历中一个短句(以逗号划分)对应于一组“属性-属性值”。因此,按照标点符号对电子病历进行划分。In most cases, a short sentence (delimited by commas) in the electronic medical record corresponds to a set of "attribute-attribute-value". Therefore, the electronic medical records are divided according to punctuation.

1、将整个病历文本按照中英文逗号、句号、换行符、制表符进行分句。1. The entire medical record text is divided into sentences according to Chinese and English commas, periods, line breaks and tabs.

2、处理划分的特殊情况(如小数点、序号编号等等)2. Handle special cases of division (such as decimal point, serial number, etc.)

步骤3:定义结构化格式。Step 3: Define the structured format.

1、结构化的目标基本格式为:文本句子、属性、属性值、位置、所属部分、文本在电子病历中对应位置。对于需要增加时间序列的属性,其目标格式为:文本句子,属性,属性值,位置,step,substep,所属部分,文本在电子病历中对应位置。1. The basic format of the structured target is: text sentence, attribute, attribute value, position, part, and the corresponding position of the text in the electronic medical record. For the attributes that need to be added to the time series, the target format is: text sentence, attribute, attribute value, position, step, substep, part, and the corresponding position of the text in the electronic medical record.

2、以此作为一行内容,将整个病历文件按句排列,保存成.csv格式。2. Take this as a line, arrange the entire medical record file by sentence, and save it in .csv format.

步骤4:将文本句子与第一医学知识库进行匹配。Step 4: Match text sentences with the first medical knowledge base.

1、对每个文本句子,遍历所有属性。对每个属性,设置属性名称的匹配得分和属性值各选项的匹配得分初始值为0。1. For each text sentence, iterate over all attributes. For each attribute, set the matching score of the attribute name and the matching score of each option of the attribute value to an initial value of 0.

2、对属性的属性名、属性值、位置进行匹配。具体匹配过程如下所述:2. Match the attribute name, attribute value, and location of the attribute. The specific matching process is as follows:

a)属性名匹配a) attribute name matching

根据属性名对应的关键词组与文本句子进行匹配,若匹配成功则累积分数(正性关键词加,负性关键词减),得到属性名所有关键词匹配的总得分。若得分超过一定阈值,则认为该文本句子与此属性的属性名称匹配成功,并进行属性值匹配。According to the keyword group corresponding to the attribute name and the text sentence, if the matching is successful, the score will be accumulated (positive keywords plus, negative keywords minus), and the total score of all keywords matching the attribute name will be obtained. If the score exceeds a certain threshold, it is considered that the text sentence matches the attribute name of this attribute successfully, and the attribute value is matched.

b)选项型属性值匹配b) Option attribute value matching

对属性值的每个选项,将相应的关键词组与文本句子进行匹配,若成功则累积分数,得到该选项所有关键词匹配的总得分。对于单选型属性,取累积分数最高的选项作为该属性的属性值;对于多选型属性,取累积分数超过一定阈值的所有选项作为该属性的属性值;对于判断型属性,若选项累积分数超过一定阈值则认为该属性值匹配成功。For each option of the attribute value, match the corresponding keyword group with the text sentence, and if successful, accumulate scores to obtain the total score of all keyword matches of the option. For single-choice attributes, the option with the highest cumulative score is taken as the attribute value of the attribute; for multiple-choice attributes, all options with cumulative scores exceeding a certain threshold are taken as the attribute value of the attribute; for judgmental attributes, if the cumulative score of the option is If it exceeds a certain threshold, the attribute value is considered to be successfully matched.

c)数值型属性值匹配c) Numeric attribute value matching

对文本句子中的每个字符进行循环判断,找出其中的表达数值的连续字符串,并转换为数值类型作为该属性的属性值。Loop judgment on each character in the text sentence, find out the continuous string that expresses the numerical value, and convert it into a numerical type as the attribute value of the attribute.

d)位置匹配d) Position matching

若属性是与牙位相关的,则利用正则表达式匹配文本句子中的牙位(连续的三个‘/’作为特征)作为该属性位置的值。若位置有多个选项,则采取选项型属性值匹配同样的方法对位置的每个选项进行匹配,根据位置取值的不同要求选择其中累积得分满足要求的选项作为位置取值。If the attribute is related to the tooth position, use the regular expression to match the tooth position in the text sentence (three consecutive '/' as the feature) as the value of the attribute position. If there are multiple options in the position, the same method of option attribute value matching is adopted to match each option of the position, and the option whose cumulative score meets the requirements is selected as the position value according to the different requirements of the position value.

3、根据表3所示的不同要求对应的得分匹配标准,确定该“属性-属性值”是否满足要求。若满足,则将此文本句子与对应“属性-属性值”对按照步骤3中的格式保存;若不满足,则进入下一个属性进行匹配。3. According to the score matching criteria corresponding to different requirements shown in Table 3, determine whether the "attribute-attribute value" meets the requirements. If it is satisfied, save the text sentence and the corresponding "attribute-attribute value" pair according to the format in step 3; if not, enter the next attribute for matching.

4、对于可能出现多个属性均匹配成功的情况,将每一条匹配成功的结果都保存。4. For a situation where multiple attributes may be successfully matched, save each successful matching result.

5、提取文本信息中的时间序列信息。5. Extract time series information in text information.

由于治疗计划部分中的不同操作有顺序之分,需要在结构化结果中体现出来。对于治疗计划部分中的每个文本句子,寻找文本句子开头表示步骤的序号作为该句子对应属性的操作顺序。Since the different operations in the treatment planning section are sequenced, they need to be reflected in the structured results. For each text sentence in the treatment plan section, find the sequence number of the step at the beginning of the text sentence as the operation order of the corresponding attribute of the sentence.

由于每一步骤之中还会出现多个方案可选的情况,也同样需要在结构化结果中体现。对这样的每个文本句子,判断文本句子中是否有表示“或”关系的词,若有则将其分开,分别进行属性匹配。Since there will be multiple options in each step, they also need to be reflected in the structured results. For each such text sentence, it is judged whether there is a word representing an "or" relationship in the text sentence, and if so, it is separated, and attribute matching is performed respectively.

6、基于匹配打分算法的匹配,能够对文本句子中的信息进行较为充分的提取。在大部分情况下,一个文本句子对应一个属性;对于一个文本句子对应多个属性的情况,根据算法逻辑这些属性也都能匹配上。由于本发明涉及的医学知识库包含了对布尔类型、实数类型、分类类型等多种类型的取值描述以及在关键词组中增加了语义正反的词语,因而该匹配算法不仅能够正确识别疾病信息的语义正反,同时还能对疾病信息的具体数值信息进行提取(说明疾病的严重程度、测量值等),这是目前其他结构化方法无法实现的。6. The matching based on the matching scoring algorithm can fully extract the information in the text sentence. In most cases, one text sentence corresponds to one attribute; for the case that one text sentence corresponds to multiple attributes, these attributes can also be matched according to the algorithm logic. Since the medical knowledge base involved in the present invention includes value descriptions of various types such as Boolean type, real number type, classification type, etc., and words with positive and negative semantics are added to the keyword group, the matching algorithm can not only correctly identify disease information At the same time, it can also extract the specific numerical information of the disease information (indicating the severity of the disease, measurement values, etc.), which cannot be achieved by other structural methods at present.

使用关键词组对文本句子进行匹配,能够识别出文本句子中多种类型的信息,包括语义正反、属性值不同选项、数值等,这极大拓展了此方法的适用性。Using keyword groups to match text sentences can identify various types of information in text sentences, including positive and negative semantics, different options for attribute values, numerical values, etc., which greatly expands the applicability of this method.

步骤5:将匹配过程中未完全匹配的文本句子进行保存。Step 5: Save the text sentences that are not completely matched during the matching process.

1、未完全匹配文件格式为:文本句子、匹配到的属性*、文本起始位置、文本结束位置、病历文件夹编号、病历文件编号。1. The format of the incompletely matched file is: text sentence, matched attribute*, text start position, text end position, medical record folder number, medical record file number.

2、以此作为一行内容,将所有病历文件中未匹配成功的句子按句排列,保存为.xls格式。2. Use this as a line of content, arrange all the unmatched sentences in the medical record files by sentence, and save them in .xls format.

3、对每个病历文件中的每个文本句子,检查其匹配情况。若该文本句子未满足匹配成功的条件,则将其保存到对应部分的.xls表格中。3. For each text sentence in each medical record file, check its match. If the text sentence does not meet the conditions for successful matching, it will be saved in the .xls table of the corresponding part.

结构化结果分析Structured Results Analysis

本实施例对以上电子病历结构化方法利用python语言开发出了一套用于病历文本结构化的工具,并对三千余份电子病历文本进行了结构化工作。以下将给出对此结果的展示和分析统计。In this embodiment, a set of tools for structuring medical record texts are developed by using the python language for the above electronic medical record structuring method, and more than 3,000 electronic medical record texts are structured. The display and analysis statistics of this result will be given below.

本实施例处理的病历文本来自于口腔修复科牙列缺损的相关病历,所用医学知识库基于口腔修复领域相关知识整理得到,部分知识库如图4所示。病历文本示例如图5所示,结构化结果如图6所示。The medical record text processed in this embodiment comes from the related medical records of dentition defects in the prosthodontics department, and the medical knowledge base used is sorted out based on the relevant knowledge in the field of prosthodontics. Part of the knowledge base is shown in FIG. 4 . An example of medical record text is shown in Figure 5, and the structured result is shown in Figure 6.

从结构化结果来看,此方法实现了以下有益效果:From the structured results, this method achieves the following beneficial effects:

1、能够准确识别出病历文本中出现的位置信息,包括以“上颌”、“下颌”这类以文本出现位置和牙位信息。1. Be able to accurately identify the position information that appears in the medical record text, including the position and tooth position information in text such as "upper jaw" and "lower jaw".

2、能够有效标注出病历文本中的属性及对应的属性值,其中对不同类型的属性值都能实现有效的识别。2. Attributes and corresponding attribute values in the medical record text can be effectively marked, and different types of attribute values can be effectively identified.

3、对文本中不同治疗措施之间的先后顺序能够有效提取。3. The sequence between different treatment measures in the text can be effectively extracted.

与已有的几种病历结构化方法对比,本发明实施例提供的涉及的方法构建了更全面的第一医学知识库,能够更加贴合病历文本,同时也能够更加完整地提取出病历文本中的信息。而现有的方法,如基于语义正反的结构化方法,往往只能根据文本对知识库中描述的医学专业词给出肯定\否定的判断,而不能赋予该属性更加全面的信息(诸如发病位置、程度等)。Compared with several existing medical record structuring methods, the method involved in the embodiment of the present invention builds a more comprehensive first medical knowledge base, which can be more suitable for the medical record text, and can also more completely extract the medical record text. Information. However, the existing methods, such as the structured methods based on semantic pros and cons, can only give positive/negative judgments to the medical professional words described in the knowledge base according to the text, but cannot give more comprehensive information (such as the onset of the disease) to this attribute. location, extent, etc.).

本实施例使用的第一医学知识库包含12个部分,共计389个属性。属性取值有多选、单选判断、数值等类型,属性位置取值有单选、取牙位等类型。图7展示了此实例结构化结果中部分属性的频率统计,关于属性不同取值差异的统计没有反映在数据中。从图7中可以看到,在这三千余份病历中,不同属性出现的频数有很大的差距,这反映除了病历中的一些常见病症,也为我们认识病症提供了一种统计上的方法。The first medical knowledge base used in this embodiment includes 12 parts and a total of 389 attributes. The attribute value can be of multiple choice, single choice judgment, numerical value and other types, and the attribute position value can be of single choice, tooth position and other types. Figure 7 shows the frequency statistics of some attributes in the structured result of this example, and the statistics about the difference of different values of attributes are not reflected in the data. As can be seen from Figure 7, among the more than 3,000 medical records, there is a large gap in the frequency of occurrence of different attributes, which reflects that in addition to some common diseases in the medical records, it also provides a statistical method for us to understand diseases. method.

通过随机抽取一定数量的病历,对照第一医学知识库人工标注,以此作为标准衡量此方法给出的结构化结果的效果,表明本发明实施例提供的电子病历结构化方法能够完成第一医学知识库中所要求的的结构化任务。By randomly extracting a certain number of medical records, manually labeling them against the first medical knowledge base, and using this as a standard to measure the effect of the structured results given by this method, it shows that the electronic medical record structuring method provided by the embodiment of the present invention can complete the first medical Structured tasks required in the knowledge base.

以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims (8)

1. An electronic medical record structuring method is characterized by comprising the following steps:
loading a first medical knowledge base, wherein the first medical knowledge base comprises a plurality of parts, and each part comprises one or more attributes and one or more keywords corresponding to the attributes; each attribute includes at least: attribute name, attribute value, position and value type/mode description, each keyword further includes: a score for the keyword;
the first electronic medical record is divided into sentences according to special symbols to obtain a plurality of text sentences;
matching, using a matching scoring algorithm, attributes in the first medical knowledge base for each of the plurality of text sentences, comprising:
matching the attribute name keywords and the scores of the attribute names of each text sentence to obtain the total score of the attribute names of each text sentence corresponding to each attribute;
matching the text sentences with the attribute name total scores of the attributes higher than the preset threshold value with the keywords and the scores of the keywords and the positions in the attributes to obtain the attribute value scores and the position scores of the attributes corresponding to the attributes in the text sentences;
taking the attribute value, the position and the corresponding attribute corresponding to the highest attribute value score and the highest position score as the matching result of the text sentence;
and storing the matching result.
2. The method of claim 1, wherein the type of attribute value comprises at least one of: real number type, boolean type, discrete classification type; the attribute value is selected in at least one of the following manners: judgment, single selection, digit, multiple selection.
3. The method of claim 1, wherein the special symbol comprises at least one of: chinese and English commas, periods, line feed symbols and tab symbols.
4. The method of claim 1, wherein prior to loading the first medical knowledge base, the method further comprises:
loading a second medical knowledge base, wherein the second medical knowledge base comprises a plurality of parts, and each part comprises one or more attributes; each attribute includes at least: attribute name, attribute value and location;
extracting keywords and scores thereof according to the second medical knowledge base and the second electronic medical record;
and constructing the first medical knowledge base according to the second medical knowledge base, the extracted keywords and the scores of the keywords.
5. The method of claim 4, wherein extracting keyword names and keyword scores from the second medical knowledge base and the second electronic medical record comprises:
performing word segmentation on the second electronic medical record according to the attribute names and the attribute values to obtain a plurality of keywords, and taking the similar words and the synonyms of the keywords as the keywords;
different scores are given to the keywords according to the importance, the negativity and the weight of the logical relationship.
6. The method of claim 1, wherein the matching result comprises: the text sentence, and the corresponding attribute, attribute value, position, belonging part and position of the text sentence in the first electronic medical record.
7. The method according to claim 4 or 5, characterized in that the method further comprises:
extracting and storing text sentences which are not correctly matched by any attribute;
performing word segmentation, sequencing and manual screening on the extracted text sentences, and comparing to find the defects of keyword or attribute classification in the second medical knowledge base;
and performing addition/deletion/score adjustment operation on the keywords to realize iterative optimization of the second medical knowledge base.
8. A computer-readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1-7.
CN201811513668.0A 2018-12-11 2018-12-11 Electronic medical record structuring method and computer-readable storage medium Active CN109637605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811513668.0A CN109637605B (en) 2018-12-11 2018-12-11 Electronic medical record structuring method and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811513668.0A CN109637605B (en) 2018-12-11 2018-12-11 Electronic medical record structuring method and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN109637605A CN109637605A (en) 2019-04-16
CN109637605B true CN109637605B (en) 2022-05-10

Family

ID=66072953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811513668.0A Active CN109637605B (en) 2018-12-11 2018-12-11 Electronic medical record structuring method and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN109637605B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110277149A (en) * 2019-06-28 2019-09-24 北京百度网讯科技有限公司 Processing method, device and the equipment of electronic health record
CN110704632A (en) * 2019-08-26 2020-01-17 南京医渡云医学技术有限公司 Method and device for processing clinical data, readable medium and electronic equipment
TWI750513B (en) * 2019-10-05 2021-12-21 業務人資訊有限公司 Insurance claim and underwriting assistance system and implementation method thereof
CN111192646A (en) * 2019-12-30 2020-05-22 北京爱医生智慧医疗科技有限公司 Method and device for extracting physical sign information in electronic medical record
CN112101034B (en) * 2020-09-09 2024-02-27 沈阳东软智能医疗科技研究院有限公司 Method and device for judging attribute of medical entity and related product
CN112883712B (en) * 2021-02-05 2023-05-02 中国人民解放军南部战区总医院 Intelligent input method and device for electronic medical record
CN113988082A (en) * 2021-10-28 2022-01-28 泰康保险集团股份有限公司 Text processing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001101184A (en) * 1999-10-01 2001-04-13 Nippon Telegr & Teleph Corp <Ntt> Method and device for generating structurized document and storage medium with structurized document generation program stored therein
CN102298588A (en) * 2010-06-25 2011-12-28 株式会社理光 Method and device for extracting object from non-structured document
CN107578798A (en) * 2017-10-26 2018-01-12 北京康夫子科技有限公司 The processing method and system of electronic health record
CN108009157A (en) * 2017-12-27 2018-05-08 北京嘉和美康信息技术有限公司 A kind of sentence classifying method and device
CN108711443A (en) * 2018-05-07 2018-10-26 成都智信电子技术有限公司 The text data analysis method and device of electronic health record

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1614587A (en) * 2003-11-07 2005-05-11 杨立伟 Method for digesting Chinese document automatically
CN103020453B (en) * 2012-12-15 2015-12-02 中国科学院深圳先进技术研究院 Based on the structured electronic patient record generation method of ontology
CN106095913A (en) * 2016-06-08 2016-11-09 广州同构医疗科技有限公司 A kind of electronic health record text structure method
CN106897568A (en) * 2017-02-28 2017-06-27 北京大数医达科技有限公司 The treating method and apparatus of case history structuring
CN107085655B (en) * 2017-04-07 2020-11-24 江西中医药大学 Attribute-based Constraint Concept Lattice for Traditional Chinese Medicine Data Processing Method and System
CN107908768A (en) * 2017-09-30 2018-04-13 北京颐圣智能科技有限公司 Method, apparatus, computer equipment and the storage medium of electronic health record processing
CN108182972B (en) * 2017-12-15 2021-07-20 中电科软件信息服务有限公司 Intelligent coding method and system for Chinese disease diagnosis based on word segmentation network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001101184A (en) * 1999-10-01 2001-04-13 Nippon Telegr & Teleph Corp <Ntt> Method and device for generating structurized document and storage medium with structurized document generation program stored therein
CN102298588A (en) * 2010-06-25 2011-12-28 株式会社理光 Method and device for extracting object from non-structured document
CN107578798A (en) * 2017-10-26 2018-01-12 北京康夫子科技有限公司 The processing method and system of electronic health record
CN108009157A (en) * 2017-12-27 2018-05-08 北京嘉和美康信息技术有限公司 A kind of sentence classifying method and device
CN108711443A (en) * 2018-05-07 2018-10-26 成都智信电子技术有限公司 The text data analysis method and device of electronic health record

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于本体的临床医学案例知识库研究;周钧;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130315;第19-32页 *

Also Published As

Publication number Publication date
CN109637605A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
CN109637605B (en) Electronic medical record structuring method and computer-readable storage medium
CN105786991B (en) Method and system for Chinese emotional new word recognition combined with user emotional expression
TWI662425B (en) A method of automatically generating semantic similar sentence samples
US8538745B2 (en) Creating a terms dictionary with named entities or terminologies included in text data
CN106844351B (en) A multi-data source-oriented medical institution organization entity identification method and device
CN105045778B (en) A kind of Chinese homonym mistake auto-collation
CN111681728B (en) Content quality control method and device for electronic medical records
CN107273861A (en) Subjective question marking and scoring method and device and terminal equipment
CN109994215A (en) Disease automatic coding system, method, device and storage medium
CN111177375B (en) Electronic document classification method and device
CN101201820A (en) A bilingual corpus filtering method and system
CN110502750A (en) Disambiguation method, system, equipment and medium in word segmentation process of TCM text
CN109726298A (en) Knowledge mapping construction method, system, terminal and medium suitable for scientific and technical literature
CN105512110B (en) A kind of wrongly written character word construction of knowledge base method based on fuzzy matching with statistics
CN110929520A (en) Non-named entity object extraction method and device, electronic equipment and storage medium
CN111832281A (en) Composition scoring method, device, computer equipment and computer-readable storage medium
CN107688630A (en) A kind of more sentiment dictionary extending methods of Weakly supervised microblogging based on semanteme
CN114358001A (en) Method for standardizing diagnosis result, and related device, equipment and storage medium thereof
CN106933802B (en) A multi-data source-oriented social security entity identification method and device
CN104778162A (en) Subject classifier training method and system based on maximum entropy
CN114528824B (en) Text error correction method and device, electronic equipment and storage medium
CN114333461B (en) Automatic subjective question scoring method and system
CN111046665B (en) Domain term semantic drift extraction method
US11594303B2 (en) Method and system for normalization of gene names in medical text
CN114548113A (en) Event-based reference resolution system, method, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant