CN114464283A - Manual labeling processing method, device, processor and storage medium based on ICD-10 depression diagnosis and treatment standard interview text - Google Patents

Manual labeling processing method, device, processor and storage medium based on ICD-10 depression diagnosis and treatment standard interview text Download PDF

Info

Publication number
CN114464283A
CN114464283A CN202210125772.2A CN202210125772A CN114464283A CN 114464283 A CN114464283 A CN 114464283A CN 202210125772 A CN202210125772 A CN 202210125772A CN 114464283 A CN114464283 A CN 114464283A
Authority
CN
China
Prior art keywords
labeling
icd
text information
depression
manual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210125772.2A
Other languages
Chinese (zh)
Inventor
沈一峰
魏宇梅
盛钦润
李华芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Mental Health Center Shanghai Psychological Counselling Training Center
Original Assignee
Shanghai Mental Health Center Shanghai Psychological Counselling Training Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Mental Health Center Shanghai Psychological Counselling Training Center filed Critical Shanghai Mental Health Center Shanghai Psychological Counselling Training Center
Priority to CN202210125772.2A priority Critical patent/CN114464283A/en
Publication of CN114464283A publication Critical patent/CN114464283A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

本发明涉及一种基于ICD‑10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法,包括以下步骤:利用统计方法构建领域词典,按照定义的标注规范,标注中文电子病历命名实体识别标注语料库;识别中文电子病历命名实体。本发明还涉及一种用于实现基于ICD‑10抑郁症诊疗标准访谈文本信息的手工标注处理的装置、处理器及其计算机可读存储介质。采用了本发明的基于ICD‑10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法、装置、处理器及其计算机可读存储介质,优化用户标注体验、提高标注效率,为了提高标注结果的正确率,增设审核和编辑功能环节,允许在多种系统环境上部署运行。

Figure 202210125772

The invention relates to a method for implementing manual labeling processing based on interview text information of ICD-10 standard of diagnosis and treatment of depression, comprising the following steps: constructing a domain dictionary by using a statistical method, labeling a Chinese electronic medical record named entity identification and labeling corpus according to a defined labeling specification; Identify Chinese electronic medical record named entities. The invention also relates to a device, a processor and a computer-readable storage medium for realizing manual annotation processing of interview text information based on the ICD-10 standard of diagnosis and treatment of depression. The method, device, processor and computer-readable storage medium for realizing manual labeling processing based on ICD-10 depression diagnosis and treatment standard interview text information of the present invention are adopted to optimize user labeling experience, improve labeling efficiency, and improve the accuracy of labeling results. rate, adding review and editing functions, allowing deployment and operation on a variety of system environments.

Figure 202210125772

Description

基于ICD-10抑郁症诊疗标准访谈文本的手工标注处理方法、 装置、处理器及存储介质Manual annotation processing method, device, processor and storage medium for interview text based on ICD-10 standard for diagnosis and treatment of depression

技术领域technical field

本发明涉及人工智能领域,尤其涉及自然语言处理领域,具体是指一种基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法、装置、处理器及其计算机可读存储介质。The invention relates to the field of artificial intelligence, in particular to the field of natural language processing, and in particular to a method, a device, a processor and a computer-readable storage medium for realizing manual annotation processing based on ICD-10 standard of depression diagnosis and treatment interview text information.

背景技术Background technique

当前的通用场景的文本标注工具不支持精神卫生领域的标注功能:The current general scene text annotation tool does not support annotation functions in the field of mental health:

比如访谈病历中可以包括文本、医患对话剧本等多种数字化信息。但是访谈记录中还有大量半结构或是无结构的自由文本数据,自由杂乱的文本中分布的有用的信息无法被计算机快速地、有效地利用起来。要想将非结构化的访谈病历数据转化成计算机能够识别的结构化形式,采用自然语言处理技术进行文本挖掘是必不可少的。访谈病历中用自然语言描述的文本信息虽然蕴藏了丰富的抑郁症症状和相关医学知识,但因其领域的专业性和复杂性,往往存在与通用文本不同的语言描述方式,无法直接利用通用领域的方法来解决。For example, interview medical records can include texts, doctor-patient dialogue scripts and other digital information. However, there is still a large amount of semi-structured or unstructured free text data in the interview records, and the useful information distributed in the free and messy text cannot be used quickly and effectively by the computer. In order to transform unstructured interview medical record data into a structured form that can be recognized by computer, it is essential to use natural language processing technology for text mining. Although the text information described in natural language in the interview medical records contains a wealth of depression symptoms and related medical knowledge, due to the professionalism and complexity of the field, there are often different language description methods from the general text, and the general field cannot be directly used. method to solve.

因此命名实体识别(人工标注)作为文本数据挖掘的关键技术,一直是精神卫生领域自然语言处理的研究基础与热点。通用领域的命名实体指的是由实体类、时间类和数字类三个大类以及特定领域(例如抑郁症诊断)的命名实体,则会根据实体所属领域特征相应地对该领域内的各种实体类型下定义。鉴于本次研究是全新尝试,没有历史经验可以参考,因此首先需要解决人工标注问题。Therefore, named entity recognition (manual annotation), as the key technology of text data mining, has always been the research basis and hotspot of natural language processing in the field of mental health. The named entity in the general domain refers to the named entity composed of three major categories: entity class, time class and number class, as well as the named entity in a specific field (such as depression diagnosis). Defined under the entity type. Since this research is a brand new attempt, there is no historical experience to refer to, so the problem of manual annotation needs to be solved first.

发明内容SUMMARY OF THE INVENTION

本发明的目的是克服了上述现有技术的缺点,提供了一种满足准确性高、成本消耗低、适用范围较为广泛的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法、装置、处理器及其计算机可读存储介质。The purpose of the present invention is to overcome the shortcomings of the above-mentioned prior art, and provide a method for realizing manual annotation processing based on the interview text information of the ICD-10 depression diagnosis and treatment standard, which meets the requirements of high accuracy, low cost consumption and relatively wide application range, Apparatus, processor, and computer-readable storage medium thereof.

为了实现上述目的,本发明的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法、装置、处理器及其计算机可读存储介质如下:In order to achieve the above object, the method, device, processor and computer-readable storage medium thereof for realizing manual labeling and processing based on ICD-10 depression diagnosis and treatment standard interview text information of the present invention are as follows:

该基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法,其主要特点是,所述的方法包括以下步骤:The main feature of the method for implementing manual labeling and processing based on the interview text information of the ICD-10 standard for diagnosis and treatment of depression is that the method comprises the following steps:

(1)通过精确匹配手工标签的统计方式构建领域词典,按照定义的标注规范,标注中文电子病历命名实体识别标注语料库;(1) Construct a domain dictionary by accurately matching manual labels, and label the Chinese electronic medical record named entity recognition and labeling corpus according to the defined labeling specifications;

(2)同时进行手工标注和自动标注,识别中文电子病历命名实体。(2) Simultaneously perform manual labeling and automatic labeling to identify Chinese electronic medical record named entities.

较佳地,所述的步骤(1)的构建领域词典的步骤具体为:Preferably, the step of constructing a domain dictionary in the step (1) is specifically:

从中文电子病历中获取关键词或利用外部专业资源获取词典关键词来构建领域词典。Obtain keywords from Chinese electronic medical records or use external professional resources to obtain dictionary keywords to construct domain dictionaries.

较佳地,所述的步骤(2)中进行手工标注,具体包括以下步骤:Preferably, in the described step (2), carry out manual labeling, and specifically includes the following steps:

(2.1)登录抑郁智能诊断与病例生成系统,填写患者问诊信息;(2.1) Log in to the depression intelligent diagnosis and case generation system, and fill in the patient consultation information;

(2.2)上传录音文件;(2.2) Upload recording files;

(2.3)进行录音识别,将录音文件转换为文本内容;(2.3) Perform recording recognition and convert the recording file into text content;

(2.4)对文本内容进行标注;(2.4) Mark the text content;

(2.5)生成诊断报告。(2.5) Generate a diagnostic report.

较佳地,所述的步骤(2)中进行深度学习的步骤,具体为:Preferably, the step of deep learning in the described step (2) is specifically:

基于深度学习采用领域语料预训练字嵌入和对相关实体识别模型进行领域预处理方式。Based on deep learning, domain corpus pre-training word embedding and domain preprocessing are used for related entity recognition models.

较佳地,所述的方法还包括以下步骤:Preferably, the method further comprises the following steps:

(3)构建手工标注模型,通过手工标注模型识别访谈电子病历医疗实体,并对精神科病历医疗实体识别结果进行展示和分析。(3) Build a manual labeling model, identify and interview medical entities of electronic medical records through the manual labeling model, and display and analyze the results of medical entity recognition in psychiatric medical records.

该用于实现基于ICD-10抑郁症诊疗标准访谈文本信息的手工标注处理的装置,其主要特点是,所述的装置包括:The device for realizing manual annotation processing of interview text information based on the ICD-10 standard for diagnosis and treatment of depression is mainly characterized in that the device includes:

处理器,被配置成执行计算机可执行指令;a processor configured to execute computer-executable instructions;

存储器,存储一个或多个计算机可执行指令,所述的计算机可执行指令被所述的处理器执行时,实现上述的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法的各个步骤。The memory stores one or more computer-executable instructions, and when the computer-executable instructions are executed by the processor, each of the above-mentioned methods for implementing manual annotation processing based on the ICD-10 depression diagnosis and treatment standard interview text information is realized. step.

该用于实现基于ICD-10抑郁症诊疗标准访谈文本信息的手工标注处理的处理器,其主要特点是,所述的处理器被配置成执行计算机可执行指令,所述的计算机可执行指令被所述的处理器执行时,实现上述的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法的各个步骤。The main feature of the processor for implementing manual annotation processing of interview text information based on the ICD-10 depression diagnosis and treatment standard is that the processor is configured to execute computer-executable instructions, and the computer-executable instructions are When the processor is executed, each step of the above-mentioned method for manual annotation processing based on the interview text information of the ICD-10 standard for diagnosis and treatment of depression is realized.

该计算机可读存储介质,其主要特点是,其上存储有计算机程序,所述的计算机程序可被处理器执行以实现上述的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法的各个步骤。The computer-readable storage medium is mainly characterized in that a computer program is stored thereon, and the computer program can be executed by a processor to realize the above-mentioned method for manual annotation processing based on the interview text information of the ICD-10 standard for diagnosis and treatment of depression of the various steps.

采用了本发明的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法、装置、处理器及其计算机可读存储介质,需要实时可视化显示每次的标注结果,每次标注时需要支持可选择大量实体类型。两个实体之间的关系标注实现方式,多个实体之间的事件抽取实现方式。嵌套实体标注的实时可视化展示。文本支持迭代标注中,对已标注文本的标注内容识别导入数据库,优化用户标注体验、提高标注效率,为了提高标注结果的正确率,增设审核和编辑功能环节,允许在多种系统环境上部署运行。Using the method, device, processor and computer-readable storage medium for manual annotation processing based on ICD-10 depression diagnosis and treatment standard interview text information of the present invention, each annotation result needs to be visually displayed in real time, and each annotation needs to be Supports selection of a large number of entity types. The implementation method of relationship annotation between two entities, and the implementation method of event extraction between multiple entities. Real-time visualization of nested entity annotations. In the iterative labeling of text support, the labeling content of the labelled text is recognized and imported into the database, which optimizes the user labeling experience and improves labeling efficiency. In order to improve the accuracy of labeling results, additional review and editing functions are added to allow deployment and operation in a variety of system environments. .

附图说明Description of drawings

图1为本发明的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法的流程示意图。FIG. 1 is a schematic flowchart of a method for implementing manual annotation processing based on ICD-10 standard of depression diagnosis and treatment interview text information according to the present invention.

图2为本发明的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法的手工标注示意图。FIG. 2 is a schematic diagram of manual annotation of a method for implementing manual annotation processing based on the interview text information of the ICD-10 standard for diagnosis and treatment of depression according to the present invention.

图3为本发明的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法的实施例的问诊示意图。FIG. 3 is a schematic diagram of an inquiries according to an embodiment of the method for implementing manual annotation processing based on the interview text information of the ICD-10 standard for diagnosis and treatment of depression according to the present invention.

图4为本发明的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法的经过手工标注的标签的电子病历处理结果的示意图。4 is a schematic diagram of the electronic medical record processing result of the manual labeling of the method for implementing manual labeling processing based on the interview text information of the ICD-10 depression diagnosis and treatment standard of the present invention.

具体实施方式Detailed ways

为了能够更清楚地描述本发明的技术内容,下面结合具体实施例来进行进一步的描述。In order to describe the technical content of the present invention more clearly, further description will be given below with reference to specific embodiments.

本发明的该基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法,其中包括以下步骤:The present invention is based on the ICD-10 depression diagnosis and treatment standard interview text information to realize the method of manual labeling and processing, which comprises the following steps:

(1)通过精确匹配手工标签的统计方式构建领域词典,按照定义的标注规范,标注中文电子病历命名实体识别标注语料库;(1) Construct a domain dictionary by accurately matching manual labels, and label the Chinese electronic medical record named entity recognition and labeling corpus according to the defined labeling specifications;

(2)同时进行手工标注和自动标注,识别中文电子病历命名实体。(2) Simultaneously perform manual labeling and automatic labeling to identify Chinese electronic medical record named entities.

作为本发明的优选实施方式,所述的步骤(1)的构建领域词典的步骤具体为:As a preferred embodiment of the present invention, the step of constructing a domain dictionary in the step (1) is specifically:

从中文电子病历中获取关键词或利用外部专业资源获取词典关键词来构建领域词典。Obtain keywords from Chinese electronic medical records or use external professional resources to obtain dictionary keywords to construct domain dictionaries.

作为本发明的优选实施方式,所述的步骤(2)中进行手工标注,具体包括以下步骤:As a preferred embodiment of the present invention, manual marking is performed in the step (2), which specifically includes the following steps:

(2.1)登录抑郁智能诊断与病例生成系统,填写患者问诊信息;(2.1) Log in to the depression intelligent diagnosis and case generation system, and fill in the patient consultation information;

(2.2)上传录音文件;(2.2) Upload recording files;

(2.3)进行录音识别,将录音文件转换为文本内容;(2.3) Perform recording recognition and convert the recording file into text content;

(2.4)对文本内容进行标注;(2.4) Mark the text content;

(2.5)生成诊断报告。(2.5) Generate a diagnostic report.

作为本发明的优选实施方式,所述的步骤(2)中进行深度学习的步骤,具体为:As a preferred embodiment of the present invention, the step of deep learning in the step (2) is specifically:

基于深度学习采用领域语料预训练字嵌入和对相关实体识别模型进行领域预处理方式。Based on deep learning, domain corpus pre-training word embedding and domain preprocessing are used for related entity recognition models.

作为本发明的优选实施方式,所述的方法还包括以下步骤:As a preferred embodiment of the present invention, the method further comprises the following steps:

(3)构建手工标注模型,通过手工标注模型识别访谈电子病历医疗实体,并对精神科病历医疗实体识别结果进行展示和分析。(3) Build a manual labeling model, identify and interview medical entities of electronic medical records through the manual labeling model, and display and analyze the results of medical entity recognition in psychiatric medical records.

本发明的该用于实现基于ICD-10抑郁症诊疗标准访谈文本信息的手工标注处理的装置,其中所述的装置包括:The device of the present invention for realizing manual annotation processing of interview text information based on the ICD-10 standard of diagnosis and treatment for depression, wherein the device includes:

处理器,被配置成执行计算机可执行指令;a processor configured to execute computer-executable instructions;

存储器,存储一个或多个计算机可执行指令,所述的计算机可执行指令被所述的处理器执行时,实现上述的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法的各个步骤。The memory stores one or more computer-executable instructions, and when the computer-executable instructions are executed by the processor, each of the above-mentioned methods for implementing manual annotation processing based on the ICD-10 depression diagnosis and treatment standard interview text information is realized. step.

本发明的该用于实现基于ICD-10抑郁症诊疗标准访谈文本信息的手工标注处理的处理器,其中所述的处理器被配置成执行计算机可执行指令,所述的计算机可执行指令被所述的处理器执行时,实现上述的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法的各个步骤。The processor of the present invention for implementing manual annotation processing of interview text information based on ICD-10 depression diagnosis and treatment standard, wherein the processor is configured to execute computer-executable instructions, and the computer-executable instructions are When the processor described above is executed, each step of the above-mentioned method for manual annotation processing based on the interview text information of the ICD-10 standard for diagnosis and treatment of depression is realized.

本发明的该计算机可读存储介质,其上存储有计算机程序,所述的计算机程序可被处理器执行以实现上述的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法的各个步骤。The computer-readable storage medium of the present invention has a computer program stored thereon, and the computer program can be executed by the processor to implement each of the above-mentioned methods for manual annotation processing based on the interview text information of the ICD-10 standard for diagnosis and treatment of depression. step.

由于市面上的文本标注工具无法满足实际抑郁症领域项目的标注场景需求,因此本项目自主开发了基于web的文本标注工具用于构建高质量的语料库。该工具需要支持实体标注、ICD-10指标,事件抽取,文本分类等基础标注功能,要求标注规范可自定义,文本可迭代标注,适用于大规模实体类型的标注任务,可拓展嵌套实体标注、标准名标注和基于字典匹配和正则匹配的预标注功能。在满足标注功能的前提下,尽可能优化标注体验,减少用户的工作量和成本消耗,同时保证标注结果的准确性。标注可以在主流操作系统Windows、Linux和Mac上正常运行。Since the text annotation tools on the market cannot meet the needs of the actual depression field projects, this project independently developed a web-based text annotation tool to build a high-quality corpus. The tool needs to support basic labeling functions such as entity labeling, ICD-10 indicators, event extraction, text classification, etc. It requires that labeling specifications can be customized, and text can be iteratively labelled. It is suitable for labeling tasks of large-scale entity types, and can expand nested entity labeling. , standard name tagging and pre tagging based on dictionary matching and regular matching. On the premise of satisfying the annotation function, optimize the annotation experience as much as possible, reduce the user's workload and cost consumption, and ensure the accuracy of the annotation results. Annotation works well on mainstream operating systems Windows, Linux and Mac.

本发明的具体实施方式中,本发明提供一种支持精神卫生领域的专有人工标注ICD-10诊疗模型,易于支持创新研究。In a specific embodiment of the present invention, the present invention provides a proprietary manual labeling ICD-10 diagnosis and treatment model that supports the field of mental health, which is easy to support innovative research.

本发明在抑郁智能诊断与病例生成系统中添加手工标注ICD-10标签体系,针对访谈结果进行结构化分析,用户可以通过web可视化界面,对文字记录进行贴标,且对形成的标签进行编辑。The invention adds a manual labeling ICD-10 label system to the depression intelligent diagnosis and case generation system, and conducts structured analysis on the interview results. The user can label the text record through the web visual interface, and edit the formed label.

本发明通过以下技术手段解决上述技术问题:考虑到中文访谈病历文本的语言特点,以及目前还没有统一的标注规范及公开的标注语料用以研究的现状,本专利致力于在小规模标注语料的情况下提升中文医疗实体识别模型的性能。本专利所做的工作可概括为以下三部分内容:The present invention solves the above-mentioned technical problems through the following technical means: considering the language characteristics of Chinese interview medical record texts, and the current situation that there is no unified labeling norm and public labeling corpus for research, this patent is devoted to the application of small-scale labeling corpus. It can improve the performance of Chinese medical entity recognition model under different circumstances. The work done by this patent can be summarized into the following three parts:

(1)利用统计方法从中文电子病历中获取关键词以及利用外部专业资源获取词典关键词这两种途径构建领域词典。同时,按照定义的标注规范,标注中文电子病历命名实体识别标注语料库。其中统计方法为精确匹配,手工标签,自动存储结果。比如使用系统的医患访谈结果,由医生精确匹配选择关键词“睡不着觉”为例,在精确匹配下,医生选择“睡不着觉”并打上“失眠”的标签,则能触发存储的“失眠”症状关键词:“睡不着觉”。同时自动完全包含关键词的短语(语序不能颠倒),例如:“我睡不着觉”、“我经常睡不着觉”。(1) Using statistical methods to obtain keywords from Chinese electronic medical records and using external professional resources to obtain dictionary keywords to build domain dictionaries. At the same time, according to the defined annotation specification, the Chinese electronic medical record named entity recognition and annotation corpus is annotated. Among them, the statistical methods are exact matching, manual labeling, and automatic storage of results. For example, using the results of a systematic doctor-patient interview, the doctor selects the keyword "can't sleep" by exact matching. Insomnia" symptom keyword: "can't sleep". At the same time, phrases that fully contain keywords (the word order cannot be reversed), such as: "I can't sleep", "I often can't sleep".

(2)中文电子病历命名实体识别研究,使医生基于手工标注和自动标注两大类。基于手工标注的方法将条件随机场与领域词典相结合,利用预标注-二次标注两次不同粒度的标注过程提升命名实体识别效果。基于深度学习的方法利用BLSTM-CRF和TransformerCRF两种网络,采用领域语料预训练字嵌入和对相关实体识别模型进行Fine-turing两种领域预处理方式,使得深度学习方法更好地应用于医疗的诊疗标准实体识别。(2) Research on named entity recognition of Chinese electronic medical records, which makes doctors based on two categories of manual annotation and automatic annotation. The method based on manual annotation combines the conditional random field with the domain dictionary, and uses the pre-labeling-secondary labeling process of two different granularities to improve the effect of named entity recognition. The deep learning-based method uses two networks, BLSTM-CRF and TransformerCRF, and adopts two domain preprocessing methods, namely, domain corpus pre-training word embedding and Fine-turing for related entity recognition models, so that deep learning methods can be better applied to medical treatment. Medical standard entity recognition.

手工和自动标注同时进行。抑郁症访谈病历记录与辅助诊疗系统,手工标注上传操作步骤如下:Manual and automatic labeling are performed simultaneously. Depression interview medical records and auxiliary diagnosis and treatment system, manual labeling and uploading steps are as follows:

a、抑郁智能诊断与病例生成系统:登陆;a. Depression intelligent diagnosis and case generation system: login;

b、抑郁智能诊断与病例生成系统:填写患者问诊信息,点击保存;b. Depression intelligent diagnosis and case generation system: fill in the patient consultation information and click save;

c、抑郁智能诊断与病例生成系统:点击音频,上传录音;c. Depression intelligent diagnosis and case generation system: click on the audio to upload the recording;

d、抑郁智能诊断与病例生成系统:选择录音文件,打开;d. Depression intelligent diagnosis and case generation system: select the recording file and open it;

e、抑郁智能诊断与病例生成系统:上传文件后,点接录音识别;e. Depression intelligent diagnosis and case generation system: After uploading the file, click on the recording to identify;

f、抑郁智能诊断与病例生成系统:等待几分钟后,点接获取文字,保存;f. Depression intelligent diagnosis and case generation system: After waiting for a few minutes, click to get the text and save it;

g、抑郁智能诊断与病例生成系统:问诊列表中点击标注,即可标注;g. Depression intelligent diagnosis and case generation system: Click the mark in the inquiry list to mark it;

h、抑郁智能诊断与病例生成系统:标注操作,保存,生成诊断报告。h. Depression intelligent diagnosis and case generation system: label operation, save, and generate diagnosis report.

(3)手工标注模型的其他应用。分别研究手工标注模型对访谈电子病历医疗实体识别的普适性以及对医院真实精神科病历医疗实体识别结果的展示和分析。(3) Other applications of manual annotation models. The universality of the manual annotation model for the recognition of medical entities in interview electronic medical records and the display and analysis of the results of medical entity recognition in real psychiatric medical records in hospitals are respectively studied.

经由测试人员提交测试报告以及测试开发过程,开发人员对测试结果进行确认,确认ASR识别准确率为90%以上,可以实现手工标注标签功能,生成病历报告,且所有bug已经完成修复,经过修复完成后进行提交重新确认。再次经由测试人员对提交的修复结果进行回归确认,确认所有系统bug已经完成清理。After the tester submits the test report and the test development process, the developer confirms the test results and confirms that the ASR recognition accuracy rate is more than 90%, the manual labeling function can be realized, and the medical record report can be generated, and all bugs have been fixed, and the repair is completed. Submit and re-confirm. Once again, the testers conduct regression confirmation on the submitted repair results to confirm that all system bugs have been cleaned up.

本实施例的具体实现方案可以参见上述实施例中的相关说明,此处不再赘述。For the specific implementation scheme of this embodiment, reference may be made to the relevant descriptions in the foregoing embodiments, which will not be repeated here.

可以理解的是,上述各实施例中相同或相似部分可以相互参考,在一些实施例中未详细说明的内容可以参见其他实施例中相同或相似的内容。It can be understood that, the same or similar parts in the above embodiments may refer to each other, and the content not described in detail in some embodiments may refer to the same or similar content in other embodiments.

需要说明的是,在本发明的描述中,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,在本发明的描述中,除非另有说明,“多个”的含义是指至少两个。It should be noted that, in the description of the present invention, the terms "first", "second", etc. are only used for the purpose of description, and should not be construed as indicating or implying relative importance. Furthermore, in the description of the present invention, unless otherwise specified, the meaning of "plurality" means at least two.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。Any description of a process or method in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing a specified logical function or step of the process , and the scope of the preferred embodiments of the invention includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present invention belong.

应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行装置执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by suitable instruction execution means. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,相应的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those skilled in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing the relevant hardware through a program, and the corresponding program can be stored in a computer-readable storage medium, and the program can be executed when the program is executed. , including one or a combination of the steps of the method embodiment.

此外,在本发明各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器,磁盘或光盘等。The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like.

在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

采用了本发明的基于ICD-10抑郁症诊疗标准访谈文本信息实现手工标注处理的方法、装置、处理器及其计算机可读存储介质,需要实时可视化显示每次的标注结果,每次标注时需要支持可选择大量实体类型。两个实体之间的关系标注实现方式,多个实体之间的事件抽取实现方式。嵌套实体标注的实时可视化展示。文本支持迭代标注中,对已标注文本的标注内容识别导入数据库,优化用户标注体验、提高标注效率,为了提高标注结果的正确率,增设审核和编辑功能环节,允许在多种系统环境上部署运行。Using the method, device, processor and computer-readable storage medium for manual annotation processing based on ICD-10 depression diagnosis and treatment standard interview text information of the present invention, each annotation result needs to be visually displayed in real time, and each annotation needs to be Supports selection of a large number of entity types. The implementation method of relationship annotation between two entities, and the implementation method of event extraction between multiple entities. Real-time visualization of nested entity annotations. In the iterative labeling of text support, the labeling content of the labelled text is recognized and imported into the database, which optimizes the user labeling experience and improves labeling efficiency. In order to improve the accuracy of labeling results, additional review and editing functions are added to allow deployment and operation in a variety of system environments. .

在此说明书中,本发明已参照其特定的实施例作了描述。但是,很显然仍可以作出各种修改和变换而不背离本发明的精神和范围。因此,说明书和附图应被认为是说明性的而非限制性的。In this specification, the invention has been described with reference to specific embodiments thereof. However, it will be evident that various modifications and changes can still be made without departing from the spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims (8)

1. A method for realizing manual labeling processing based on ICD-10 depression diagnosis and treatment standard interview text information is characterized by comprising the following steps:
(1) constructing a domain dictionary in a statistical mode of accurately matching handlabels, and labeling a named entity recognition labeling corpus of the Chinese electronic medical record according to a defined labeling specification;
(2) and performing manual labeling and automatic labeling at the same time, and identifying the named entity of the Chinese electronic medical record.
2. The method for realizing manual labeling processing based on ICD-10 depression diagnosis and treatment standard interview text information according to claim 1, wherein the step of constructing the domain dictionary in the step (1) is specifically as follows:
and obtaining keywords from the Chinese electronic medical record or obtaining dictionary keywords by using external professional resources to construct a domain dictionary.
3. The method for realizing manual labeling processing based on the ICD-10 depression diagnosis and treatment standard interview text information according to claim 1, wherein the manual labeling in the step (2) specifically comprises the following steps:
(2.1) logging in an intelligent depression diagnosis and case generation system and filling out inquiry information of a patient;
(2.2) uploading the recording file;
(2.3) carrying out sound recording identification, and converting the sound recording file into text content;
(2.4) labeling the text content;
and (2.5) generating a diagnosis report.
4. The method for realizing manual labeling processing based on the ICD-10 depression diagnosis and treatment standard interview text information according to claim 1, wherein the step (2) of deep learning comprises the following specific steps:
and embedding the domain linguistic data pre-training words and performing a domain preprocessing mode on the relevant entity recognition model based on deep learning.
5. The method for realizing manual labeling processing based on ICD-10 depression diagnosis and treatment standard interview text information according to claim 1, wherein the method further comprises the following steps:
(3) and constructing a manual labeling model, identifying the interview electronic medical record medical entity through the manual labeling model, and displaying and analyzing the identification result of the psychiatric medical record medical entity.
6. An apparatus for implementing manual annotation processing based on ICD-10 depression diagnosis and treatment standard interview text information, characterized in that the apparatus comprises:
a processor configured to execute computer-executable instructions;
a memory storing one or more computer-executable instructions that, when executed by the processor, perform the steps of the method of any one of claims 1 to 5 for performing a manual tagging process based on ICD-10 depression medical standard interview text information.
7. A processor for implementing a manual tagging process based on ICD-10 depression diagnosis and treatment standard interview text information, characterized in that the processor is configured to execute computer executable instructions which, when executed by the processor, implement the steps of the method for implementing a manual tagging process based on ICD-10 depression diagnosis and treatment standard interview text information as claimed in any one of claims 1 to 5.
8. A computer-readable storage medium, having stored thereon a computer program executable by a processor to perform the steps of the method of any one of claims 1 to 5 for performing manual tagging based on ICD-10 depressive illness treatment standard interview text information.
CN202210125772.2A 2022-02-10 2022-02-10 Manual labeling processing method, device, processor and storage medium based on ICD-10 depression diagnosis and treatment standard interview text Pending CN114464283A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210125772.2A CN114464283A (en) 2022-02-10 2022-02-10 Manual labeling processing method, device, processor and storage medium based on ICD-10 depression diagnosis and treatment standard interview text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210125772.2A CN114464283A (en) 2022-02-10 2022-02-10 Manual labeling processing method, device, processor and storage medium based on ICD-10 depression diagnosis and treatment standard interview text

Publications (1)

Publication Number Publication Date
CN114464283A true CN114464283A (en) 2022-05-10

Family

ID=81414421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210125772.2A Pending CN114464283A (en) 2022-02-10 2022-02-10 Manual labeling processing method, device, processor and storage medium based on ICD-10 depression diagnosis and treatment standard interview text

Country Status (1)

Country Link
CN (1) CN114464283A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033660A (en) * 2023-08-12 2023-11-10 安徽理工大学 A domain dictionary construction method for behavioral characteristics of depression

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107527073A (en) * 2017-09-05 2017-12-29 中南大学 The recognition methods of entity is named in electronic health record
CN108628824A (en) * 2018-04-08 2018-10-09 上海熙业信息科技有限公司 A kind of entity recognition method based on Chinese electronic health record
CN111881294A (en) * 2020-07-30 2020-11-03 本识科技(深圳)有限公司 Corpus labeling system, corpus labeling method and storage medium
CN112001177A (en) * 2020-08-24 2020-11-27 浪潮云信息技术股份公司 Electronic medical record named entity identification method and system integrating deep learning and rules
CN112712118A (en) * 2020-12-29 2021-04-27 银江股份有限公司 Medical text data oriented filtering method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107527073A (en) * 2017-09-05 2017-12-29 中南大学 The recognition methods of entity is named in electronic health record
CN108628824A (en) * 2018-04-08 2018-10-09 上海熙业信息科技有限公司 A kind of entity recognition method based on Chinese electronic health record
CN111881294A (en) * 2020-07-30 2020-11-03 本识科技(深圳)有限公司 Corpus labeling system, corpus labeling method and storage medium
CN112001177A (en) * 2020-08-24 2020-11-27 浪潮云信息技术股份公司 Electronic medical record named entity identification method and system integrating deep learning and rules
CN112712118A (en) * 2020-12-29 2021-04-27 银江股份有限公司 Medical text data oriented filtering method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
龚乐君等: "基于ICD-10抑郁症诊疗标准访谈文本的手工标注处理方法", 工程科学学报, vol. 42, no. 04, 30 April 2020 (2020-04-30), pages 1 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033660A (en) * 2023-08-12 2023-11-10 安徽理工大学 A domain dictionary construction method for behavioral characteristics of depression

Similar Documents

Publication Publication Date Title
US12393575B2 (en) Multi-table question answering system and method thereof
Konda Magellan: Toward building entity matching management systems
CN108475538B (en) Structured discovery objects for integrating third party applications in an image interpretation workflow
US11928156B2 (en) Learning-based automated machine learning code annotation with graph neural network
CN109584975B (en) Medical data standardization processing method and device
WO2022218186A1 (en) Method and apparatus for generating personalized knowledge graph, and computer device
CN111552766B (en) Using machine learning to characterize reference relationships applied on reference graphs
WO2022188584A1 (en) Similar sentence generation method and apparatus based on pre-trained language model
CN109524121B (en) Medical file processing method and device
US20200312431A1 (en) Method, system, and apparatus for automatically adding icd code, and medium
CN102708161B (en) A Modeling Method of Data Logic Model Using Common Concept Set
US11940964B2 (en) System for annotating input data using graphs via a user interface
CN113421657B (en) Method and device for constructing knowledge representation model of clinical practice guideline
US20200342339A1 (en) Cognitive Data Preparation for Deep Learning Model Training
CN117649913A (en) Electronic medical record generation method based on large language model
CN105468571A (en) Method and device used for automatically generating report
CN118132592A (en) A SQL statement generation method and system based on deep learning
CN120297253A (en) Fault analysis report generation method and related equipment for thermal power plants based on retrieval enhancement
ALMutairi et al. Fhirviz: Multi-agent platform for fhir visualization to advance healthcare analytics
CN114464283A (en) Manual labeling processing method, device, processor and storage medium based on ICD-10 depression diagnosis and treatment standard interview text
CN116738998A (en) A Web-based multi-granularity semantic annotation system and method for medical conversations
CN115309652A (en) Software test document processing method and device, computer equipment and storage medium
Sinha et al. Interactive software for generation and visualization of structured findings in radiology reports
US20250004928A1 (en) Automated software testing using natural language-based script execution
Khalid et al. Explainable prediction of medical codes through automated knowledge graph curation framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220510

RJ01 Rejection of invention patent application after publication