CN116738998A - Medical dialogue multi-granularity semantic annotation system and method based on Web - Google Patents

Medical dialogue multi-granularity semantic annotation system and method based on Web Download PDF

Info

Publication number
CN116738998A
CN116738998A CN202310462367.4A CN202310462367A CN116738998A CN 116738998 A CN116738998 A CN 116738998A CN 202310462367 A CN202310462367 A CN 202310462367A CN 116738998 A CN116738998 A CN 116738998A
Authority
CN
China
Prior art keywords
labeling
dialogue
module
sentence
annotation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310462367.4A
Other languages
Chinese (zh)
Inventor
胡玥
张兴盛
梅明阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202310462367.4A priority Critical patent/CN116738998A/en
Publication of CN116738998A publication Critical patent/CN116738998A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to a Web-based medical dialogue multi-granularity semantic annotation system and a Web-based medical dialogue multi-granularity semantic annotation method. The system comprises a file management module, a dialogue display module, a statement module and a labeling module; the file management module is used for managing files; the dialogue display module is used for displaying dialogue sentences and role information of corresponding speakers; the sentence module is used for displaying the sentences to be annotated and the sources thereof which are currently selected by the annotators; the labeling module is used for setting labeling functions according to the designed labeling standards and labeling by adopting a multi-level labeling framework. The invention designs and develops a high-efficiency and simple labeling system aiming at the data characteristics and the application requirements in the medical dialogue field, has the functions of multi-round multi-role data display, multi-granularity text free dividing and selecting, layering complex semantic data labeling and the like, and has stronger standardization and completeness for labeling. The labeling flow designed by the invention can reduce the difficulty of data labeling and improve the labeling efficiency.

Description

Medical dialogue multi-granularity semantic annotation system and method based on Web
Technical Field
The invention belongs to the technical field of information, relates to a text labeling technology and an off-line manual dialogue labeling technology, and particularly relates to a medical dialogue multi-granularity semantic labeling system and method based on Web.
Background
With the rapid development of artificial intelligence in recent years, the dialogue and question-answering capability of natural language processing is mature, and a dialogue AI is gradually a focus in the field of intelligent voice semantics, and the dialogue AI is used for understanding human intention through voice semantic technology so as to execute tasks or answer. The conversational AI combines with each industry to derive a plurality of new business demands, and the online medical consultation platform is one of them. As the online medical consultation platform is vigorously developed, the application has great potential in improving the medical service quality while reducing the cost.
Typically, during medical online consultation, the patient first provides a brief summary of his or her health status, i.e., self-reports, and then designates a doctor to communicate with the patient to learn more about the patient's health status. After sufficient interrogation, the physician may make a diagnosis and provide further medical comments.
Recently, researchers have focused on developing automated methods to facilitate online medical consultation services. Research topics include medical named entity identification, drug recommendation, text-based automatic diagnosis, health questions and answers, medical report generation, and the like. Despite the progress made in supporting automated medical consultation from different perspectives, there is still a great gap between existing work and practical application, mainly the lack of a large number of fine-grained semantically annotated medical dialogue corpora for supporting the needs of multiple automated medical consultation services. Compared with other natural language processing tasks, the medical field has more specialized knowledge and serious speaking of the dialogue, so that the labeling standard which is followed when the semantic labeling is carried out is more complex, for example, dialogue state tracking in the normal task dialogue only needs to label 'slot-value', while the medical field usually relates to multi-layered semantic labeling such as 'stage/service-slot-value', and the special requirement of the medical field relates to the labeling of conditional semantics and negative semantics. Therefore, there is a need to develop a data labeling system that meets the semantic labeling requirements of complex medical dialogs. The system design also has the characteristics of simplicity and easiness in use, so that the learning cost of labeling personnel can be reduced, the personnel can use the system more quickly, the operation error of the labeling personnel is reduced, and the quality of data and the speed of data labeling are improved.
Conventional semantic annotation systems often employ text editors such as Excel, notpad++. Firstly, marking personnel open texts to be marked by using the editors, then locate words to be marked in the original texts according to corresponding marking specifications, input corresponding labels, finish marking each piece of data in sequence, and finally store marked data files. The text editor is used for marking the position and the text label which require the annotator to manually input the text for positioning, so that the time is very long, the fatigue of the annotator is easily caused, and the annotation error is easily generated. Meanwhile, the labeling method is generally only suitable for simple labeling specifications, and a corresponding labeling flow cannot be clearly designed for a complex labeling system, so that labeling results are disordered and cannot be used.
With the development of deep learning in the field of artificial intelligence, a great deal of labeling data support is often needed behind the deep learning for training a neural network machine learning model. Therefore, in recent years, software for data annotation development is also available, and compared with a traditional text editor, the data annotation software is provided with an annotation module specially aiming at text data, so that the annotation efficiency and accuracy are improved. However, due to the high cost of developing the corresponding software and the need for modification of the labeling specification, a lot of manpower is often required, and the compatibility and portability of the software are poor. Most of marking software at present is usually only aimed at the field of single sentence texts, and few marking software aimed at dialogue data formats are provided, and dialogue type data samples are compared with common single sentence texts, so that requirements of dialogue roles, specific sentence selection and the like are required to be considered; meanwhile, the labeling specification aiming at the medical dialogue semantics is generally high in iteration update speed, and the labeling system is required to be capable of rapidly adapting to a new labeling specification so as to continuously finish large-scale data labeling. For the above reasons, existing data labeling software is difficult to qualify for labeling needs in the medical dialogue field.
In general, the existing labeling system has poor suitability for dialogue data samples, does not perform targeted analysis on medical dialogue data, lacks clear labeling flow design, and cannot meet the requirements of complex labeling specifications. The current text data labeling systems are mainly based on traditional text editors or pertinently developed labeling software for data labeling, and the labeling systems mainly have the following problems in use:
1. marking systems based on traditional text editors can only carry out simple text marking, lack of adaptation to specific data marking flow often causes more situations of wrong marking and missing marking, and the requirements on complex marking specifications are difficult to meet;
2. the labeling system based on text labeling software lacks of design for dialogue data samples, is not suitable for the characteristics of dialogue roles, multi-round dialogue selection and the like, and cannot clearly show multi-round multi-role dialogue data to a labeling person; the existing software has high development cost and poor compatibility and portability of the system;
3. the existing medical dialogue data labeling system is often developed along with the labeling specification of the traditional task type dialogue, the form is single, the labeling mode of the data sample lacks layering and fine granularity labeling, the labeling of conditional semantics and negative semantics is omitted, and the problems of multi-layer labels and complex labels are difficult to solve.
Disclosure of Invention
The invention aims at the problems and provides a Web-based medical dialogue multi-granularity semantic annotation system and a Web-based medical dialogue multi-granularity semantic annotation method. The data labeling system is mainly suitable for multi-round multi-role dialogue text data based on Web design development, and is provided with a layering multi-granularity data labeling module, so that a data labeling person can label medical dialogue data in an omnibearing manner. The requirement of complex semantic annotation specifications can be met while the simple and easy-to-use annotation system is ensured.
The technical scheme adopted by the invention is as follows:
a medical dialogue multi-granularity semantic annotation system based on Web comprises a file management module, a dialogue display module, a sentence module and an annotation module;
the file management module is used for managing files;
the dialogue display module is used for displaying dialogue sentences and role information of corresponding speakers;
the sentence module is used for displaying the sentences to be annotated and the sources thereof which are currently selected by the annotators;
the labeling module is used for setting labeling functions according to designed labeling specifications, and labeling by adopting a multi-level labeling frame containing intention-stage-key-slot values.
Further, the file management module comprises a current file name display area, and a dialogue file opening button, a marked file opening button and a marked file downloading button; the open dialogue file button calls a local file management system of the annotator, so that the annotator can conveniently select files to be annotated for annotation, and the opened files are mainly displayed on a dialogue display module; the button for opening the marked file is convenient for a marker to pause and store the periodic marking result, and is convenient for the marker to review marked data and to modify and adjust; after the annotators complete the annotation of the data samples, the button for downloading the annotated file saves the annotated semantic information into the Json file, so that the subsequent processing is convenient.
Further, the dialogue display module performs interactive display on dialogue type data, and displays the dialogue type data according to dialogue roles and corresponding sentences in sequence; meanwhile, the dialogue display module comprises a sentence selection function, a annotator can freely select sentences for annotation, the selected sentences are prompted in a form of a dotted line frame, and meanwhile, the sentences are displayed in the sentence module, so that the annotator can conveniently annotate the current sentences; the dialogue display module further comprises an identification function of whether the sentence is marked, if the mark person finishes marking the sentence in the mark module, the marked sentence can be displayed in the display module, the mark person is helped to better identify the mark progress, and the condition of mark missing is prevented.
Further, in the sentences displayed by the sentence module, the user can freely select the range of the labeling text, determine different labeling granularity, click an adding labeling panel, namely enter the labeling module, and perform semantic labeling of the current labeling text; for the same dialogue sentence, the annotator can mark and select different text ranges for multiple times; the sentence module is provided with a button for selecting the last button and the next button, so that a annotator can conveniently switch sentences to be annotated.
Further, the labeling module comprises two function options of State and Precondition, which are respectively used for labeling negative semantics and conditional semantics; the labeling module has expandability, can conveniently set different labeling standard systems according to different dialogue roles, and meets the requirements of various semantic labels; the labeling module also has a labeling completion detection function, and if the current labeling has the condition of imperfect labeling semantics and nonstandard labeling, corresponding reminding is carried out.
Further, the file management module is located at the uppermost part of the system page, the dialogue display module is located at the left side of the system page, the sentence module is located at the upper right side of the system page, and the labeling module is located at the lower right side of the system page.
A medical dialogue multi-granularity semantic labeling method based on Web adopts the system to label, and comprises the following steps:
clicking an option of opening a dialogue file in the file management module to select a file to be annotated from the local;
the opened file to be marked is displayed in a dialogue display module, and a marker clicks each dialogue sentence in sequence to be used as a sentence to be marked, so that semantic marking is carried out respectively;
the annotator selects words, phrases or clauses to be annotated, clicks an adding panel on the selected annotation text prompt, and expands the corresponding annotation panel according to different roles, so that the annotator can annotate the semantics of the selected text at the annotation module;
and generating a layering annotation frame through an annotation module according to a preset annotation specification, so that the annotator can perform layering semantic annotation on the selected text according to the pre-trained knowledge.
Further, after the labeling of one text range is completed, a labeling person can select other text ranges again in the sentence module to label so as to label the multi-semantic content with different granularities of the same dialogue sentence.
Further, for the multi-round dialogue data, after the annotator finishes the annotation of one dialogue sentence, the annotator can select the next dialogue sentence to annotate, and the selection of the next dialogue sentence can be clicked in the dialogue display module or the next sentence is clicked in the sentence module to automatically select the next dialogue sentence to annotate; after the labeling of the multi-round dialogue sentences is completed, a labeling person can click and download the labeled file through the file management module to save the labeled file, the labeling system automatically detects whether the labeling of all sentences is completed, if the non-labeled dialogue sentences exist, the labeling system prompts the dialogue sentences so as to prevent the condition of label missing, and if all sentences complete the labeling, the labeled data is saved under the local file catalog opened by the file to be labeled.
The invention has the beneficial effects and advantages that: the high-efficiency and simple labeling system is designed and developed according to the data characteristics and the application requirements in the medical dialogue field, has the functions of multi-round multi-role data display, multi-granularity text free dividing and selecting, layering complex semantic data labeling and the like, and has stronger normalization and completeness for labeling. The labeling flow designed by the invention can reduce the difficulty of data labeling and improve the labeling efficiency.
Drawings
FIG. 1 is a diagram showing a medical dialog annotation system operations page.
Fig. 2 is a diagram showing the effect of a dialog display module of the medical dialog labeling system.
FIG. 3 is a hierarchical association diagram of modules of a multi-granularity semantic annotation system.
FIG. 4 is a schematic diagram of a multi-round multi-granularity semantic annotation process.
Detailed Description
The present invention will be further described in detail with reference to the following examples and drawings, so that the above objects, features and advantages of the present invention can be more clearly understood.
The medical dialogue multi-granularity semantic annotation system is constructed by using html language, css language and Vue frame, mainly comprises a file management module, a dialogue display module, a statement module and an annotation module, combines the characteristics of the medical dialogue field to design a simple, reasonable and efficient annotation process, and develops the annotation system according to the annotation process. The page presentation of the system is shown mainly in fig. 1.
1. And a file management module:
the file management module is located at the top of the system page (module a in fig. 1), and the annotator uses this module to manage files. The module includes a current filename display area, and an open dialog file button, an open annotated file button, and a download annotated file button. Wherein:
1) Opening a dialogue file: the button calls a local file management system of the annotator, so that the annotator can conveniently select files to be annotated for annotating, and the opened files are mainly displayed on a dialogue display module;
2) Opening the marked file: the number of the data dialogue rounds of the normal medical dialogue sample is longer, so that a annotator can pause the storage of the periodic annotation result conveniently, and meanwhile, the annotator can review the annotated data conveniently, and modification and adjustment are facilitated;
3) Downloading the marked file: after the annotators finish annotating the data samples, the annotated semantic information is stored in the Json file, so that the subsequent processing is convenient.
2. A dialogue display module:
the dialogue display module is positioned at the left side of the system page (module b in fig. 1) and is mainly used for displaying dialogue sentences and role information of corresponding speakers, the module is mainly used for interactively displaying dialogue type data, and the dialogue display module is used for displaying the dialogue type data according to dialogue roles (doctors and patients) and corresponding sentences in sequence. Meanwhile, the sentence selection function is added to the sentence module, a annotator can freely select sentences for annotation, the selected sentences can be prompted in a mode of a dotted line frame in the display module, meanwhile, the sentences are displayed in the sentence module, and the annotator can conveniently annotate the current sentences. The module is also added with an identification function of whether the sentence is marked, if the mark is completed by the mark person in the mark module, the marked sentence can be displayed in the display module, so that the mark person can better identify the mark progress, and the mark missing condition is prevented. Fig. 2 is a diagram showing the effect of a dialog display module of the medical dialog labeling system.
3. Statement module:
the sentence module is located at the upper right part of the system page (module c in fig. 1), and is mainly used for displaying the sentences to be annotated and the sources thereof currently selected by the annotator, the user can freely select the range of the annotation text in the sentences displayed by the current module, determine different annotation granularity, and then click the add annotation panel to enter the annotation module for semantic annotation of the current selected text. For the same dialogue sentence, the annotator can mark and select different text ranges for multiple times. Meanwhile, the module also selects the last button and the next button, so that a labeling person can conveniently switch sentences to be labeled.
4. And the marking module is used for:
the marking module is positioned at the right lower part of the system page (d module in fig. 1), the marking module mainly sets marking functions according to the designed marking specification, the current system mainly sets functions according to the hierarchical multi-granularity semantic marking specification designed by combining the characteristics of the medical dialogue field, mainly comprises a multi-level marking frame of 'intention-stage-key-slot value', and simultaneously, two functional options of State and Precondition are added for marking negative semantics and conditional semantics respectively. The labeling module has strong expandability, can conveniently set different labeling standard systems according to different dialogue roles, conveniently and quickly adjusts labeling standards, and meets the requirements of various semantic labels. Meanwhile, the module also has a label completion degree detection function, and if the current label has the condition of imperfect and nonstandard labeling semantics, the system can also carry out corresponding reminding.
Wherein, the multi-level labeling framework of the intention-stage-key-slot value specifically comprises:
1) The intention is: labeling the speaking intention of the text segment to be labeled, namely labeling the syntactic information of the text segment, wherein the candidate speaking intention comprises the following five types:
and (3) informing: the information is generally statement sentences, including that the patient actively informs the doctor of own symptoms or the doctor actively informs the patient of diagnosis results or advice;
inquiring: for asking the other party, typically questions, including the patient asking the doctor's own condition or the doctor asking the patient's history;
reply: for answering questions of the other party;
chat: a dialogue for identifying and not relating to the medical consultation and inquiry process;
other: for identifying other situations than the above four types.
2) Stage: labeling semantic content from a dialogue layer, judging a certain stage in a dialogue flow where a selected text fragment is located, and defining four stages in total: diagnosing disease stages, including patient notification of symptoms and diagnosis by a physician; a query history stage including a doctor querying past medical history; the treatment stage comprises the steps that a doctor gives a treatment scheme and life advice according to the description of a patient, or interaction of the doctor and the life advice aiming at a certain treatment scheme; other phases, including other phases not belonging to the three phases described above.
3) A key: judging the category of the content contained in the selected text segment, wherein the disease diagnosis stage corresponds to six categories: basic information, symptoms, science popularization, diseases, emotion interactions, and others; the inquiry history phase corresponds to ten categories: disease, emotional interactions, medications, examinations, surgery, allergies history, off-line interrogation, procedures, science popularization, others; the treatment phase corresponds to ten categories: emotional interactions, medications, examinations, surgery, off-line consultations, operations, life advice, replies, science popularization, and others.
4) Slot-slot value: refers to labeling of specific content. And setting corresponding slot positions for each category respectively, and filling corresponding contents, namely slot values, into different slot positions to finish the labeling of specific contents.
Wherein, the State and Precondition two function options specifically refer to:
1) Status: the method is used for judging whether the specific content of the filled slot position-slot value is affirmative, and comprises three options, namely Yes, no, uncertain, wherein the selection time is divided into two cases:
(a) In the determination of the relevant fact-class information, if the fact is determined to exist, the fact is marked as Yes; marking as No if it is determined that the fact does not exist; if the existence is not determined, marking as Uncertain;
(b) In the determination of the information about the knowledge class, the information is marked as Yes if the knowledge is determined to be correct, as No if the knowledge is determined to be incorrect, and as Uncertain if the knowledge is not determined to be correct.
2) The method comprises the following steps: the method is to mark the condition that the precondition exists in the words, and supplement the precondition after the groove position-groove value is marked. For example, "patients allergic to X-agent are not available, and administration of X-agent may cause allergic reaction Y. The cause of the "Y" symptom in this sentence is "take X medicine", and the precondition for the content of this information is "allergy to X medicine".
The hierarchical logic of the main function design of the medical dialogue multi-granularity semantic annotation system is shown in fig. 3, wherein the corresponding main functions of the four modules are shown.
The system designs a reasonable and efficient labeling method based on the functions of four modules, and the flow of the method is shown in figure 4. The following one-time labeling process is introduced completely from the angle of a labeling person, and after the labeling person opens the medical dialogue data labeling system, the following steps are adopted for labeling:
and step 1, opening a data file to be marked. Firstly, clicking an option of opening a dialogue file in a file management module, selecting a file to be marked from the local, and carrying out data arrangement on the marked file according to the system requirement. In the following, a specific Json data sample file is taken as an example, where format requirements are mainly directed to data keys (e.g. "dialog", "turn", "role", "sentence", etc.) and hierarchical data logic. The format requirements can then be quickly changed as needed.
And 2, selecting sentences to be annotated. The opened file to be marked is displayed in a dialogue display module of the system, a marker clicks each dialogue sentence in sequence to carry out semantic marking, and each sentence clicked by the marker reminds the marker in the dialogue display module in a form of a dotted line box.
And 3, defining a labeling range. The dialogue sentence selected by the annotator in the last step is subjected to text display in a sentence module, and the interface displays the information source, namely the patient or doctor and the original sentence. Because a dialog sentence generally contains a plurality of semantic content segments, the system design allows a annotator to freely divide and select an annotation range to carry out annotation selection with different granularities, thereby realizing multi-granularity text semantic annotation. The annotator selects the vocabulary, phrase or clause to be annotated, and the annotating system automatically calculates the position of the selected part in the sentence. The superscript position refers to the starting position and ending position of the selected portion in the original sentence. After the selected text is marked, the system prompts clicking the adding panel, and the system expands the corresponding marking panel according to different roles, so that a marker can mark the semantics of the selected text on the marking module.
The labeling panel can be preset according to different roles, for example, for a role of doctor, the labeling panel comprises key information, severity, follow-up processing and other contents; for the role "patient", the annotation panel includes the contents of the subject, purpose, current state, etc.
And 4, layering semantic annotation. After clicking the adding panel according to the marking range marked by the marker, the system generates a layering marking frame according to a preset marking specification in the marking module, and the marker performs layering semantic marking on the selected text according to the pre-trained knowledge.
And 5, selecting different text ranges. In medical dialogue data, a dialogue sentence usually contains a plurality of different semantic segments, so that different text ranges need to be marked for the semantic segments, and semantic marking with different granularities is carried out, so that after marking of one text range is completed, a marker can mark other text ranges by marking again in a sentence module. Namely, the steps 3 and 4 are circularly operated to realize the marking of the multi-semantic content with different granularities of the same dialogue sentence.
And 6, selecting multiple rounds of conversations. For the multi-round dialogue data, after the annotator finishes the annotation of one dialogue sentence, the annotator can select the next dialogue sentence to annotate, and for the selection of the next dialogue sentence, the dialogue display module can be clicked, and the next dialogue sentence can be clicked by the sentence module to automatically select the next dialogue sentence to annotate. I.e. repeating the operation steps 2,3,4,5 to realize the labeling of the multi-round dialog.
And 7, storing the completed annotation file. After the labeling of the multi-round dialogue sentences is completed, a labeling person can click and download the labeled file through the file management module to save the labeled file, the labeling system can automatically detect whether the labeling of all sentences is completed, if the non-labeled dialogue sentences exist, the system can prompt so as to prevent the condition of label missing, and if all sentences are completed, the system saves the labeled data under the local file list opened by the file to be labeled.
The following is presented in a data format stored after labeling of a specific Json data sample file:
wherein actions represent content annotated by a user, text represents words, phrases or clauses selected by the user, range represents the starting and ending positions of selected parts in the original sentence, content represents dialogue actions such as inquiry, boring, etc., slot represents (slot positions such as name, cause, usage, etc., slot_value represents a value corresponding to a slot).
In summary, the invention provides a Web-based medical dialogue multi-granularity semantic annotation method and a Web-based medical dialogue multi-granularity semantic annotation system. The Web-based concrete means that corresponding webpages are built by using html language, css language and Vue frame, the front end is responsible for dialogue information display, and the rear end is responsible for labeling information processing.
The key points of the invention include:
1. the dialogue display module designed by the system can clearly display data samples of multiple rounds of multi-role dialogue, and has the functions of marking progress prompt marks and marking completion automatic detection, so that the condition of label leakage is avoided.
2. The sentence module developed by the system can enable a annotator to freely divide different text fragments to achieve multi-granularity semantic annotation, so that semantic information of texts can be completely marked, and the quality of annotation is improved.
3. The annotation module developed by the system can adapt to the requirements of layering complex semantic annotation specifications, can load different annotation templates for different dialogue roles, and can adapt to the adjustment of the annotation specifications quickly and conveniently.
4. The method has the advantages that the design is reasonable and efficient, the task simplification is realized through the 4 large modules for complex semantic annotation tasks, the annotators can conveniently and rapidly carry out semantic annotation, and the annotation efficiency of medical dialogue data is improved. While being highly portable for data text formats.
Another embodiment of the invention provides a computer device (computer, server, smart phone, etc.) comprising a memory storing a computer program configured to be executed by the processor and a processor, the computer program comprising instructions for implementing the respective functions of the respective modules in the system of the invention.
Another embodiment of the present invention provides a computer readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program which, when executed by a computer, performs the corresponding functions of the various modules in the system of the present invention.
The above-disclosed embodiments of the present invention are intended to aid in understanding the contents of the present invention and to enable the same to be carried into practice, and it will be understood by those of ordinary skill in the art that various alternatives, variations and modifications are possible without departing from the spirit and scope of the invention. The invention should not be limited to what has been disclosed in the examples of the specification, but rather by the scope of the invention as defined in the claims.

Claims (10)

1. The medical dialogue multi-granularity semantic annotation system based on Web is characterized by comprising a file management module, a dialogue display module, a statement module and an annotation module;
the file management module is used for managing files;
the dialogue display module is used for displaying dialogue sentences and role information of corresponding speakers;
the sentence module is used for displaying the sentences to be annotated and the sources thereof which are currently selected by the annotators;
the labeling module is used for setting labeling functions according to designed labeling specifications, and labeling by adopting a multi-level labeling frame containing intention-stage-key-slot values.
2. The system of claim 1, wherein the file management module includes a current filename display area, and an open dialog file button, an open annotated file button, and a download annotated file button; the open dialogue file button calls a local file management system of the annotator, so that the annotator can conveniently select files to be annotated for annotation, and the opened files are mainly displayed on a dialogue display module; the button for opening the marked file is convenient for a marker to pause and store the periodic marking result, and is convenient for the marker to review marked data and to modify and adjust; after the annotators complete the annotation of the data samples, the button for downloading the annotated file saves the annotated semantic information into the Json file, so that the subsequent processing is convenient.
3. The system according to claim 1, wherein the dialogue presentation module performs interactive presentation for dialogue type data, and performs presentation sequentially according to dialogue roles and corresponding sentences; meanwhile, the dialogue display module comprises a sentence selection function, a annotator can freely select sentences for annotation, the selected sentences are prompted in a form of a dotted line frame, and meanwhile, the sentences are displayed in the sentence module, so that the annotator can conveniently annotate the current sentences; the dialogue display module further comprises an identification function of whether the sentence is marked, if the mark person finishes marking the sentence in the mark module, the marked sentence can be displayed in the display module, the mark person is helped to better identify the mark progress, and the condition of mark missing is prevented.
4. The system of claim 1, wherein in the sentence displayed by the sentence module, a user can freely select the range of the labeling text, determine different labeling granularity, and click an add labeling panel, namely enter the labeling module to perform semantic labeling of the current labeling text; for the same dialogue sentence, the annotator can mark and select different text ranges for multiple times; the sentence module is provided with a button for selecting the last button and the next button, so that a annotator can conveniently switch sentences to be annotated.
5. The system of claim 1, wherein the labeling module comprises two functional options of State and Precondition for labeling negative semantics and conditional semantics, respectively; the labeling module has expandability, can conveniently set different labeling standard systems according to different dialogue roles, and meets the requirements of various semantic labels; the labeling module also has a labeling completion detection function, and if the current labeling has the condition of imperfect labeling semantics and nonstandard labeling, corresponding reminding is carried out.
6. The system of claim 1, wherein the document management module is located at an uppermost portion of the system page, the dialog presentation module is located at a left side of the system page, the statement module is located at an upper right side of the system page, and the annotation module is located at a lower right side of the system page.
7. A Web-based medical dialogue multi-granularity semantic labeling method, characterized in that the system of any one of claims 1 to 6 is adopted for labeling, and the method comprises the following steps:
clicking an option of opening a dialogue file in the file management module to select a file to be annotated from the local;
the opened file to be marked is displayed in a dialogue display module, and a marker clicks each dialogue sentence in sequence to be used as a sentence to be marked, so that semantic marking is carried out respectively;
the annotator selects words, phrases or clauses to be annotated, clicks an adding panel on the selected annotation text prompt, and expands the corresponding annotation panel according to different roles, so that the annotator can annotate the semantics of the selected text at the annotation module;
and generating a layering annotation frame through an annotation module according to a preset annotation specification, so that the annotator can perform layering semantic annotation on the selected text according to the pre-trained knowledge.
8. The method of claim 7, wherein after the labeling of a text range is completed, the labeling person can re-select other text ranges in the sentence module to label, so as to realize the labeling of multi-semantic content with different granularities of the same sentence.
9. The method of claim 7, wherein for the multi-turn dialogue data, after the annotator completes the annotation of one dialogue sentence, the annotator can select the next dialogue sentence for annotation, and the selection of the next dialogue sentence can be clicked in the dialogue display module, or clicking the next dialogue sentence in the sentence module can automatically select the next dialogue sentence for annotation; after the labeling of the multi-round dialogue sentences is completed, a labeling person can click and download the labeled file through the file management module to save the labeled file, the labeling system automatically detects whether the labeling of all sentences is completed, if the non-labeled dialogue sentences exist, the labeling system prompts the dialogue sentences so as to prevent the condition of label missing, and if all sentences complete the labeling, the labeled data is saved under the local file catalog opened by the file to be labeled.
10. A computer device comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions to implement the functions of the document management module, the dialog presentation module, the sentence module, and the annotation module in a system as claimed in any one of claims 1 to 6.
CN202310462367.4A 2023-04-26 2023-04-26 Medical dialogue multi-granularity semantic annotation system and method based on Web Pending CN116738998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310462367.4A CN116738998A (en) 2023-04-26 2023-04-26 Medical dialogue multi-granularity semantic annotation system and method based on Web

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310462367.4A CN116738998A (en) 2023-04-26 2023-04-26 Medical dialogue multi-granularity semantic annotation system and method based on Web

Publications (1)

Publication Number Publication Date
CN116738998A true CN116738998A (en) 2023-09-12

Family

ID=87908743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310462367.4A Pending CN116738998A (en) 2023-04-26 2023-04-26 Medical dialogue multi-granularity semantic annotation system and method based on Web

Country Status (1)

Country Link
CN (1) CN116738998A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116992861A (en) * 2023-09-25 2023-11-03 四川健康久远科技有限公司 Intelligent medical service processing method and system based on data processing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116992861A (en) * 2023-09-25 2023-11-03 四川健康久远科技有限公司 Intelligent medical service processing method and system based on data processing
CN116992861B (en) * 2023-09-25 2023-12-08 四川健康久远科技有限公司 Intelligent medical service processing method and system based on data processing

Similar Documents

Publication Publication Date Title
US12080429B2 (en) Methods and apparatus for providing guidance to medical professionals
US20210398630A1 (en) Systems and methods for identifying errors and/or critical results in medical reports
Javaid et al. ChatGPT for healthcare services: An emerging stage for an innovative perspective
CN111708874B (en) Man-machine interaction question-answering method and system based on intelligent complex intention recognition
US9916420B2 (en) Physician and clinical documentation specialist workflow integration
US9679107B2 (en) Physician and clinical documentation specialist workflow integration
Denny et al. Evaluation of a method to identify and categorize section headers in clinical documents
US20140365239A1 (en) Methods and apparatus for facilitating guideline compliance
Chen et al. A benchmark for automatic medical consultation system: frameworks, tasks and datasets
CN106326640A (en) Medical speech control system and control method thereof
WO2015187481A1 (en) Medical coding system with cdi clarification request notification
WO2006014847A2 (en) Ontology based medical system for data capture and knowledge representation
WO2014197669A1 (en) Methods and apparatus for providing guidance to medical professionals
CN116738998A (en) Medical dialogue multi-granularity semantic annotation system and method based on Web
Zafari et al. Chatsum: an intelligent medical chat summarization tool
WO2021026533A1 (en) Method of labeling and automating information associations for clinical applications
Heilmann Profiling effects of syntactic complexity in translation: a multi-method approach
Shiffman et al. Building a speech interface to a medical diagnostic system
CN113314236A (en) Intelligent question-answering system for hypertension
EP3011489B1 (en) Physician and clinical documentation specialist workflow integration
Charlet et al. Building a Medical Ontology to support Information Retrieval: Terminological and metamodelization issues
Kevin et al. BUILDING A CHATBOT FOR HEALTHCARE USING NLP
Ceusters et al. Language engineering and information mapping in pharmaceutical medicine: dealing successfully with information overload
US20240347156A1 (en) System and method for radiology reporting
Fitria Using Google Bard as an Al-Powered Chatbot Tool for Writing English Essays

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination