CN116738998A - Medical dialogue multi-granularity semantic annotation system and method based on Web - Google Patents
Medical dialogue multi-granularity semantic annotation system and method based on Web Download PDFInfo
- Publication number
- CN116738998A CN116738998A CN202310462367.4A CN202310462367A CN116738998A CN 116738998 A CN116738998 A CN 116738998A CN 202310462367 A CN202310462367 A CN 202310462367A CN 116738998 A CN116738998 A CN 116738998A
- Authority
- CN
- China
- Prior art keywords
- labeling
- dialogue
- module
- sentence
- annotation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000002372 labelling Methods 0.000 claims abstract description 148
- 230000006870 function Effects 0.000 claims abstract description 25
- 239000003550 marker Substances 0.000 claims description 13
- 235000019580 granularity Nutrition 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 230000000737 periodic effect Effects 0.000 claims description 3
- 238000012552 review Methods 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 2
- 238000013461 design Methods 0.000 abstract description 10
- 238000011161 development Methods 0.000 description 5
- 238000003745 diagnosis Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 239000003814 drug Substances 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 201000010099 disease Diseases 0.000 description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 208000024891 symptom Diseases 0.000 description 4
- 206010020751 Hypersensitivity Diseases 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000007815 allergy Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002996 emotional effect Effects 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000003862 health status Effects 0.000 description 2
- 238000002483 medication Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000001356 surgical procedure Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 208000026935 allergic disease Diseases 0.000 description 1
- 230000000172 allergic effect Effects 0.000 description 1
- 208000030961 allergic reaction Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 208000010668 atopic eczema Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013550 semantic technology Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H80/00—ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention relates to a Web-based medical dialogue multi-granularity semantic annotation system and a Web-based medical dialogue multi-granularity semantic annotation method. The system comprises a file management module, a dialogue display module, a statement module and a labeling module; the file management module is used for managing files; the dialogue display module is used for displaying dialogue sentences and role information of corresponding speakers; the sentence module is used for displaying the sentences to be annotated and the sources thereof which are currently selected by the annotators; the labeling module is used for setting labeling functions according to the designed labeling standards and labeling by adopting a multi-level labeling framework. The invention designs and develops a high-efficiency and simple labeling system aiming at the data characteristics and the application requirements in the medical dialogue field, has the functions of multi-round multi-role data display, multi-granularity text free dividing and selecting, layering complex semantic data labeling and the like, and has stronger standardization and completeness for labeling. The labeling flow designed by the invention can reduce the difficulty of data labeling and improve the labeling efficiency.
Description
Technical Field
The invention belongs to the technical field of information, relates to a text labeling technology and an off-line manual dialogue labeling technology, and particularly relates to a medical dialogue multi-granularity semantic labeling system and method based on Web.
Background
With the rapid development of artificial intelligence in recent years, the dialogue and question-answering capability of natural language processing is mature, and a dialogue AI is gradually a focus in the field of intelligent voice semantics, and the dialogue AI is used for understanding human intention through voice semantic technology so as to execute tasks or answer. The conversational AI combines with each industry to derive a plurality of new business demands, and the online medical consultation platform is one of them. As the online medical consultation platform is vigorously developed, the application has great potential in improving the medical service quality while reducing the cost.
Typically, during medical online consultation, the patient first provides a brief summary of his or her health status, i.e., self-reports, and then designates a doctor to communicate with the patient to learn more about the patient's health status. After sufficient interrogation, the physician may make a diagnosis and provide further medical comments.
Recently, researchers have focused on developing automated methods to facilitate online medical consultation services. Research topics include medical named entity identification, drug recommendation, text-based automatic diagnosis, health questions and answers, medical report generation, and the like. Despite the progress made in supporting automated medical consultation from different perspectives, there is still a great gap between existing work and practical application, mainly the lack of a large number of fine-grained semantically annotated medical dialogue corpora for supporting the needs of multiple automated medical consultation services. Compared with other natural language processing tasks, the medical field has more specialized knowledge and serious speaking of the dialogue, so that the labeling standard which is followed when the semantic labeling is carried out is more complex, for example, dialogue state tracking in the normal task dialogue only needs to label 'slot-value', while the medical field usually relates to multi-layered semantic labeling such as 'stage/service-slot-value', and the special requirement of the medical field relates to the labeling of conditional semantics and negative semantics. Therefore, there is a need to develop a data labeling system that meets the semantic labeling requirements of complex medical dialogs. The system design also has the characteristics of simplicity and easiness in use, so that the learning cost of labeling personnel can be reduced, the personnel can use the system more quickly, the operation error of the labeling personnel is reduced, and the quality of data and the speed of data labeling are improved.
Conventional semantic annotation systems often employ text editors such as Excel, notpad++. Firstly, marking personnel open texts to be marked by using the editors, then locate words to be marked in the original texts according to corresponding marking specifications, input corresponding labels, finish marking each piece of data in sequence, and finally store marked data files. The text editor is used for marking the position and the text label which require the annotator to manually input the text for positioning, so that the time is very long, the fatigue of the annotator is easily caused, and the annotation error is easily generated. Meanwhile, the labeling method is generally only suitable for simple labeling specifications, and a corresponding labeling flow cannot be clearly designed for a complex labeling system, so that labeling results are disordered and cannot be used.
With the development of deep learning in the field of artificial intelligence, a great deal of labeling data support is often needed behind the deep learning for training a neural network machine learning model. Therefore, in recent years, software for data annotation development is also available, and compared with a traditional text editor, the data annotation software is provided with an annotation module specially aiming at text data, so that the annotation efficiency and accuracy are improved. However, due to the high cost of developing the corresponding software and the need for modification of the labeling specification, a lot of manpower is often required, and the compatibility and portability of the software are poor. Most of marking software at present is usually only aimed at the field of single sentence texts, and few marking software aimed at dialogue data formats are provided, and dialogue type data samples are compared with common single sentence texts, so that requirements of dialogue roles, specific sentence selection and the like are required to be considered; meanwhile, the labeling specification aiming at the medical dialogue semantics is generally high in iteration update speed, and the labeling system is required to be capable of rapidly adapting to a new labeling specification so as to continuously finish large-scale data labeling. For the above reasons, existing data labeling software is difficult to qualify for labeling needs in the medical dialogue field.
In general, the existing labeling system has poor suitability for dialogue data samples, does not perform targeted analysis on medical dialogue data, lacks clear labeling flow design, and cannot meet the requirements of complex labeling specifications. The current text data labeling systems are mainly based on traditional text editors or pertinently developed labeling software for data labeling, and the labeling systems mainly have the following problems in use:
1. marking systems based on traditional text editors can only carry out simple text marking, lack of adaptation to specific data marking flow often causes more situations of wrong marking and missing marking, and the requirements on complex marking specifications are difficult to meet;
2. the labeling system based on text labeling software lacks of design for dialogue data samples, is not suitable for the characteristics of dialogue roles, multi-round dialogue selection and the like, and cannot clearly show multi-round multi-role dialogue data to a labeling person; the existing software has high development cost and poor compatibility and portability of the system;
3. the existing medical dialogue data labeling system is often developed along with the labeling specification of the traditional task type dialogue, the form is single, the labeling mode of the data sample lacks layering and fine granularity labeling, the labeling of conditional semantics and negative semantics is omitted, and the problems of multi-layer labels and complex labels are difficult to solve.
Disclosure of Invention
The invention aims at the problems and provides a Web-based medical dialogue multi-granularity semantic annotation system and a Web-based medical dialogue multi-granularity semantic annotation method. The data labeling system is mainly suitable for multi-round multi-role dialogue text data based on Web design development, and is provided with a layering multi-granularity data labeling module, so that a data labeling person can label medical dialogue data in an omnibearing manner. The requirement of complex semantic annotation specifications can be met while the simple and easy-to-use annotation system is ensured.
The technical scheme adopted by the invention is as follows:
a medical dialogue multi-granularity semantic annotation system based on Web comprises a file management module, a dialogue display module, a sentence module and an annotation module;
the file management module is used for managing files;
the dialogue display module is used for displaying dialogue sentences and role information of corresponding speakers;
the sentence module is used for displaying the sentences to be annotated and the sources thereof which are currently selected by the annotators;
the labeling module is used for setting labeling functions according to designed labeling specifications, and labeling by adopting a multi-level labeling frame containing intention-stage-key-slot values.
Further, the file management module comprises a current file name display area, and a dialogue file opening button, a marked file opening button and a marked file downloading button; the open dialogue file button calls a local file management system of the annotator, so that the annotator can conveniently select files to be annotated for annotation, and the opened files are mainly displayed on a dialogue display module; the button for opening the marked file is convenient for a marker to pause and store the periodic marking result, and is convenient for the marker to review marked data and to modify and adjust; after the annotators complete the annotation of the data samples, the button for downloading the annotated file saves the annotated semantic information into the Json file, so that the subsequent processing is convenient.
Further, the dialogue display module performs interactive display on dialogue type data, and displays the dialogue type data according to dialogue roles and corresponding sentences in sequence; meanwhile, the dialogue display module comprises a sentence selection function, a annotator can freely select sentences for annotation, the selected sentences are prompted in a form of a dotted line frame, and meanwhile, the sentences are displayed in the sentence module, so that the annotator can conveniently annotate the current sentences; the dialogue display module further comprises an identification function of whether the sentence is marked, if the mark person finishes marking the sentence in the mark module, the marked sentence can be displayed in the display module, the mark person is helped to better identify the mark progress, and the condition of mark missing is prevented.
Further, in the sentences displayed by the sentence module, the user can freely select the range of the labeling text, determine different labeling granularity, click an adding labeling panel, namely enter the labeling module, and perform semantic labeling of the current labeling text; for the same dialogue sentence, the annotator can mark and select different text ranges for multiple times; the sentence module is provided with a button for selecting the last button and the next button, so that a annotator can conveniently switch sentences to be annotated.
Further, the labeling module comprises two function options of State and Precondition, which are respectively used for labeling negative semantics and conditional semantics; the labeling module has expandability, can conveniently set different labeling standard systems according to different dialogue roles, and meets the requirements of various semantic labels; the labeling module also has a labeling completion detection function, and if the current labeling has the condition of imperfect labeling semantics and nonstandard labeling, corresponding reminding is carried out.
Further, the file management module is located at the uppermost part of the system page, the dialogue display module is located at the left side of the system page, the sentence module is located at the upper right side of the system page, and the labeling module is located at the lower right side of the system page.
A medical dialogue multi-granularity semantic labeling method based on Web adopts the system to label, and comprises the following steps:
clicking an option of opening a dialogue file in the file management module to select a file to be annotated from the local;
the opened file to be marked is displayed in a dialogue display module, and a marker clicks each dialogue sentence in sequence to be used as a sentence to be marked, so that semantic marking is carried out respectively;
the annotator selects words, phrases or clauses to be annotated, clicks an adding panel on the selected annotation text prompt, and expands the corresponding annotation panel according to different roles, so that the annotator can annotate the semantics of the selected text at the annotation module;
and generating a layering annotation frame through an annotation module according to a preset annotation specification, so that the annotator can perform layering semantic annotation on the selected text according to the pre-trained knowledge.
Further, after the labeling of one text range is completed, a labeling person can select other text ranges again in the sentence module to label so as to label the multi-semantic content with different granularities of the same dialogue sentence.
Further, for the multi-round dialogue data, after the annotator finishes the annotation of one dialogue sentence, the annotator can select the next dialogue sentence to annotate, and the selection of the next dialogue sentence can be clicked in the dialogue display module or the next sentence is clicked in the sentence module to automatically select the next dialogue sentence to annotate; after the labeling of the multi-round dialogue sentences is completed, a labeling person can click and download the labeled file through the file management module to save the labeled file, the labeling system automatically detects whether the labeling of all sentences is completed, if the non-labeled dialogue sentences exist, the labeling system prompts the dialogue sentences so as to prevent the condition of label missing, and if all sentences complete the labeling, the labeled data is saved under the local file catalog opened by the file to be labeled.
The invention has the beneficial effects and advantages that: the high-efficiency and simple labeling system is designed and developed according to the data characteristics and the application requirements in the medical dialogue field, has the functions of multi-round multi-role data display, multi-granularity text free dividing and selecting, layering complex semantic data labeling and the like, and has stronger normalization and completeness for labeling. The labeling flow designed by the invention can reduce the difficulty of data labeling and improve the labeling efficiency.
Drawings
FIG. 1 is a diagram showing a medical dialog annotation system operations page.
Fig. 2 is a diagram showing the effect of a dialog display module of the medical dialog labeling system.
FIG. 3 is a hierarchical association diagram of modules of a multi-granularity semantic annotation system.
FIG. 4 is a schematic diagram of a multi-round multi-granularity semantic annotation process.
Detailed Description
The present invention will be further described in detail with reference to the following examples and drawings, so that the above objects, features and advantages of the present invention can be more clearly understood.
The medical dialogue multi-granularity semantic annotation system is constructed by using html language, css language and Vue frame, mainly comprises a file management module, a dialogue display module, a statement module and an annotation module, combines the characteristics of the medical dialogue field to design a simple, reasonable and efficient annotation process, and develops the annotation system according to the annotation process. The page presentation of the system is shown mainly in fig. 1.
1. And a file management module:
the file management module is located at the top of the system page (module a in fig. 1), and the annotator uses this module to manage files. The module includes a current filename display area, and an open dialog file button, an open annotated file button, and a download annotated file button. Wherein:
1) Opening a dialogue file: the button calls a local file management system of the annotator, so that the annotator can conveniently select files to be annotated for annotating, and the opened files are mainly displayed on a dialogue display module;
2) Opening the marked file: the number of the data dialogue rounds of the normal medical dialogue sample is longer, so that a annotator can pause the storage of the periodic annotation result conveniently, and meanwhile, the annotator can review the annotated data conveniently, and modification and adjustment are facilitated;
3) Downloading the marked file: after the annotators finish annotating the data samples, the annotated semantic information is stored in the Json file, so that the subsequent processing is convenient.
2. A dialogue display module:
the dialogue display module is positioned at the left side of the system page (module b in fig. 1) and is mainly used for displaying dialogue sentences and role information of corresponding speakers, the module is mainly used for interactively displaying dialogue type data, and the dialogue display module is used for displaying the dialogue type data according to dialogue roles (doctors and patients) and corresponding sentences in sequence. Meanwhile, the sentence selection function is added to the sentence module, a annotator can freely select sentences for annotation, the selected sentences can be prompted in a mode of a dotted line frame in the display module, meanwhile, the sentences are displayed in the sentence module, and the annotator can conveniently annotate the current sentences. The module is also added with an identification function of whether the sentence is marked, if the mark is completed by the mark person in the mark module, the marked sentence can be displayed in the display module, so that the mark person can better identify the mark progress, and the mark missing condition is prevented. Fig. 2 is a diagram showing the effect of a dialog display module of the medical dialog labeling system.
3. Statement module:
the sentence module is located at the upper right part of the system page (module c in fig. 1), and is mainly used for displaying the sentences to be annotated and the sources thereof currently selected by the annotator, the user can freely select the range of the annotation text in the sentences displayed by the current module, determine different annotation granularity, and then click the add annotation panel to enter the annotation module for semantic annotation of the current selected text. For the same dialogue sentence, the annotator can mark and select different text ranges for multiple times. Meanwhile, the module also selects the last button and the next button, so that a labeling person can conveniently switch sentences to be labeled.
4. And the marking module is used for:
the marking module is positioned at the right lower part of the system page (d module in fig. 1), the marking module mainly sets marking functions according to the designed marking specification, the current system mainly sets functions according to the hierarchical multi-granularity semantic marking specification designed by combining the characteristics of the medical dialogue field, mainly comprises a multi-level marking frame of 'intention-stage-key-slot value', and simultaneously, two functional options of State and Precondition are added for marking negative semantics and conditional semantics respectively. The labeling module has strong expandability, can conveniently set different labeling standard systems according to different dialogue roles, conveniently and quickly adjusts labeling standards, and meets the requirements of various semantic labels. Meanwhile, the module also has a label completion degree detection function, and if the current label has the condition of imperfect and nonstandard labeling semantics, the system can also carry out corresponding reminding.
Wherein, the multi-level labeling framework of the intention-stage-key-slot value specifically comprises:
1) The intention is: labeling the speaking intention of the text segment to be labeled, namely labeling the syntactic information of the text segment, wherein the candidate speaking intention comprises the following five types:
and (3) informing: the information is generally statement sentences, including that the patient actively informs the doctor of own symptoms or the doctor actively informs the patient of diagnosis results or advice;
inquiring: for asking the other party, typically questions, including the patient asking the doctor's own condition or the doctor asking the patient's history;
reply: for answering questions of the other party;
chat: a dialogue for identifying and not relating to the medical consultation and inquiry process;
other: for identifying other situations than the above four types.
2) Stage: labeling semantic content from a dialogue layer, judging a certain stage in a dialogue flow where a selected text fragment is located, and defining four stages in total: diagnosing disease stages, including patient notification of symptoms and diagnosis by a physician; a query history stage including a doctor querying past medical history; the treatment stage comprises the steps that a doctor gives a treatment scheme and life advice according to the description of a patient, or interaction of the doctor and the life advice aiming at a certain treatment scheme; other phases, including other phases not belonging to the three phases described above.
3) A key: judging the category of the content contained in the selected text segment, wherein the disease diagnosis stage corresponds to six categories: basic information, symptoms, science popularization, diseases, emotion interactions, and others; the inquiry history phase corresponds to ten categories: disease, emotional interactions, medications, examinations, surgery, allergies history, off-line interrogation, procedures, science popularization, others; the treatment phase corresponds to ten categories: emotional interactions, medications, examinations, surgery, off-line consultations, operations, life advice, replies, science popularization, and others.
4) Slot-slot value: refers to labeling of specific content. And setting corresponding slot positions for each category respectively, and filling corresponding contents, namely slot values, into different slot positions to finish the labeling of specific contents.
Wherein, the State and Precondition two function options specifically refer to:
1) Status: the method is used for judging whether the specific content of the filled slot position-slot value is affirmative, and comprises three options, namely Yes, no, uncertain, wherein the selection time is divided into two cases:
(a) In the determination of the relevant fact-class information, if the fact is determined to exist, the fact is marked as Yes; marking as No if it is determined that the fact does not exist; if the existence is not determined, marking as Uncertain;
(b) In the determination of the information about the knowledge class, the information is marked as Yes if the knowledge is determined to be correct, as No if the knowledge is determined to be incorrect, and as Uncertain if the knowledge is not determined to be correct.
2) The method comprises the following steps: the method is to mark the condition that the precondition exists in the words, and supplement the precondition after the groove position-groove value is marked. For example, "patients allergic to X-agent are not available, and administration of X-agent may cause allergic reaction Y. The cause of the "Y" symptom in this sentence is "take X medicine", and the precondition for the content of this information is "allergy to X medicine".
The hierarchical logic of the main function design of the medical dialogue multi-granularity semantic annotation system is shown in fig. 3, wherein the corresponding main functions of the four modules are shown.
The system designs a reasonable and efficient labeling method based on the functions of four modules, and the flow of the method is shown in figure 4. The following one-time labeling process is introduced completely from the angle of a labeling person, and after the labeling person opens the medical dialogue data labeling system, the following steps are adopted for labeling:
and step 1, opening a data file to be marked. Firstly, clicking an option of opening a dialogue file in a file management module, selecting a file to be marked from the local, and carrying out data arrangement on the marked file according to the system requirement. In the following, a specific Json data sample file is taken as an example, where format requirements are mainly directed to data keys (e.g. "dialog", "turn", "role", "sentence", etc.) and hierarchical data logic. The format requirements can then be quickly changed as needed.
And 2, selecting sentences to be annotated. The opened file to be marked is displayed in a dialogue display module of the system, a marker clicks each dialogue sentence in sequence to carry out semantic marking, and each sentence clicked by the marker reminds the marker in the dialogue display module in a form of a dotted line box.
And 3, defining a labeling range. The dialogue sentence selected by the annotator in the last step is subjected to text display in a sentence module, and the interface displays the information source, namely the patient or doctor and the original sentence. Because a dialog sentence generally contains a plurality of semantic content segments, the system design allows a annotator to freely divide and select an annotation range to carry out annotation selection with different granularities, thereby realizing multi-granularity text semantic annotation. The annotator selects the vocabulary, phrase or clause to be annotated, and the annotating system automatically calculates the position of the selected part in the sentence. The superscript position refers to the starting position and ending position of the selected portion in the original sentence. After the selected text is marked, the system prompts clicking the adding panel, and the system expands the corresponding marking panel according to different roles, so that a marker can mark the semantics of the selected text on the marking module.
The labeling panel can be preset according to different roles, for example, for a role of doctor, the labeling panel comprises key information, severity, follow-up processing and other contents; for the role "patient", the annotation panel includes the contents of the subject, purpose, current state, etc.
And 4, layering semantic annotation. After clicking the adding panel according to the marking range marked by the marker, the system generates a layering marking frame according to a preset marking specification in the marking module, and the marker performs layering semantic marking on the selected text according to the pre-trained knowledge.
And 5, selecting different text ranges. In medical dialogue data, a dialogue sentence usually contains a plurality of different semantic segments, so that different text ranges need to be marked for the semantic segments, and semantic marking with different granularities is carried out, so that after marking of one text range is completed, a marker can mark other text ranges by marking again in a sentence module. Namely, the steps 3 and 4 are circularly operated to realize the marking of the multi-semantic content with different granularities of the same dialogue sentence.
And 6, selecting multiple rounds of conversations. For the multi-round dialogue data, after the annotator finishes the annotation of one dialogue sentence, the annotator can select the next dialogue sentence to annotate, and for the selection of the next dialogue sentence, the dialogue display module can be clicked, and the next dialogue sentence can be clicked by the sentence module to automatically select the next dialogue sentence to annotate. I.e. repeating the operation steps 2,3,4,5 to realize the labeling of the multi-round dialog.
And 7, storing the completed annotation file. After the labeling of the multi-round dialogue sentences is completed, a labeling person can click and download the labeled file through the file management module to save the labeled file, the labeling system can automatically detect whether the labeling of all sentences is completed, if the non-labeled dialogue sentences exist, the system can prompt so as to prevent the condition of label missing, and if all sentences are completed, the system saves the labeled data under the local file list opened by the file to be labeled.
The following is presented in a data format stored after labeling of a specific Json data sample file:
wherein actions represent content annotated by a user, text represents words, phrases or clauses selected by the user, range represents the starting and ending positions of selected parts in the original sentence, content represents dialogue actions such as inquiry, boring, etc., slot represents (slot positions such as name, cause, usage, etc., slot_value represents a value corresponding to a slot).
In summary, the invention provides a Web-based medical dialogue multi-granularity semantic annotation method and a Web-based medical dialogue multi-granularity semantic annotation system. The Web-based concrete means that corresponding webpages are built by using html language, css language and Vue frame, the front end is responsible for dialogue information display, and the rear end is responsible for labeling information processing.
The key points of the invention include:
1. the dialogue display module designed by the system can clearly display data samples of multiple rounds of multi-role dialogue, and has the functions of marking progress prompt marks and marking completion automatic detection, so that the condition of label leakage is avoided.
2. The sentence module developed by the system can enable a annotator to freely divide different text fragments to achieve multi-granularity semantic annotation, so that semantic information of texts can be completely marked, and the quality of annotation is improved.
3. The annotation module developed by the system can adapt to the requirements of layering complex semantic annotation specifications, can load different annotation templates for different dialogue roles, and can adapt to the adjustment of the annotation specifications quickly and conveniently.
4. The method has the advantages that the design is reasonable and efficient, the task simplification is realized through the 4 large modules for complex semantic annotation tasks, the annotators can conveniently and rapidly carry out semantic annotation, and the annotation efficiency of medical dialogue data is improved. While being highly portable for data text formats.
Another embodiment of the invention provides a computer device (computer, server, smart phone, etc.) comprising a memory storing a computer program configured to be executed by the processor and a processor, the computer program comprising instructions for implementing the respective functions of the respective modules in the system of the invention.
Another embodiment of the present invention provides a computer readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program which, when executed by a computer, performs the corresponding functions of the various modules in the system of the present invention.
The above-disclosed embodiments of the present invention are intended to aid in understanding the contents of the present invention and to enable the same to be carried into practice, and it will be understood by those of ordinary skill in the art that various alternatives, variations and modifications are possible without departing from the spirit and scope of the invention. The invention should not be limited to what has been disclosed in the examples of the specification, but rather by the scope of the invention as defined in the claims.
Claims (10)
1. The medical dialogue multi-granularity semantic annotation system based on Web is characterized by comprising a file management module, a dialogue display module, a statement module and an annotation module;
the file management module is used for managing files;
the dialogue display module is used for displaying dialogue sentences and role information of corresponding speakers;
the sentence module is used for displaying the sentences to be annotated and the sources thereof which are currently selected by the annotators;
the labeling module is used for setting labeling functions according to designed labeling specifications, and labeling by adopting a multi-level labeling frame containing intention-stage-key-slot values.
2. The system of claim 1, wherein the file management module includes a current filename display area, and an open dialog file button, an open annotated file button, and a download annotated file button; the open dialogue file button calls a local file management system of the annotator, so that the annotator can conveniently select files to be annotated for annotation, and the opened files are mainly displayed on a dialogue display module; the button for opening the marked file is convenient for a marker to pause and store the periodic marking result, and is convenient for the marker to review marked data and to modify and adjust; after the annotators complete the annotation of the data samples, the button for downloading the annotated file saves the annotated semantic information into the Json file, so that the subsequent processing is convenient.
3. The system according to claim 1, wherein the dialogue presentation module performs interactive presentation for dialogue type data, and performs presentation sequentially according to dialogue roles and corresponding sentences; meanwhile, the dialogue display module comprises a sentence selection function, a annotator can freely select sentences for annotation, the selected sentences are prompted in a form of a dotted line frame, and meanwhile, the sentences are displayed in the sentence module, so that the annotator can conveniently annotate the current sentences; the dialogue display module further comprises an identification function of whether the sentence is marked, if the mark person finishes marking the sentence in the mark module, the marked sentence can be displayed in the display module, the mark person is helped to better identify the mark progress, and the condition of mark missing is prevented.
4. The system of claim 1, wherein in the sentence displayed by the sentence module, a user can freely select the range of the labeling text, determine different labeling granularity, and click an add labeling panel, namely enter the labeling module to perform semantic labeling of the current labeling text; for the same dialogue sentence, the annotator can mark and select different text ranges for multiple times; the sentence module is provided with a button for selecting the last button and the next button, so that a annotator can conveniently switch sentences to be annotated.
5. The system of claim 1, wherein the labeling module comprises two functional options of State and Precondition for labeling negative semantics and conditional semantics, respectively; the labeling module has expandability, can conveniently set different labeling standard systems according to different dialogue roles, and meets the requirements of various semantic labels; the labeling module also has a labeling completion detection function, and if the current labeling has the condition of imperfect labeling semantics and nonstandard labeling, corresponding reminding is carried out.
6. The system of claim 1, wherein the document management module is located at an uppermost portion of the system page, the dialog presentation module is located at a left side of the system page, the statement module is located at an upper right side of the system page, and the annotation module is located at a lower right side of the system page.
7. A Web-based medical dialogue multi-granularity semantic labeling method, characterized in that the system of any one of claims 1 to 6 is adopted for labeling, and the method comprises the following steps:
clicking an option of opening a dialogue file in the file management module to select a file to be annotated from the local;
the opened file to be marked is displayed in a dialogue display module, and a marker clicks each dialogue sentence in sequence to be used as a sentence to be marked, so that semantic marking is carried out respectively;
the annotator selects words, phrases or clauses to be annotated, clicks an adding panel on the selected annotation text prompt, and expands the corresponding annotation panel according to different roles, so that the annotator can annotate the semantics of the selected text at the annotation module;
and generating a layering annotation frame through an annotation module according to a preset annotation specification, so that the annotator can perform layering semantic annotation on the selected text according to the pre-trained knowledge.
8. The method of claim 7, wherein after the labeling of a text range is completed, the labeling person can re-select other text ranges in the sentence module to label, so as to realize the labeling of multi-semantic content with different granularities of the same sentence.
9. The method of claim 7, wherein for the multi-turn dialogue data, after the annotator completes the annotation of one dialogue sentence, the annotator can select the next dialogue sentence for annotation, and the selection of the next dialogue sentence can be clicked in the dialogue display module, or clicking the next dialogue sentence in the sentence module can automatically select the next dialogue sentence for annotation; after the labeling of the multi-round dialogue sentences is completed, a labeling person can click and download the labeled file through the file management module to save the labeled file, the labeling system automatically detects whether the labeling of all sentences is completed, if the non-labeled dialogue sentences exist, the labeling system prompts the dialogue sentences so as to prevent the condition of label missing, and if all sentences complete the labeling, the labeled data is saved under the local file catalog opened by the file to be labeled.
10. A computer device comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions to implement the functions of the document management module, the dialog presentation module, the sentence module, and the annotation module in a system as claimed in any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310462367.4A CN116738998A (en) | 2023-04-26 | 2023-04-26 | Medical dialogue multi-granularity semantic annotation system and method based on Web |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310462367.4A CN116738998A (en) | 2023-04-26 | 2023-04-26 | Medical dialogue multi-granularity semantic annotation system and method based on Web |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116738998A true CN116738998A (en) | 2023-09-12 |
Family
ID=87908743
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310462367.4A Pending CN116738998A (en) | 2023-04-26 | 2023-04-26 | Medical dialogue multi-granularity semantic annotation system and method based on Web |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116738998A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116992861A (en) * | 2023-09-25 | 2023-11-03 | 四川健康久远科技有限公司 | Intelligent medical service processing method and system based on data processing |
-
2023
- 2023-04-26 CN CN202310462367.4A patent/CN116738998A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116992861A (en) * | 2023-09-25 | 2023-11-03 | 四川健康久远科技有限公司 | Intelligent medical service processing method and system based on data processing |
CN116992861B (en) * | 2023-09-25 | 2023-12-08 | 四川健康久远科技有限公司 | Intelligent medical service processing method and system based on data processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12080429B2 (en) | Methods and apparatus for providing guidance to medical professionals | |
US20210398630A1 (en) | Systems and methods for identifying errors and/or critical results in medical reports | |
Javaid et al. | ChatGPT for healthcare services: An emerging stage for an innovative perspective | |
CN111708874B (en) | Man-machine interaction question-answering method and system based on intelligent complex intention recognition | |
US9916420B2 (en) | Physician and clinical documentation specialist workflow integration | |
US9679107B2 (en) | Physician and clinical documentation specialist workflow integration | |
Denny et al. | Evaluation of a method to identify and categorize section headers in clinical documents | |
US20140365239A1 (en) | Methods and apparatus for facilitating guideline compliance | |
Chen et al. | A benchmark for automatic medical consultation system: frameworks, tasks and datasets | |
CN106326640A (en) | Medical speech control system and control method thereof | |
WO2015187481A1 (en) | Medical coding system with cdi clarification request notification | |
WO2006014847A2 (en) | Ontology based medical system for data capture and knowledge representation | |
WO2014197669A1 (en) | Methods and apparatus for providing guidance to medical professionals | |
CN116738998A (en) | Medical dialogue multi-granularity semantic annotation system and method based on Web | |
Zafari et al. | Chatsum: an intelligent medical chat summarization tool | |
WO2021026533A1 (en) | Method of labeling and automating information associations for clinical applications | |
Heilmann | Profiling effects of syntactic complexity in translation: a multi-method approach | |
Shiffman et al. | Building a speech interface to a medical diagnostic system | |
CN113314236A (en) | Intelligent question-answering system for hypertension | |
EP3011489B1 (en) | Physician and clinical documentation specialist workflow integration | |
Charlet et al. | Building a Medical Ontology to support Information Retrieval: Terminological and metamodelization issues | |
Kevin et al. | BUILDING A CHATBOT FOR HEALTHCARE USING NLP | |
Ceusters et al. | Language engineering and information mapping in pharmaceutical medicine: dealing successfully with information overload | |
US20240347156A1 (en) | System and method for radiology reporting | |
Fitria | Using Google Bard as an Al-Powered Chatbot Tool for Writing English Essays |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |