CN116860979A

CN116860979A - Medical text labeling method and device based on label knowledge base

Info

Publication number: CN116860979A
Application number: CN202311126960.8A
Authority: CN
Inventors: 黄主斌; 王春旭; 贺晓培
Original assignee: Shanghai Clinbrain Information Technology Co Ltd
Current assignee: Shanghai Clinbrain Information Technology Co Ltd
Priority date: 2023-09-04
Filing date: 2023-09-04
Publication date: 2023-10-10
Anticipated expiration: 2043-09-04
Also published as: CN116860979B

Abstract

The application provides a medical text labeling method and a medical text labeling device based on a label knowledge base, which are used for automatically pre-labeling an acquired medical text to be labeled through a pre-established label knowledge base to obtain a text with a first label and/or a first relationship label pre-label; the manual auditing and adjusting operation is supported to be carried out on the pre-marked text so as to generate a second label mark and/or a second relation mark; finally, obtaining an annotation text according to a first label annotation and/or a first relation annotation of the pre-annotation text and a second label annotation and/or a second relation annotation generated by the adjustment operation of the pre-annotation text; the method and the device adopt a scheme of pre-labeling and manual auditing operation based on a label knowledge base, complete the rapid high-quality labeling work of the medical texts, save the work of manually labeling each word and each sentence of the text to be labeled one by one, greatly improve the reading speed and efficiency of scientific researchers and avoid omission.

Description

Medical text labeling method and device based on label knowledge base

Technical Field

The application relates to the technical field of medical big data, in particular to a medical text labeling technology based on a label knowledge base.

Background

The growing medical text data brings great opportunity and challenge to the development of the whole industry, most of the medical text data belongs to semi-structured or unstructured data, and only the semi-structured or unstructured data is converted into structured data which can be processed by a computer, a series of scientific research applications can be carried out on the data, and the labeling of text information is the basis for the structured processing of the data. The prepared language material obtained through text labeling is a very important resource, is the basis of related researches such as named entity recognition, automatic relation extraction and the like, and particularly needs to provide enough high-quality manually pre-labeled medical data samples when a model for natural language processing of medical data text is trained.

The traditional medical data marking adopts a single manual marking mode, so that time and labor are wasted, the quality of the medical data sample marking is completely dependent on the personal quality and the careful degree of a single data marking person, and the data is easy to mark in a wrong way or miss.

Disclosure of Invention

The application aims to provide a medical text labeling method and device based on a label knowledge base, which aims to reduce the workload of manual labeling of medical data texts and improve labeling efficiency and accuracy.

To achieve the above object, according to one aspect of the present application, some embodiments of the present application provide a medical text labeling method based on a tag knowledge base, the method including: acquiring a medical text to be marked; automatically pre-labeling the medical text to be labeled based on a preset label knowledge base to obtain a pre-labeled text, wherein the pre-labeled text is provided with a first label and/or a first relation label; obtaining a marked text according to the confirmation operation of the pre-marked text; or, obtaining an adjustment operation on the pre-marked text, and generating a second label mark and/or a second relation mark; and obtaining the marked text according to the first label mark and/or the first relation mark of the pre-marked text and the second label mark and/or the second relation mark generated by the adjustment operation of the pre-marked text.

Optionally, on the basis of the foregoing embodiment, the method further includes: pre-constructing a tag knowledge base; the construction method comprises the following steps: generating a label name based on a label name value set by a user, and extracting a first-level labeling rule based on at least one labeling sample and/or label labeling setting set by the user; generating a relationship name based on a relationship name value set by a user, and extracting a first-level relationship annotation rule based on at least one relationship annotation sample and/or relationship annotation setting set by the user; and constructing the tag knowledge base based on the set tag name, the primary labeling rule, the set relationship name and the primary relationship labeling rule.

Optionally, on the basis of the foregoing embodiment, the automatically pre-labeling the medical text to be labeled based on the preset label knowledge base, and obtaining the pre-labeled text includes: importing the medical text to be annotated, and segmenting the medical text to be annotated based on a regular expression to obtain a segmented text; and matching the word segmentation text with the first-level labeling rules and/or the first-level relation labeling rules of the tag knowledge base to generate a pre-labeling text with a first tag label and/or a first relation label.

Optionally, on the basis of the above embodiment, the adjusting operation includes: performing a second label marking and/or a second relation marking new operation on the pre-marked text; and/or deleting or replacing the first label and/or the first relation label to generate a second label and/or a second relation label; and/or, adjusting the range of the first label to generate the second label; and/or adjusting the starting point or the ending point of the first relation annotation to generate the second relation annotation.

Optionally, on the basis of the foregoing embodiment, performing a second label labeling operation on the pre-labeled text, including: acquiring a starting point position and an end point position of a selected word in a current text by determining the text content and the range selected by a user through a mouse; providing a first label option to carry out second label labeling on the selected characters, and storing the second label option into an array corresponding to the first label; and dividing and rendering the whole document according to the starting point position and the end point position of the array.

Optionally, on the basis of the foregoing embodiment, performing a new operation of second relationship labeling on the pre-labeled text includes: selecting a first label or a second label, and guiding a relation line through a mouse, wherein the starting point of the relation line is the midpoint of the selected first label or second label, and the ending point of the relation line is the current position of the mouse; judging whether a relation exists between the two labels when the mouse passes through other first label labels or other second label labels, if so, highlighting the end label, and connecting the two labels to generate a relation line, wherein the relation line displays the label relation name.

Optionally, on the basis of the foregoing embodiment, the method further includes: further judging whether the two labels cross the rows or not; if the line is crossed, adding a first mark point and a second mark point for marking the corresponding association relation between the starting point and the end point of the relation line, wherein the first mark point is positioned at the rearmost side of the relation line of the starting point line, and the second mark point is positioned at the leftmost side of the relation line of the end point line; and acquiring the number of relationship lines and the height information from the labels of the starting line to the middle of the labels of the ending line, and drawing the relationship lines between the two labels across the lines.

Optionally, on the basis of the foregoing embodiment, the method further includes: and updating the tag knowledge base according to a second tag label and/or a second relation label generated by the adjustment operation of the pre-labeled text.

Optionally, on the basis of the foregoing embodiment, the updating the tag knowledge base according to the second tag label and/or the second relationship label generated by the adjusting operation on the pre-labeled text includes:

generating a second-level labeling rule and/or a second-level relationship labeling rule according to the second label labeling and/or the second relationship labeling; counting the times and/or proportions of the second label and/or the second relation label; when the preset times and/or the preset proportion are reached, upgrading the secondary labeling rules and/or the secondary relation labeling rules into primary labeling rules and/or primary relation labeling rules, and automatically pre-labeling the medical texts to be labeled by utilizing the primary labeling rules and/or the primary relation labeling rules in the label knowledge base to obtain pre-labeled texts; or setting the credibility of the secondary labeling rules and/or the secondary relation labeling rules, when the credibility reaches a preset value, automatically pre-labeling the medical text to be labeled together with the primary labeling rules and/or the primary relation labeling rules to obtain a pre-labeled text.

According to another aspect of the present application, there is also provided a medical text labeling apparatus based on a tag knowledge base, including: the acquisition module is used for acquiring the medical text to be marked; the automatic pre-labeling module is used for automatically pre-labeling the medical text to be labeled based on a preset label knowledge base to obtain a pre-labeled text, wherein the pre-labeled text is provided with a first label and/or a first relation label; the confirmation module is used for obtaining the marked text according to the confirmation operation of the pre-marked text; the active labeling module is used for acquiring adjustment operation of the pre-labeled text and generating a second label and/or a second relation label; the processing module is used for obtaining the marked text according to the first label mark and/or the first relation mark of the pre-marked text and the second label mark and/or the second relation mark generated by the adjustment operation of the pre-marked text.

According to the technical scheme, the acquired medical text to be marked is automatically pre-marked through the pre-established label knowledge base, so that the pre-marked text with the first label mark and/or the first relation mark is obtained; the manual auditing and adjusting operation is supported to be carried out on the pre-marked text so as to generate a second label mark and/or a second relation mark; and finally, obtaining the marked text according to the first label mark and/or the first relation mark of the pre-marked text and the second label mark and/or the second relation mark generated by the adjustment operation of the pre-marked text. The method and the device adopt a scheme of pre-labeling and manual auditing operation based on a label knowledge base, complete the rapid high-quality labeling work of the medical texts, omit the work of manually labeling each word and each sentence of the text to be labeled one by one, greatly improve the reading speed and efficiency of scientific researchers and avoid omission.

Drawings

Fig. 1 is a schematic flow chart of a medical text labeling method based on a label knowledge base according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a method for constructing a tag knowledge base according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for performing a second label labeling operation on the pre-labeled text according to an embodiment of the present application;

FIG. 4 is a schematic block diagram of a medical text labeling device based on a tag knowledge base according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, an embodiment of the present application provides a medical text labeling method based on a tag knowledge base, the method comprising:

Step S101: acquiring a medical text to be marked; the medical text refers to a document for recording the patient's treatment process, medication and operation records generated in the patient treatment process, such as a medical record document, a prescription, an examination report document and the like, wherein the key parts are medical record documents, including urgent medical records, inpatient medical records, clinical medical records of various departments and the like.

Step S102: automatically pre-labeling the medical text to be labeled based on a preset label knowledge base to obtain a pre-labeled text, wherein the pre-labeled text is provided with a first label and/or a first relation label;

step S103: obtaining a marked text according to the confirmation operation of the pre-marked text;

or,

step S104: acquiring adjustment operation of the pre-labeling text, and generating a second label and/or a second relation label;

step S105: and obtaining the marked text according to the first label mark and/or the first relation mark of the pre-marked text and the second label mark and/or the second relation mark generated by the adjustment operation of the pre-marked text.

As an alternative embodiment, the method further comprises: pre-constructing a tag knowledge base; the method for constructing the tag knowledge base comprises the following steps: generating a label name based on a label name value set by a user, and extracting a first-level labeling rule based on at least one labeling sample and/or label labeling setting set by the user; generating a relationship name based on a relationship name value set by a user, and extracting a first-level relationship annotation rule based on at least one relationship annotation sample and/or relationship annotation setting set by the user; and constructing the tag knowledge base based on the set tag name, the primary labeling rule, the set relationship name and the primary relationship labeling rule.

Referring to fig. 2, as a specific implementation manner of the foregoing embodiment, a method for constructing a tag knowledge base includes: generating a tag knowledge base based on the tag name and the attribute set by the user;

step S201: generating a label name based on a label name value set by a user, extracting a first-level labeling rule based on at least one labeling sample and/or label labeling setting set by the user, for example, the label name of a time node set by the user is "3-4 days" and "6 hours", the labeling rule which can be extracted is "x-time unit", wherein "×" represents a numerical placeholder, may represent a specific numerical value of integer and floating point, and the time units may be a set of time units such as "year, month, day, time, minute, second", and may further include a set of time units such as "present, bright, yesterday, morning, noon, evening"; for example, the label name of the "part" set by the user is "head", "left pillow part", "injury", and the first-level labeling rule that can be extracted is "body part", "azimuth # part word", where "body part" includes, head, eye, ear, mouth, nose, finger, big arm, forearm, etc., a preset body part set can be obtained by obtaining a preset body part set, or a medical dictionary is queried to obtain a word and word set of all body parts, the "#" represents a character placeholder, and may include body parts or other character content, the "azimuth" may be "upper, lower, left, right, front, rear" and other preset azimuth word sets, and the "part word" may be "part, location, point, end" and other preset part word sets. The label labeling is set in such a way that a user directly generates a primary labeling rule corresponding to the label in a self-setting labeling rule expression mode, the primary labeling rule can also comprise label attributes, and the label attributes can also comprise label color setting, label labeling shortcut key setting, label size, display position and the like.

Step S202: generating a relationship name based on a relationship name value set by a user, and extracting a first-level relationship labeling rule based on at least one relationship labeling sample and/or relationship labeling setting set by the user, wherein the first-level relationship labeling rule comprises a starting point label, an end point label, a connection rule and the like, and the connection rule comprises a nearest connection rule, a left connection rule, a right connection rule and the like. The first-level relationship labeling rule can also comprise relationship attributes, and the relationship attributes can also comprise set relationship colors, relationship labeling shortcut keys, relationship sizes, relationship line thicknesses, display positions and the like.

Step S203: and generating a tag knowledge base based on the tag setting and the first-level labeling rules, the relationship setting and the first-level relationship labeling rules.

As an optional embodiment, the automatically pre-labeling the medical text to be labeled based on the preset label knowledge base, and obtaining the pre-labeled text includes: importing the medical text to be annotated, and segmenting the medical text to be annotated based on a regular expression to obtain a segmented text; and matching the word segmentation text with the first-level labeling rules and/or the first-level relation labeling rules of the tag knowledge base to generate a pre-labeling text with a first tag label and/or a first relation label. The method comprises the steps of importing a text to be marked, segmenting the text to be marked based on a regular expression, and matching with a first-level marking rule and a first-level relation marking rule of a tag knowledge base to generate a pre-marked text with corresponding tags and relations (first tag marking and/or first relation marking); the manual one-to-one labeling of each word and each sentence of the text to be labeled is omitted, the manpower is saved, and omission is avoided.

As an alternative embodiment, the adjusting operation includes: performing a second label marking and/or a second relation marking new operation on the pre-marked text; and/or deleting or replacing the first label and/or the first relation label to generate a second label and/or a second relation label;

and/or, adjusting the range of the first label to generate the second label;

and/or adjusting the starting point or the ending point of the first relation annotation to generate the second relation annotation.

As a specific implementation manner of the embodiment, the adjustment operation is to manually audit and modify the pre-labeling text, including modifying the labeling range of the label (for example, the labeling range of the pre-labeling label is "3-4 days", and the labeling range of the pre-labeling label is "3-4 days before admission"), adding, deleting and replacing the label, modifying the relationship labeling direction, adding, deleting and replacing the relationship labeling, and the like, so as to generate the final labeling text. The method specifically comprises the steps of deleting the corresponding relation label of the deleted label when the label is deleted, generating the corresponding relation label according to the label knowledge base when the label is newly added, deleting the original relation label when the label is changed, and adding the relation label corresponding to the label after the label is newly changed. Compared with the mode of manually marking from 0 one by one, the method has the advantages of greatly saving manpower, improving marking efficiency, avoiding omission, improving marking quality, simultaneously automatically marking the medical text in advance and manually checking and modifying the medical text, fully meeting the technical characteristics of strong specialization for the medical text, high data value density, certain flexibility and the like, and improving the data marking quality.

The pre-label is a first label (a first label and/or a first relation label), and any modification to the first label (the first label and/or the first relation label) falls within the scope of a second label (a second label and/or a second relation label), and the deletion operation to generate the second label may be regarded as changing the pre-labeled first label to a blank second label.

Specifically, deleting or replacing the first label and/or the first relation label, and generating the second label and/or the second relation label includes: selecting the first label and/or the first relation label, and clicking to delete the first label and/or the first relation label, namely modifying the text originally marked as the first label and/or the first relation label into a second label and/or a second relation label without marking; or clicking to replace the first label option pair to replace the first label by the user, and correspondingly replacing the first relation label, wherein the replaced label is the label in the first label option in the label knowledge base, and the replaced relation label is the label in the first relation label option in the label knowledge base.

Specifically, the operation of adjusting the range of the first label to generate the second label includes: selecting a starting point position and/or an ending point position of the selected text content and range of the first label, adjusting the selected text content and range by reducing or extending a line segment distance from the starting point to the middle point through the mouse guiding the starting point position and/or the ending point position, re-labeling the adjusted selected text content and range by using the first label to obtain a second label, and for the pre-labeled text, only the first label is adjusted, namely the adjusted content can form the second label.

Specifically, the operation of adjusting the starting point or the ending point of the first relationship annotation to generate the second relationship annotation includes: selecting a starting point position and/or an end point position of the selected text content and range of the first relation annotation, adjusting the selected text content and range by reducing or extending a line segment distance from the starting point to the middle point through guiding the starting point position and/or the end point position by a mouse, annotating the adjusted selected text content and range again by using the first relation annotation to obtain a second relation annotation, and for the pre-annotated text, only the first relation annotation is adjusted, namely the adjusted content can form the second relation annotation.

Referring to fig. 3, as an alternative embodiment, the adding operation of the second label labeling to the pre-labeled text includes:

step S301: acquiring a starting point position and an end point position of a selected word in a current text by determining the text content and the range selected by a user through a mouse;

step S302: providing a first label option to carry out second label labeling on the selected characters, and storing the second label option into an array corresponding to the first label; the automatic pre-labeling is the first label labeling, the manual adjustment operation belongs to the second label labeling, the labeling rules are different although the labels (namely the first label option pairs) are the same, and the second label labeling rules are more flexible and close to the actual.

Step S303: and dividing and rendering the whole document according to the starting point position and the end point position of the array.

As a specific implementation manner of the foregoing embodiment, the adding operation of the second label labeling on the pre-labeled text, that is, the adding method of the label labeling: the content and the range selected by the mouse can be obtained based on a javascript native method window.getselection (), the starting point and the end point of the selected text in the current text can be obtained, a label pull-down selection box is popped up for selection, or labeling is completed through a shortcut key or through selection of a display label of a labeling page function area, and the label is stored in an array corresponding to the label. When rendering, dividing the whole document according to starting and ending positions in the array; if the label is present, the label name and the label color are set through the label, the label content and the marked text content are displayed, otherwise, the normal text is displayed.

As an alternative embodiment, the adding operation of the second relation annotation to the pre-annotated text comprises the following steps: selecting a first label or a second label, and guiding a relation line through a mouse, wherein the starting point of the relation line is the midpoint of the selected first label or second label, and the ending point of the relation line is the current position of the mouse; judging whether a relation exists between the two labels when the mouse passes through other first label labels or other second label labels, if so, highlighting the end label, and connecting the two labels to generate a relation line, wherein the relation line displays the label relation name. Further, the method further comprises: further judging whether the two labels cross the rows or not; if the line is crossed, adding a first mark point and a second mark point for marking the corresponding association relation between the starting point and the end point of the relation line, wherein the first mark point is positioned at the rearmost side of the relation line of the starting point line, and the second mark point is positioned at the leftmost side of the relation line of the end point line; the number of relationship lines and the height information between the labels of the starting line and the labels of the ending line are obtained, relationship lines between the two labels are drawn across the line, the first mark point and the second mark point can be any one or a combination of a plurality of symbols, graphics, characters and numbers, or other modes, the mark is mainly used, and the mark is not particularly limited. The first mark points and the second mark points of the same association relation can be set to be the same mark, the first mark points and the second mark points of different association lines can be set to be different marks to distinguish line numbers and the like, and the main purpose is to determine the association relation more conveniently and rapidly and improve marking efficiency.

As a specific implementation manner of the foregoing embodiment, a method for adding relationship labels, which is a new operation of performing a second relationship labeling on the pre-labeled text, specifically includes: selecting a label, drawing a straight line through svg, wherein the starting point is the middle point of the selected label, the end point is the current position of the mouse, the mouse moves, and the current connecting line is drawn again. And judging whether the relation exists between the two labels through other labels, if so, highlighting the end label, and releasing the mouse by a user to clear the current connecting line. And drawing a relationship line from the start tag to the end with an arrow, and displaying the relationship name in the middle of the relationship line. The drawing relation is divided into two steps: in the first step, whether two labels are in the same row or not is judged, two marks need to be added between a starting point and an ending point in a crossing way, and the marks can be any one or a combination of a plurality of symbols, graphics, characters and numbers or other modes, and the marks mainly play a role in marking and are not particularly limited.

For example: the first mark point and the second mark point of the same association relationship may be set to the same mark, as a shape of "Δ", or the same symbol of "#", or the same word of "one" or the same number of "1", or one or several combinations of symbols, graphics, words, and numbers, such as "# group 1", "# 2", or what pair of relationships, such as "group 6", are marked by using the sequence of the relationships of the two labels in the text, and represent the 6 th group of relationships in the text. In addition, the first mark point and the second mark point of different relation lines can be set to be different marks, for example, the first mark point and the second mark point of the first relation line can be set to be "#", the first mark point and the second mark point of the second relation line can be set to be "#", the first mark point and the second mark point of the third relation line can be set to be "&" and other symbols, so that the association relation can be determined more conveniently and rapidly, and the marking efficiency can be improved. In addition, the first mark point and the second mark point of the same association relationship may be set to different marks, for example, the first mark point may be set to "×1" or "Δ1" or "1", and the second mark point may be set to "×1" or "Δ1" or "11", which are not described herein, and any deformation of the above mark forms is regarded as the protection scope of the present application.

Specifically, the first point (first mark point) is at the last side of the start line, and the second point (second mark point) is at the leftmost side of the end line; specifically, the first point (first mark point) may be located at the leftmost side of the start line, and the second point (second mark point) may be located at the rightmost side of the end line; further, a first point (first marker point) is on the left side of the first letter selected by the relation range on the start line, and a second point (second marker point) is on the right side of the first following letter selected by the relation range on the end line.

Further, the height is calculated through an algorithm, namely the number of relation lines between the starting point label and the ending point label is calculated, the fixed height is multiplied, the height of each line is calculated, and the drawing is carried out again.

As an alternative embodiment, the method further comprises: and updating the tag knowledge base according to a second tag label and/or a second relation label generated by the adjustment operation of the pre-labeled text.

Specifically, the label knowledge base can keep the second label labeling and/or the second relation labeling generated by the adjustment operation on the basis of keeping the first-level labeling rules and/or the first-level relation labeling rules, can keep the generated second-level labeling rules and/or the second-level relation labeling rules on the basis of keeping the generated second-level labeling rules and/or the second-level relation labeling rules, and can count, but when the label knowledge base is used for pre-labeling the medical text to be labeled, the second-level labeling rules and/or the second-level relation labeling rules are not used for pre-labeling temporarily until the second-level labeling rules are increased to the first-level, or the confidence level can be set for the second-level, and when the confidence level reaches a certain value, the second-level labeling rules can be used for pre-labeling, but the first-level labeling is generated in the same way.

As an optional embodiment, the updating the tag knowledge base according to the second tag label and/or the second relationship label generated by the adjusting operation on the pre-labeled text includes: generating a second-level labeling rule and/or a second-level relationship labeling rule according to the second label labeling and/or the second relationship labeling; counting the times and/or proportions of the second label and/or the second relation label; when the preset times and/or the preset proportion are reached, upgrading the secondary labeling rules and/or the secondary relation labeling rules into primary labeling rules and/or primary relation labeling rules, and automatically pre-labeling the medical texts to be labeled by utilizing the primary labeling rules and/or the primary relation labeling rules in the label knowledge base to obtain pre-labeled texts; or setting the credibility of the secondary labeling rules and/or the secondary relation labeling rules, when the credibility reaches a preset value, automatically pre-labeling the medical text to be labeled together with the primary labeling rules and/or the primary relation labeling rules to obtain a pre-labeled text.

As a specific implementation manner of the foregoing embodiment, updating the tag knowledge base according to the second tag label and/or the second relationship label generated by the adjustment operation on the pre-labeled text (that is, updating the tag knowledge base based on the operation on the labeling content in the pre-labeled text) includes: the operation of marking the content in the pre-marked text is obtained, a secondary marking rule and/or a secondary relation marking rule are generated based on the operation and different parts of the primary marking rule and/or the primary relation marking rule in the tag knowledge base, the primary rule is an upper rule which is more widely applicable, the secondary rule is a rule per se for each adjustment operation, individuation is higher, for example, the primary rule with 3-4 days as a time node, and the secondary rule with 3-4 days before admission, 3-4 days before onset and 3-4 days after onset as different time nodes. Therefore, specific different rules of each operation are stored as secondary rules of the tag knowledge base, so that the labeling accuracy is improved continuously, the personalized labeling requirement is met, and the labeling efficiency is improved. And the inclusion times of each secondary rule can be counted, and when a certain number of times is reached or the occurrence ratio of the secondary rule is higher than a preset value, the secondary rule is updated to be a primary rule, so that a more rapid and accurate labeling effect is achieved.

As an optional embodiment, the application also provides a method for online collaborative labeling of multiple people, which specifically comprises the following steps: based on an html5websocket full duplex communication protocol, a user enters a labeling platform page, a websocket connection establishment request is sent to a server through javascript, and after connection establishment, a client and a server can directly exchange data through TCP connection. The user carries out text labeling or draws a relationship line, sends data to the service through a send () method, the service end processes the request and responds, other users receive the data (including information such as operation user, time, content, position and the like) returned by the server through an onmessage event, the operation of prompting a certain user is popped up at a designated position, the content of the operation is rendered, and even communication and multi-user cooperation are completed.

According to the technical scheme, the label knowledge base is established, and the scheme of pre-labeling and manual auditing operation is adopted, so that the rapid high-quality labeling work of medical texts is completed, the reading speed and efficiency of scientific researchers are greatly improved, and omission is avoided.

As shown in fig. 4, according to another aspect of the present application, an embodiment of the present application further provides a medical text labeling apparatus based on a tag knowledge base, including:

The acquisition module is used for acquiring the medical text to be marked;

the automatic pre-labeling module is used for automatically pre-labeling the medical text to be labeled based on a preset label knowledge base to obtain a pre-labeled text, wherein the pre-labeled text is provided with a first label and/or a first relation label;

the confirmation module is used for obtaining the marked text according to the confirmation operation of the pre-marked text;

the active labeling module is used for acquiring adjustment operation of the pre-labeled text and generating a second label and/or a second relation label;

the processing module is used for obtaining the marked text according to the first label mark and/or the first relation mark of the pre-marked text and the second label mark and/or the second relation mark generated by the adjustment operation of the pre-marked text.

For specific limitations on the medical text labeling device based on the tag knowledge base, reference may be made to the above limitation on the medical text labeling method based on the tag knowledge base, and the description thereof will not be repeated here. The various modules/units in the medical text labeling apparatus based on the tag knowledge base may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules/units may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, the present application provides a computer device comprising: the device comprises a communication interface, a processor, a memory and a bus, wherein the communication interface, the processor and the memory are mutually connected through the bus;

the memory stores a memory of computer program instructions that, when executed, cause the processor to perform the steps of a medical text labeling method based on a tag knowledge base.

The computer equipment provided by the embodiment of the application can be a server, a client or other computer network communication equipment; fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Processor 501, memory 502, bus 505, interface 504, processor 501 being connected to memory 502, interface 504, bus 505 being connected to processor 501, memory 502, and interface 504, respectively, interface 504 being used to receive or transmit data, processor 501 being a single or multi-core central processing unit, or being a specific integrated circuit, or being one or more integrated circuits configured to implement embodiments of the present application. The memory 502 may be a random access memory (randomaccess memory, RAM) or a non-volatile memory (non-volatile memory), such as at least one hard disk memory. Memory 502 is used to store computer-executable instructions. Specifically, the program 503 may be included in the computer-executable instructions.

In this embodiment, when the processor 501 invokes the program 503, the management server in fig. 5 may be caused to perform the operation of the medical text labeling method based on the label knowledge base, which is not described herein.

It should be appreciated that the processor provided by the above embodiment of the present application may be a central processing unit (centralprocessing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signalprocessor, DSP), application-specific integrated circuits (ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be understood that the number of processors in the computer device in the above embodiment of the present application may be one or more, and may be adjusted according to the actual application scenario, which is merely illustrative and not limiting. The number of the memories in the embodiment of the present application may be one or more, and may be adjusted according to the actual application scenario, which is only illustrative and not limiting.

It should be further noted that, when the computer device includes a processor (or a processing unit) and a memory, the processor in the present application may be integrated with the memory, or the processor and the memory may be connected through an interface, which may be adjusted according to an actual application scenario, and is not limited.

The present application provides a chip system comprising a processor for supporting a computer device (client or server) to implement the functions of the controller involved in the above method, e.g. to process data and/or information involved in the above method. In one possible design, the chip system further includes memory to hold the necessary program instructions and data. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

In another possible design, when the chip system is a chip in a user equipment or an access network or the like, the chip comprises: the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to cause the chip in the client or the management server or the like to perform the steps of the common sense question-answering method. Alternatively, the storage unit is a storage unit in the chip, such as a register, a cache, or the like, and the storage unit may also be a storage unit located outside the chip in a client or a management server, such as a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM), or the like.

It should be appreciated that the methods and/or embodiments of the present application may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. The above-described functions defined in the method of the application are performed when the computer program is executed by a processing unit.

It is to be appreciated that the controllers or processors referred to in the above embodiments of the present application may be central processing units (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digitalsignal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should be further understood that the number of processors or controllers in the computer device or the chip system and the like in the above embodiment of the present application may be one or more, and may be adjusted according to the actual application scenario, which is merely illustrative and not limiting. The number of the memories in the embodiment of the present application may be one or more, and may be adjusted according to the actual application scenario, which is only illustrative and not limiting.

The computer readable medium according to the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowchart or block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As another aspect, the embodiment of the present application also provides a computer-readable medium that may be contained in the apparatus described in the above embodiment; or may be present alone without being fitted into the device. The computer readable medium carries one or more computer readable instructions executable by a processor to perform the steps of the methods and/or aspects of the various embodiments of the application described above. The computer may be a computer device (client or server or other computer network communication device) as described above.

In one exemplary configuration of the application, the terminal, the devices of the services network each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer-readable media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device.

In addition, the embodiment of the application also provides a computer program which is stored in the computer equipment, so that the computer equipment executes the method for executing the control code.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, e.g., using Application Specific Integrated Circuits (ASIC), a general purpose computer or any other similar hardware device. In some embodiments, the software program of the present application may be executed by a processor to implement the above steps or functions. Likewise, the software programs of the present application (including associated data structures) may be stored on a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. In addition, some steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, the terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. As used in this embodiment of the application, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the description of the present application, unless otherwise indicated, "/" means that the objects associated in tandem are in a "or" relationship, e.g., A/B may represent A or B; the "and/or" in the present application is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B may indicate: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. The word "if" or "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims

1. A medical text labeling method based on a label knowledge base, the method comprising:

acquiring a medical text to be marked;

automatically pre-labeling the medical text to be labeled based on a preset label knowledge base to obtain a pre-labeled text, wherein the pre-labeled text is provided with a first label and/or a first relation label; the tag knowledge base comprises a first-level labeling rule and/or a first-level relation labeling rule, a second tag label and/or a second relation label generated based on adjustment operation, and a second-level labeling rule and/or a second-level relation labeling rule;

obtaining a marked text according to the confirmation operation of the pre-marked text; or,

Acquiring adjustment operation of the pre-labeling text, and generating a second label and/or a second relation label;

obtaining a marked text according to a first tag mark and/or a first relation mark of the pre-marked text and a second tag mark and/or a second relation mark generated by the adjustment operation of the pre-marked text;

updating the tag knowledge base according to a second tag label and/or a second relationship label generated by the adjustment operation of the pre-labeled text;

generating a second-level labeling rule and/or a second-level relationship labeling rule according to the second label labeling and/or the second relationship labeling;

counting the times and/or proportions of the second label and/or the second relation label; when the preset times and/or the preset proportion are reached, upgrading the secondary labeling rules and/or the secondary relation labeling rules into primary labeling rules and/or primary relation labeling rules, and automatically pre-labeling the medical texts to be labeled by utilizing the primary labeling rules and/or the primary relation labeling rules in the label knowledge base to obtain pre-labeled texts; or setting the credibility of the secondary labeling rules and/or the secondary relation labeling rules, when the credibility reaches a preset value, automatically pre-labeling the medical text to be labeled together with the primary labeling rules and/or the primary relation labeling rules to obtain a pre-labeled text.

2. The method according to claim 1, wherein the method further comprises: pre-constructing a tag knowledge base; the construction method comprises the following steps:

generating a label name based on a label name value set by a user, and extracting a first-level labeling rule based on at least one labeling sample and/or label labeling setting set by the user;

generating a relationship name based on a relationship name value set by a user, and extracting a first-level relationship annotation rule based on at least one relationship annotation sample and/or relationship annotation setting set by the user;

and constructing the tag knowledge base based on the set tag name, the primary labeling rule, the set relationship name and the primary relationship labeling rule.

3. The method according to claim 2, wherein automatically pre-labeling the medical text to be labeled based on a preset label knowledge base, and obtaining pre-labeled text comprises:

importing the medical text to be annotated, and segmenting the medical text to be annotated based on a regular expression to obtain a segmented text;

and matching the word segmentation text with the first-level labeling rules and/or the first-level relation labeling rules of the tag knowledge base to generate a pre-labeling text with a first tag label and/or a first relation label.

4. The method of claim 1, wherein the adjusting operation comprises:

performing a second label marking and/or a second relation marking new operation on the pre-marked text;

and/or deleting or replacing the first label and/or the first relation label to generate a second label and/or a second relation label;

and/or, adjusting the range of the first label to generate the second label;

5. The method of claim 4, wherein performing a second label-tagged augmentation operation on the pre-tagged text comprises:

acquiring a starting point position and an end point position of a selected word in a current text by determining the text content and the range selected by a user through a mouse;

providing a first label option to carry out second label labeling on the selected characters, and storing the second label option into an array corresponding to the first label;

and dividing and rendering the whole document according to the starting point position and the end point position of the array.

6. The method of claim 4, wherein performing a second relationship annotation on the pre-annotated text comprises:

Selecting a first label or a second label, and guiding a relation line through a mouse, wherein the starting point of the relation line is the midpoint of the selected first label or second label, and the ending point of the relation line is the current position of the mouse;

judging whether a relation exists between the two labels when the mouse passes through other first label labels or other second label labels, if so, highlighting the end label, and connecting the two labels to generate a relation line, wherein the relation line displays the label relation name.

7. The method of claim 6, wherein the method further comprises:

further judging whether the two labels cross the rows or not;

if the line is crossed, adding a first mark point and a second mark point for marking the corresponding association relation between the starting point and the end point of the relation line, wherein the first mark point is positioned at the rearmost side of the relation line of the starting point line, and the second mark point is positioned at the leftmost side of the relation line of the end point line;

and acquiring the number of relationship lines and the height information from the labels of the starting line to the middle of the labels of the ending line, and drawing the relationship lines between the two labels across the lines.

8. A medical text labeling device based on a tag knowledge base, comprising:

The acquisition module is used for acquiring the medical text to be marked;