CN107515851A - Apparatus and method for the retrieval of coreference resolution, information extraction and similar document - Google Patents
Apparatus and method for the retrieval of coreference resolution, information extraction and similar document Download PDFInfo
- Publication number
- CN107515851A CN107515851A CN201610428860.4A CN201610428860A CN107515851A CN 107515851 A CN107515851 A CN 107515851A CN 201610428860 A CN201610428860 A CN 201610428860A CN 107515851 A CN107515851 A CN 107515851A
- Authority
- CN
- China
- Prior art keywords
- medical
- medical entity
- entity
- diagnostic
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 82
- 238000000605 extraction Methods 0.000 title claims abstract description 45
- 238000011524 similarity measure Methods 0.000 claims description 87
- 238000001514 detection method Methods 0.000 claims description 24
- 238000002405 diagnostic procedure Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000008901 benefit Effects 0.000 abstract description 4
- 238000003745 diagnosis Methods 0.000 description 20
- 230000014509 gene expression Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 239000000284 extract Substances 0.000 description 9
- 230000005856 abnormality Effects 0.000 description 8
- 206010028980 Neoplasm Diseases 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000011282 treatment Methods 0.000 description 7
- 238000010276 construction Methods 0.000 description 5
- 201000010099 disease Diseases 0.000 description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 235000021384 green leafy vegetables Nutrition 0.000 description 3
- 210000004072 lung Anatomy 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 235000003447 Pistacia vera Nutrition 0.000 description 2
- 240000006711 Pistacia vera Species 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000002591 computed tomography Methods 0.000 description 2
- 238000007596 consolidation process Methods 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 235000020233 pistachio Nutrition 0.000 description 2
- 206010017076 Fracture Diseases 0.000 description 1
- 206010034246 Pelvic fractures Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000009534 blood test Methods 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000002595 magnetic resonance imaging Methods 0.000 description 1
- 210000001370 mediastinum Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011328 necessary treatment Methods 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 230000008733 trauma Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H15/00—ICT specially adapted for medical reports, e.g. generation or transmission thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Primary Health Care (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention discloses the apparatus and method retrieved for coreference resolution, information extraction and similar document.The device for coreference resolution includes:It is configured to obtain the unit of the first medical science entity and the second medical science entity from the medical files of input;It is configured to from the medical files, detects the unit of at least one attribute of the diagnostic state of the first medical science entity, at least one attribute of the first medical science entity, the diagnostic state of the second medical science entity and the second medical science entity;It is configured to based on detected diagnostic state and attribute, determine the compatible unit between the first medical science entity and the second medical science entity;And the compatibility based on determined by is configured to, determine that the first medical science entity and the second medical science entity indicate whether the unit of same medical object.Using advantages of the present invention, the accuracy of coreference resolution will be improved.
Description
Technical Field
The present invention relates to Natural Language Processing (NLP), and more particularly, to an apparatus and method for co-reference resolution (co-referrence resolution), information extraction, and similar document retrieval, for example.
Background
The use and management of electronic medical documents is becoming increasingly popular. Based on the management of electronic medical documents, many applications may be developed that would benefit physicians, such as similar medical document retrieval, diagnostic support, and the like. The above application is implemented by performing a text information extraction technique on a medical document. The expression of the results obtained via the textual information extraction technique is referred to as a medical entity. Generally, a medical entity in one medical document may represent several different medical objects, wherein a medical object may be a concrete physical object (such as an abnormality diagnosed from an examination result, etc.) or an abstract medical concept (such as a disease judged by a doctor, etc.). For example, some medical entities may represent abnormalities (e.g., tumors) diagnosed from examination results, some medical entities may represent a patient's disease, and so forth. That is, for one medical object (e.g., a tumor) in a medical document, a physician may record using several different expressions (i.e., medical entities).
Thus, there is a need for techniques that can determine whether different expressions (i.e., two different medical entities) represent the same medical object. Coreference resolution is an important technology. US patent US8457950 has disclosed a method for coreference resolution comprising: calculating a similarity measure for two candidate entities in a document based on a similarity measure of word features between the two candidate entities and contexts of the two candidate entities; and determining the two candidate entities as co-fingers in case the similarity measure of the two candidate entities is larger than or equal to a predetermined threshold.
However, in a medical document, some word features that are seemingly the same or similar to each other may not actually represent the same medical object. For example, the word feature "tumor" in a medical document may represent a different abnormality, since different abnormalities may be described by using the same expression (i.e., word feature). In addition, some word features that are not superficially similar to each other may actually represent the same medical object. For example, the word features "pelvic fracture" and "fragility fracture" in the medical document may represent the same abnormality, since different aspects of an abnormality may be described by using different expressions (i.e., word features) so that other readers will not be confused. Therefore, the accuracy of coreference resolution using only similarity measures of word features will be low.
Disclosure of Invention
Accordingly, the present disclosure is directed to solving the above-described problems in view of the description in the background art described above.
According to an aspect of the present invention, there is provided an apparatus for coreference resolution, the apparatus for coreference resolution comprising: an acquisition unit configured to acquire a first medical entity and a second medical entity from an input medical document; a diagnostic feature detection unit configured to detect a diagnostic status of the first medical entity, at least one attribute of the first medical entity, a diagnostic status of the second medical entity and at least one attribute of the second medical entity from the medical document; a compatibility determination unit configured to determine compatibility between the first medical entity and the second medical entity based on the detected diagnostic status and attributes; and a coreference resolution unit configured to determine, based on the determined compatibility, whether the first medical entity and the second medical entity represent the same medical object. Wherein the diagnostic status represents a location of the medical entity in a diagnostic process in the medical document; the attributes represent diagnostic items of the medical entity in the medical document; and compatibility represents the likelihood that the medical entity and the other second medical entity represent the same medical object.
By adopting the advantages of the invention, the accuracy of coreference resolution can be improved.
Other characteristic features and advantages of the present invention will become apparent from the following description of exemplary embodiments with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a block diagram schematically showing a hardware configuration capable of implementing the technique according to the embodiment of the present invention.
Fig. 2 is a block diagram illustrating a configuration of an apparatus for coreference resolution according to a first embodiment of the present invention.
Fig. 3 shows an exemplary input medical document of the invention according to the first embodiment in fig. 2.
Fig. 4 is a flowchart schematically showing the procedure of the overall process according to the first embodiment in fig. 2.
Fig. 5 is a block diagram illustrating a configuration of an apparatus for coreference resolution according to a second embodiment of the present invention.
Fig. 6 is a flowchart schematically showing the procedure of the overall process according to the second embodiment in fig. 5.
Fig. 7 is another block diagram illustrating the configuration of an apparatus for coreference resolution according to the second embodiment of the present invention.
Fig. 8 is a flowchart schematically showing the procedure of the overall process according to the second embodiment in fig. 7.
Fig. 9 is a block diagram illustrating a configuration of an apparatus for information extraction according to a third embodiment of the present invention.
Fig. 10 is a flowchart schematically showing the procedure of the overall process according to the third embodiment in fig. 9.
Fig. 11 is another flowchart schematically showing the procedure of the overall process according to the third embodiment in fig. 9.
Fig. 12 is a block diagram illustrating the configuration of an apparatus for similar document retrieval according to a fourth embodiment of the present invention.
Fig. 13 is a flowchart schematically showing the procedure of the overall process according to the fourth embodiment in fig. 12.
FIG. 14 illustrates an arrangement of an exemplary similar document retrieval system according to the present invention.
Detailed Description
Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that the following description is merely illustrative and exemplary in nature and is in no way intended to limit the invention, its application, or uses. The relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in the embodiments do not limit the scope of the present invention unless it is specifically stated otherwise. Additionally, techniques, methods, and apparatus known to those skilled in the art may not be discussed in detail but are intended to be part of the specification where appropriate.
Note that in the drawings, like reference numerals and letters refer to like items, and thus once items are defined in one figure, discussion about the following figures is not necessary.
In medical diagnosis, a complete diagnostic process may include several diagnostic states, such as a finding state that identifies medical findings (e.g., normal findings and abnormal findings) from examination results, wherein the examination results may be, for example, Computed Tomography (CT), Magnetic Resonance Imaging (MRI), blood tests, and the like; a description status describing detailed information of the medical findings; making a preliminary suspect status based on medical findings; a comparison state comparing the current situation with a previous inspection result; analyzing a cause state of a cause of the current condition; making a final judgment state; recording the delay judgment state of the information which can not make final judgment; a treatment state suggesting a treatment (e.g., surgery or medication); a request status to request further examination for more information, and other diagnostic status. Wherein the diagnostic status represents a location of medical findings in the diagnostic process. In the field of NLP technology, the above medical findings may also be considered as medical entities, wherein a medical entity (i.e. a medical finding) is an expression or record corresponding to a medical object. And the medical object may be a concrete physical object (such as an abnormality diagnosed from an examination result, etc.) or an abstract medical concept (such as a disease judged by a doctor, etc.).
When writing medical documents, the sequence of the above diagnostic states is not fixed for one diagnostic process, and not all of the above diagnostic states are necessary. In addition, in one medical document, the medical findings (i.e., medical entities) of the patient diagnosed from the examination results may be recorded, the doctor's consideration and/or judgment of the patient may be recorded, and other necessary treatments and/or examinations of the patient may be recorded. That is, in one medical document, contents corresponding to any diagnosis state may be recorded.
Furthermore, in one diagnostic state, more than one different expression may be recorded to describe detailed information (such as different aspects, etc.) of one medical object. For example, in the discovery state, more than one different expression may be recorded to describe the details of a tumor. Furthermore, a medical object can also be recorded several times in different diagnostic states. For example, in the discovery state, one expression may be recorded to describe details of one tumor, in the judgment state, a different expression may be recorded to describe the level of the tumor, and then, for further diagnostic states, another different expression may be recorded to describe other requests for subsequent examination of the tumor. That is, in a medical document, a medical object can be described by using several different expressions (i.e. several different medical entities), whether in one diagnostic state or among different diagnostic states.
As described above, the structure of the medical document is complex. However, the inventors found that: based on the writing criteria of the medical documents, there is some compatibility between the medical entities, and in case the medical entities are compatible with each other, the medical entities should represent the same medical object. In one example, among the different diagnostic states, in the event that a diagnostic process is not complete, a medical object that has been described in a previous diagnostic state will still be described in other subsequent diagnostic states, and therefore the descriptions (i.e., medical entities) recorded among the different diagnostic states are compatible and represent the same medical object. In another example, in a diagnostic state, where a description of a medical object is incomplete, other descriptions of the medical object will continue to be recorded in the diagnostic state, so the descriptions (i.e., medical entities) recorded in the diagnostic state are compatible and represent the same medical object. Of course, there are also cases in which only a description of one medical object is recorded in only a few diagnostic states, rather than in the entire diagnostic process, and/or there are cases in which other descriptions describing an incomplete one medical object in one diagnostic state will not be continued to be recorded in that diagnostic state.
However, the present invention is not intended to process the entire diagnostic process in one medical document, but is merely intended to identify compatibility between two medical entities in an input medical document, where the input medical document may be a portion of a medical document or an entire medical document. That is, in the present invention, in the case where the description of the current medical entity is incomplete, or the diagnosis state of the current medical entity is not the final diagnosis state, other medical entities may be compatible with the current medical entity.
(hardware construction)
First, a hardware configuration capable of implementing the technology described hereinafter will be described with reference to fig. 1. Fig. 1 is a block diagram schematically illustrating a hardware configuration 100 capable of implementing techniques according to an embodiment of the present invention.
The hardware configuration 100 may include, for example, a Central Processing Unit (CPU)110, a Random Access Memory (RAM)120, a Read Only Memory (ROM)130, a hard disk 140, an input device 150, an output device 160, a network interface 170, and a system bus 180. Further, the hardware configuration 100 may be implemented by, for example, a Personal Digital Assistant (PDA), a mobile phone, a notebook computer, a desktop computer, a tablet computer, or other suitable electronic devices.
The CPU110 may be any suitable programmable control device and may execute various functions to be described hereinafter by executing various application programs stored in the ROM 130 or the hard disk 140. The RAM 120 is used to temporarily store programs or data loaded from the ROM 130 or the hard disk 140, and is also used for a space where the CPU110 executes various programs (for example, to perform a disclosed technique which will be described in detail below with reference to fig. 2 to 14) and other available functions. The hard disk 140 may store various information such as an Operating System (OS), various applications, control programs, and data pre-stored or pre-set by a manufacturer or a user, wherein the data may be, for example, historic medical documents, Thresholds (TH), rules, models, and the like, which will be described below.
In one embodiment, the input device 150 may be an input interface and may, for example, receive an image of a medical document output from an image acquisition device, where the image acquisition device may be, for example, a camera, a digital camera, or other suitable electronic device. The output device 160 may be an output interface, and may output the processing result to a sequence operation to be described later.
In another embodiment, input device 150 may enable a user to interact with an electronic device implementing hardware configuration 100, for example, a user may input a medical entity, a medical document, or an image of a medical document through input device 150. The input device 150 can take various forms such as buttons, a keyboard, a dial, a click wheel, or a touch screen. Output device 160 may include a Cathode Ray Tube (CRT) or liquid crystal display, and may display the processing results to a user. Additionally, if the electronic device implementing hardware architecture 100 is a so-called device, such as a smartphone, PDA, tablet computer, or other suitable electronic device, input device 150 and output device 160 may be incorporated integrally. Furthermore, if the electronic device implementing hardware architecture 100 is a so-called device, such as a conventional mobile phone, laptop, desktop, or other suitable electronic device, input device 150 and output device 160 may be discretely incorporated into the device.
Network interface 170 provides an interface for connecting an electronic device implementing hardware architecture 100, such as electronic device 1410 shown in fig. 14, to a network, such as network 1420 shown in fig. 14. For example, an electronic device implementing the hardware configuration 100 may be in data communication with other electronic devices (e.g., the server 1430 shown in fig. 14) connected via a network via the network interface 170. Alternatively, an electronic device implementing the hardware configuration 100 may be provided with a wireless interface for wireless data communication. The system bus 180 may provide a data transfer path for transferring data among the CPU110, the RAM 120, the ROM 130, the hard disk 140, the input device 150, the output device 160, the network interface 170, and the like, to each other. Although referred to as a bus, system bus 180 is not limited to any particular data transfer technique.
Alternatively, software that realizes the same functions as the above-described hardware configuration may be used.
In an example of an embodiment of the present invention (e.g., coreference resolution), a program of the embodiment, which will be described later in detail with reference to fig. 4, 6, and 8, may be installed in advance in the hard disk 140, and when the CPU110 needs to perform the program of the embodiment, the program is loaded from the hard disk 140 to the RAM 120. In another example, the program of the embodiment may be recorded in the ROM 130 as a part of the memory map and directly executed by the CPU 110. In addition, the programs of other embodiments (such as information extraction and similar document retrieval) which will be described later in detail by referring to fig. 10 to 11 and fig. 13 may also be stored and executed in the same manner.
The hardware configuration 100 described above is merely exemplary and is in no way intended to limit the present invention, its applications, or uses. For simplicity, only one hardware configuration is shown in fig. 1. However, a plurality of hardware configurations can also be used as necessary.
(construction of apparatus for coreference resolution Using compatibility)
Next, a configuration for coreference resolution will be described with reference to fig. 2. Fig. 2 is a block diagram illustrating the configuration of an apparatus for coreference resolution 200 according to the first embodiment of the present invention.
The blocks shown in fig. 2 are implemented as the CPU110 described above with reference to fig. 1 and for executing the program loaded to the RAM 120 and for cooperating with the respective hardware shown in fig. 1. Some or all of these blocks may be implemented by dedicated hardware.
As shown in fig. 2, the apparatus for coreference resolution 200 according to the first embodiment of the present invention includes: an acquisition unit 210, a diagnostic feature detection unit 220, a compatibility determination unit 230, and a coreference resolution unit 240.
As described above, first, the input device 150 shown in FIG. 1 will receive a medical document entered by a user (e.g., a physician), where the medical document may be a portion of the medical document or the entire medical document. Further, the input device 150 will receive the first medical entity and the second medical entity selected by the user from the medical documents. Second, the input device 150 transmits the received medical document, the first medical entity, and the second medical entity to the acquisition unit 210 via the system bus 180.
As described above, the first medical entity and the second medical entity are selected by the user. As an alternative solution, it is also possible to extract from the received medical document by the CPU110 by using existing text information extraction techniques. For example, a plurality of medical entities may be extracted from a received medical document by the CPU110, and then any two of the medical entities may be considered a first medical entity and a second medical entity.
In addition, the above medical document is in a text format. However, the medical document may also be in an image format. For example, the input device 150 may receive images of medical documents input by a user or output from an image acquisition device. After the input device 150 receives the image of the medical document, the input device 150 transmits the received image of the medical document to the CPU110 via the system bus 180. CPU110 then converts the medical document from an image format to a text format, for example, using existing Optical Character Recognition (OCR) techniques. The first and second medical entities may then be selected from the converted medical document by the user through the input device 150, or may be extracted from the converted medical document by the CPU110 by using a text information extraction technique.
In order to make the invention easier to understand, an exemplary entered medical document is shown in fig. 3 as part of a medical document, which is recorded, for example, in japanese. As shown in fig. 3, the terms "piste" (i.e., "piste" 310, "piste" 320, "piste" 330, and "piste" 340) represented by the dotted ellipse are the above-mentioned medical entities, and any two of these "piste" may be considered as the above-mentioned first and second medical entities. Here, "JI Piet is a" node ".
Returning now to fig. 2, first, the acquisition unit 210 acquires the first medical entity, the second medical entity and the medical document from the input device 150 via the system bus 180.
Next, the diagnostic feature detection unit 220 will detect the diagnostic status of the first medical entity, the at least one attribute of the first medical entity, the diagnostic status of the second medical entity and the at least one attribute of the second medical entity from the medical document. Wherein the diagnosis status represents a location of the medical entity in a diagnosis process in the medical document, the attributes represent diagnosis items of the medical entity in the medical document, and the diagnosis items represent items considered by the doctor for making a diagnosis.
As described above, the diagnosis state includes at least one of the above-described diagnosis states (e.g., a discovery state, a description state, a suspicion state, a comparison state, a cause state, a delay judgment state, a treatment state, and a request state). The attributes include at least a type of attribute corresponding to the diagnostic item and a value of the attribute corresponding to the parameter of the diagnostic item. In addition, the type of attribute includes at least one of the following types: examination indices (e.g., size, shape, location, level, and numerical indices), medical vocabulary (e.g., examination, disease, treatment, drug), diagnostic assertions (e.g., polarity, cause), and the like. Wherein the polarity may be, for example, negative or positive, and the cause represents that the abnormality is caused by, for example, lifestyle or trauma.
Taking the first medical entity as an example, in order to identify which diagnostic state the first medical entity belongs to, for example, first, the diagnostic feature detection unit 220 may extract predetermined contents related to the first medical entity from a medical document. The predefined content may be predefined by the manufacturer or the user, for example, depending on the actual application or experience, or depending on the descriptive criteria of the respective diagnostic state. And the predetermined content is content in the medical document that can support making a medical diagnosis. Second, the diagnostic feature detection unit 220 may identify a diagnostic state of the first medical entity by analyzing the extracted contents. For example, the diagnostic status of the first medical entity may be identified by analyzing the extracted content according to a pre-generated rule, or by classifying the extracted content according to a pre-generated model. Wherein the pre-generated rules and the pre-generated models may be generated based on a plurality of expression samples of the respective diagnostic states.
Taking the medical entity "clamp" 310 shown in fig. 3 as an example, the contents extracted by the diagnostic feature detection unit 220 are "tip", "diameter", "cm", and " められます", and the diagnostic status of the medical entity "clamp" 310 identified by the diagnostic feature detection unit 220 is a found status. Where "distal" means "end", "diameter" means "diameter", "cm" is a measure of length, and " められます" means "found".
Then, in order to identify what information is recorded for the first medical entity, the diagnostic feature detection unit 220 may also extract attributes (e.g., types of attributes and/or values of attributes) of the first medical entity according to existing NLP techniques (e.g., template-based information extraction or training language corpus-based information extraction), as an example. Continuing with the example of the medical entity "humics" 310 shown in fig. 3, the attribute of the medical entity "humics" 310 extracted by the diagnostic feature detection unit 220 is "location: right lung S4 "," size: 2.5cm "and" polarity: positive ". Where the extracted "position", "size", and "polarity" are the types of attributes, and the extracted "right lung S4", "2.5 cm", and "positive" are the values of the attributes.
As described above, the diagnostic status and attributes of the medical entity are both detected by one unit, i.e., the diagnostic feature detection unit 220. However, the diagnostic status and the properties of the medical entity may also be detected by different units. As shown in fig. 2, as an alternative solution, the diagnostic status of the medical entity may be identified by the diagnostic status identification unit 221, and the attributes of the medical entity may be extracted by the attribute extraction unit 222.
Returning now to fig. 2, after the diagnostic feature detection unit 220 detects the diagnostic status of the first medical entity, the attributes of the first medical entity, the diagnostic status of the second medical entity and the attributes of the second medical entity, the compatibility determination unit 230 will determine the compatibility between the first medical entity and the second medical entity based on the detected diagnostic status and attributes. Wherein compatibility represents the likelihood that one medical entity and another medical entity represent the same medical object. And as a preferred solution, the compatibility determination unit 230 may further include a compatibility factor determination unit 231 and a compatibility judgment unit 232.
First, the compatibility factor determination unit 231 will determine a compatibility factor between the first medical entity and the second medical entity based on the detected diagnostic status and attributes, wherein the compatibility factor represents a semantic conflict between the first medical entity and the second medical entity. As an example, the compatibility factor includes a conflict of semantic values among the diagnostic status of the first medical entity, the attribute of the first medical entity, the diagnostic status of the second medical entity, and the attribute of the second medical entity, and includes a conflict of semantic sequences among the diagnostic status of the first medical entity, the attribute of the first medical entity, the diagnostic status of the second medical entity, and the attribute of the second medical entity.
For example, in case the diagnostic state of the first medical entity and the diagnostic state of the second medical entity are in the same diagnostic state, the semantic values between the first medical entity and the second medical entity do not conflict if the type of the attribute of the first medical entity is different from the type of the attribute of the second medical entity and if the distance between the first medical entity and the second medical entity is small. Or in case the diagnostic state of the first medical entity and the diagnostic state of the second medical entity are in different diagnostic states, if the type of certain attributes of the first medical entity is the same as the type of certain attributes of the second medical entity and the values of these attributes are the same, and if the distance between the first medical entity and the second medical entity is small, the semantic values between the first medical entity and the second medical entity do not conflict.
Furthermore, in case the diagnostic state of the first medical entity is connected in a consecutive manner with the diagnostic state of the second medical entity or in case the attributes of the first medical entity are connected in a consecutive manner with the attributes of the second medical entity, the semantic sequence between the first medical entity and the second medical entity is not in conflict if the obtained semantic meaning is not in conflict (i.e. the consecutive connections comply with the writing criteria of the medical document).
Then, after the compatible factor determining unit 231 determines the compatible factor between the first medical entity and the second medical entity, the compatibility judging unit 232 will judge the compatibility between the first medical entity and the second medical entity based on the determined compatible factor. Furthermore, the compatibility determination unit 232 will determine that the first medical entity is compatible with the second medical entity in case, for example, the semantic value and the semantic sequence do not conflict.
In one example, the compatibility factor determination unit 231 may determine the compatibility factor according to the following steps:
first, the compatibility factor determination unit 231 will calculate at least one of the following features:
1) a distance between the first medical entity and the second medical entity, wherein the distance may be determined based on, for example, a distance of a sentence between the first medical entity and the second medical entity. The smaller the distance, the more likely the first medical entity and the second medical entity represent the same medical object. For example, if a sentence related to a first medical entity and a sentence related to a second medical entity are adjacent to each other (i.e., the distance is zero), the first medical entity and the second medical entity may represent the same medical object.
2) A sequence between a diagnostic state of a first medical entity and a diagnostic state of a second medical entity.
3) A distance between the diagnostic state of the first medical entity and the diagnostic state of the second medical entity, wherein the distance may be determined based on, for example, a distance of a sentence between the diagnostic state of the first medical entity and the diagnostic state of the second medical entity. The smaller the distance, the more likely the first medical entity and the second medical entity represent the same medical object. For example, if the diagnostic state of a first medical entity and the diagnostic state of a second medical entity are adjacent to each other (i.e. the distance is zero), the first medical entity and the second medical entity may represent the same medical object.
4) A type of attribute of the first medical entity and the second medical entity.
5) A sequence between a type of an attribute of the first medical entity and a type of an attribute of the second medical entity.
6) The type belongs to values of attributes of both the first medical entity and the second medical entity.
Next, the compatibility factor determination unit 231 determines the conflict of semantic values and the conflict of semantic sequences based on the calculated features and predetermined rules.
Through the writing criteria of the medical document, it can be found that the medical entity in the medical document can conform to certain rules. In one example, in case two medical entities are in the same diagnostic state, the type of the attributes of the two medical entities (in addition to the polarity type described above) should be different (i.e. the content of the two medical entities may be different aspects of the same medical object) if the two medical entities are compatible (i.e. the two medical entities represent the same medical object). In another example, in case two medical entities are in different diagnostic states, if the two medical entities are compatible (i.e. the two medical entities represent the same medical object), the type of certain attributes of the two medical entities should be the same and the values of these attributes should be the same (e.g. the positions of the two medical entities should be the same). Thus, the predetermined rule may comprise at least a conflict condition of the feature when the first medical entity and the second medical entity are in the same diagnostic state and a conflict condition of the feature when the first medical entity and the second medical entity are in different diagnostic states. Further, the predetermined rule may be set in advance by the manufacturer or the user based on, for example, experience or statistical training.
Taking the medical entity 'pistes' 310 and the medical entity 'pistes' 320 shown in fig. 3 as an example, wherein the medical entity 'pistes' 310 is considered a first medical entity, and the medical entity 'pistes' 320 is considered a second medical entity. As described above, the diagnostic feature detecting unit 220 will detect the diagnostic status and attributes of the medical entity 'piste' 310 and the medical entity 'piste' 320, respectively, and thus the compatible factor determining unit 231 can easily determine that the medical entity 'piste' 310 and the medical entity 'piste' 320 are in the same diagnostic status. The attribute of the medical entity "JI temperate" 310 extracted by the diagnostic feature detection unit 220 is "position: right lung S4 "," size: diameter 2.5cm "and" polarity: positive "; the attribute of the medical entity "JI temperate" 320 extracted by the diagnostic feature detection unit 220 is "shape: unshaped "and" polarity: positive "; wherein "not reshape" means "irregular".
Based on the attributes of the medical entities ' pistes ' 310 and the medical entities ' 320, it is easy to find that the medical entities ' pistes ' 310 and the medical entities ' pistes ' 320 have a type of the same attribute (i.e., "polarity"), and the values of the types of the same attribute are the same (i.e., "positive"). Further, as shown in fig. 3, the sentence relating to the medical entity "clamp" 310 and the sentence relating to the medical entity "clamp" 320 are adjacent to each other, and thus, the distance between the medical entity "clamp" 310 and the medical entity "clamp" 320 will be calculated as, for example, zero. Therefore, according to the predetermined rule, since the medical entity ' clamp ' 310 differs from the type of other attributes of the medical entity ' 320 except for the polarity type, and the distance between the medical entity ' clamp ' 310 and the medical entity ' clamp ' 320 is very small, the compatibility factor determining unit 231 will determine that the semantic values between the medical entity ' clamp ' 310 and the medical entity ' clamp ' 320 do not conflict. In addition, if the types of attributes of the medical entity "pistachio" 310 and the medical entity "pistachio" 320 are connected in a consecutive manner (i.e., "position" → "size" → "polarity" → "shape" → "polarity"), the semantic meanings obtained do not conflict (i.e., the consecutive connections conform to the writing criteria of the medical document). Therefore, according to the predetermined rule, the compatibility factor determining unit 231 determines that the semantic sequences between the medical entity 'clamp' 310 and the medical entity 'clamp' 320 do not conflict.
Then, since the semantic sequence and the semantic value between the medical entity "clamp" 310 and the medical entity "320 do not conflict, the compatibility judging unit 232 would judge that the medical entity" clamp "310 is compatible with the medical entity" clamp "320.
Further, taking the medical entities "pistes" 310 and medical entities "330 shown in fig. 3 as another example, wherein the medical entity" pistes "310 is considered a first medical entity, and the medical entity" pistes "330 is considered a second medical entity. Based on the detected diagnostic states of the medical entities 'pistes' 310 and the medical entities 'pistes' 330, the compatible factor determining unit 231 may easily determine that the medical entity 'pistes' 310 and the medical entity 'pistes' 330 are in the same diagnostic state. The attribute of the medical entity "JI temperate" 330 extracted by the diagnostic feature detection unit 220 is "position: process gap "," size: 1cm "and" polarity: plus ", wherein" shifted septa "means" mediastinum ".
Based on the attributes of the medical entities "pistes" 310 and the medical entities "330, it is easy to find the types of the medical entities" pistes "310 and the medical entities" pistes "330 having three identical attributes (i.e.," position "," size ", and" polarity "). Furthermore, as shown in fig. 3, there are three sentences between the sentences relating to the medical entity "310 and the sentences relating to the medical entity" 330, and thus, the distance between the medical entity "310 and the medical entity" 330 would be calculated as, for example, 3. Therefore, according to a predetermined rule, since the medical entity "310 is not the same as the type of several attributes of the medical entity" 330 (for example, "location" and "size"), and the distance between the medical entity "310 and the medical entity" greens "330 is not small, the compatible factor determining unit 231 will determine that the semantic value between the medical entity" greens "310 and the medical entity" greens "330 conflicts. In addition, if the types of attributes of the medical entity 'clamp' 310 and the medical entity 'clamp' 330 are connected in a consecutive manner (i.e., "position" → "size" → "polarity" → "position" → "size" → "polarity"), the obtained semantic meanings conflict (i.e., the consecutive connections do not conform to the writing criteria of the medical document). Therefore, according to the predetermined rule, the compatibility factor determining unit 231 determines that the semantic sequence conflicts between the medical entity 'clamp' 310 and the medical entity 'clamp' 330.
Then, since the semantic sequence and the semantic value between the medical entity 'clamp' 310 and the medical entity '330 conflict, the compatibility judging unit 232 would judge that the medical entity' clamp '310 is incompatible with the medical entity' 330.
Further, taking the medical entities "pistes" 310 and medical entities "340 shown in fig. 3 as another example, wherein the medical entity" pistes "310 is considered a first medical entity, and the medical entity" pistes "340 is considered a second medical entity. Based on the detected diagnostic status of the medical entity 'piste' 310 and the medical entity 'piste' 340, the compatible factor determining unit 231 may easily determine that the medical entity 'piste' 310 and the medical entity 'piste' 340 are in different diagnostic statuses, wherein the medical entity 'piste' 310 is in the found status, and the medical entity 'piste' 340 is in the compared status. The attribute of the medical entity "JI temperate" 340 extracted by the diagnostic feature detection unit 220 is "position: process interval "," object: forward-backward "and" trend: increase ", wherein" previous return "means" previous time ", and" increase "means" increase ".
According to the attributes of the medical entity 'piste' 310 and the medical entity 'piste' 340, it is easy to find that the medical entity 'piste' 310 and the medical entity 'piste' 340 have a type (i.e. "position") of one and the same attribute, and that the values of the types of the same attribute are different. Furthermore, as shown in fig. 3, there are five sentences between the sentences relating to the medical entity "310 and the sentences relating to the medical entity" 340, and therefore, the distance between the medical entity "310 and the medical entity" 340 will be calculated as, for example, 5. Therefore, according to a predetermined rule, since the values of the attributes having the same type are different in the medical entity "310 and the medical entity" 340 and the distance between the medical entity "green" 310 and the medical entity "green" 340 is large, the compatibility determining unit 231 determines that the semantic value between the medical entity "green" 310 and the medical entity "green" 340 conflicts. Moreover, if the diagnostic states of the medical entity 'piste' 310 and the medical entity 'piste' 340 are connected in a consecutive manner (i.e. "found state" → "compared state"), the obtained semantic meanings do not conflict (i.e. the consecutive connections comply with the writing criteria of the medical document, or the distance between the medical entity 'piste' 310 and the diagnostic state of the medical entity 'piste' 340 is zero). Therefore, according to the predetermined rule, the compatibility factor determining unit 231 determines that the semantic sequences between the medical entity 'clamp' 310 and the medical entity 'clamp' 340 do not conflict.
Then, since the semantic values of the medical entity 'clamp' 310 and the medical entity 'clamp' 340 conflict, the compatibility judging unit 232 would judge that the medical entity 'clamp' 310 is incompatible with the medical entity 'clamp' 340.
Further, taking the medical entities "melilabes" 330 and medical entities "melilabes" 340 shown in fig. 3 as another example, wherein the medical entity "melilabes" 330 is considered a first medical entity, and the medical entity "melilabes" 340 is considered a second medical entity. Based on the detected diagnostic status of the medical entity 'piste' 330 and the medical entity 'piste' 340, the compatible factor determining unit 231 may easily determine that the medical entity 'piste' 330 and the medical entity 'piste' 340 are in different diagnostic statuses, wherein the medical entity 'piste' 330 is in the found status and the medical entity 'piste' 340 is in the compared status.
Based on the attributes of the medical entity "330" and the medical entity "340," it is easy to find that the medical entity "330" and the medical entity "340 have the same attribute type (i.e.," position "), and the same attribute type values are the same (i.e.," shift interval "). Furthermore, as shown in fig. 3, there is a sentence between the sentence relating to the medical entity "330 and the sentence relating to the medical entity" 340, and thus, the distance between the medical entity "330 and the medical entity" 340 will be calculated as, for example, 1. Therefore, according to a predetermined rule, since the values of the attributes having the same type are the same in the medical entity "330 and the medical entity" 340 and the distance between the medical entity "green" 330 and the medical entity "green" 340 is small, the compatible factor determining unit 231 determines that the semantic value between the medical entity "green" 330 and the medical entity "green" 340 does not conflict. Moreover, if the diagnostic states of the medical entity 'piste' 330 and the medical entity 'piste' 340 are connected in a consecutive manner (i.e. "found state" → "comparison state"), the obtained semantic meanings do not conflict (i.e. the consecutive connections comply with the writing criteria of the medical document, or the distance between the medical entity 'piste' 330 and the diagnostic state of the medical entity 'piste' 340 is zero). Therefore, according to the predetermined rule, the compatibility factor determining unit 231 determines that the semantic sequences between the medical entity 'clamp' 330 and the medical entity 'clamp' 340 do not conflict.
Then, since the semantic sequence and the semantic value between the medical entity 'clamp' 330 and the medical entity '340 do not conflict, the compatibility judging unit 232 would judge that the medical entity' clamp '330 is compatible with the medical entity' 340.
In addition, the above example determines compatibility between the first medical entity and the second medical entity by determining whether the calculated attributes conflict according to a predetermined rule. However, the compatibility between the first medical entity and the second medical entity may also be determined by calculating a compatibility score based on the calculated degree of conflict of the features according to a predetermined rule. In one example, in case the first medical entity and the second medical entity are in the same diagnostic state, the compatibility factor determination unit 231 will calculate the compatibility score between them according to the following formula:
wherein,representing the ratio of the number of types of the same attribute to the total number of types of the attribute in the first and second medical entities,representing a ratio of the number of types of the same attribute with different values in the first and second medical entities to the total number of types of the attribute,a ratio of a number of types of attributes representing semantic sequence anomalies in the first and second medical entities to a total number of types of attributes,represents a ratio of sentence distance between the first medical entity and the second medical entity to a total number of sentences in the diagnostic state, and WType (B)、WValue of、WSequence of、WSentence distanceRepresenting a predetermined weight that is preset by the manufacturer or user based on experience, for example.
Then, in case the calculated compatibility score is greater than or equal to a predetermined threshold, which may be set in advance by the manufacturer or the user based on experience, the compatibility determination unit 232 will determine that the first medical entity is compatible with the second medical entity.
In another example, in case the first medical entity and the second medical entity are in different diagnostic states, the compatibility factor determination unit 231 will calculate the compatibility score between them according to the following formula:
wherein,representing a ratio of the number of types of the same attribute with different values in the first and second medical entities to the total number of types of the attribute,a ratio of a number of states representing semantic sequence anomalies in the first and second medical entities to a total number of states in the overall diagnostic process,representing the ratio of sentence distance between the first medical entity and the second medical entity to the total number of sentences in the overall diagnostic process,represents a ratio of a state distance between the first medical entity and the second medical entity to a total number of states in the entire diagnostic process, and WValue of、WSequence of、WSentence distance、WDistance of stateRepresenting a predetermined weight that is preset by the manufacturer or user based on experience, for example.
Then, in case the calculated compatibility score is greater than or equal to another predetermined threshold, which may be set in advance by the manufacturer or the user based on experience, the compatibility determination unit 232 will determine that the first medical entity is compatible with the second medical entity.
As described above, the compatibility factor determination unit 231 determines the compatibility factor based on the predetermined rule. However, as an alternative solution, the compatibility factor determination unit 231 may also determine the compatibility factor according to the following steps:
first, the compatibility factor determination unit 231 will also calculate at least one of the above features.
Secondly, the compatibility factor determining unit 231 determines the conflict of semantic values and the conflict of semantic sequences based on a pre-generated model; wherein the pre-generated model comprises at least: a model for features when the first medical entity and the second medical entity are in the same diagnostic state, and a model for features when the first medical entity and the second medical entity are in different diagnostic states. Further, the pre-generated model may be generated in advance by the manufacturer, for example, based on statistical training. Since the operation of determining the compatibility factor based on the pre-generated model is similar to the above-described operation of determining the compatibility factor based on the predetermined rule, a detailed description will not be repeated here.
Returning now to fig. 2, after the compatibility determination unit 230 determines the compatibility between the first medical entity and the second medical entity, the coreference resolution unit 240 will determine whether the first medical entity and the second medical entity represent the same medical object based on the determined compatibility. Further, in case the determined compatibility is compatible, the coreference resolution unit 240 will determine that the first medical entity and the second medical entity represent the same medical object.
Referring to the above examples, as shown in fig. 3, the medical entity "310 and the medical entity" 320 are mutually compatible, the medical entity "310 and the medical entity" 330 are not mutually compatible, the medical entity "310 and the medical entity" 340 are not mutually compatible, and the medical entity "330 and the medical entity" 340 are mutually compatible, so the coreference resolving unit 240 determines that the medical entity "310 and the medical entity" 320 represent one and the same "JI", and the medical entity "330 and the medical entity" 340 represent another and the same "JI".
As described above, the present invention determines whether two medical entities are co-referred to each other based on compatibility between the two medical entities. Compatibility may be interpreted as a set of features that meet certain constraints (e.g., writing criteria of a medical document) that should not conflict with one another, otherwise the set of features are not compatible with one another. That is, two medical entities are compatible (i.e., they represent the same medical object) without features related to the two medical entities conflicting. Furthermore, since the present invention semantically determines conflicts between features related to two medical entities based on writing criteria of medical documents, it is also possible to accurately process medical entities describing seemingly dissimilar to each other. Therefore, according to the present invention, the accuracy of coreference resolution will be improved.
(bulk treatment)
The overall process performed by the configuration of the first embodiment in fig. 2 will be described with reference to fig. 4. Fig. 4 is a flowchart 400 schematically showing the procedure of the overall process according to the first embodiment in fig. 2.
As described above, first, the input device 150 shown in fig. 1 will receive a medical document input by a user or output from an image acquisition device, wherein the medical document may be a portion of the medical document or the entire medical document. In addition, in the case where the medical document is in an image format, the CPU110 receives the medical document from the input device 150 via the system bus 180, and first converts the medical document from the image format into a text format.
Further, the input device 150 will receive the first medical entity and the second medical entity selected by the user from the medical document or extracted by the CPU110 by using existing text information extraction techniques. Next, the input device 150 transmits the received medical document, the first medical entity, and the second medical entity to the acquisition unit 210 shown in fig. 2 via the system bus 180.
Then, as shown in fig. 4, in the acquisition step S410, the acquisition unit 210 acquires the first medical entity, the second medical entity, and the medical document from the input device 150 via the system bus 180.
In the diagnostic feature detection step S420, the diagnostic feature detection unit 220 will detect the diagnostic status of the first medical entity, the at least one attribute of the first medical entity, the diagnostic status of the second medical entity and the at least one attribute of the second medical entity from the medical document.
In the compatibility factor determining step S430, the compatibility determining unit 230 will determine a compatibility factor between the first medical entity and the second medical entity based on the detected diagnostic status and attribute.
In the compatibility determination step S440, the compatibility determination unit 230 will determine the compatibility between the first medical entity and the second medical entity based on the determined compatibility factor.
In case the first medical entity is compatible with the second medical entity, the coreference resolution unit 240 will determine that the first medical entity and the second medical entity represent the same medical object in the coreference resolution step S450. Otherwise, in the coreference resolution step S460, the coreference resolution unit 240 will determine that the first medical entity and the second medical entity do not represent the same medical object.
Finally, the coreference resolution unit 240 transmits the coreference resolution results to the output device 160 shown in fig. 1 via the system bus 180 for displaying the processing results to the user or for subsequence operations such as information extraction, which will be described later.
(construction of apparatus for coreference resolution Using similarity measure and compatibility)
As described in the first embodiment, the apparatus for coreference resolution 200 shown in fig. 2 determines whether a first medical entity and a second medical entity represent the same medical object only by using compatibility between the first medical entity and the second medical entity. However, the invention may also perform coreference resolution by using similarity measures and compatibilities between the first medical entity and the second medical entity. In this embodiment, next, with reference to fig. 5 and 7, the configuration of an apparatus for coreference resolution that performs coreference resolution by using similarity measure and compatibility will be described. The apparatus for coreference resolution of this embodiment has the same hardware configuration as described in fig. 1.
Fig. 5 is a block diagram illustrating a configuration of an apparatus 500 for coreference resolution according to a second embodiment of the present invention.
The blocks shown in fig. 5 are implemented as the CPU110 described above with reference to fig. 1 and used to execute the program loaded to the RAM 120 and to cooperate with the respective hardware shown in fig. 1. Some or all of these blocks may be implemented by dedicated hardware.
Comparing fig. 5 with fig. 2, there are two main differences in the apparatus 500 for coreference resolution shown in fig. 5: first, the apparatus 500 for coreference resolution further comprises a first similarity measure determination unit 510, the first similarity measure determination unit 510 being configured to determine a first similarity measure between a first medical entity and a second medical entity based on a similarity measure of word features between the first and second medical entities in the medical document and a similarity measure of contexts of the first and second medical entities in the medical document. Wherein the measure of similarity between the first medical entity and the second medical entity represents a degree of similarity between the first medical entity and the second medical entity. For example, if the similarity is very high, it means that the first medical entity and the second medical entity use almost the same word features. If the degree of similarity is high, it means that the first medical entity and the second medical entity use very similar word features, or some word features used in the first medical entity and the second medical entity are different, or are replaced by synonyms, such as word features "failure" and "failure". In addition, a similarity measure may be determined by using, for example, the above-mentioned US patent US 8457950.
Secondly, the diagnostic feature detection unit 220 detects the diagnostic status of the first medical entity, the attribute of the first medical entity, the diagnostic status of the second medical entity and the attribute of the second medical entity only when the first similarity measure determination unit 510 determines that the first similarity measure between the first medical entity and the second medical entity is greater than or equal to a threshold value (i.e., TH shown in fig. 6), wherein the threshold value may be predefined by a manufacturer or a user according to, for example, actual applications or experience. That is, in this embodiment, the invention may use the compatibility between two medical entities to correct false positives caused by performing coreference resolution using similarity measures. Wherein the misjudgment is that two medical entities are determined to be co-referent while the two medical entities are identical or similar to each other on their surfaces but are not actually co-referent to each other. Therefore, the accuracy of the coreference resolution will be improved.
Since other detailed descriptions for the acquisition unit 210, the diagnostic feature detection unit 220, the compatibility determination unit 230, and the coreference resolution unit 240 shown in fig. 5 are similar to the corresponding units shown in fig. 2, the detailed descriptions will not be repeated here.
Next, the overall process performed by the configuration of the second embodiment in fig. 5 will be described with reference to fig. 6. Fig. 6 is a flowchart 600 schematically showing the procedure of the overall process according to the second embodiment in fig. 5.
Comparing fig. 6 with fig. 4, there are the following main differences in the flow chart 600 shown in fig. 6:
after the acquisition unit 210 acquires the first medical entity, the second medical entity and the medical document from the input device 150 via the system bus 180 in the acquisition step S410, in a first similarity measure determination step S610, the first similarity measure determination unit 510 determines a first similarity measure between the first medical entity and the second medical entity based on a similarity measure of word features between the first and second medical entities in the medical document and a similarity measure of contexts of the first and second medical entities in the medical document. And in case the first similarity measure determination unit 510 determines that the first similarity measure between the first medical entity and the second medical entity is greater than or equal to the threshold value in S610 (i.e. TH shown in fig. 6), the procedure will go to step S420; otherwise, the process will be terminated.
Since other detailed descriptions for steps S410 to S460 shown in fig. 6 are similar to the corresponding steps shown in fig. 4, the detailed description will not be repeated here.
As described above, the embodiment shown in fig. 5 uses compatibility to correct erroneous judgments caused by performing coreference resolution by using a similarity measure. Furthermore, there are other solutions to perform coreference resolution by using similarity measures and compatibility. Fig. 7 is another block diagram illustrating the configuration of an apparatus 700 for coreference resolution according to the second embodiment of the present invention.
The blocks shown in fig. 7 are implemented as the CPU110 described above with reference to fig. 1 and used to execute the program loaded to the RAM 120 and to cooperate with the respective hardware shown in fig. 1. Some or all of these blocks may be implemented by dedicated hardware.
Comparing fig. 7 with fig. 2, there are two main differences in the apparatus 700 for coreference resolution shown in fig. 7:
first, the apparatus 700 for coreference resolution further comprises a second similarity measure determination unit 710, the second similarity measure determination unit 710 being configured to determine a second similarity measure between the first medical entity and the second medical entity based on the similarity measure of the word features between the first and second medical entities in the medical document and the similarity measure of the context of the first and second medical entities in the medical document. Therein, a similarity measure between a first medical entity and a second medical entity may be determined by using, for example, the above-mentioned US patent US 8457950.
Second, the coreference resolution unit 240 will determine whether the first medical entity and the second medical entity represent the same medical object based on the determined second similarity measure and the determined compatibility. For example, the coreference resolution unit 240 determines that the first medical entity and the second medical entity represent the same medical object in case both the determined second similarity measure and the determined compatibility are greater than or equal to a first threshold (TH1) or in case a weighted sum of the determined second similarity measure and the determined compatibility is greater than or equal to a second threshold (TH 2). TH1 and TH2 may be predefined by the manufacturer or user, among others, depending on, for example, the actual application or experience.
The determined compatibility between the two medical entities may be recorded as a compatibility score, or may be recorded as compatible or incompatible, as described in the first embodiment. Therefore, when recorded as compatible, the determined compatibility may be regarded as 1, and when recorded as incompatible, the determined compatibility may be regarded as 0. In addition, both the weight of the determined first similarity measure and the weight of the determined compatibility may be set to 1, in which case the above-mentioned weighted sum, i.e. the sum of the determined first similarity measure and the determined compatibility, is the sum.
Since other detailed descriptions for the acquisition unit 210, the diagnostic feature detection unit 220, and the compatibility determination unit 230 shown in fig. 7 are similar to the corresponding units shown in fig. 2, the detailed descriptions will not be repeated here. In this embodiment, for certain medical entities that describe criteria that do not comply with medical documents, the present invention may use a similarity measure between the medical entities to correct false positives caused by performing coreference resolution using compatibility. Therefore, the accuracy of the coreference resolution will be improved.
Next, the overall process performed by the configuration of the second embodiment in fig. 7 will be described with reference to fig. 8, and fig. 8 is a flowchart 800 schematically showing the procedure of the overall process according to the second embodiment in fig. 7.
Compared to fig. 4, there are two main differences in the flow chart 800 shown in fig. 8:
first, in addition to determining the compatibility between the first medical entity and the second medical entity, the flow chart 800 further comprises a second similarity measure determining step S810. In this step, the second similarity measure determination unit 710 determines a second similarity measure between the first medical entity and the second medical entity based on the similarity measure of the word features between the first and second medical entities in the medical document and the similarity measure of the contexts of the first and second medical entities in the medical document.
Secondly, after the compatibility determination unit 230 determines the compatibility in steps S430 to S440 and the second similarity measure determination unit 710 determines the second similarity measure in step S810, in a coreference resolution step S820 the coreference resolution unit 240 determines whether the first medical entity and the second medical entity represent the same medical object based on the determined second similarity measure and the determined compatibility.
Since other detailed descriptions for steps S410 to S440 shown in fig. 8 are similar to the corresponding steps shown in fig. 4, the detailed description will not be repeated here.
(construction of apparatus for information extraction)
As described in the first and second embodiments, the apparatus for coreference resolution 200 shown in fig. 2, the apparatus for coreference resolution 500 shown in fig. 5, and the apparatus for coreference resolution 700 shown in fig. 7 may be used for information extraction. In this embodiment, next, a configuration of an apparatus for information extraction to which the above-described apparatus for coreference resolution 200, 500, or 700 is applied will be described with reference to fig. 9. The apparatus for information extraction of this embodiment has the same hardware configuration as that described in fig. 1.
Fig. 9 is a block diagram illustrating a structure of an apparatus 900 for information extraction according to a third embodiment of the present invention.
The blocks shown in fig. 9 are implemented as the CPU110 described above with reference to fig. 1 and used to execute the program loaded to the RAM 120 and to cooperate with the respective hardware shown in fig. 1. Some or all of these blocks may be implemented by dedicated hardware.
As shown in fig. 9, an apparatus 900 for information extraction according to a third embodiment of the present invention includes: the apparatus for coreference resolution 200, 500 or 700, the obtaining unit 910, the medical entity extracting unit 920 and the medical entity merging unit 930 described above.
First, as described in the first embodiment, the input device 150 shown in fig. 1 will receive a medical document input by a user or output from an image acquisition apparatus, wherein the medical document may be a part of the medical document or the entire medical document. In addition, in the case where the medical document is in an image format, the CPU110 receives the medical document from the input device 150 via the system bus 180, and first converts the medical document from the image format into a text format.
Next, as shown in fig. 9, the obtaining unit 910 obtains the medical document from the input device 150 or the CPU110 via the system bus 180.
Third, the medical entity extracting unit 920 extracts at least two medical entities from the medical document obtained by the obtaining unit 910. The medical entity extracting unit 920 may extract the medical entity using, for example, an existing text information extracting technique.
The means for coreference resolution, which may be the means 200, 500 or 700 described above, will then determine whether any two of the medical entities represent the same medical object in accordance with the description above with reference to fig. 2 to 8.
Finally, the medical entity merger unit 930 will merge the diagnostic states and attributes of the medical entities determined to be co-referenced to each other by the means for co-reference resolution. The medical entity merging unit 930 then transmits the merged result to the output device 160 shown in fig. 1 via the system bus 180 for displaying the processed result to the user or for sub-sequence operations such as similar document retrieval and the like to be described later.
Thus, according to the information extraction embodiment, each of the medical objects in the obtained medical document can be obtained as one piece of description data.
In order to obtain more accurate description data for the medical object, as a preferred solution, the medical entity merging unit 930 may merge the diagnostic states and attributes of the medical entities that are co-referred to each other based on the sequence of appearances of the medical entities in the obtained medical document. Furthermore, the medical entity merging unit 930 may implement the merging in various ways as will be described below with reference to fig. 10 to 11.
(bulk treatment)
The overall process performed by the configuration of the third embodiment in fig. 9 will be described with reference to fig. 10 to 11. Fig. 10 is a flowchart 1000 schematically showing the procedure of the overall process according to the third embodiment in fig. 9.
As described above, first, the input device 150 shown in fig. 1 will receive a medical document input by a user or output from an image acquisition device, wherein the medical document may be a portion of the medical document or the entire medical document. In addition, in the case where the medical document is in an image format, the CPU110 receives the medical document from the input device 150 via the system bus 180, and first converts the medical document from the image format into a text format.
Then, as shown in fig. 10, in the obtaining step S1010, the obtaining unit 910 obtains the medical document from the input device 150 or the CPU110 via the system bus 180.
In the medical entity extraction step S1020, the medical entity extraction unit 920 will extract at least two medical entities from the medical document obtained by the obtaining unit 910.
In the coreference resolution steps S1030 to S1040, first in step S1030, the apparatus for coreference resolution, which may be the above-described apparatus 200, 500 or 700, will select the extracted medical entity that first appears in the obtained medical document as the first-appearing medical entity. Then, in step S1040, the means for coreference resolution, which may be the means 200, 500 or 700 described above, determines a medical entity that is coreferenced with the first-occurring medical entity according to the description described above with reference to fig. 2 to 8.
In the medical entity merging step S1050, the medical entity merging unit 930 will merge the diagnostic status and attributes of the first appearing medical entity and the medical entities that are co-referred to with the first appearing medical entity.
Then, in step S1060, the medical entity merging unit 930 will determine whether the extracted medical entity remains. In the case where any medical entity remains, the process will repeat the operations performed in steps S1030 to S1050 for the remaining medical entity; otherwise, the procedure will be terminated and the description data for the respective medical object will be displayed to the user or used for the sub-sequence operation.
An alternative solution will be described below with reference to fig. 11. Fig. 11 is another flowchart 1100 schematically showing the procedure of the overall process according to the third embodiment in fig. 9.
As shown in fig. 11, since detailed descriptions for steps S1010 to S1030 and S1060 shown in fig. 11 are similar to the corresponding steps shown in fig. 10, the detailed descriptions will not be repeated here.
After the first-occurring medical entity is selected in step S1030, in step S1110, the means for coreference resolution, which may be the means 200, 500 or 700 described above, will determine the closest and coreference medical entity to the first-occurring medical entity according to the description above with reference to fig. 2-8.
In step S1120, the medical entity consolidation unit 930 will consolidate the diagnostic status and attributes of the first occurring medical entity and the medical entity determined in step S1110.
In step S1130, for the merged medical entity, the means for coreference resolution, which may be the means 200, 500 or 700 described above, will determine whether the medical entity closest to and coreference with the merged medical entity is included according to the description described above with reference to fig. 2 to 8. In case the medical entity closest to and co-referred to the merged medical entity is still included, the procedure will go to step S1140; otherwise, the process goes to step S1060.
In step S1140, the medical entity consolidation unit 930 consolidates the diagnostic status and attributes of the medical entity determined in step S1130 to the consolidated medical entity. The process will then repeat the operations performed in steps S1130 to S1140 until there are no medical entities that are closest to and commonly referred to the merged medical entity.
Those skilled in the art will appreciate that the merge operations described above with reference to fig. 10-11 are illustrative only and not limiting.
Taking the above-mentioned medical entities ' 310 to 340 as an example, as described above, the medical entities ' 310 and ' 320 ' are green-web ' indicates a same quee ' and the medical entities ' 330 and ' 340 ' are green-web ' and the medical entities ' 330 and ' 340 ' are quees ' indicate another same quee ' as described above. Thus, the diagnostic states and attributes of the medical entity 'temperate' 310 and the medical entity 'temperate' 320 may be merged into one description data for the medical object, with the corresponding description data shown, for example, in table 1 below. Furthermore, the diagnostic states and attributes of the medical entity 'temperate "330 and the medical entity' temperate" 340 may be merged into another description data for another medical object, with the corresponding description data shown, for example, in table 2 below.
TABLE 1
TABLE 2
(construction of apparatus for similar document retrieval)
As described in the third embodiment, the apparatus 900 for information extraction shown in fig. 9 can be used for similar document retrieval. Wherein if two documents are the same document, it means that the two documents almost have the same content. If two documents are similar documents, it means that there is a significant overlap of conceptual content between the two documents. In this embodiment, next, the configuration of an apparatus for similar document retrieval to which the above-described apparatus 900 for information extraction is applied will be described with reference to fig. 12. The apparatus for similar document retrieval of this embodiment has the same hardware configuration as that described in fig. 1.
Fig. 12 is a block diagram illustrating the configuration of an apparatus 1200 for similar document retrieval according to a fourth embodiment of the present invention.
The blocks shown in fig. 12 are implemented as the CPU110 described above with reference to fig. 1 and used to execute the program loaded to the RAM 120 and to cooperate with the respective hardware shown in fig. 1. Some or all of these blocks may be implemented by dedicated hardware.
As shown in fig. 12, an apparatus 1200 for similar document retrieval according to the fourth embodiment of the present invention includes: the above-described apparatus for information extraction 900, the similarity measure calculation unit 1210, and the similar document retrieval unit 1220.
When a user (e.g., a doctor) is writing or reading a medical document, he/she often wants to retrieve some historic medical documents for reference that are similar to the medical document he/she is writing or reading.
Thus, first, the input device 150 shown in fig. 1 receives a medical document written or read by a user, wherein the medical document may be directly input by the user or may be output from the image acquisition device, and the medical document may be a part of the medical document or the entire medical document.
For the medical documents received by the input device 150, the means 900 for information extraction will extract the diagnosis status and attributes of the incorporated medical entity (i.e. the above-mentioned description data for the respective medical object) from the obtained medical documents according to the above-mentioned description with reference to fig. 9 to 11.
Meanwhile, in one example, the CPU110 shown in FIG. 1 will obtain historic medical documents stored in the ROM 130 or the hard disk 140. In another example, the CPU110 will obtain a historic medical document stored in a server connected to the apparatus 1200 for similar document retrieval via a network, wherein a detailed description will be described below with reference to fig. 14. Then, for each of the historic medical documents, the means for information extraction 900 will extract the diagnostic status and attributes of the merged medical entity from the corresponding historic medical document according to the description above with reference to fig. 9 to 11.
Then, as shown in fig. 12, the similarity measure calculation unit 1210 will calculate a similarity measure among the diagnosis statuses and attributes of the merged medical entity extracted from the obtained medical document, and the diagnosis statuses and attributes of the merged medical entity extracted from the historic medical document. Wherein the similarity measure may be, for example, an edit distance.
Finally, the similar document retrieving unit 1220 is to retrieve at least one medical document similar to the obtained medical document from the historical medical documents based on the calculated similarity measure. In one example, the similar document retrieving unit 1220 may retrieve one medical document corresponding to the largest similarity measure as a final processing result. In another example, the similar document retrieving unit 1220 may retrieve some of the medical documents ranked based on the calculated value of the similarity measure.
Then, the similar document retrieving unit 1220 transfers the retrieved similar medical documents to the output device 160 shown in fig. 1 via the system bus 180 for displaying the processing results to the user or for sub-sequence operations such as diagnosis support or the like.
Therefore, according to this similar document retrieval embodiment, historic medical documents that are superficially similar to, but not actually similar to, the medical documents input by the user will not be retrieved, and historic medical documents that are superficially dissimilar to, but not actually similar to, the medical documents input by the user will be retrieved. That is, according to the present invention, historic medical documents having significant conceptual content overlapping with the medical documents input by the user and/or historic medical documents describing similar subjects to the medical documents input by the user will be retrieved. Therefore, the accuracy of similar document retrieval will be improved.
(bulk treatment)
The overall process performed by the configuration of the fourth embodiment in fig. 12 will be described with reference to fig. 13. Fig. 13 is a flowchart 1300 schematically showing the procedure of the overall process according to the fourth embodiment in fig. 12.
As described above, for each of the medical documents input by the user and the historic medical documents stored in the ROM 130, the hard disk 140, or in the server, in the information extraction step S1310, the apparatus for information extraction 900 will extract the diagnosis state and the attribute of the merged medical entity from the obtained medical documents, and will extract the diagnosis state and the attribute of the merged medical entity from the historic medical documents, according to the description above with reference to fig. 9 to 11.
In the similarity measure calculation step S1320, the similarity measure calculation unit 1210 calculates a similarity measure among the diagnosis statuses and attributes of the merged medical entity extracted from the obtained medical document, and the diagnosis statuses and attributes of the merged medical entity extracted from the historic medical document.
In the similar document retrieving step S1330, the similar document retrieving unit 1220 retrieves at least one medical document similar to the obtained medical document from the historic medical documents based on the calculated similarity measure.
Then, the similar document retrieving unit 1220 transfers the retrieved similar medical documents to the output device 160 shown in fig. 1 via the system bus 180 for displaying the processing results to the user or for sub-sequence operations.
(similar document retrieval system)
As described in the fourth embodiment, the historic medical documents can be stored in a server connected to the apparatus 1200 for similar document retrieval via a network. In this embodiment, next, an exemplary similar document retrieval system 1400 to which the above-described apparatus 1200 for similar document retrieval is applied will be described with reference to fig. 14. FIG. 14 illustrates an arrangement of an exemplary similar document retrieval system 1400 in accordance with the present invention.
As shown in fig. 14, the similar document retrieval system 1400 includes an electronic device 1410 that is the apparatus 1200 for similar document retrieval described above, and a server 1430 for storing historical medical documents and other historical data. The electronic device 1410 of this embodiment may have the same hardware configuration as described in fig. 1 and the same configuration as described in fig. 12.
The electronic device 1410 is configured to obtain historic medical documents from a server 1430 via a network 1420. The electronic device 1410 is configured to retrieve, from the obtained historical medical documents, medical documents similar to the medical documents input by the user, according to the above description with reference to fig. 12 to 13. Further, the electronic device 1410 may be a device such as a mobile phone, a Personal Data Assistant (PDA), a laptop, a desktop, a tablet computer, etc., or other suitable personal device. As shown in fig. 14, the electronic device 1410 is, for example, a notebook computer (i.e., a personal computer).
All of the above units are exemplary and/or preferred modules for implementing the processes described in this disclosure. These units may be hardware units, such as Field Programmable Gate Arrays (FPGAs), digital signal processors, application specific integrated circuits, etc., and/or software modules, such as computer readable programs. The above does not exhaustively describe the elements for performing the various steps. However, there are steps of performing a specific process, and there may be corresponding functional modules or units (implemented by hardware and/or software) for implementing the same process. All combinations of the described steps and elements corresponding to these steps are included in the disclosure of the present application as long as the constituting technical solutions are complete and applicable.
Further, if the apparatus for coreference resolution 200 as shown in fig. 2, 5 or 7, the apparatus for information extraction 900 shown in fig. 9 or the apparatus for similar document retrieval 1200 shown in fig. 12, which are constituted by various units, are partially or entirely constructed by software, they may be stored in the hard disk 140 shown in fig. 1. On the other hand, if the apparatus for coreference resolution 200 shown in fig. 2, fig. 5, or fig. 7, the apparatus for information extraction 900 shown in fig. 9, or the apparatus for similar document retrieval 1200 shown in fig. 12 is partially or entirely constructed of hardware or firmware, it may also be incorporated as a functional module into an electronic device (e.g., a computer) as long as there is a need for coreference resolution or a need for information extraction or a need for similar document retrieval in the electronic device.
The method and apparatus of the present invention can be implemented in a number of ways. For example, the methods and apparatus of the present invention can be implemented in software, hardware, firmware, or any combination thereof. The above-described order for the steps of the method is intended to be illustrative only, and the steps of the method of the present invention are not limited to the order specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, including machine-readable instructions for implementing a method according to the present invention. Therefore, the present invention also covers a recording medium storing a program for implementing the method according to the present invention.
While some specific embodiments of the present invention have been shown in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are intended to be illustrative only and are not intended to limit the scope of the invention. It will be appreciated by those skilled in the art that modifications can be made to the above-described embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.
Claims (21)
1. An apparatus for coreference resolution, the apparatus for coreference resolution comprising:
an acquisition unit configured to acquire a first medical entity and a second medical entity from an input medical document;
a diagnostic feature detection unit configured to detect, from the medical document, a diagnostic status of the first medical entity, a diagnostic status of the second medical entity, at least one attribute of the first medical entity and at least one attribute of the second medical entity;
a compatibility determination unit configured to determine compatibility between the first medical entity and the second medical entity based on the detected diagnostic status and attributes; and
a coreference resolution unit configured to determine, based on the determined compatibility, whether the first medical entity and the second medical entity represent a same medical object, wherein,
the diagnostic status represents a location of the medical entity in a diagnostic process in the medical document;
the attributes represent diagnostic items of the medical entity in the medical document; and is
Compatibility represents the possibility that the medical entity and the other medical entity represent the same medical object.
2. The apparatus for coreference resolution as claimed in claim 1, wherein for a medical entity, the diagnostic feature detection unit:
extracting predefined content related to the medical entity from the medical document; and identifying a diagnostic status of the medical entity by analyzing the extracted content.
3. The apparatus for coreference resolution as claimed in claim 1, wherein the compatibility determining unit comprises:
a compatibility factor determination unit configured to determine a compatibility factor between the first medical entity and the second medical entity based on the detected diagnostic status and attributes, wherein the compatibility factor represents a semantic conflict between the first medical entity and the second medical entity; and
a compatibility determination unit configured to determine compatibility between the first medical entity and the second medical entity based on the determined compatibility factor.
4. The apparatus for coreference resolution as claimed in claim 3, wherein the compatibility factors comprise:
a conflict of semantic values among the diagnostic status of the first medical entity, the attribute of the first medical entity, the diagnostic status of the second medical entity and the attribute of the second medical entity, an
A semantic sequence of a diagnostic state of the first medical entity, an attribute of the first medical entity, a diagnostic state of the second medical entity, and an attribute of the second medical entity.
5. The apparatus for coreference resolution as claimed in claim 4, wherein the compatibility factor determining unit:
calculating at least one of the following features:
a distance between the first medical entity and the second medical entity;
a sequence between a diagnostic state of the first medical entity and a diagnostic state of the second medical entity;
a distance between a diagnostic state of the first medical entity and a diagnostic state of the second medical entity;
a type of attribute of the first medical entity and the second medical entity;
a sequence between a type of attribute of the first medical entity and a type of attribute of the second medical entity;
the type belongs to the values of the attributes of both the first medical entity and the second medical entity, and
determining a collision of the semantic values and a collision of the semantic sequences based on the calculated features and a predetermined rule.
6. The apparatus for coreference resolution as claimed in claim 4, wherein the compatibility factor determining unit:
calculating at least one of the following features:
a distance between the first medical entity and the second medical entity;
a sequence between a diagnostic state of the first medical entity and a diagnostic state of the second medical entity;
a distance between a diagnostic state of the first medical entity and a diagnostic state of the second medical entity;
a type of attribute of the first medical entity and the second medical entity;
a sequence between a type of attribute of the first medical entity and a type of attribute of the second medical entity;
the type belongs to the values of the attributes of both the first medical entity and the second medical entity, and
determining a collision of the semantic values and a collision of the semantic sequences based on the computed features and the pre-generated model.
7. The apparatus for coreference resolution as claimed in claim 4, wherein the compatibility determining unit determines that the first medical entity is compatible with the second medical entity if the semantic values and the semantic sequences do not conflict.
8. The apparatus for coreference resolution as claimed in claim 1, wherein the coreference resolution unit determines that the first medical entity and the second medical entity represent a same medical object if the determined compatibilities are compatible.
9. The apparatus for coreference resolution as claimed in claim 1, further comprising:
a first similarity measure determination unit configured to determine a first similarity measure between the first medical entity and the second medical entity based on a similarity measure of word features between the first medical entity and the second medical entity in the medical document and based on a similarity measure of contexts of the first medical entity and the second medical entity in the medical document, wherein,
in case the first similarity measure determination unit determines that the first similarity measure between the first medical entity and the second medical entity is larger than or equal to a threshold value, the diagnostic feature detection unit detects a diagnostic status of the first medical entity, an attribute of the first medical entity, a diagnostic status of the second medical entity and an attribute of the second medical entity.
10. The apparatus for coreference resolution as claimed in claim 1, further comprising:
a second similarity measure determination unit configured to determine a second similarity measure between the first medical entity and the second medical entity based on a similarity measure of word features between the first medical entity and the second medical entity in the medical document and based on a similarity measure of contexts of the first medical entity and the second medical entity in the medical document, wherein,
the coreference resolution unit determines whether the first medical entity and the second medical entity represent the same medical object based on the determined second similarity measure and the determined compatibility.
11. An apparatus for information extraction, the apparatus for information extraction comprising:
an obtaining unit configured to obtain a medical document;
a medical entity extraction unit configured to extract at least two medical entities from the obtained medical document;
the apparatus for coreference resolution as claimed in any one of claims 1-10, configured to determine whether any two of the medical entities represent a same medical object; and
a medical entity merging unit configured to merge diagnostic states and attributes of medical entities that are co-referred to each other.
12. The apparatus for information extraction according to claim 11, wherein the medical entity merging unit merges diagnostic statuses and attributes of medical entities that are commonly referred to each other based on the appearance sequence of the medical entities in the obtained medical document.
13. An apparatus for similar document retrieval, the apparatus for similar document retrieval comprising:
the apparatus for information extraction according to any one of claims 11 to 12, configured to extract a merged diagnostic status and attributes of a medical entity from the obtained medical documents and to extract a merged diagnostic status and attributes of a medical entity from historical medical documents;
a similarity measure calculation unit configured to calculate a similarity measure among the merged diagnostic state and attributes of the medical entities extracted from the obtained medical documents, and the merged diagnostic state and attributes of the medical entities extracted from the historic medical documents;
a similar document retrieving unit configured to retrieve at least one medical document similar to the obtained medical document from the historical medical documents based on the calculated similarity measure.
14. A similar document retrieval system composed of at least one server and means for similar document retrieval connected to the server via a network,
the server is configured to store historic medical documents;
the device for similar document retrieval comprises:
the apparatus for information extraction according to any one of claims 11 to 12, configured to extract a merged diagnostic status and attributes of a medical entity from the obtained medical documents and to extract a merged diagnostic status and attributes of a medical entity from the historic medical documents;
a similarity measure calculation unit configured to calculate a similarity measure among the merged diagnostic state and attributes of the medical entities from the obtained medical documents, and the merged diagnostic state and attributes of the medical entities from the historic medical documents;
a similar document retrieving unit configured to retrieve at least one medical document similar to the obtained medical document from the historical medical documents based on the calculated similarity measure.
15. A method for coreference resolution, the method for coreference resolution comprising:
an acquisition step of acquiring a first medical entity and a second medical entity from an input medical document;
a diagnostic feature detection step of detecting, from the medical document, a diagnostic status of the first medical entity, at least one attribute of the first medical entity, a diagnostic status of the second medical entity, and at least one attribute of the second medical entity;
a compatibility determination step of determining compatibility between the first medical entity and the second medical entity based on the detected diagnostic status and attributes; and
a coreference resolution step of determining, based on the determined compatibility, whether the first medical entity and the second medical entity represent the same medical object, wherein,
the diagnostic status represents a location of the medical entity in a diagnostic process in the medical document;
the attributes represent diagnostic items of the medical entity in the medical document; and is
Compatibility represents the possibility that the medical entity and the other medical entity represent the same medical object.
16. The method for coreference resolution as claimed in claim 15, wherein the compatibility determining step comprises:
a compatibility factor determination step of determining a compatibility factor between the first medical entity and the second medical entity based on the detected diagnostic status and attributes, wherein the compatibility factor represents a semantic conflict between the first medical entity and the second medical entity; and
a compatibility determination step of determining compatibility between the first medical entity and the second medical entity based on the determined compatibility factor.
17. A method for coreference resolution as claimed in claim 15, wherein in the coreference resolution step, it is determined that the first medical entity and the second medical entity represent the same medical object if the determined compatibilities are compatible.
18. The method for coreference resolution as claimed in claim 15, further comprising:
a first similarity measure determining step of determining a first similarity measure between the first medical entity and the second medical entity based on a similarity measure of word features between the first medical entity and the second medical entity in the medical document and based on a similarity measure of contexts of the first medical entity and the second medical entity in the medical document, wherein,
in case it is determined in the first similarity measure determining step that the first similarity measure between the first medical entity and the second medical entity is larger than or equal to a threshold value, in the diagnostic feature detecting step a diagnostic status of the first medical entity, an attribute of the first medical entity, a diagnostic status of the second medical entity and an attribute of the second medical entity are detected.
19. The method for coreference resolution as claimed in claim 15, further comprising:
a second similarity measure determining step of determining a second similarity measure between the first medical entity and the second medical entity based on a similarity measure of word features between the first medical entity and the second medical entity in the medical document and based on a similarity measure of contexts of the first medical entity and the second medical entity in the medical document, wherein,
in the coreference resolution step, it is determined whether the first medical entity and the second medical entity represent the same medical object based on the determined second similarity measure and the determined compatibility.
20. A method for information extraction, the method for information extraction comprising:
an obtaining step of obtaining a medical document;
a medical entity extraction step of extracting at least two medical entities from the obtained medical document;
a coreference resolution step of determining whether any two of the medical entities represent the same medical object by using the method for coreference resolution according to any one of claims 15 to 19; and
a medical entity merging step of merging the diagnostic status and attributes of the medical entities which are referred to in common with each other.
21. A method for similar document retrieval, the method for similar document retrieval comprising:
an information extraction step of extracting a merged diagnostic state and attributes of the medical entity from the obtained medical document and extracting a merged diagnostic state and attributes of the medical entity from the historical medical document by using the method for information extraction according to claim 20;
a similarity measure calculation step of calculating a similarity measure among the merged diagnostic state and attributes of the medical entities from the obtained medical documents, and the merged diagnostic state and attributes of the medical entities from the historic medical documents;
a similar document retrieval step of retrieving at least one medical document similar to the obtained medical document from the historical medical documents based on the calculated similarity measure.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610428860.4A CN107515851B (en) | 2016-06-16 | 2016-06-16 | Apparatus and method for coreference resolution, information extraction and similar document retrieval |
JP2018562274A JP6972029B2 (en) | 2016-06-16 | 2017-06-15 | Equipment for co-reference analysis, information extraction and similar document retrieval, similar document retrieval system and information processing method |
PCT/JP2017/022114 WO2017217489A1 (en) | 2016-06-16 | 2017-06-15 | Apparatuses and methods for co-reference resolution, information extraction and similar document retrieval |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610428860.4A CN107515851B (en) | 2016-06-16 | 2016-06-16 | Apparatus and method for coreference resolution, information extraction and similar document retrieval |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107515851A true CN107515851A (en) | 2017-12-26 |
CN107515851B CN107515851B (en) | 2021-09-10 |
Family
ID=59270075
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610428860.4A Expired - Fee Related CN107515851B (en) | 2016-06-16 | 2016-06-16 | Apparatus and method for coreference resolution, information extraction and similar document retrieval |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP6972029B2 (en) |
CN (1) | CN107515851B (en) |
WO (1) | WO2017217489A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359184A (en) * | 2018-10-16 | 2019-02-19 | 苏州大学 | English event synchronous anomalies method and system |
CN111950281A (en) * | 2020-07-02 | 2020-11-17 | 中国科学院软件研究所 | Demand entity co-reference detection method and device based on deep learning and context semantics |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109284497B (en) * | 2017-07-20 | 2021-01-12 | 京东方科技集团股份有限公司 | Method and apparatus for identifying medical entities in medical text in natural language |
US11573994B2 (en) | 2020-04-14 | 2023-02-07 | International Business Machines Corporation | Encoding entity representations for cross-document coreference |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080313111A1 (en) * | 2007-06-14 | 2008-12-18 | Microsoft Corporation | Large scale item representation matching |
CN101499062A (en) * | 2008-01-29 | 2009-08-05 | 国际商业机器公司 | Method and equipment for collecting entity alias |
CN101796508A (en) * | 2007-08-31 | 2010-08-04 | 微软公司 | Coreference resolution in an ambiguity-sensitive natural language processing system |
US7813916B2 (en) * | 2003-11-18 | 2010-10-12 | University Of Utah | Acquisition and application of contextual role knowledge for coreference resolution |
US20130046758A1 (en) * | 2010-04-27 | 2013-02-21 | Snu R&Db Foundation | Terminology-based system for supporting data object definition |
CN103294764A (en) * | 2012-02-29 | 2013-09-11 | 国际商业机器公司 | Method and system for extracting information from electronic documents |
CN103577491A (en) * | 2012-08-09 | 2014-02-12 | 佳能株式会社 | Method and device for representing functional entities and carrying out disambiguation on functional entities |
CN103778346A (en) * | 2014-02-18 | 2014-05-07 | 中国科学院上海技术物理研究所 | Medical information processing method and device |
CN104572904A (en) * | 2014-12-25 | 2015-04-29 | 微梦创科网络科技(中国)有限公司 | Method and device for determining relevance level between tags |
CN105184074A (en) * | 2015-09-01 | 2015-12-23 | 哈尔滨工程大学 | Multi-modal medical image data model based medical data extraction and parallel loading method |
CN105260457A (en) * | 2015-10-14 | 2016-01-20 | 南京大学 | Coreference resolution-oriented multi-semantic web entity contrast table automatic generation method |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5699789B2 (en) * | 2011-05-10 | 2015-04-15 | ソニー株式会社 | Information processing apparatus, information processing method, program, and information processing system |
US8457950B1 (en) | 2012-11-01 | 2013-06-04 | Digital Reasoning Systems, Inc. | System and method for coreference resolution |
-
2016
- 2016-06-16 CN CN201610428860.4A patent/CN107515851B/en not_active Expired - Fee Related
-
2017
- 2017-06-15 WO PCT/JP2017/022114 patent/WO2017217489A1/en active Application Filing
- 2017-06-15 JP JP2018562274A patent/JP6972029B2/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7813916B2 (en) * | 2003-11-18 | 2010-10-12 | University Of Utah | Acquisition and application of contextual role knowledge for coreference resolution |
US20080313111A1 (en) * | 2007-06-14 | 2008-12-18 | Microsoft Corporation | Large scale item representation matching |
CN101796508A (en) * | 2007-08-31 | 2010-08-04 | 微软公司 | Coreference resolution in an ambiguity-sensitive natural language processing system |
CN101499062A (en) * | 2008-01-29 | 2009-08-05 | 国际商业机器公司 | Method and equipment for collecting entity alias |
US20130046758A1 (en) * | 2010-04-27 | 2013-02-21 | Snu R&Db Foundation | Terminology-based system for supporting data object definition |
CN103294764A (en) * | 2012-02-29 | 2013-09-11 | 国际商业机器公司 | Method and system for extracting information from electronic documents |
CN103577491A (en) * | 2012-08-09 | 2014-02-12 | 佳能株式会社 | Method and device for representing functional entities and carrying out disambiguation on functional entities |
CN103778346A (en) * | 2014-02-18 | 2014-05-07 | 中国科学院上海技术物理研究所 | Medical information processing method and device |
CN104572904A (en) * | 2014-12-25 | 2015-04-29 | 微梦创科网络科技(中国)有限公司 | Method and device for determining relevance level between tags |
CN105184074A (en) * | 2015-09-01 | 2015-12-23 | 哈尔滨工程大学 | Multi-modal medical image data model based medical data extraction and parallel loading method |
CN105260457A (en) * | 2015-10-14 | 2016-01-20 | 南京大学 | Coreference resolution-oriented multi-semantic web entity contrast table automatic generation method |
Non-Patent Citations (5)
Title |
---|
HONG-JIE DAI: "Coreference resolution of medical concepts in discharge summaries by exploiting contextual information", 《JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION》 * |
PRATEEK JINDAL: "End-to-end coreference resolution for clinical narratives", 《PROCEEDINGS OF THE 23RD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE》 * |
RODERICK Y SON: "Inter-document coreference resolution of abnormal findings in radiology documents", 《STUDIES IN HEALTH TECHNOLOGY AND INFORMATICS》 * |
朱倩: "面向自由文本的细粒度关系抽取的关键技术研究", 《中国博士学位论文全文数据库信息科技辑》 * |
郎君: "集成多种背景语义知识的共指消解", 《中文信息学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359184A (en) * | 2018-10-16 | 2019-02-19 | 苏州大学 | English event synchronous anomalies method and system |
CN109359184B (en) * | 2018-10-16 | 2020-08-18 | 苏州大学 | English event co-fingering resolution method and system |
CN111950281A (en) * | 2020-07-02 | 2020-11-17 | 中国科学院软件研究所 | Demand entity co-reference detection method and device based on deep learning and context semantics |
Also Published As
Publication number | Publication date |
---|---|
CN107515851B (en) | 2021-09-10 |
JP2019522274A (en) | 2019-08-08 |
JP6972029B2 (en) | 2021-11-24 |
WO2017217489A1 (en) | 2017-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gonzalez et al. | Disease staging and prognosis in smokers using deep learning in chest computed tomography | |
US9760689B2 (en) | Computer-aided diagnosis method and apparatus | |
Sada et al. | Validation of case finding algorithms for hepatocellular cancer from administrative data and electronic health records using natural language processing | |
CN107515851B (en) | Apparatus and method for coreference resolution, information extraction and similar document retrieval | |
CN111696642A (en) | System and method for generating a description of an abnormality in a medical image | |
JP5736007B2 (en) | Apparatus, system, method and program for generating inspection report | |
JP5646128B2 (en) | Medical image retrieval system | |
Kumar et al. | Deep Transfer Learning-based COVID-19 prediction using Chest X-rays | |
US10909129B2 (en) | Automated identification of salient finding codes in structured and narrative reports | |
US9904966B2 (en) | Using image references in radiology reports to support report-to-image navigation | |
US20150227714A1 (en) | Medical information analysis apparatus and medical information analysis method | |
US20190108175A1 (en) | Automated contextual determination of icd code relevance for ranking and efficient consumption | |
CN107239722B (en) | Method and device for extracting diagnosis object from medical document | |
CN115410717B (en) | Model training method, data retrieval method, image data retrieval method and device | |
WO2021008601A1 (en) | Method for testing medical data | |
Li et al. | Artificial intelligence can increase the detection rate of colorectal polyps and adenomas: a systematic review and meta-analysis | |
US10235360B2 (en) | Generation of pictorial reporting diagrams of lesions in anatomical structures | |
Bandos et al. | Evaluation of diagnostic accuracy in free-response detection-localization tasks using ROC tools | |
WO2017164203A1 (en) | Methods and apparatuses for segmenting text | |
Arias-Londoño et al. | Analysis of the Clever Hans effect in COVID-19 detection using Chest X-Ray images and Bayesian Deep Learning | |
US20200075141A1 (en) | Medical information management apparatus and medical information management system | |
Park et al. | Deep learning-enabled detection of pneumoperitoneum in supine and erect abdominal radiography: modeling using transfer learning and semi-supervised learning | |
BR112020023361A2 (en) | method and system | |
Weissenbacher et al. | Detecting goals of care conversations in clinical notes with active learning | |
Moosavi et al. | Segmentation and classification of lungs CT-scan for detecting COVID-19 abnormalities by deep learning technique: U-Net model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210910 |