CN113657112A

CN113657112A - Method and device for reading article

Info

Publication number: CN113657112A
Application number: CN202110949956.6A
Authority: CN
Inventors: 顾大中; 梁建增; 李亚东; 石秋慧; 王洪彬; 葛雍龙; 张学诚; 王增彦; 彭辉; 刘亚蓉; 李腾; 杨吟泽
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Ant Shengxin (Shanghai) Information Technology Co.,Ltd.
Priority date: 2021-08-18
Filing date: 2021-08-18
Publication date: 2021-11-16

Abstract

The embodiment of the specification provides a method and a device for interpreting an article, wherein the method for interpreting the article comprises the following steps: extracting a plurality of corresponding entities from the articles according to a plurality of entity types for describing the articles; extracting relationships by using the extracted entities and texts of the articles to obtain the association relationship among the entities; constructing a multi-element forest for describing the article based on the incidence relation among the entities, wherein the multi-element forest comprises one or more tree structures which take the entities as nodes and the incidence relation among the entities as edges; and inputting the multi-element forest into an interpretation rule engine to obtain an interpretation result. According to the method, the article text is structured into the multi-element forest according to the complexity of the article text, the article content is accurately and completely described by the multi-element forest, and then the multi-element forest is input into an interpretation rule engine constructed based on the multi-element forest structure for interpretation, so that an interpretation result can be accurately obtained, and accurate and efficient article interpretation is realized.

Description

Method and device for reading article

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a method for reading an article. One or more embodiments of the present specification also relate to an apparatus for interpreting an article, a computing device, and a computer-readable storage medium.

Background

In different scenarios, there are various types of articles for reflecting the contents of work situation, product notification, contract notification, health notification, and the like. Because the content in the article often contains a lot of professional knowledge, people who read the article sometimes have difficulty in understanding the true meaning of the article, so that the problems that people understand wrongly, and then make decisions which do not conform to the article, buy wrong products and the like occur. Therefore, it is necessary to help people to read articles.

At present, the conventional article interpretation is generally performed on line with help of experts. With the development of the internet industry, some enterprises begin to use online article interpretation, but essentially experts perform manual interpretation online. When the expert experience is limited or the article volume is large, the accurate and efficient interpretation service cannot be provided.

Disclosure of Invention

In view of the above, the embodiments of the present disclosure provide a method for interpreting an article. One or more embodiments of the present disclosure also relate to an apparatus for interpreting an article, a computing device, and a computer-readable storage medium to solve technical problems in the prior art.

According to a first aspect of embodiments herein, there is provided a method of interpreting an article, comprising: extracting a plurality of corresponding entities from an article according to a plurality of entity types for describing the article; extracting relationships by using the extracted entities and texts of the articles to obtain the association relationship among the entities; constructing a multi-element forest for describing the article based on the incidence relation among the entities, wherein the multi-element forest comprises one or more tree structures which take the entities as nodes and the incidence relation among the entities as edges; and inputting the multi-element forest into an interpretation rule engine to obtain an interpretation result.

Optionally, the extracting relationships by using the extracted multiple entities and the text of the article to obtain the association relationship between the entities includes: constructing a graph network based on texts and entities by using the extracted entities and the texts of the articles; extracting the characteristics of the entity nodes in the graph network by using a graph convolution algorithm to obtain vectors of the entity nodes; and obtaining the incidence relation between the entities by calculating the vector similarity between the entity nodes in the graph network.

Optionally, the constructing a graph network based on texts and entities by using the extracted multiple entities and the texts of the articles includes: coding each sentence and each entity in the text of the article as independent nodes to obtain sentence nodes and entity nodes of the graph network; the construction of the graph network is completed by establishing an edge between each pair of sentence nodes, an edge between each pair of entity nodes in the same sentence and an edge between each entity node and the sentence node to which the entity node belongs.

Optionally, the obtaining an association relationship between entities by calculating vector similarity between entity nodes in the graph network includes: calculating the vector similarity between every two entity nodes; and comparing the vector similarity between every two entity nodes with a preset similarity threshold value to determine the entity nodes with the incidence relation.

Optionally, before extracting the entities respectively corresponding to the multiple entity types, the method further includes: and carrying out irrelevant information filtering on the initial text of the article to obtain a filtered text as the text of the article.

Optionally, the processing of filtering irrelevant information from the initial text of the article to obtain a filtered text as the text of the article includes: converting the initial text of the article into a sentence sequence; coding each sentence through a sentence-level language model to obtain an initial semantic vector of each sentence; inputting the initial semantic vector of each sentence into a chapter-level language model to classify related information and irrelevant information to obtain an updated semantic vector and a sentence classification probability sequence of each sentence, wherein the updated semantic vector fuses semantic information of a full text of the sentence sequence; the sentence classification probability sequence is subjected to full link network to obtain an optimized classification result; and filtering out irrelevant information according to the classification result to obtain a filtered text.

Optionally, the article is a medical report, the plurality of entity types including: orientation, organ, tissue, index value.

Optionally, the extracting, from the article, a plurality of corresponding entities according to a plurality of entity types for describing the article includes: performing text semantic extraction on the text input language model of the article to obtain vectors which are respectively corresponding to the input characters and are fused with full-text semantic information; passing the vector of each word through a full link network to obtain a label corresponding to each word; extracting a plurality of indexes corresponding to the various entity types according to the labels respectively corresponding to the characters; and respectively linking the plurality of the names to the entities corresponding to the existing knowledge base to extract a plurality of corresponding entities.

Optionally, before linking the multiple references to corresponding entities in the existing knowledge base, the method further includes: judging whether each extracted name has corresponding negative meaning description in the text of the article; if so, the negative meaning description is added to the corresponding reference.

Optionally, the determining whether each extracted reference has a corresponding negative meaning description in the text of the article includes: classifying the position of each reference in the text of the article, the reference text and the text input language model of the article whether the reference text is negative or not; determining whether each of the designations has a corresponding negative meaning description based on the results of the two classifications.

According to a second aspect of embodiments of the present specification, there is provided an apparatus for interpreting an article, including: the entity extraction module is configured to extract a plurality of corresponding entities from the article according to a plurality of entity types for describing the article. And the relationship extraction module is configured to extract relationships by using the extracted entities and the texts of the articles to obtain the association relationship among the entities. And the forest building module is configured to build a multi-group forest for describing the article based on the incidence relation among the entities, and the multi-group forest comprises one or more tree structures with the entities as nodes and the incidence relation among the entities as edges. And the engine interpretation module is configured to input the multi-element forest into an interpretation rule engine to obtain an interpretation result.

According to a third aspect of embodiments herein, there is provided a computing device comprising: a memory and a processor; the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to: extracting a plurality of corresponding entities from an article according to a plurality of entity types for describing the article; extracting relationships by using the extracted entities and texts of the articles to obtain the association relationship among the entities; constructing a multi-element forest for describing the article based on the incidence relation among the entities, wherein the multi-element forest comprises one or more tree structures which take the entities as nodes and the incidence relation among the entities as edges; and inputting the multi-element forest into an interpretation rule engine to obtain an interpretation result.

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, perform the steps of a method of interpreting an article as described in any of the embodiments herein.

One embodiment of the present specification provides a method for reading an article, in which a plurality of corresponding entities are extracted from the article according to a plurality of entity types for describing the article, an association relationship between the entities is obtained by using the extracted entities and a text of the article, a multi-group forest for describing the article is constructed based on the association relationship between the entities, the multi-group forest includes one or more tree structures with the entities as nodes and the association relationship between the entities as edges, the multi-group forest is input to a reading rule engine to obtain a reading result, and thus, the method provides a multi-group forest structure for the complexity of the article text, so that the article text can be structured into the multi-group forest, the article content is accurately and completely described by the multi-group forest, and then the multi-group forest is input to the reading rule engine constructed based on the multi-group forest structure for reading, the reading result can be accurately obtained, and the accurate and efficient article reading is realized.

Drawings

FIG. 1 is a flow chart of a method of interpreting an article provided by an embodiment of the present description;

FIG. 2 is a schematic diagram of an interface interaction provided by one embodiment of the present description;

FIG. 3 is a process diagram of filtering irrelevant information according to an embodiment of the present disclosure;

FIG. 4 is a process diagram of named entity identification provided by one embodiment of the present description;

FIG. 5 is a process diagram of negative detection provided by one embodiment of the present description;

FIG. 6 is a diagram network schematic provided by one embodiment of the present description;

FIG. 7 is a process flow diagram of a method for interpreting an article according to an embodiment of the present description;

FIG. 8 is a call link diagram provided by one embodiment of the present description;

FIG. 9 is a schematic diagram of a five tuple forest provided by one embodiment of the present description;

FIG. 10 is a schematic diagram of an apparatus for interpreting an article according to an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of an apparatus for interpreting an article according to another embodiment of the present disclosure;

fig. 12 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

The term "refers to a word actually used in the text of an article and pointing to an entity corresponding to an entity type in this specification. An entity may have one or more references with the same meaning. For example, the designations "tumor," mass, "and" nodule "all correspond to the pointing entity" mass.

The entity type is an entity type for describing the content and structure of the article.

The entity is an entity used for describing the content and the structure of the article in the existing knowledge base and is a standard name in the existing knowledge base.

Medical report: and single examination and inspection reports such as an ultrasonic report, a blood routine report, a comprehensive report such as a whole body examination report and the like.

OCR: optical Character Recognition, an Optical Character Recognition technology, can convert a picture of a paper document into a text.

NER: the Named Entity Recognition technology can recognize Entity nouns (orientation, organ, tissue, index value, etc.) in a text.

The entity chain means: and (3) Entity Linking, namely associating the Entity identified by the NER with a corresponding Entity in the existing knowledge base.

And (3) extracting the relation: relationships between entities (index-adjective-tissue, tissue-located-organ) in a piece of text can be identified.

And (4) negative detection: it is recognized that some entities in a piece of text that do appear are semantically positive or negative. For example, "i have eaten an apple in the morning," apple "is a semantically positive entity; in the condition that I eat pears in the morning and do not eat apples, the entity of the apple is semantically negative.

A quintuple: the index is five entities of direction, organ, tissue, index and index value.

In the present specification, a method of interpreting an article is provided, and the present specification relates to an apparatus for interpreting an article, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.

Fig. 1 shows a flowchart of a method for interpreting an article, including steps 102-108, according to an embodiment of the present description.

Step 102: from the article, a plurality of corresponding entities are extracted according to a plurality of entity types for describing the article.

Step 104: and extracting the relationship by using the extracted entities and the texts of the articles to obtain the association relationship between the entities.

Step 106: and constructing a multi-group forest for describing the article based on the incidence relation among the entities, wherein the multi-group forest comprises one or more tree structures which take the entities as nodes and take the incidence relation among the entities as edges.

It is understood that, according to the association relationship between the entities, the superior-inferior relationship between the entities can be correspondingly derived. For example, an entity node located at a higher level is a parent node for an entity node located at a lower level and having an association relationship. Therefore, one or more tree structures can be constructed according to the incidence relation among the entities.

Step 108: and inputting the multi-element forest into an interpretation rule engine to obtain an interpretation result.

For example, the interpretation rule engine may be a program designed according to a plurality of preset interpretation rules and used for identifying the structure of the multi-group forest. For example, each tree structure in a multi-tuple forest may include one or more paths from a root node to a leaf node. Similarly, an interpretation rule corresponds to a path from the root node to the leaf node, and a path can derive a corresponding interpretation result. Each path from the root node to the leaf node in the multi-element forest is matched with a plurality of preset interpretation rules, the matched interpretation rules can be found, and then the interpretation results corresponding to the matched interpretation rules are determined.

Because the method extracts a plurality of corresponding entities from an article according to a plurality of entity types for describing the article, obtains the incidence relation between the entities by utilizing the plurality of extracted entities and the text of the article, constructs a multi-group forest for describing the article based on the incidence relation between the entities, the multi-group forest comprises one or more tree structures taking the entities as nodes and the incidence relation between the entities as edges, and inputs the multi-group forest into an interpretation rule engine to obtain the interpretation result, thus the method provides the structure of the multi-group forest aiming at the complexity of the text of the article, so that a pure text article can be structured into the multi-group forest, the multi-group forest accurately describes the article content, and then inputs the multi-group forest into the interpretation rule engine constructed based on the multi-group forest structure to interpret, the interpretation result can be accurately obtained, so that the accurate and efficient article interpretation result is provided.

For example, the methods provided by the embodiments of the present specification may be applied to interpretation of medical reports. After the medical report is interpreted, scenes such as intelligent underwriting, intelligent claims and medical service recommendation can be further supported. For example, the user is provided with corresponding underwriting results such as risk level prompts and the like. For example, the insurance basic capability of the health center, the pricing center and the underwriting center is the abstraction of the health notice in the application process, and the core models are the health notice book and the health notice contract. The health notice is the health notice of the insurance product and defines the content that the user needs to actively notice when the insurance is applied. The notice contract is a contract signed by the user at the time of application, and carries the notice condition of the user as important reference content when the user settles the claim. Because the content in the health report often contains a lot of professional medical knowledge, the user sometimes has difficulty in understanding the true meaning of the health report, and in order to avoid the user from mistakenly insuring a product which does not accord with the health report, the method provided by the embodiment of the specification can be applied to intelligently interpret the health report of the user and provide a corresponding interpretation result for the user. For example, the interpretation results may include: and the risk prompt and other information prompt the user about the key terms which need to be focused on or automatically judge whether the user knows to accord with the key, and can provide corresponding medical explanation for the user so that the user without medical background can fully know the key contents. Taking the interface interaction diagram shown in fig. 2 as an example, the left diagram is an upload page, and the user can take a picture of the physical examination report and upload the report. The right image is an interpretation result presentation page in which medical explanations and suggestions read out according to the contents of the report are presented.

In terms of execution environment, the method provided by the embodiment of the present specification may be applied to a server side, and may also be applied to a local user side. For example, when the method is applied to the server side, the server side can receive an article image or text uploaded by the user side, and the server side unscrambles the article and feeds back the unscrambled result to the user side, so that the purpose of providing the online article unscrambling for the user is achieved.

In order to make the interpretation of the article more accurate and efficient, in one or more embodiments of the present specification, before extracting the entities corresponding to the multiple entity types, the method further includes: and carrying out irrelevant information filtering on the initial text of the article to obtain a filtered text as the text of the article. By the embodiment, after the irrelevant information is filtered, the article content only containing the relevant information is read, so that the reading of the article is more accurate and efficient.

The specific processing means of the irrelevant information filtering may adopt any possible implementation manner. For example, irrelevant information can be filtered out by detecting a format, detecting a keyword, and the like. In order to make filtering of irrelevant information more accurate, in one or more embodiments of this specification, the processing of filtering irrelevant information on an initial text of the article to obtain a filtered text as the text of the article includes: converting the initial text of the article into a sentence sequence; coding each sentence through a sentence-level language model to obtain an initial semantic vector of each sentence; inputting the initial semantic vector of each sentence into a chapter-level language model to classify related information and irrelevant information to obtain an updated semantic vector and a sentence classification probability sequence of each sentence, wherein the updated semantic vector fuses semantic information of a full text of the sentence sequence; the sentence classification probability sequence is subjected to full link network to obtain an optimized classification result; and filtering out irrelevant information according to the classification result to obtain a filtered text. In this embodiment, a more accurate filtering of irrelevant information is achieved based on a classification technique of a deeply learned language model.

Specifically, for example, as shown in the processing procedure diagram of the irrelevant information filtering shown in fig. 3, the sentence-level language model and the chapter-level language model in this embodiment may adopt a BERT model. For an input sentence sequence, each sentence may be encoded by using a sentence-level BERT model to obtain a semantic vector of each sentence (in this case, each vector only contains semantic information of the sentence). And then, inputting the semantic vector of the sentence into an article-level BERT model to obtain a new semantic vector of each sentence. At this time, the new semantic vector of the sentence contains semantic information of the whole sentence sequence to some extent. And finally, the obtained sentence classification probability sequence passes through a CRF layer to optimize a classification result from the perspective of the reasonability of the label sequence.

The method provided by the embodiment of the present specification is not limited to the specific implementation of extracting entities. For example, the extracting a plurality of corresponding entities from the article according to a plurality of entity types for describing the article includes: performing text semantic extraction on the text input language model of the article to obtain vectors which are respectively corresponding to the input characters and are fused with full-text semantic information; passing the vector of each word through a full link network to obtain a label corresponding to each word; extracting a plurality of indexes corresponding to the various entity types according to the labels respectively corresponding to the characters; and respectively linking the plurality of the names to the entities corresponding to the existing knowledge base to extract a plurality of corresponding entities. In this embodiment, a deep learning-based NER (called after Recognition, for short NER) technology is used, and the required information is extracted through semantic features, so that compared with an extraction technology relying on a vocabulary, the generalization is greatly enhanced, and the method is not strictly limited by the vocabulary or tagging data formulated by experts.

The method provided in the embodiments of the present specification does not limit which NLP (natural language processing) model is specifically used for the language model. For example, the language model in this embodiment may be a BERT model. The training of the BERT model is mainly divided into two stages: a pre-training phase and a Fine-tuning phase. The pre-training phase is trained on a large data set according to some pre-training tasks. The Fine-tuning stage is used for Fine tuning when some downstream tasks are used subsequently, such as text classification, part of speech tagging, question and answer system and the like, and BERT can perform Fine tuning on different tasks without adjusting the structure. For example, as shown in the processing procedure diagram of named entity recognition shown in fig. 4, in this embodiment, the input of the BERT model is a word of a text, and the output of the BERT model may be a vector fused with full-text semantic information corresponding to each word. After the vector of each word is passed through the full link network CRF, the NER tag sequence, i.e. the tag corresponding to each word, can be obtained. Further, a plurality of names corresponding to a plurality of entity types may be extracted from the labels corresponding to the respective words.

The entity link finger processing for linking a plurality of fingers to the entity corresponding to the existing knowledge base can adopt any possible implementation mode. For example, in the case that the number of entities involved in the graph is small, and the case that different entities have the same name is basically absent, the entity chain can be matched in the manner of ha3 search and literal similarity calculation. For another example, when the spectrum content expands and there are many ambiguous entities, the entity chain indexing can be performed in a deep learning manner.

In order to make the extracted reference more accurate, in one or more embodiments of the present specification, before linking the multiple references to corresponding entities in the existing knowledge base, the method further includes: judging whether each extracted name has corresponding negative meaning description in the text of the article; if so, the negative meaning description is added to the corresponding reference. In this embodiment, by performing negative detection on the previously extracted reference, a negative meaning description corresponding to the reference is added to the corresponding reference, so that the extracted reference is more accurate.

The specific implementation of the negative detection is not limited, for example, the detection may be performed by using a sensitive word detection method. For another example, in order to make negative detection more accurate, in one or more embodiments of the present specification, a deep learning method is used to perform negative detection, so that the extracted reference is more accurate. For example, the determining whether each extracted reference has a corresponding negative meaning description in the text of the article includes: classifying the position of each reference in the text of the article, the reference text and the text input language model of the article whether the reference text is negative or not; determining whether each of the designations has a corresponding negative meaning description based on the results of the two classifications. For example, as shown in the processing procedure diagram of negative detection shown in fig. 5, a BERT model may be used as the language model for performing the negative meaning binary classification. As shown in fig. 5, the NER result and the original text of the article are input into the BERT model, and the output result of the BERT model is passed through the full link network to obtain the probability that the designation is negated.

In the method provided in the embodiment of the present specification, the specific implementation manner of the relationship extraction is not limited. For example, in one or more embodiments of the present specification, a relationship extraction technique based on a deep learning technique is used, so that the generalization is greatly enhanced, and the problem of insufficient generalization on the relationship extraction capability is avoided. Specifically, for example, the extracting relationships by using the extracted multiple entities and the text of the article to obtain the association relationship between the entities includes: constructing a graph network based on texts and entities by using the extracted entities and the texts of the articles; extracting the characteristics of the entity nodes in the graph network by using a graph convolution algorithm to obtain vectors of the entity nodes; and obtaining the incidence relation between the entities by calculating the vector similarity between the entity nodes in the graph network. In the embodiment, the characteristics of the entity nodes in the graph network are extracted through a graph convolution algorithm, so that the characteristics of the entity nodes are conducted and updated through the structure of the graph network, more accurate node vectors are obtained, and the association relation between the entities is accurately determined based on the vector similarity.

The method provided by the embodiment of the present specification does not limit the specific implementation of constructing the graph network. For example, the constructing a graph network based on texts and entities by using the extracted multiple entities and texts of the articles includes: coding each sentence and each entity in the text of the article as independent nodes to obtain sentence nodes and entity nodes of the graph network; the construction of the graph network is completed by establishing an edge between each pair of sentence nodes, an edge between each pair of entity nodes in the same sentence and an edge between each entity node and the sentence node to which the entity node belongs.

Due to the complexity of the article, related entities are often located in different sentences, so that a graph network constructed based on the sentences and the entities can be used, then, a graph convolution algorithm is used for carrying out feature extraction on entity nodes in the graph network to obtain vectors of the entity nodes, and the association relationship between the entities is obtained by calculating the vector similarity between the entity nodes in the graph network. Where each sentence and entity is considered a node, each sentence and each entity may be encoded, for example, by the BERT model, as a vector representation of the node. An edge is connected between any two sentence nodes. If an entity belongs to a sentence, an edge is connected between the entity and the sentence. An edge is also constructed if two entities are in the same sentence. For example, based on the article content "one nodule on the top of left breast, size 3 x 3mm, smooth border, no cyst seen. There is a low echo zone below, which is not clear, and a graph network as shown in fig. 6 can be constructed.

In the method provided by the embodiments of the present specification, how to determine the association relationship between the entities using the vector similarity is not limited. For example, in order to accurately and efficiently obtain the association relationship between the entities, in one or more embodiments of the present specification, the entity nodes having the association relationship are determined by comparing a preset similarity threshold with a preset similarity threshold. For example, the obtaining of the association relationship between entities by calculating the vector similarity between entity nodes in the graph network includes: calculating the vector similarity between every two entity nodes; and comparing the vector similarity between every two entity nodes with a preset similarity threshold value to determine the entity nodes with the incidence relation. For example, if the vector similarity is higher than a preset similarity threshold (a specific threshold may be set according to manual experience), it is considered that an association relationship exists between two entities.

The method for reading the article is further described below with reference to fig. 7, taking the application of the method for reading the article provided in the present specification to medical reports as an example. In this embodiment, the plurality of entity types may include: orientation, organ, tissue, index value. Fig. 7 is a flowchart illustrating a processing procedure of a method for interpreting an article according to an embodiment of the present disclosure, where specific steps include step 702 to step 720.

Step 702: a picture of a medical report uploaded by a user is received.

Step 704: the photographs of the medical report are subjected to OCR (optical character recognition) processing to obtain initial text.

For example, photographs are converted into sentence text by OCR systems.

Step 706: the initial text is converted into a sequence of sentences.

Step 708: and carrying out irrelevant information filtering processing on the sentence sequence.

In the step of irrelevant information filtering of the sentence sequence, information that is irrelevant to the person to whom the medical report belongs may be filtered out (e.g., "BIRADS level I is xxx meaning, BIRADS level II is xxx meaning, … …" for the description of the report hierarchy).

Step 710: and carrying out named entity identification and negative detection on the filtered text to obtain a plurality of names corresponding to any one or more entity types in the direction, the organ, the tissue, the index and the index value.

For example, in the step of performing named entity recognition and negative detection on the filtered sentences, the corresponding reference may be extracted: such as a bearing class (meaning, e.g., "left side"), such as an organ class (meaning, e.g., "breast"), such as a tissue class (meaning, e.g., "nodule"), such as an index class (meaning, e.g., "boundary"), such as an index value class (meaning, e.g., "unclear"), and determining whether each designation has a negative meaning in the sentence (e.g., "missing nodule").

Step 712: and respectively linking the plurality of the names to the entities corresponding to the existing knowledge base to extract a plurality of corresponding entities.

Step 714: and extracting the relationship by using the extracted entities and the text of the report to obtain the association relationship between the entities.

Step 716: and constructing a quintuple forest based on the incidence relation among the entities.

Step 718: and inputting the five-tuple forest into an interpretation rule engine to obtain an interpretation result.

For example, expert rules may be set based on experience of a medical expert to form an interpretation rule engine. Of course, the corresponding prompt may be added to the interpretation rule, and the interpretation rule engine may directly output the corresponding prompt.

Step 720: and providing corresponding prompts such as medical explanation, risk level and the like for the user based on the interpretation result.

For example, information such as prompts can be output to the user side for presentation to the user.

Based on the processing procedure shown in fig. 7, the method for interpreting the article provided by the embodiment of the present specification may form an overall call link as shown in fig. 8. The "irrelevant information filtering module", "NER + negative detection module", "entity chain indicating module", and "relationship extracting module" shown in fig. 8 may all be implemented by a natural language processing method based on a deep learning technique, and specific implementation manners may be described in detail in other embodiments, and are not described in detail herein.

As can be seen from the above embodiments, the medical report can be structurally described in a quintuple forest manner through processing by the method provided by the embodiments of the present specification. The quintuple is five entities of orientation, organ, tissue, index and index value, and the five entities have upper and lower membership relations. Five kinds of entity nodes are used, the upper and lower membership relations are edges, a tree structure can be constructed, and the collection (forest) of the tree structures can completely describe the content of a medical report. For example, the report "one nodule on the top of the left breast, 3 × 3mm in size, smooth border, no cyst seen. There is a low echo zone under the right breast, and the border is unclear. "after processing by the method provided by the embodiment of the present specification, a five-tuple forest as shown in fig. 9 can be constructed. Assuming that the interpretation rule engine includes the interpretation rule "right- > breast- > hypoechoic area- > boundary- > unclear ═ risk in breast nuclear protection", the quintuple forest shown in fig. 9 is input into the interpretation rule engine, and the interpretation result of "risk in breast nuclear protection" can be obtained. As can be seen, for content with a complex context logical relationship, such as "find a nodule … … on top of the left breast with unclear boundaries", applying the method provided by the embodiment of the present specification, after extracting the association relationship based on the deep learning natural language processing technology, the method can implement accurate description of article content based on the quintuple forest. Through the accurate description of the article content, the article can be accurately read.

Corresponding to the above method embodiments, this specification also provides an embodiment of an apparatus for reading an article, and fig. 10 shows a schematic structural diagram of an apparatus for reading an article according to an embodiment of this specification. As shown in fig. 10, the apparatus includes: an entity extraction module 1002, a relationship extraction module 1004, a forest construction module 1006, and an engine interpretation module 1008.

The entity extraction module 1002 may be configured to extract a plurality of entities from an article according to a plurality of entity types for describing the article.

The relationship extraction module 1004 may be configured to perform relationship extraction by using the extracted multiple entities and the text of the article to obtain an association relationship between the entities.

The forest building module 1006 may be configured to build a multi-group forest for describing the article based on the association relationship between the entities, where the multi-group forest includes one or more tree structures with entities as nodes and with the association relationship between the entities as edges.

The engine interpretation module 1008 may be configured to input the multi-element forest to an interpretation rule engine to obtain an interpretation result.

Because the device extracts a plurality of corresponding entities from an article according to a plurality of entity types for describing the article, obtains the incidence relation between the entities by utilizing the plurality of extracted entities and the text of the article, constructs a multi-group forest for describing the article based on the incidence relation between the entities, the multi-group forest comprises one or more tree structures taking the entities as nodes and the incidence relation between the entities as edges, and inputs the multi-group forest into an interpretation rule engine to obtain an interpretation result, the device provides the structure of the multi-group forest aiming at the complexity of the text of the article, so that a pure text article can be structured into the multi-group forest, the multi-group forest accurately describes the article content, and then inputs the multi-group forest into the interpretation rule engine constructed based on the multi-group forest structure to perform interpretation, the reading result can be intelligently obtained, so that an accurate and efficient article reading result is provided.

Fig. 11 is a schematic structural diagram of an apparatus for interpreting an article according to another embodiment of the present disclosure. As shown in fig. 11, in order to make the interpretation of the article more accurate and efficient, the apparatus may further include: the irrelevant filtering module 1010 may be configured to, before the entity extracting module 1002 extracts the entities corresponding to the multiple entity types, perform irrelevant information filtering on the initial text of the article to obtain a filtered text as the text of the article.

The specific processing means of the irrelevant information filtering may adopt any possible implementation manner. In one or more embodiments of the present disclosure, the irrelevant filter module 1010 may include: a sentence sequence conversion sub-module 1010a, a sentence level model processing sub-module 1010b, a chapter level model processing sub-module 1010c, an irrelevant classification output sub-module 1010d and a filtering sub-module 1010 e.

The sentence sequence conversion sub-module 1010a may be configured to convert the initial text of the article into a sentence sequence.

The sentence-level model processing submodule 1010b may be configured to encode each sentence by a sentence-level language model, resulting in an initial semantic vector for each sentence.

The chapter-level model processing sub-module 1010c may be configured to input the initial semantic vector of each sentence into the chapter-level language model to perform classification on related information and unrelated information, so as to obtain an updated semantic vector of each sentence and a sentence classification probability sequence, where the updated semantic vector fuses semantic information of the full text of the sentence sequence.

The irrelevant classification output submodule 1010d may be configured to pass the sentence classification probability sequence through a full link network to obtain an optimized classification result.

The filtering sub-module 1010e may be configured to filter out irrelevant information according to the classification result, so as to obtain a filtered text.

In this embodiment, a more accurate filtering of irrelevant information is achieved based on a classification technique of a deeply learned language model.

The method provided by the embodiment of the present specification is not limited to the specific implementation of extracting entities. For example, as shown in fig. 11, the entity extraction module 1002 may include: a word vector extraction sub-module 1002a, a tag output sub-module 1002b, a designation extraction sub-module 1002c, and an entity chain designation sub-module 1002 d.

The word vector extraction sub-module 1002a may be configured to perform text semantic extraction on the text input language model of the article, so as to obtain vectors fused with full-text semantic information, which correspond to each input word.

The tag output sub-module 1002b may be configured to obtain the tag corresponding to each word by passing the vector of each word through a full link network.

The name extracting sub-module 1002c may be configured to extract a plurality of names corresponding to the plurality of entity types according to the labels corresponding to the words, respectively.

The entity chain module 1002d may be configured to extract a plurality of corresponding entities by respectively linking the plurality of names to the entities corresponding to the existing knowledge base.

In the embodiment, the NER technology based on deep learning is used, the required information is extracted through semantic features, and compared with the extraction technology depending on the word list, the generalization is greatly enhanced, and the method is not strictly limited by the word list or the labeled data formulated by experts.

In order to make the extracted reference more accurate, in one or more embodiments of the present specification, the entity extraction module 1002 may further include: a negative detection sub-module 1002e, which may be configured to determine whether each extracted name has a corresponding negative meaning description in the text of the article before the entity chain name sub-module 1002d links the plurality of names to corresponding entities in the existing knowledge base; if so, the negative meaning description is added to the corresponding reference.

The specific implementation of the negative detection is not limited, for example, the detection may be performed by using a sensitive word detection method. For another example, in order to make negative detection more accurate, in one or more embodiments of the present specification, a deep learning method is used to perform negative detection, so that the extracted reference is more accurate. For example, the negative detection sub-module 1002e may be configured to classify whether each of the reference is a position in the text of the article, the reference text, and the text input language model of the article as negative meaning; determining whether each of the designations has a corresponding negative meaning description based on the results of the two classifications.

The specific implementation of the relationship extraction in the embodiments of the present specification is not limited. For example, as shown in fig. 11, the relationship extraction module 1004 may include: a graph network construction sub-module 1004a, a graph convolution calculation sub-module 1004b, and an association determination sub-module 1004 c.

The graph network constructing sub-module 1004a may be configured to construct a graph network based on texts and entities by using the extracted entities and texts of the articles.

The graph convolution calculating sub-module 1004b may be configured to perform feature extraction on the entity nodes in the graph network by using a graph convolution algorithm, so as to obtain vectors of the entity nodes.

The association relation determining sub-module 1004c may be configured to obtain an association relation between entities by calculating vector similarity between entity nodes in the graph network.

In the embodiment, the characteristics of the entity nodes in the graph network are extracted through a graph convolution algorithm, so that the characteristics of the entity nodes are conducted and updated through the structure of the graph network, more accurate node vectors are obtained, and the association relation between the entities is accurately determined based on the vector similarity.

The method provided by the embodiment of the present specification does not limit the specific implementation of constructing the graph network. For example, the graph network constructing sub-module 1004a may be configured to encode each sentence and each entity in the text of the article as independent nodes, so as to obtain sentence nodes and entity nodes of the graph network; the construction of the graph network is completed by establishing an edge between each pair of sentence nodes, an edge between each pair of entity nodes in the same sentence and an edge between each entity node and the sentence node to which the entity node belongs.

In the method provided by the embodiments of the present specification, how to determine the association relationship between the entities using the vector similarity is not limited. For example, the association determining sub-module 1004c may be configured to calculate a vector similarity between each two entity nodes; and comparing the vector similarity between every two entity nodes with a preset similarity threshold value to determine the entity nodes with the incidence relation.

The above is an illustrative scheme of an apparatus for interpreting an article according to the embodiment. It should be noted that the technical solution of the apparatus for interpreting the article is the same as the technical solution of the method for interpreting the article, and for details of the technical solution of the apparatus for interpreting the article, which is not described in detail, reference may be made to the description of the technical solution of the method for interpreting the article.

FIG. 12 illustrates a block diagram of a computing device 1200 provided according to one embodiment of the present description. The components of the computing device 1200 include, but are not limited to, memory 1210 and processor 1220. Processor 1220 is coupled to memory 1210 via bus 1230, and database 1250 is used to store data.

The computing device 1200 also includes an access device 1240, the access device 1240 enabling the computing device 1200 to communicate via one or more networks 1260. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 1240 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 1200 and other components not shown in FIG. 12 may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 12 is for purposes of example only and is not limiting as to the scope of the description. Those skilled in the art may add or replace other components as desired.

Computing device 1200 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 1200 may also be a mobile or stationary server.

Wherein processor 1220 is configured to execute the following computer-executable instructions:

extracting a plurality of corresponding entities from an article according to a plurality of entity types for describing the article;

extracting relationships by using the extracted entities and texts of the articles to obtain the association relationship among the entities;

constructing a multi-element forest for describing the article based on the incidence relation among the entities, wherein the multi-element forest comprises one or more tree structures which take the entities as nodes and the incidence relation among the entities as edges;

and inputting the multi-element forest into an interpretation rule engine to obtain an interpretation result.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the above method for interpreting the article belong to the same concept, and for details that are not described in detail in the technical solution of the computing device, reference may be made to the description of the technical solution of the above method for interpreting the article.

An embodiment of the present specification also provides a computer readable storage medium storing computer instructions that, when executed by a processor, are operable to:

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium is the same as the technical solution of the method for reading the article, and for details that are not described in detail in the technical solution of the storage medium, reference may be made to the description of the technical solution of the method for reading the article.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method of interpreting an article, comprising:

2. The method of claim 1, wherein extracting relationships by using the extracted entities and texts of the articles to obtain the association relationships between the entities comprises:

constructing a graph network based on texts and entities by using the extracted entities and the texts of the articles;

extracting the characteristics of the entity nodes in the graph network by using a graph convolution algorithm to obtain vectors of the entity nodes;

and obtaining the incidence relation between the entities by calculating the vector similarity between the entity nodes in the graph network.

3. The method of claim 2, wherein constructing a graph network based on texts and entities by using the extracted entities and texts of the articles comprises:

coding each sentence and each entity in the text of the article as independent nodes to obtain sentence nodes and entity nodes of the graph network;

the construction of the graph network is completed by establishing an edge between each pair of sentence nodes, an edge between each pair of entity nodes in the same sentence and an edge between each entity node and the sentence node to which the entity node belongs.

4. The method of claim 2, wherein the obtaining the association relationship between the entities by calculating the vector similarity between the entity nodes in the graph network comprises:

calculating the vector similarity between every two entity nodes;

and comparing the vector similarity between every two entity nodes with a preset similarity threshold value to determine the entity nodes with the incidence relation.

5. The method of claim 1, prior to extracting the corresponding plurality of entities, further comprising:

and carrying out irrelevant information filtering on the initial text of the article to obtain a filtered text as the text of the article.

6. The method of claim 5, wherein the processing of filtering irrelevant information from the initial text of the article to obtain a filtered text as the text of the article comprises:

converting the initial text of the article into a sentence sequence;

coding each sentence through a sentence-level language model to obtain an initial semantic vector of each sentence;

inputting the initial semantic vector of each sentence into a chapter-level language model to classify related information and irrelevant information to obtain an updated semantic vector and a sentence classification probability sequence of each sentence, wherein the updated semantic vector fuses semantic information of a full text of the sentence sequence;

the sentence classification probability sequence is subjected to full link network to obtain an optimized classification result;

and filtering out irrelevant information according to the classification result to obtain a filtered text.

7. The method of claim 1, the article being a medical report, the plurality of entity types for describing the article comprising: orientation, organ, tissue, index value.

8. The method of claim 1, wherein extracting a plurality of entities from an article according to a plurality of entity types for describing the article comprises:

performing text semantic extraction on the text input language model of the article to obtain vectors which are respectively corresponding to the input characters and are fused with full-text semantic information;

passing the vector of each word through a full link network to obtain a label corresponding to each word, wherein the label is set according to a plurality of entity types for describing the article;

extracting a plurality of indexes corresponding to the various entity types according to the labels respectively corresponding to the characters;

and respectively linking the plurality of the names to the entities corresponding to the existing knowledge base to extract a plurality of corresponding entities.

9. The method of claim 8, further comprising, prior to linking the plurality of designations to corresponding entities in an existing knowledge base:

judging whether each extracted name has corresponding negative meaning description in the text of the article;

if so, the negative meaning description is added to the corresponding reference.

10. The method of claim 9, the determining whether each extracted designation has a corresponding negative meaning description in text of the article, comprising:

classifying the position of each reference in the text of the article, the reference text and the text input language model of the article whether the reference text is negative or not;

determining whether each of the designations has a corresponding negative meaning description based on the results of the two classifications.

11. An apparatus to interpret an article, comprising:

the entity extraction module is configured to extract a plurality of corresponding entities from the article according to a plurality of entity types for describing the article;

the relation extraction module is configured to extract relations by using the extracted entities and the texts of the articles to obtain the association relations among the entities;

a forest construction module configured to construct a multi-group forest for describing the article based on the incidence relation between the entities, wherein the multi-group forest comprises one or more tree structures with entities as nodes and incidence relation between the entities as edges;

and the engine interpretation module is configured to input the multi-element forest into an interpretation rule engine to obtain an interpretation result.

12. A computing device, comprising:

a memory and a processor;

the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:

13. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the method of interpreting an article of any one of claims 1 to 10.