WO2023051142A1

WO2023051142A1 - Method and apparatus for relationship extraction, device and medium

Info

Publication number: WO2023051142A1
Application number: PCT/CN2022/116286
Authority: WO
Inventors: 孙长志; 茹栋宇
Original assignee: 北京有竹居网络技术有限公司
Priority date: 2021-09-30
Filing date: 2022-08-31
Publication date: 2023-04-06
Also published as: CN113901151B; CN113901151A

Abstract

Provided are a method and apparatus for training a relationship extraction model, and a device and a storage medium. The method described herein comprises: on the basis of a given triple consisting of a target entity pair in a document and of a target relationship associated with the target entity pair, determining a probability distribution of a set of rules under the conditions of the given triplet, the set of rules being used for describing the logic of the associations between target entity pairs. On the basis of the probability distribution of the set of rules under the conditions of the given triplet, determining the probability distribution of scores under the conditions of the given triplet, a score indicating whether a target relationship in a document is valid for a target entity pair. On the basis of marker values corresponding to the scores, maximizing a likelihood function of parameters of the probability distribution of the scores under the conditions of the given triplet, and thus obtaining a trained relationship extraction model. In accordance with the facts of the present disclosure, the use of rules allows for easy capture of long-range dependencies of relationships and for providing better interpretability.

Description

Method, device, device and medium for relation extraction

Cross References to Related Applications

This application claims the priority of the Chinese Invention Patent Application No. 202111161205.4, entitled "Method, Apparatus, Device and Medium for Relation Extraction", with the filing date of September 30, 2021, which is incorporated by reference in its entirety Incorporated into this article.

technical field

Various implementations of the present disclosure relate to the computer field, and more specifically, relate to a method, device, device, and computer storage medium for relation extraction.

Background technique

Currently, document-level relation extraction methods have attracted much attention. Document-level relational extraction can be applied to fields such as question answering and search. Typically, sequence-based models or graph-based models can be leveraged to account for longer contexts and long-range dependencies of relationships in documents. For example, the representation of long-range relationships can be computed through pooling operations, or the nodes in the graph can be used to represent distant entities in documents, so as to better characterize the long-range relationships between entities.

However, the interpretability of the long-range relationships extracted by the above methods is poor. Therefore, there is a need for document-level relation extraction methods that can provide better interpretability.

Contents of the invention

In a first aspect of the present disclosure, a method for training a relation extraction model is provided. The method comprises: based on a given triplet consisting of a target entity pair in a document and a target relationship associated with said target entity pair, determining a probability distribution of a set of rules conditional on the given triplet, said The target relationship is selected from a set of relationships used to describe the relationship between the entity pairs in the document, and the set of rules is used to describe the logic of the relationship between the target entity pairs; A probability distribution for a set of rules conditional on , determining a probability distribution conditional on a given triplet for a score indicating whether the target relationship is valid for the target entity pair in the document; and The tag value corresponding to the score is obtained by maximizing the likelihood function of the parameter of the probability distribution of the score under the condition of the given triplet, and the trained relationship extraction model is obtained.

In a second aspect of the present disclosure, an apparatus for training a relation extraction model is provided. The apparatus includes: a rule probability determination module configured to determine, based on a given triplet consisting of a target entity pair in a document and a target relationship associated with the target entity pair, under the condition of a given triplet a probability distribution of a set of rules, the target relationship is selected from a set of relationships used to describe the relationship between entity pairs in the document, and the set of rules is used to describe the logic of the relationship between the target entity pairs; score a probability determination module configured to determine a probability distribution conditional on a triplet of scores, based on said probability distribution conditional on said set of rules given triplets, said score indicating that in said document whether the target relationship is valid for the target entity pair; and an optimization module configured to, based on the flag value corresponding to the score, by making the parameter of the probability distribution of the score conditional on the given triplet The likelihood function is maximized to obtain the trained relation extraction model.

In a third aspect of the present disclosure, an electronic device is provided, including: a memory and a processor; wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of the first aspect.

In a fourth aspect of the present disclosure, a computer-readable storage medium is provided, on which one or more computer instructions are stored, wherein one or more computer instructions are executed by a processor to implement the method according to the first aspect of the present disclosure .

In a fifth aspect of the present disclosure, a method for extracting a relationship model is provided. The method comprises: based on a given triplet consisting of a target entity pair in a document and a target relationship associated with said target entity pair, generating a set of rules for describing the logic of the linkage between said target entity pair, The target relationship is selected from a set of relationships used to describe the relationship between the entity pair in the document; based on the set of rules, at least one path between the target entity pair is determined; and based on the at least Entity pairs and associated relationships traversed by a path determine a score indicating whether the target relationship is valid for the target entity pair in the document.

In a sixth aspect of the present disclosure, an apparatus for relation extraction models is provided. The device includes: a rule generation module configured to generate a rule for describing the relationship between the target entity pair based on a given triplet consisting of a target entity pair in the document and a target relationship associated with the target entity pair A set of rules of the logic, the target relationship is selected from a set of relationships used to describe the relationship between the entity pairs in the document; the path determination module is configured to determine in the target based on the set of rules at least one path between entity pairs; and a score determination module configured to determine, based on the entity pairs and associated relationships traversed by the at least one path, whether the target relationship is indicated in the document for the target entity pair Valid score.

In a seventh aspect of the present disclosure, an electronic device is provided, including: a memory and a processor; wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the The fifth aspect of the method.

In an eighth aspect of the present disclosure, there is provided a computer-readable storage medium, on which one or more computer instructions are stored, wherein one or more computer instructions are executed by a processor to implement the method according to the fifth aspect of the present disclosure .

According to various embodiments of the present disclosure, by utilizing rules for logical reasoning, long-range dependencies of relationships can be easily captured and better interpretability provided. In addition, by iteratively optimizing the parameters and hidden variables of the probability model, the rules as hidden variables can be automatically learned while optimizing the model parameters, so that the relationship in the document can be extracted based on the rules generated for the document to obtain better relation extraction performance. Furthermore, the conventional relation extraction model can be easily modified to implement some functions according to the embodiments of the present disclosure, so this solution has high portability.

Description of drawings

The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, the same or similar reference numerals indicate the same or similar elements, wherein:

Figure 1 shows a schematic diagram of an example environment in which various embodiments of the present disclosure can be implemented;

Fig. 2 shows a schematic diagram of an example process of relation extraction according to some embodiments of the present disclosure;

FIG. 3 shows a flowchart of an example method of training a relation extraction model according to some embodiments of the present disclosure;

Figure 4 shows a flowchart of an example method of an optimization process according to some embodiments of the present disclosure.

Fig. 5 shows a schematic structural block diagram of an apparatus for relation extraction according to some embodiments of the present disclosure; and

Figure 6 shows a block diagram of a computing device capable of implementing various embodiments of the present disclosure.

Detailed ways

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.

In the description of the embodiments of the present disclosure, the term "comprising" and its similar expressions should be interpreted as an open inclusion, that is, "including but not limited to". The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be read as "at least one embodiment". The terms "first", "second", etc. may refer to different or the same object. Other definitions, both express and implied, may also be included below.

As mentioned above, the interpretability of long-range relations obtained with current relation extraction methods is usually poor.

In order to at least partially solve one or more of the above-mentioned problems and other potential problems, example embodiments of the present disclosure propose a method for relation extraction model. The method includes: based on the target relationship associated with the target entity pair in the set of relationships describing the relationship between the entity pairs in the document and the target entity pair, generating a set of rules for describing the logic of the relationship between the target entity pair, each A rule is represented by a sequence of a plurality of relationships in a set of relationships; based on the set of rules, at least one path between target entity pairs is determined; and based on at least one entity pair traversed by the at least one path and associated relationships, determining The score of whether the target relationship is valid for the target entity pair.

In this way, by using rules for logical reasoning, our scheme can easily capture the long-range dependencies of relations and provide better interpretability. In addition, this scheme can automatically learn the rules suitable for the document and extract the relationship in the document based on the generated rules, so as to obtain better relationship extraction performance.

Embodiments of the present disclosure will be specifically described below with reference to the accompanying drawings.

FIG. 1 shows a schematic diagram of an example environment 100 in which various embodiments of the present disclosure can be implemented. In this example environment 100 , computing device 110 may receive document 120 . Computing device 110 may be any suitable device with computing capabilities. Document 120 may include multiple sentences. It should be understood that the text length of the document 120 may be longer than that of sentence-level or sentence-sequence-level relation extraction. However, the scope of the present disclosure is not limited to the text length of the document. For example, as shown in FIG. 1, document 120 may only include three sentences.

Computing device 110 may utilize relation extraction model 130 to extract relations 140 from documents 120 . Document 120 may include multiple entities (a collection of entities may be denoted ε). For example, in the example of document 120 shown in FIG. 1 , document 120 may include multiple entities such as "England", "Harry", "William", "Kate". These entities can be paired into entity pairs.

A set of relationships describing the connection between pairs of entities in document 120 . Relationship 140 may be bidirectional or unidirectional. For example, the relationship "is a friend" is two-way and the relationship "is a wife" is one-way. Relationships 140 may include relationships that describe associations between different pairs of entities. For example, relationship 140 may include the relationship "is a prince" or "is a member of the royal family" describing the connection between "Harry" and "Britain". As another example, relationship 140 may include the relationship "is spouse" or "is husband" describing the connection between "Harry" and "Meghan".

Various examples of documents 120 and relationships 140 are described above with reference to FIG. 1 , and it should be understood that documents 120 and relationships 140 shown in FIG. 1 are merely illustrative and not intended to limit the present disclosure.

The process of extracting the relationship 140 from the document 120 by the computing device 110 using the relationship extraction model 130 will be described in detail below with reference to FIG. 2 . FIG. 2 shows a schematic diagram of an example process 200 of relation extraction according to some embodiments of the present disclosure.

As shown in FIG. 2 , the relation extraction model 130 can receive a document 120 and consist of a target entity pair (denoted as (h, t), h, t∈ε) and an associated target relation (denoted as r(h, t)) A given triple (denoted as q=(h, r, t)) 210 of . The target entity pair may be an entity pair in the document 120 that the user is interested in. The target entity pair may also be any entity pair in the document 120 .

The target relation r associated with the target entity pair may be any suitable set of relations (denoted as

) in any relationship. The target relationship can also be the one of a set of relationships that best matches the target entity set. The target relationship may also be a user-selected relationship from a set of relationships. a set of relationships

is a set of relationships determined empirically by the user.

Relation extraction model 130 may utilize rule generator 220 and relation extractor 230 to determine whether a target relation is valid for a target entity pair in document 120 . The relation extraction model 130 may output a score for a given triple 210 to indicate whether the target relation is valid for the target entity pair in the document 120 .

For example, the relation extraction model 130 may determine that the target relation "is a member of the royal family" is valid for the target entity pair ("Kate", "UK") in the document 120, and output a confirmed Triple 240. In another example (not shown), the relation extraction model 130 may determine that the target relation "is wife" is not valid for the target entity pair ("Kate", "Harry") in the document 120 .

Specifically, the rule generator 220 may determine a set of rules describing the logic of the relationship between the target entity pair based on the target entity pair and the target relationship. The rule generator 220 may be any suitable model that determines a set of rules that describe the logic of relationships between target entity pairs based on target entity pairs and target relationships. Rule generator 220 may be any suitable sequence generation model. In some implementations, rule generator 220 may be an autoregressive model, such as an autoregressive model based on a Transformer model. In one example, the rule generator 220 may be a Transformer model with a 2-layer encoder and a 2-layer decoder.

In some implementations, the rule generator 220 can generate a sequence of relations based on the target entity pair and the target relation r (which can be denoted as [r ₁ ,...,r _l ], where

). Based on the generated relationship sequence, the rule generator 220 can determine a corresponding rule (denoted as rule). A rule may take the form r←r ₁ ∧...∧r _l . For example, referring to the example of FIG. 2 , an example of a rule may be the relation "is royal" ← relation "is spouse" ∧ relation "is sibling" ∧ relation "is royal".

A rule can be represented by a sequence of multiple relations. For example, a rule can be expressed as [r,r1,...,rl]. Alternatively, the rules can be represented as [r1,...,rl,r]. The scope of the present disclosure is not limited to the specific expression method of the rules.

The rule generator 220 can be used to perform multiple samplings to determine a set of rules (denoted as z) describing the logic of the relationship between the target entity pair. When generating each relation in the relation sequence [r ₁ , _.

) probability distribution. Thus, a set of rules z can be generated using rule generator 220 by sampling a plurality of candidates for a relation.

Based on the generated set of rules, relationship extractor 230 may determine at least one path between the target entity pair that satisfies the rule, and based on the determined path, determine a score indicating whether the target relationship is valid for the target entity pair. The relation extractor 230 may be any suitable model that realizes the functions described above. In some implementations, relation extractor 230 may be a modified version of a conventional relation extraction model. For example, the relation extractor 230 may use a sequence-based model or a graph-based model for relation extraction as a backbone model, and add additional units to implement functions according to some embodiments of the present disclosure.

In some implementations, relationship extractor 230 may be used to determine at least one path between target entity pairs that satisfies a rule. In other words, for each rule in the generated set of rules, the additional unit may determine one or more corresponding paths between target entity pairs that satisfy the rule. The corresponding path starts from the start entity (eg, h) of the target entity pair and ends at the end entity (eg, t) of the target entity pair, and the logic of the relationship between the entity pairs passed by the path satisfies this rule.

For example, with reference to the example of Figure 2, for the rule: "is a member of the royal family" ← "is a spouse" ∧ "is a sibling" ∧ "is a member of the royal family", it is possible to determine the relationship between the target entity pair ("Kate", "UK") One path for: "Kate" "is a royal" "UK" ← "Kate" "is a spouse" "William" "is a sibling" "Harry" "is a royal" "UK".

In some implementations, relationship extractor 230 may determine, based on the paths determined for each rule, a score that indicates whether the target relationship is valid for the target entity pair. The score may be determined based on the entity pairs traversed by the path and the associated relationships. Details of determining the score will be described in detail with reference to FIG. 3 .

Using rule generator 220 and relation extractor 230, a set of rules may be generated for a document 120 and a given triple 210, and based on the rules determine a score indicating whether a target relationship is valid for a target entity pair. In this way, the interaction between entities and relations can be exploited to explicitly describe the long-range dependencies of relations, thus improving the accuracy and interpretability of relation extraction.

An example process 200 of relation extraction is described above with reference to FIG. 2 . It should be understood that the process shown in FIG. 2 is only illustrative and not intended to limit the scope of the present disclosure. For example, the relationship extraction model 130 may also include other units such as pre-processing and post-processing units to implement some embodiments of the present disclosure. For another example, the relation extraction model 130 may receive multiple given triples 210 and separately determine whether the target relation in each given triple 210 is valid for the target entity pair.

The parameterization and training process of the relation extraction model 130 will be described in detail below with reference to FIGS. 3 to 4 . According to the solution of the present disclosure, the relationship extraction model 130 can be represented by a probability model, and a set of rules z can be used as hidden variables in the probability model.

The task of relation extraction can be defined as given a document D and given a triplet q, determine the probability distribution of the score y

where y can be a binary random variable, and the value of y indicates whether a given triple holds or not. For example, y ∈ {-1, 1} can be set, y = 1 indicating that the target relation r in the given triple q is valid for the target entity pair (h, t), and y = -1 indicating that the given triple The target relation r in q is invalid for the target entity pair (h,t). It should be understood that the task of relationship extraction can also be defined in other forms, for example, y can be a ternary random variable, and the value of y can indicate that the target relationship is positively correlated with the target entity pair, the reverse relationship of the target relationship is positively correlated with the target entity pair, The target relationship is not related to the target entity pair, etc.

Probability distribution of score y

is defined as:

where p _θ (z|q) represents the probability distribution of a set of rules determined by rule generator 220 under the condition of given triples (and documents), θ represents the learnable parameters of rule generator 220, p _w ( y|q, z) denote the probability distribution of scores determined by the relation extractor 230 given triples and a set of rules (and documents), and w denote the learnable parameters of the relation extractor 230. For simplicity, it is assumed that the distribution of documents is independent of the distribution of a set of rules, and the expression "under the condition of documents" is omitted below.

FIG. 3 shows a flowchart of an example method 300 of training the relation extraction model 130 according to some embodiments of the present disclosure. The method 300 may be implemented, for example, at the computing device 110 of FIG. 1 .

At block 302 , the computing device 110 determines a regular set of probability distributions p _θ (z|q) conditioned on the given triples, based on the target relationship and the target entity pair. In some implementations, p _θ (z|q)˜Multi(z|N, AutoReg _θ (rule|q)) may be assumed. In other words, the rule generator 220 can generate a set of rules z (including N rules) that obey multivariate normal distribution, and the N rules obey the corresponding probability distribution AutoReg _θ (rule|q) respectively. AutoReg _θ defines a regular probability distribution conditioned on a given triplet q. Alternatively, p _θ (z|q) may be determined using other suitable methods. For example, a set of rules z subject to other types of independent and identical distributions may be generated by the rule generator 220 .

At block 304, the computing device 110 _{determines a} probability distribution pw,θ of scores conditional on the given triplet based at least on a regular set of probability distributions _pθ (z|q) conditional on the given triplet (y|q).

In some implementations, the relationship extractor 230 can determine at least one path between the target entity pair based on the determined set of rules. Based on the entity pairs traversed by at least one path and the associated relations, the relation extractor 230 may determine a probability distribution _pw (y|q,z) of scores given triples and a set of rules. Based on p _θ (z|q) and p _w (y|q, z), the relation extractor 230 may determine a probability distribution p _{w, θ} (y|q) of scores conditioned on a given triple.

In some implementations, for each rule in the determined set of rules z, a corresponding path may be determined. The corresponding path is defined as starting from the start entity h in the target entity pair and ending with the end entity t in the target entity pair, and the logic of the connection between the entity pairs passed by the path satisfies the rule. It should be understood that a variety of methods can be used to determine the path satisfying the above definition, and the scope of the present disclosure is not limited in this respect.

In some implementations, p _w (y|q, z) can be defined according to the following equation:

p _w (y | q, z) = Sigmoid (y score _w (q, z)) (2)

where _φw (q) and _φw (q,rule) are learnable scalar parameters, and _φw (rule) represents the reachability of the path from the start entity to the end entity in the target entity pair following the rule.

Represents a collection of at least one path between target entity pairs determined based on rules. φ _w (e _i-1 , r _i , e _i ) represents the confidence that the relation ri is valid for the entity pair (ei-1,ei). φ _w (e _i-1 , r _i , e _i ) can be obtained using any suitable relation extraction method. For example, φ _w (e _i−1 , _ri , e _i ) can be obtained by using the backbone model of the relation extractor 230 .

It should be understood that the above formulas (2)-(6) are only exemplary, and other suitable methods can be used to define p _w (y|q, z). For example, other fuzzy logic functions can be used to transform score _w (q, z) into p _w (y|q, z).

Additionally, it should be noted that the prediction score for a given triplet 210 can be calculated using equation (3) during the inference phase. In some implementations, the prediction score score _w (q, z) is a continuous value around 0, and the larger the value, the greater the possibility of the establishment of a given triple, that is, the more likely the target relationship is valid for the target entity pair big.

At block 306, the computing device 110, based on the label value y* corresponding to the score y, maximizes the likelihood function of the parameters of the probability distribution _{pw, θ} (y|q) of the score conditioned on the triplet , to obtain the trained relation extraction model 130. The labeled value y* refers to the human-annotated ground truth value indicating whether a given triple holds, that is, the ground truth value indicating whether the target relation is valid for the target entity pair. In some implementations, the likelihood function of the parameters of _{θ(y|q) by making the probability distribution pw,}

Maximizing, the parameters w and θ can be estimated to obtain a trained relation extraction model 130 .

In some implementations, the likelihood function can be made by iteratively updating the parameters w and θ and the latent variable z

maximize. Based on the current values of the parameters w and θ, the posterior probability distribution of the latent variable z can be determined. Then, based on the posterior probability distribution of the latent variable z, the updated values of the parameters w and θ can be determined by maximizing the likelihood function. Iterating in this way until convergence, the parameters w and θ and the latent variable z can be estimated.

For example, the parameters w and θ and the latent variable z can be iteratively updated using an Expectation-Maximization (EM) algorithm. In the expectation (E) step, the expectation of the hidden variable z can be determined based on the current values of the parameters w and θ, that is, the posterior probability distribution of the hidden variable z. In the maximization (M) step, updated values of the parameters w and θ may be determined by maximizing the likelihood function. Alternatively or additionally, an approximate posterior method can be used to determine the parameters w and θ and the latent variable z.

In some implementations, an approximate posterior probability distribution of the hidden variable z may be determined instead of an exact posterior probability distribution of the hidden variable z, thereby simplifying the optimization process of the parameters w and θ and the hidden variable z. In some implementations, the parameters w and θ can be determined by maximizing the lower bound of the likelihood function, thereby further simplifying the optimization process of the parameters w and θ and the latent variable z.

In some examples, as shown in the following formula (7), the approximate posterior probability distribution q(z) of the hidden variable z can be used to replace the exact posterior probability distribution p(z|y,q) of the hidden variable z, and can By making the lower bound

maximize the likelihood function

maximize.

In some implementations, a suitable approximate posterior probability distribution q(z) for the latent variable z may be determined such that KL(q(z)||p _{w, θ} (z|q, y))≧0 is satisfied. The approximate posterior probability distribution can be determined by performing Taylor expansion or variational approximation on the posterior probability distribution.

In some implementations, the probability distribution of a set of rules (i.e., the prior probability distribution of rules), the determined entity pair and associated relationship of at least one path traversed by a given triplet, and the label value to determine the scoring function for each rule in a set of rules. A scoring function estimates the quality of each rule. For example, the scoring function H(rule) of each rule can be determined with reference to the following formula (8).

Based on the scoring function for each rule, a posterior probability distribution for the corresponding rule can be determined. For example, the following formula (9) can be referred to to determine the posterior probability distribution of the corresponding rule

Based on the posterior probability distribution for each rule and the number of rules in the set, an approximate posterior probability distribution q(z) for the set of rules can be determined. For example, q(z) can obey

It should be understood that the above formulas (8)-(9) are only exemplary, and other suitable methods can be used to determine the approximate posterior probability distribution q(z) of the hidden variable z.

In some implementations, by using

Maximize, you can make the lower bound

maximize. in

for the rule generator 220 and the relation extractor 230 respectively. In some implementations, it is also possible to

equivalently converts to

In the case that q(z) has been determined, conventional parameter estimation methods can be used to determine updated values of parameters w and θ. For example, the method of gradient descent can be used to determine the updated values of parameters w and θ.

The process of iteratively updating the parameters w and θ and the latent variable z will be described in detail below with reference to FIG. 4 . FIG. 4 shows a flowchart of an example method 400 of an optimization process according to some embodiments of the present disclosure. Method 400 may be implemented at computing device 110 shown in FIG. 1 . It should be understood that the optimization process shown in FIG. 4 is exemplary only, and the scope of the present disclosure is not limited in this respect.

As shown in FIG. 4 , at block 402 , computing device 110 may utilize rule generator 220 to generate a set of rules. A set of rules satisfying p _θ (z|q)˜Multi(z|N, AutoReg _θ (rule|q)) can be generated by the rule generator 220 based on the initial parameter θ or the updated current parameter θ.

At block 404, computing device 110 may compute a score function for each rule in a set of rules, thereby determining a posterior probability distribution for each rule

The probability distribution of a set of rules (i.e., the prior probability distribution of the rules) under the condition of a given triplet, the determined entity pair and associated relationship of at least one path passed by the relationship extractor 230, and the label value to determine the scoring function H(rule) for each rule in a set of rules. Based on the score function H(rule) of each rule, the posterior probability distribution of the corresponding rule can be determined by the relation extractor 230

At block 406, computing device 110 may base the posterior probability distribution from each rule on

The first set of update rules sampled to update the corresponding AutoReg _θ (rule|q). In some implementations, computing device 110 may use

Maximize to determine the update value of the parameter θ, thereby updating AutoReg _θ (rule|q), that is, to update a set of regular probability distribution p _θ (z|q) under the condition of a given triple.

At block 408, computing device 110 may update the probability distribution p θ (z|q) for the given triplet based on a second set of update rules sampled from the updated probability distribution p _θ (z|q) of the set of rules conditional on the given triplet. Probability distribution p _w (y|q, z) of scores conditioned on groups and a set of rules. In some implementations, the rule generator 220 may generate a second set of update rules satisfying p _θ (z|q)˜Multi(z|N, AutoReg _θ (rule|q)) based on the updated current parameter θ. Based on the second set of update rules, computing device 110 may use

Maximize to determine the updated value of the parameter w, thereby updating the probability distribution _pw (y|q, z) of the score given the triplet and a set of rules.

The above describes the relationship extraction method and the construction and training process of the relationship extraction model 130 according to some embodiments of the present disclosure with reference to FIGS. 1 to 4 .

In this way, long-range dependencies of relations can be easily captured and better interpretability provided by utilizing rules for logical reasoning. In addition, by iteratively optimizing the parameters and hidden variables of the probability model, the rules as hidden variables can be automatically learned while optimizing the model parameters, so that the relationship in the document can be extracted based on the rules generated for the document to obtain better relation extraction performance. Furthermore, the conventional relation extraction model can be easily modified to implement some functions according to the embodiments of the present disclosure, so this solution has high portability.

It should be understood that the relation extraction model 130 according to some embodiments of the present disclosure may also be trained in other suitable ways.

Embodiments of the present disclosure also provide corresponding devices for implementing the above method or process. Fig. 5 shows a schematic structural block diagram of an apparatus 500 for relation extraction according to some embodiments of the present disclosure.

As shown in FIG. 5 , the apparatus 500 may include a rule generation module 510 configured to, based on a given triplet consisting of a target entity pair in a document and a target relationship associated with the target entity pair, generate A set of rules describing the logic of the relationship between target entity pairs, the target relationship being selected from a set of relationships used to describe the relationship between entity pairs in the document. In addition, the apparatus 500 further includes a path determination module 520 configured to determine at least one path between the pair of target entities based on the set of rules. The apparatus 500 further includes a score determination module 530 configured to determine a score indicating whether the target relationship is valid for the target entity pair in the document based on the entity pair and the associated relationship passed by the at least one path.

In some embodiments, the path determination module 520 further includes a path exploration module configured to, for each rule in the set of rules, determine a corresponding path, the path starting from the target entity pair and ends at the end entity in the target entity pair, and the logic of the relationship between the entity pairs passed by the path satisfies the rule.

Embodiments of the present disclosure also provide an apparatus for training a relation extraction model. The apparatus may include a rule probability determination module configured to determine, based on a given triplet consisting of a target entity pair in the document and a target relationship associated with the target entity pair, under the condition of the given triplet A probability distribution of a set of rules, the target relationship is selected from a set of relationships used to describe the relationship between the entity pairs in the document, and the set of rules is used to describe the logic of the relationship between the target entity pairs. The apparatus also includes a score probability determination module configured to determine a probability distribution of scores conditioned on a given triplet based on the probability distribution conditioned on the set of rules given the triplet, the score indicating Whether the target relationship is valid for the target entity pair in the document. The apparatus also includes an optimization module configured to obtain a trained The relation extraction model.

In some embodiments, the score probability determination module includes a path finding module configured to determine at least one path between the pair of target entities based on the set of rules. The scoring probability determination module further includes: a first probability determination module configured to determine the score given a triplet and a set of rules based on the entity pairs passed by the at least one path and the associated relationship. Probability distributions. The score probability determination module also includes a second probability determination module configured to be based on the probability distribution of the set of rules conditioned on the given triples and the probability distribution conditioned on the given triples and the set of rules Probability Distribution of Scores, Determines the probability distribution of scores conditioned on the triples.

In some embodiments, the optimization module includes a posterior probability determination module configured to determine a posterior probability distribution for the set of rules based on current values of the parameters. The optimization module also includes a likelihood function maximization module configured to determine an updated value for the parameter by maximizing the likelihood function based on the posterior probability distribution of the set of rules.

In some embodiments, the posterior probability determination module includes a score function determination module configured to be based on said probability distribution of a set of rules conditioned on a given triplet, pairs of entities traversed by said at least one path, and The associated relationship, and the flag value, determine a scoring function for each rule in the set of rules. The posterior probability determination module also includes a first posterior probability determination module configured to determine a posterior probability distribution for each rule based on the scoring function for each rule. The posterior probability determination module also includes a second posterior probability determination module configured to determine an approximate posterior probability of the set of rules based on the posterior probability distribution of each rule and the number of rules in the set of rules. The posterior probability distribution is used as the posterior probability distribution of the set of rules.

In some embodiments, the likelihood function maximization module includes a lower bound maximization module configured to maximize the lower bound of the likelihood function, the lower bound of the likelihood function being an approximate posterior of the set of rules Probability distribution association.

In some embodiments, the floor maximization module includes a first sampling module configured to sample the first set of updated rules based on an approximate posterior probability distribution of the set of rules. The floor maximization module also includes a first update module configured to update the probability distribution of the set of rules conditioned on the given triples based on the first set of update rules. The floor maximization module also includes a second sampling module configured to sample a second updated set of rules based on the updated probability distribution of the set of rules conditioned on the given triples. The floor maximization module also includes a second update module configured to update the probability distribution of scores conditioned on the triplet and the set of rules based on the second set of update rules.

In some embodiments, each rule in the set of rules is represented by a sequence of a plurality of relationships in the set of relationships.

In some embodiments, the optimization module includes an expectation maximization module configured to perform maximum likelihood estimation of the parameters using an expectation maximization algorithm.

Units or modules included in the apparatus 500 for relation extraction and the apparatus for training a relation model may be implemented in various ways, including software, hardware, firmware or any combination thereof. Taking apparatus 500 as an example, in some embodiments, one or more units may be implemented using software and/or firmware, such as machine-executable instructions stored on a storage medium. In addition to or instead of machine-executable instructions, some or all of the units in apparatus 500 may be at least partially implemented by one or more hardware logic components. Exemplary types of hardware logic components that may be used include, by way of example and not limitation, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logic Devices (CPLD), and so on.

FIG. 6 shows a block diagram of a computing device/server 600 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the computing device/server 600 shown in FIG. 6 is exemplary only and should not constitute any limitation on the functionality and scope of the embodiments described herein.

As shown in Figure 6, computing device/server 600 is in the form of a general purpose computing device. Components of computing device/server 600 may include, but are not limited to, one or more processors or processing units 610, memory 620, storage devices 630, one or more communication units 640, one or more input devices 650, and one or more output device 660. The processing unit 610 may be an actual or virtual processor and is capable of performing various processes according to programs stored in the memory 620 . In a multi-processor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capability of the computing device/server 600 .

Computing device/server 600 typically includes multiple computer storage media. Such media can be any available media that is accessible to computing device/server 600 , including but not limited to, volatile and nonvolatile media, removable and non-removable media. Memory 620 can be volatile memory (eg, registers, cache, random access memory (RAM)), nonvolatile memory (eg, read only memory (ROM), electrically erasable programmable read only memory (EEPROM) , flash memory) or some combination of them. Storage device 630 may be removable or non-removable media, and may include machine-readable media, such as flash drives, magnetic disks, or any other media that may be capable of storing information and/or data (e.g., training data for training ) and can be accessed within computing device/server 600.

Computing device/server 600 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in FIG. 6, a disk drive for reading from or writing to a removable, nonvolatile disk (such as a "floppy disk") and a disk drive for reading from a removable, nonvolatile disk may be provided. CD-ROM drive for reading or writing. In these cases, each drive may be connected to the bus (not shown) by one or more data media interfaces. Memory 620 may include a computer program product 626 having one or more program modules configured to perform the various methods or actions of the various embodiments of the present disclosure.

The communication unit 640 enables communication with other computing devices through the communication medium. Additionally, the functionality of the components of computing device/server 600 may be implemented in a single computing cluster or as a plurality of computing machines capable of communicating via communication links. Accordingly, computing device/server 600 may operate in a networked environment using logical connections to one or more other servers, a network personal computer (PC), or another network node.

The input device 650 may be one or more input devices, such as a mouse, keyboard, trackball, and the like. Output device 660 may be one or more output devices, such as a display, speakers, printer, or the like. The computing device/server 600 can also communicate with one or more external devices (not shown) through the communication unit 640 as needed, such as storage devices, display devices, etc., and one or more external devices that allow users to communicate with the computing device/server The devices that interact with 600 communicate, or communicate with any device (eg, network card, modem, etc.) that enables computing device/server 600 to communicate with one or more other computing devices. Such communication may be performed via an input/output (I/O) interface (not shown).

According to an exemplary implementation of the present disclosure, there is provided a computer-readable storage medium on which one or more computer instructions are stored, wherein the one or more computer instructions are executed by a processor to implement the method described above.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products implemented according to the disclosure. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processing unit of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operation steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various implementations of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a program segment, or a portion of an instruction that contains one or more executable instruction. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.

Having described various implementations of the present disclosure above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed implementations. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described implementations. The terminology used herein is chosen to best explain the principle of each implementation, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each implementation disclosed herein.

Claims

A method of training a relation extraction model, comprising:

Based on a given triplet consisting of a target entity pair in a document and a target relationship associated with said target entity pair, determine the probability distribution of a set of rules conditional on the given triplet, said target relationship being selected from a set of relations for describing the relationship between the entity pairs in the document, and the set of rules for describing the logic of the relationship between the target entity pairs;

Based on the probability distribution of the set of rules conditioned on the given triples, a probability distribution of scores conditioned on the given triples is determined, the score indicating that the target relationship is significant in the document for the Whether the target entity pair is valid; and

The trained relation extraction model is obtained by maximizing a likelihood function of a parameter of the probability distribution of scores conditioned on the given triples, based on the flag values corresponding to the scores.
The method of claim 1, wherein determining a probability distribution of scores conditioned on a given triple comprises:

determining at least one path between the pair of target entities based on the set of rules;

determining said probability distribution of scores given triples and a set of rules based on pairs of entities traversed by said at least one path and associated relationships; and

Based on said probability distribution of a set of rules conditioned on a given triple and said probability distribution of scores conditioned on a given triple and a set of rules, determining said The probability distribution of the next score.
The method of claim 2, wherein maximizing the likelihood function of a parameter of the probability distribution scored conditioned on the triples comprises:

determining a posterior probability distribution for the set of rules based on current values of the parameters; and

An updated value for the parameter is determined by maximizing the likelihood function based on the posterior probability distribution of the set of rules.
The method of claim 3, wherein determining the posterior probability distribution of the set of rules comprises:

Based on the probability distribution of the set of rules under the condition of the given triples, the entity pairs passed by the at least one path and the associated relationship, and the tag value, determine the A scoring function for each rule;

determining a posterior probability distribution for each rule based on the scoring function for each rule; and

An approximate posterior probability distribution for the set of rules is determined as the posterior probability distribution for the set of rules based on the posterior probability distribution for each rule and the number of rules in the set of rules.
The method of claim 4, wherein maximizing the likelihood function comprises:

A lower bound of the likelihood function is maximized, the lower bound of the likelihood function being associated with an approximate posterior probability distribution for the set of rules.
The method of claim 5, wherein maximizing the lower bound of the likelihood function comprises:

sampling a first set of update rules based on an approximate posterior probability distribution of said set of rules;

updating the probability distribution of the set of rules conditioned on the given triples based on the first set of updated rules;

sampling a second set of updated rules based on the updated probability distribution of the set of rules conditional on the triplet; and

Based on the second set of update rules, the probability distribution of scores conditioned on the triplet and a set of rules is updated.
The method of claim 1, wherein each rule in the set of rules is represented by a sequence of a plurality of relationships in the set of relationships.
The method of claim 1 , wherein maximizing the likelihood function of a parameter of the probability distribution scored conditioned on the triples comprises:

The parameters are estimated with maximum likelihood using an expectation-maximization algorithm.
A method for relation extraction comprising:

Based on a given triplet consisting of a target entity pair in a document and a target relationship associated with the target entity pair, generate a set of rules for describing the logic of the connection between the target entity pair, the target relationship selected from a set of relationships describing relationships between pairs of entities in said document;

determining at least one path between the pair of target entities based on the set of rules; and

A score indicating whether the target relationship is valid for the target entity pair in the document is determined based on the entity pair traversed by the at least one path and the associated relationship.
The method of claim 9, wherein determining at least one path between the pair of target entities comprises:

for each rule in the set of rules, determine a corresponding path that begins with a start entity of the pair of target entities and ends with an end entity of the pair of target entities; and

The logic of the connection between the entity pairs passed by the path satisfies the rule.
The method of claim 9, wherein each rule in the set of rules is represented by a sequence of a plurality of relationships in the set of relationships.
A device for training relation extraction, comprising:

a rule probability determination module configured to determine, based on a given triplet consisting of a target entity pair in a document and a target relationship associated with said target entity pair, the probability of a set of rules conditional on the given triplet distribution, the target relationship is selected from a set of relationships used to describe the relationship between the entity pairs in the document, and the set of rules is used to describe the logic of the relationship between the target entity pairs;

a score probability determination module configured to determine, based on the probability distribution of a set of rules conditioned on the given triples, a probability distribution of scores conditioned on the given triples, the scores indicating the Whether the target relationship described in is valid for the target entity pair; and

an optimization module configured to obtain said relation extraction trained by maximizing a likelihood function of a parameter of said probability distribution of scores conditioned on a given triplet, based on a flag value corresponding to said score Model.
The apparatus of claim 12, wherein the scoring probability determination module comprises:

a path finding module configured to determine at least one path between the pair of target entities based on the set of rules;

A first probability determination module configured to determine the probability distribution of scores given triples and a set of rules based on entity pairs and associated relationships traversed by the at least one path; and

The second probability determination module, configured to determine the Describes the probability distribution of scores conditioned on a given triplet.
The apparatus of claim 13, wherein said optimization module comprises:

a posterior probability determination module configured to determine a posterior probability distribution for the set of rules based on current values of the parameters; and

A likelihood function maximization module configured to determine an updated value of the parameter by maximizing the likelihood function based on the posterior probability distribution of the set of rules.
The apparatus according to claim 14, wherein said posterior probability determination module comprises:

A scoring function determination module configured to determine, based on the probability distribution of a set of rules under the condition of a given triplet, the entity pairs passed by the at least one path and the associated relationship, and the tag value a scoring function for each rule in the set of rules;

a first posterior probability determination module configured to determine a posterior probability distribution for each rule based on a scoring function for each rule; and

The second posterior probability determination module is configured to determine an approximate posterior probability distribution of the set of rules based on the posterior probability distribution of each rule and the number of rules in the set of rules as the A regular set of posterior probability distributions.
The apparatus according to claim 15, wherein said likelihood function maximization module comprises:

A lower bound maximization module configured to maximize a lower bound of the likelihood function associated with an approximate posterior probability distribution of the set of rules.
A device for relation extraction, comprising:

A rule generation module configured to generate a logic rule for describing the relationship between the target entity pair based on a given triplet consisting of the target entity pair in the document and the target relationship associated with the target entity pair. a set of rules, the target relationship being selected from a set of relationships describing relationships between pairs of entities in the document;

a path determination module configured to determine at least one path between the pair of target entities based on the set of rules; and

A score determination module configured to determine a score indicating whether the target relationship is valid for the target entity pair in the document based on the entity pair traversed by the at least one path and the associated relationship.
The apparatus according to claim 17, wherein said path determination module comprises:

a path exploration module configured to, for each rule in the set of rules, determine a corresponding path that begins at the start entity in the pair of target entities and ends at the end in the pair of target entities entity; and

The logic of the connection between the entity pairs passed by the path satisfies the rule.
An electronic device comprising:

memory and processor;

Wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method according to any one of claims 1-11.
A computer-readable storage medium having one or more computer instructions stored thereon, wherein the one or more computer instructions are executed by a processor to implement the method according to any one of claims 1-11.