WO2023051142A1 - Method and apparatus for relationship extraction, device and medium - Google Patents

Method and apparatus for relationship extraction, device and medium Download PDF

Info

Publication number
WO2023051142A1
WO2023051142A1 PCT/CN2022/116286 CN2022116286W WO2023051142A1 WO 2023051142 A1 WO2023051142 A1 WO 2023051142A1 CN 2022116286 W CN2022116286 W CN 2022116286W WO 2023051142 A1 WO2023051142 A1 WO 2023051142A1
Authority
WO
WIPO (PCT)
Prior art keywords
rules
probability distribution
target
relationship
rule
Prior art date
Application number
PCT/CN2022/116286
Other languages
French (fr)
Chinese (zh)
Inventor
孙长志
茹栋宇
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023051142A1 publication Critical patent/WO2023051142A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • Various implementations of the present disclosure relate to the computer field, and more specifically, relate to a method, device, device, and computer storage medium for relation extraction.
  • Document-level relation extraction can be applied to fields such as question answering and search.
  • sequence-based models or graph-based models can be leveraged to account for longer contexts and long-range dependencies of relationships in documents.
  • the representation of long-range relationships can be computed through pooling operations, or the nodes in the graph can be used to represent distant entities in documents, so as to better characterize the long-range relationships between entities.
  • a method for training a relation extraction model comprises: based on a given triplet consisting of a target entity pair in a document and a target relationship associated with said target entity pair, determining a probability distribution of a set of rules conditional on the given triplet, said The target relationship is selected from a set of relationships used to describe the relationship between the entity pairs in the document, and the set of rules is used to describe the logic of the relationship between the target entity pairs; A probability distribution for a set of rules conditional on , determining a probability distribution conditional on a given triplet for a score indicating whether the target relationship is valid for the target entity pair in the document; and The tag value corresponding to the score is obtained by maximizing the likelihood function of the parameter of the probability distribution of the score under the condition of the given triplet, and the trained relationship extraction model is obtained.
  • an apparatus for training a relation extraction model includes: a rule probability determination module configured to determine, based on a given triplet consisting of a target entity pair in a document and a target relationship associated with the target entity pair, under the condition of a given triplet a probability distribution of a set of rules, the target relationship is selected from a set of relationships used to describe the relationship between entity pairs in the document, and the set of rules is used to describe the logic of the relationship between the target entity pairs; score a probability determination module configured to determine a probability distribution conditional on a triplet of scores, based on said probability distribution conditional on said set of rules given triplets, said score indicating that in said document whether the target relationship is valid for the target entity pair; and an optimization module configured to, based on the flag value corresponding to the score, by making the parameter of the probability distribution of the score conditional on the given triplet The likelihood function is maximized to obtain the trained relation extraction model.
  • an electronic device including: a memory and a processor; wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of the first aspect.
  • a computer-readable storage medium on which one or more computer instructions are stored, wherein one or more computer instructions are executed by a processor to implement the method according to the first aspect of the present disclosure .
  • a method for extracting a relationship model comprises: based on a given triplet consisting of a target entity pair in a document and a target relationship associated with said target entity pair, generating a set of rules for describing the logic of the linkage between said target entity pair, The target relationship is selected from a set of relationships used to describe the relationship between the entity pair in the document; based on the set of rules, at least one path between the target entity pair is determined; and based on the at least Entity pairs and associated relationships traversed by a path determine a score indicating whether the target relationship is valid for the target entity pair in the document.
  • an apparatus for relation extraction models includes: a rule generation module configured to generate a rule for describing the relationship between the target entity pair based on a given triplet consisting of a target entity pair in the document and a target relationship associated with the target entity pair A set of rules of the logic, the target relationship is selected from a set of relationships used to describe the relationship between the entity pairs in the document; the path determination module is configured to determine in the target based on the set of rules at least one path between entity pairs; and a score determination module configured to determine, based on the entity pairs and associated relationships traversed by the at least one path, whether the target relationship is indicated in the document for the target entity pair Valid score.
  • an electronic device including: a memory and a processor; wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the The fifth aspect of the method.
  • a computer-readable storage medium on which one or more computer instructions are stored, wherein one or more computer instructions are executed by a processor to implement the method according to the fifth aspect of the present disclosure .
  • the rules as hidden variables can be automatically learned while optimizing the model parameters, so that the relationship in the document can be extracted based on the rules generated for the document to obtain better relation extraction performance.
  • the conventional relation extraction model can be easily modified to implement some functions according to the embodiments of the present disclosure, so this solution has high portability.
  • Figure 1 shows a schematic diagram of an example environment in which various embodiments of the present disclosure can be implemented
  • Fig. 2 shows a schematic diagram of an example process of relation extraction according to some embodiments of the present disclosure
  • FIG. 3 shows a flowchart of an example method of training a relation extraction model according to some embodiments of the present disclosure
  • Figure 4 shows a flowchart of an example method of an optimization process according to some embodiments of the present disclosure.
  • Fig. 5 shows a schematic structural block diagram of an apparatus for relation extraction according to some embodiments of the present disclosure.
  • Figure 6 shows a block diagram of a computing device capable of implementing various embodiments of the present disclosure.
  • example embodiments of the present disclosure propose a method for relation extraction model.
  • the method includes: based on the target relationship associated with the target entity pair in the set of relationships describing the relationship between the entity pairs in the document and the target entity pair, generating a set of rules for describing the logic of the relationship between the target entity pair, each A rule is represented by a sequence of a plurality of relationships in a set of relationships; based on the set of rules, at least one path between target entity pairs is determined; and based on at least one entity pair traversed by the at least one path and associated relationships, determining The score of whether the target relationship is valid for the target entity pair.
  • our scheme can easily capture the long-range dependencies of relations and provide better interpretability.
  • this scheme can automatically learn the rules suitable for the document and extract the relationship in the document based on the generated rules, so as to obtain better relationship extraction performance.
  • FIG. 1 shows a schematic diagram of an example environment 100 in which various embodiments of the present disclosure can be implemented.
  • computing device 110 may receive document 120 .
  • Computing device 110 may be any suitable device with computing capabilities.
  • Document 120 may include multiple sentences. It should be understood that the text length of the document 120 may be longer than that of sentence-level or sentence-sequence-level relation extraction. However, the scope of the present disclosure is not limited to the text length of the document. For example, as shown in FIG. 1, document 120 may only include three sentences.
  • Document 120 may include multiple entities (a collection of entities may be denoted ⁇ ). For example, in the example of document 120 shown in FIG. 1 , document 120 may include multiple entities such as "England”, “Harry”, “William”, “Kate”. These entities can be paired into entity pairs.
  • Relationship 140 may be bidirectional or unidirectional. For example, the relationship "is a friend” is two-way and the relationship "is a wife" is one-way. Relationships 140 may include relationships that describe associations between different pairs of entities. For example, relationship 140 may include the relationship “is a prince” or “is a member of the royal family” describing the connection between "Harry” and “Britain”. As another example, relationship 140 may include the relationship "is spouse” or “is husband” describing the connection between "Harry” and “Meghan”.
  • documents 120 and relationships 140 are described above with reference to FIG. 1 , and it should be understood that documents 120 and relationships 140 shown in FIG. 1 are merely illustrative and not intended to limit the present disclosure.
  • FIG. 2 shows a schematic diagram of an example process 200 of relation extraction according to some embodiments of the present disclosure.
  • the target entity pair may be an entity pair in the document 120 that the user is interested in.
  • the target entity pair may also be any entity pair in the document 120 .
  • the target relation r associated with the target entity pair may be any suitable set of relations (denoted as ) in any relationship.
  • the target relationship can also be the one of a set of relationships that best matches the target entity set.
  • the target relationship may also be a user-selected relationship from a set of relationships.
  • a set of relationships is a set of relationships determined empirically by the user.
  • Relation extraction model 130 may utilize rule generator 220 and relation extractor 230 to determine whether a target relation is valid for a target entity pair in document 120 .
  • the relation extraction model 130 may output a score for a given triple 210 to indicate whether the target relation is valid for the target entity pair in the document 120 .
  • the relation extraction model 130 may determine that the target relation "is a member of the royal family" is valid for the target entity pair ("Kate", "UK") in the document 120, and output a confirmed Triple 240. In another example (not shown), the relation extraction model 130 may determine that the target relation "is wife" is not valid for the target entity pair ("Kate", "Harry") in the document 120 .
  • the rule generator 220 may determine a set of rules describing the logic of the relationship between the target entity pair based on the target entity pair and the target relationship.
  • the rule generator 220 may be any suitable model that determines a set of rules that describe the logic of relationships between target entity pairs based on target entity pairs and target relationships.
  • Rule generator 220 may be any suitable sequence generation model.
  • rule generator 220 may be an autoregressive model, such as an autoregressive model based on a Transformer model.
  • the rule generator 220 may be a Transformer model with a 2-layer encoder and a 2-layer decoder.
  • the rule generator 220 can generate a sequence of relations based on the target entity pair and the target relation r (which can be denoted as [r 1 ,...,r l ], where ). Based on the generated relationship sequence, the rule generator 220 can determine a corresponding rule (denoted as rule).
  • a rule may take the form r ⁇ r 1 ⁇ ... ⁇ r l .
  • an example of a rule may be the relation "is royal” ⁇ relation "is spouse” ⁇ relation "is sibling” ⁇ relation "is royal”.
  • a rule can be represented by a sequence of multiple relations.
  • a rule can be expressed as [r,r1,...,rl].
  • the rules can be represented as [r1,...,rl,r].
  • the scope of the present disclosure is not limited to the specific expression method of the rules.
  • the rule generator 220 can be used to perform multiple samplings to determine a set of rules (denoted as z) describing the logic of the relationship between the target entity pair.
  • a set of rules z can be generated using rule generator 220 by sampling a plurality of candidates for a relation.
  • relationship extractor 230 may determine at least one path between the target entity pair that satisfies the rule, and based on the determined path, determine a score indicating whether the target relationship is valid for the target entity pair.
  • the relation extractor 230 may be any suitable model that realizes the functions described above.
  • relation extractor 230 may be a modified version of a conventional relation extraction model.
  • the relation extractor 230 may use a sequence-based model or a graph-based model for relation extraction as a backbone model, and add additional units to implement functions according to some embodiments of the present disclosure.
  • relationship extractor 230 may be used to determine at least one path between target entity pairs that satisfies a rule.
  • the additional unit may determine one or more corresponding paths between target entity pairs that satisfy the rule. The corresponding path starts from the start entity (eg, h) of the target entity pair and ends at the end entity (eg, t) of the target entity pair, and the logic of the relationship between the entity pairs passed by the path satisfies this rule.
  • relationship extractor 230 may determine, based on the paths determined for each rule, a score that indicates whether the target relationship is valid for the target entity pair. The score may be determined based on the entity pairs traversed by the path and the associated relationships. Details of determining the score will be described in detail with reference to FIG. 3 .
  • a set of rules may be generated for a document 120 and a given triple 210, and based on the rules determine a score indicating whether a target relationship is valid for a target entity pair. In this way, the interaction between entities and relations can be exploited to explicitly describe the long-range dependencies of relations, thus improving the accuracy and interpretability of relation extraction.
  • relationship extraction model 130 may also include other units such as pre-processing and post-processing units to implement some embodiments of the present disclosure.
  • relation extraction model 130 may receive multiple given triples 210 and separately determine whether the target relation in each given triple 210 is valid for the target entity pair.
  • the relationship extraction model 130 can be represented by a probability model, and a set of rules z can be used as hidden variables in the probability model.
  • y can be a ternary random variable, and the value of y can indicate that the target relationship is positively correlated with the target entity pair, the reverse relationship of the target relationship is positively correlated with the target entity pair, The target relationship is not related to the target entity pair, etc.
  • q) represents the probability distribution of a set of rules determined by rule generator 220 under the condition of given triples (and documents)
  • represents the learnable parameters of rule generator 220
  • q, z) denote the probability distribution of scores determined by the relation extractor 230 given triples and a set of rules (and documents)
  • w denote the learnable parameters of the relation extractor 230.
  • FIG. 3 shows a flowchart of an example method 300 of training the relation extraction model 130 according to some embodiments of the present disclosure.
  • the method 300 may be implemented, for example, at the computing device 110 of FIG. 1 .
  • the computing device 110 determines a regular set of probability distributions p ⁇ (z
  • q) Multi(z
  • the rule generator 220 can generate a set of rules z (including N rules) that obey multivariate normal distribution, and the N rules obey the corresponding probability distribution AutoReg ⁇ (rule
  • AutoReg ⁇ defines a regular probability distribution conditioned on a given triplet q.
  • q) may be determined using other suitable methods.
  • a set of rules z subject to other types of independent and identical distributions may be generated by the rule generator 220 .
  • the computing device 110 determines a probability distribution pw, ⁇ of scores conditional on the given triplet based at least on a regular set of probability distributions p ⁇ (z
  • the relationship extractor 230 can determine at least one path between the target entity pair based on the determined set of rules. Based on the entity pairs traversed by at least one path and the associated relations, the relation extractor 230 may determine a probability distribution pw (y
  • a corresponding path may be determined for each rule in the determined set of rules z.
  • the corresponding path is defined as starting from the start entity h in the target entity pair and ending with the end entity t in the target entity pair, and the logic of the connection between the entity pairs passed by the path satisfies the rule. It should be understood that a variety of methods can be used to determine the path satisfying the above definition, and the scope of the present disclosure is not limited in this respect.
  • q, z) can be defined according to the following equation:
  • ⁇ w (q) and ⁇ w (q,rule) are learnable scalar parameters
  • ⁇ w (rule) represents the reachability of the path from the start entity to the end entity in the target entity pair following the rule.
  • ⁇ w (e i-1 , r i , e i ) represents the confidence that the relation ri is valid for the entity pair (ei-1,ei).
  • ⁇ w (e i-1 , r i , e i ) can be obtained using any suitable relation extraction method. For example, ⁇ w (e i ⁇ 1 , ri , e i ) can be obtained by using the backbone model of the relation extractor 230 .
  • the prediction score for a given triplet 210 can be calculated using equation (3) during the inference phase.
  • the prediction score score w (q, z) is a continuous value around 0, and the larger the value, the greater the possibility of the establishment of a given triple, that is, the more likely the target relationship is valid for the target entity pair big.
  • the computing device 110 based on the label value y* corresponding to the score y, maximizes the likelihood function of the parameters of the probability distribution pw, ⁇ (y
  • the labeled value y* refers to the human-annotated ground truth value indicating whether a given triple holds, that is, the ground truth value indicating whether the target relation is valid for the target entity pair.
  • q) by making the probability distribution pw, Maximizing, the parameters w and ⁇ can be estimated to obtain a trained relation extraction model 130 .
  • the likelihood function can be made by iteratively updating the parameters w and ⁇ and the latent variable z maximize. Based on the current values of the parameters w and ⁇ , the posterior probability distribution of the latent variable z can be determined. Then, based on the posterior probability distribution of the latent variable z, the updated values of the parameters w and ⁇ can be determined by maximizing the likelihood function. Iterating in this way until convergence, the parameters w and ⁇ and the latent variable z can be estimated.
  • the parameters w and ⁇ and the latent variable z can be iteratively updated using an Expectation-Maximization (EM) algorithm.
  • EM Expectation-Maximization
  • the expectation of the hidden variable z can be determined based on the current values of the parameters w and ⁇ , that is, the posterior probability distribution of the hidden variable z.
  • the maximization (M) step updated values of the parameters w and ⁇ may be determined by maximizing the likelihood function.
  • an approximate posterior method can be used to determine the parameters w and ⁇ and the latent variable z.
  • an approximate posterior probability distribution of the hidden variable z may be determined instead of an exact posterior probability distribution of the hidden variable z, thereby simplifying the optimization process of the parameters w and ⁇ and the hidden variable z.
  • the parameters w and ⁇ can be determined by maximizing the lower bound of the likelihood function, thereby further simplifying the optimization process of the parameters w and ⁇ and the latent variable z.
  • the approximate posterior probability distribution q(z) of the hidden variable z can be used to replace the exact posterior probability distribution p(z
  • a suitable approximate posterior probability distribution q(z) for the latent variable z may be determined such that KL(q(z)
  • the approximate posterior probability distribution can be determined by performing Taylor expansion or variational approximation on the posterior probability distribution.
  • the probability distribution of a set of rules i.e., the prior probability distribution of rules
  • the determined entity pair and associated relationship of at least one path traversed by a given triplet and the label value to determine the scoring function for each rule in a set of rules.
  • a scoring function estimates the quality of each rule.
  • the scoring function H(rule) of each rule can be determined with reference to the following formula (8).
  • a posterior probability distribution for the corresponding rule can be determined. For example, the following formula (9) can be referred to to determine the posterior probability distribution of the corresponding rule
  • an approximate posterior probability distribution q(z) for the set of rules can be determined. For example, q(z) can obey
  • FIG. 4 shows a flowchart of an example method 400 of an optimization process according to some embodiments of the present disclosure.
  • Method 400 may be implemented at computing device 110 shown in FIG. 1 . It should be understood that the optimization process shown in FIG. 4 is exemplary only, and the scope of the present disclosure is not limited in this respect.
  • computing device 110 may utilize rule generator 220 to generate a set of rules.
  • q)) can be generated by the rule generator 220 based on the initial parameter ⁇ or the updated current parameter ⁇ .
  • computing device 110 may compute a score function for each rule in a set of rules, thereby determining a posterior probability distribution for each rule
  • the probability distribution of a set of rules i.e., the prior probability distribution of the rules
  • the posterior probability distribution of the corresponding rule can be determined by the relation extractor 230
  • computing device 110 may base the posterior probability distribution from each rule on The first set of update rules sampled to update the corresponding AutoReg ⁇ (rule
  • computing device 110 may update the probability distribution p ⁇ (z
  • the rule generator 220 may generate a second set of update rules satisfying p ⁇ (z
  • computing device 110 may use Maximize to determine the updated value of the parameter w, thereby updating the probability distribution pw (y
  • relation extraction model 130 may also be trained in other suitable ways.
  • FIG. 5 shows a schematic structural block diagram of an apparatus 500 for relation extraction according to some embodiments of the present disclosure.
  • the apparatus 500 may include a rule generation module 510 configured to, based on a given triplet consisting of a target entity pair in a document and a target relationship associated with the target entity pair, generate A set of rules describing the logic of the relationship between target entity pairs, the target relationship being selected from a set of relationships used to describe the relationship between entity pairs in the document.
  • the apparatus 500 further includes a path determination module 520 configured to determine at least one path between the pair of target entities based on the set of rules.
  • the apparatus 500 further includes a score determination module 530 configured to determine a score indicating whether the target relationship is valid for the target entity pair in the document based on the entity pair and the associated relationship passed by the at least one path.
  • the path determination module 520 further includes a path exploration module configured to, for each rule in the set of rules, determine a corresponding path, the path starting from the target entity pair and ends at the end entity in the target entity pair, and the logic of the relationship between the entity pairs passed by the path satisfies the rule.
  • Embodiments of the present disclosure also provide an apparatus for training a relation extraction model.
  • the apparatus may include a rule probability determination module configured to determine, based on a given triplet consisting of a target entity pair in the document and a target relationship associated with the target entity pair, under the condition of the given triplet A probability distribution of a set of rules, the target relationship is selected from a set of relationships used to describe the relationship between the entity pairs in the document, and the set of rules is used to describe the logic of the relationship between the target entity pairs.
  • the apparatus also includes a score probability determination module configured to determine a probability distribution of scores conditioned on a given triplet based on the probability distribution conditioned on the set of rules given the triplet, the score indicating Whether the target relationship is valid for the target entity pair in the document.
  • the apparatus also includes an optimization module configured to obtain a trained The relation extraction model.
  • the score probability determination module includes a path finding module configured to determine at least one path between the pair of target entities based on the set of rules.
  • the scoring probability determination module further includes: a first probability determination module configured to determine the score given a triplet and a set of rules based on the entity pairs passed by the at least one path and the associated relationship. Probability distributions.
  • the score probability determination module also includes a second probability determination module configured to be based on the probability distribution of the set of rules conditioned on the given triples and the probability distribution conditioned on the given triples and the set of rules Probability Distribution of Scores, Determines the probability distribution of scores conditioned on the triples.
  • the optimization module includes a posterior probability determination module configured to determine a posterior probability distribution for the set of rules based on current values of the parameters.
  • the optimization module also includes a likelihood function maximization module configured to determine an updated value for the parameter by maximizing the likelihood function based on the posterior probability distribution of the set of rules.
  • the posterior probability determination module includes a score function determination module configured to be based on said probability distribution of a set of rules conditioned on a given triplet, pairs of entities traversed by said at least one path, and The associated relationship, and the flag value, determine a scoring function for each rule in the set of rules.
  • the posterior probability determination module also includes a first posterior probability determination module configured to determine a posterior probability distribution for each rule based on the scoring function for each rule.
  • the posterior probability determination module also includes a second posterior probability determination module configured to determine an approximate posterior probability of the set of rules based on the posterior probability distribution of each rule and the number of rules in the set of rules. The posterior probability distribution is used as the posterior probability distribution of the set of rules.
  • the likelihood function maximization module includes a lower bound maximization module configured to maximize the lower bound of the likelihood function, the lower bound of the likelihood function being an approximate posterior of the set of rules Probability distribution association.
  • the floor maximization module includes a first sampling module configured to sample the first set of updated rules based on an approximate posterior probability distribution of the set of rules.
  • the floor maximization module also includes a first update module configured to update the probability distribution of the set of rules conditioned on the given triples based on the first set of update rules.
  • the floor maximization module also includes a second sampling module configured to sample a second updated set of rules based on the updated probability distribution of the set of rules conditioned on the given triples.
  • the floor maximization module also includes a second update module configured to update the probability distribution of scores conditioned on the triplet and the set of rules based on the second set of update rules.
  • each rule in the set of rules is represented by a sequence of a plurality of relationships in the set of relationships.
  • the optimization module includes an expectation maximization module configured to perform maximum likelihood estimation of the parameters using an expectation maximization algorithm.
  • Units or modules included in the apparatus 500 for relation extraction and the apparatus for training a relation model may be implemented in various ways, including software, hardware, firmware or any combination thereof.
  • one or more units may be implemented using software and/or firmware, such as machine-executable instructions stored on a storage medium.
  • some or all of the units in apparatus 500 may be at least partially implemented by one or more hardware logic components.
  • Exemplary types of hardware logic components include, by way of example and not limitation, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logic Devices (CPLD), and so on.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logic Devices
  • FIG. 6 shows a block diagram of a computing device/server 600 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the computing device/server 600 shown in FIG. 6 is exemplary only and should not constitute any limitation on the functionality and scope of the embodiments described herein.
  • computing device/server 600 is in the form of a general purpose computing device.
  • Components of computing device/server 600 may include, but are not limited to, one or more processors or processing units 610, memory 620, storage devices 630, one or more communication units 640, one or more input devices 650, and one or more output device 660.
  • the processing unit 610 may be an actual or virtual processor and is capable of performing various processes according to programs stored in the memory 620 .
  • multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capability of the computing device/server 600 .
  • Computing device/server 600 typically includes multiple computer storage media. Such media can be any available media that is accessible to computing device/server 600 , including but not limited to, volatile and nonvolatile media, removable and non-removable media.
  • Memory 620 can be volatile memory (eg, registers, cache, random access memory (RAM)), nonvolatile memory (eg, read only memory (ROM), electrically erasable programmable read only memory (EEPROM) , flash memory) or some combination of them.
  • Storage device 630 may be removable or non-removable media, and may include machine-readable media, such as flash drives, magnetic disks, or any other media that may be capable of storing information and/or data (e.g., training data for training ) and can be accessed within computing device/server 600.
  • Computing device/server 600 may further include additional removable/non-removable, volatile/nonvolatile storage media.
  • a disk drive for reading from or writing to a removable, nonvolatile disk such as a "floppy disk"
  • a disk drive for reading from a removable, nonvolatile disk may be provided.
  • CD-ROM drive for reading or writing.
  • each drive may be connected to the bus (not shown) by one or more data media interfaces.
  • Memory 620 may include a computer program product 626 having one or more program modules configured to perform the various methods or actions of the various embodiments of the present disclosure.
  • the communication unit 640 enables communication with other computing devices through the communication medium. Additionally, the functionality of the components of computing device/server 600 may be implemented in a single computing cluster or as a plurality of computing machines capable of communicating via communication links. Accordingly, computing device/server 600 may operate in a networked environment using logical connections to one or more other servers, a network personal computer (PC), or another network node.
  • PC network personal computer
  • the input device 650 may be one or more input devices, such as a mouse, keyboard, trackball, and the like.
  • Output device 660 may be one or more output devices, such as a display, speakers, printer, or the like.
  • the computing device/server 600 can also communicate with one or more external devices (not shown) through the communication unit 640 as needed, such as storage devices, display devices, etc., and one or more external devices that allow users to communicate with the computing device/server
  • the devices that interact with 600 communicate, or communicate with any device (eg, network card, modem, etc.) that enables computing device/server 600 to communicate with one or more other computing devices. Such communication may be performed via an input/output (I/O) interface (not shown).
  • I/O input/output
  • a computer-readable storage medium on which one or more computer instructions are stored, wherein the one or more computer instructions are executed by a processor to implement the method described above.
  • These computer-readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processing unit of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
  • each block in a flowchart or block diagram may represent a module, a program segment, or a portion of an instruction that contains one or more executable instruction.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are a method and apparatus for training a relationship extraction model, and a device and a storage medium. The method described herein comprises: on the basis of a given triple consisting of a target entity pair in a document and of a target relationship associated with the target entity pair, determining a probability distribution of a set of rules under the conditions of the given triplet, the set of rules being used for describing the logic of the associations between target entity pairs. On the basis of the probability distribution of the set of rules under the conditions of the given triplet, determining the probability distribution of scores under the conditions of the given triplet, a score indicating whether a target relationship in a document is valid for a target entity pair. On the basis of marker values corresponding to the scores, maximizing a likelihood function of parameters of the probability distribution of the scores under the conditions of the given triplet, and thus obtaining a trained relationship extraction model. In accordance with the facts of the present disclosure, the use of rules allows for easy capture of long-range dependencies of relationships and for providing better interpretability.

Description

用于关系抽取的方法、装置、设备和介质Method, device, device and medium for relation extraction
相关申请的交叉引用Cross References to Related Applications
本申请要求申请号为202111161205.4,题为“用于关系抽取的方法、装置、设备和介质”、申请日为2021年9月30日的中国发明专利申请的优先权,通过引用方式将该申请整体并入本文。This application claims the priority of the Chinese Invention Patent Application No. 202111161205.4, entitled "Method, Apparatus, Device and Medium for Relation Extraction", with the filing date of September 30, 2021, which is incorporated by reference in its entirety Incorporated into this article.
技术领域technical field
本公开的各实现方式涉及计算机领域,更具体地,涉及用于关系抽取的方法、装置、设备和计算机存储介质。Various implementations of the present disclosure relate to the computer field, and more specifically, relate to a method, device, device, and computer storage medium for relation extraction.
背景技术Background technique
目前,文档级别的关系抽取方法备受关注。文档级别的关系抽取可以应用于问答、搜索等领域。通常,可以利用基于序列的模型或基于图的模型来考虑文档中更长的上下文和关系的长程依赖性。例如,可以通过池化操作来计算长程关系的表示,或者可以利用图中的节点来表示文档中距离较远的实体,从而更好地表征实体之间的长程关系。Currently, document-level relation extraction methods have attracted much attention. Document-level relational extraction can be applied to fields such as question answering and search. Typically, sequence-based models or graph-based models can be leveraged to account for longer contexts and long-range dependencies of relationships in documents. For example, the representation of long-range relationships can be computed through pooling operations, or the nodes in the graph can be used to represent distant entities in documents, so as to better characterize the long-range relationships between entities.
然而,利用上述方法抽取出的长程关系的可解释性较差。因此,需要能够提供更好的可解释性的文档级别的关系抽取方法。However, the interpretability of the long-range relationships extracted by the above methods is poor. Therefore, there is a need for document-level relation extraction methods that can provide better interpretability.
发明内容Contents of the invention
在本公开的第一方面,提供了一种训练关系抽取模型的方法。该方法包括:基于由文档中的目标实体对和与所述目标实体对关联的目标关系组成的给定三元组,确定在给定三元组的条件下一组规则的概率分布,所述目标关系选自用于描述所述文档中的实体对之间联系的一组关系,所述一组规则用于描述所述目标实体对之间联系的逻辑;基于所述在给定三元组的条件下一组规则的概率分布,确定在给定三元组的条件下得分的概率分布,所述得 分指示在所述文档中所述目标关系对于所述目标实体对是否有效;以及基于与所述得分对应的标记值,通过使所述在给定三元组的条件下得分的概率分布的参数的似然函数最大化,获得经训练的所述关系抽取模型。In a first aspect of the present disclosure, a method for training a relation extraction model is provided. The method comprises: based on a given triplet consisting of a target entity pair in a document and a target relationship associated with said target entity pair, determining a probability distribution of a set of rules conditional on the given triplet, said The target relationship is selected from a set of relationships used to describe the relationship between the entity pairs in the document, and the set of rules is used to describe the logic of the relationship between the target entity pairs; A probability distribution for a set of rules conditional on , determining a probability distribution conditional on a given triplet for a score indicating whether the target relationship is valid for the target entity pair in the document; and The tag value corresponding to the score is obtained by maximizing the likelihood function of the parameter of the probability distribution of the score under the condition of the given triplet, and the trained relationship extraction model is obtained.
在本公开的第二方面中,提供了一种用于训练关系抽取模型的装置。该装置包括:规则概率确定模块,被配置为基于由文档中的目标实体对和与所述目标实体对关联的目标关系组成的给定三元组,确定在给定三元组的条件下一组规则的概率分布,所述目标关系选自用于描述所述文档中的实体对之间联系的一组关系,所述一组规则用于描述所述目标实体对之间联系的逻辑;得分概率确定模块,被配置为基于所述在给定三元组的条件下一组规则的概率分布,确定在给定三元组的条件下得分的概率分布,所述得分指示在所述文档中所述目标关系对于所述目标实体对是否有效;以及优化模块,被配置为基于与所述得分对应的标记值,通过使所述在给定三元组的条件下得分的概率分布的参数的似然函数最大化,获得经训练的所述关系抽取模型。In a second aspect of the present disclosure, an apparatus for training a relation extraction model is provided. The apparatus includes: a rule probability determination module configured to determine, based on a given triplet consisting of a target entity pair in a document and a target relationship associated with the target entity pair, under the condition of a given triplet a probability distribution of a set of rules, the target relationship is selected from a set of relationships used to describe the relationship between entity pairs in the document, and the set of rules is used to describe the logic of the relationship between the target entity pairs; score a probability determination module configured to determine a probability distribution conditional on a triplet of scores, based on said probability distribution conditional on said set of rules given triplets, said score indicating that in said document whether the target relationship is valid for the target entity pair; and an optimization module configured to, based on the flag value corresponding to the score, by making the parameter of the probability distribution of the score conditional on the given triplet The likelihood function is maximized to obtain the trained relation extraction model.
在本公开的第三方面,提供了一种电子设备,包括:存储器和处理器;其中存储器用于存储一条或多条计算机指令,其中一条或多条计算机指令被处理器执行以实现根据本公开的第一方面的方法。In a third aspect of the present disclosure, an electronic device is provided, including: a memory and a processor; wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of the first aspect.
在本公开的第四方面,提供了一种计算机可读存储介质,其上存储有一条或多条计算机指令,其中一条或多条计算机指令被处理器执行实现根据本公开的第一方面的方法。In a fourth aspect of the present disclosure, a computer-readable storage medium is provided, on which one or more computer instructions are stored, wherein one or more computer instructions are executed by a processor to implement the method according to the first aspect of the present disclosure .
在本公开的第五方面,提供了一种关系抽取模型的方法。该方法包括:基于由文档中的目标实体对和与所述目标实体对关联的目标关系组成的给定三元组,生成用于描述所述目标实体对之间联系的逻辑的一组规则,所述目标关系选自用于描述所述文档中的实体对之间联系的一组关系;基于所述一组规则,确定在所述目标实体对之间的至少一个路径;以及基于所述至少一个路径途 经的实体对以及关联的关系,确定指示在所述文档中所述目标关系对于所述目标实体对是否有效的得分。In a fifth aspect of the present disclosure, a method for extracting a relationship model is provided. The method comprises: based on a given triplet consisting of a target entity pair in a document and a target relationship associated with said target entity pair, generating a set of rules for describing the logic of the linkage between said target entity pair, The target relationship is selected from a set of relationships used to describe the relationship between the entity pair in the document; based on the set of rules, at least one path between the target entity pair is determined; and based on the at least Entity pairs and associated relationships traversed by a path determine a score indicating whether the target relationship is valid for the target entity pair in the document.
在本公开的第六方面,提供了一种用于关系抽取模型的装置。该装置包括:规则生成模块,被配置为基于由文档中的目标实体对和与所述目标实体对关联的目标关系组成的给定三元组,生成用于描述所述目标实体对之间联系的逻辑的一组规则,所述目标关系选自用于描述所述文档中的实体对之间联系的一组关系;路径确定模块,被配置为基于所述一组规则,确定在所述目标实体对之间的至少一个路径;以及得分确定模块,被配置为基于所述至少一个路径途经的实体对以及关联的关系,确定指示在所述文档中所述目标关系对于所述目标实体对是否有效的得分。In a sixth aspect of the present disclosure, an apparatus for relation extraction models is provided. The device includes: a rule generation module configured to generate a rule for describing the relationship between the target entity pair based on a given triplet consisting of a target entity pair in the document and a target relationship associated with the target entity pair A set of rules of the logic, the target relationship is selected from a set of relationships used to describe the relationship between the entity pairs in the document; the path determination module is configured to determine in the target based on the set of rules at least one path between entity pairs; and a score determination module configured to determine, based on the entity pairs and associated relationships traversed by the at least one path, whether the target relationship is indicated in the document for the target entity pair Valid score.
在本公开的第七方面,提供了一种电子设备,包括:存储器和处理器;其中存储器用于存储一条或多条计算机指令,其中一条或多条计算机指令被处理器执行以实现根据本公开的第五方面的方法。In a seventh aspect of the present disclosure, an electronic device is provided, including: a memory and a processor; wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the The fifth aspect of the method.
在本公开的第八方面,提供了一种计算机可读存储介质,其上存储有一条或多条计算机指令,其中一条或多条计算机指令被处理器执行实现根据本公开的第五方面的方法。In an eighth aspect of the present disclosure, there is provided a computer-readable storage medium, on which one or more computer instructions are stored, wherein one or more computer instructions are executed by a processor to implement the method according to the fifth aspect of the present disclosure .
根据本公开的各种实施例,通过利用规则进行逻辑推理,可以容易地捕获关系的长程依赖性并且提供较好的可解释性。此外,通过迭代优化概率模型的参数和隐变量,可以在优化模型参数的同时自动学习作为隐变量的规则,从而能够基于针对文档所生成的规则来抽取该文档中的关系,以获得更好的关系抽取性能。再者,可以容易地对常规的关系抽取模型进行修改来实现根据本公开的实施例的一些功能,因而本方案具有较高的可移植性。According to various embodiments of the present disclosure, by utilizing rules for logical reasoning, long-range dependencies of relationships can be easily captured and better interpretability provided. In addition, by iteratively optimizing the parameters and hidden variables of the probability model, the rules as hidden variables can be automatically learned while optimizing the model parameters, so that the relationship in the document can be extracted based on the rules generated for the document to obtain better relation extraction performance. Furthermore, the conventional relation extraction model can be easily modified to implement some functions according to the embodiments of the present disclosure, so this solution has high portability.
附图说明Description of drawings
结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附 图标注表示相同或相似的元素,其中:The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, the same or similar reference numerals indicate the same or similar elements, wherein:
图1示出了本公开的多个实施例能够在其中实现的示例环境的示意图;Figure 1 shows a schematic diagram of an example environment in which various embodiments of the present disclosure can be implemented;
图2示出了根据本公开的一些实施例的关系抽取的示例过程的示意图;Fig. 2 shows a schematic diagram of an example process of relation extraction according to some embodiments of the present disclosure;
图3示出了根据本公开的一些实施例的训练关系抽取模型的示例方法的流程图;FIG. 3 shows a flowchart of an example method of training a relation extraction model according to some embodiments of the present disclosure;
图4示出了根据本公开一些实施例的优化过程的示例方法的流程图。Figure 4 shows a flowchart of an example method of an optimization process according to some embodiments of the present disclosure.
图5示出了根据本公开的一些实施例的用于关系抽取的装置的示意性结构框图;以及Fig. 5 shows a schematic structural block diagram of an apparatus for relation extraction according to some embodiments of the present disclosure; and
图6示出了能够实施本公开的多个实施例的计算设备的框图。Figure 6 shows a block diagram of a computing device capable of implementing various embodiments of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“第一”、“第二”等等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。In the description of the embodiments of the present disclosure, the term "comprising" and its similar expressions should be interpreted as an open inclusion, that is, "including but not limited to". The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be read as "at least one embodiment". The terms "first", "second", etc. may refer to different or the same object. Other definitions, both express and implied, may also be included below.
如上所述,利用目前的关系抽取方法得到的长程关系的可解释性通常较差。As mentioned above, the interpretability of long-range relations obtained with current relation extraction methods is usually poor.
为了至少部分地解决上述问题以及其他潜在问题中的一个或 者多个问题,本公开的示例实施例提出了一种关系抽取模型的方法。该方法包括:基于描述文档中实体对之间联系的一组关系中与目标实体对关联的目标关系以及目标实体对,生成用于描述目标实体对之间联系的逻辑的一组规则,每个规则由一组关系中的多个关系的序列表示;基于一组规则,确定在目标实体对之间的至少一个路径;以及至少基于至少一个路径途经的实体对以及关联的关系,确定指示在文档中目标关系对于目标实体对是否有效的得分。In order to at least partially solve one or more of the above-mentioned problems and other potential problems, example embodiments of the present disclosure propose a method for relation extraction model. The method includes: based on the target relationship associated with the target entity pair in the set of relationships describing the relationship between the entity pairs in the document and the target entity pair, generating a set of rules for describing the logic of the relationship between the target entity pair, each A rule is represented by a sequence of a plurality of relationships in a set of relationships; based on the set of rules, at least one path between target entity pairs is determined; and based on at least one entity pair traversed by the at least one path and associated relationships, determining The score of whether the target relationship is valid for the target entity pair.
基于这样的方式,通过利用规则进行逻辑推理,本方案可以容易地捕获关系的长程依赖性并且提供较好的可解释性。此外,本方案可以针对文档自动学习适合该文档的规则并基于所生成的规则来抽取该文档中的关系,从而获得更好的关系抽取性能。In this way, by using rules for logical reasoning, our scheme can easily capture the long-range dependencies of relations and provide better interpretability. In addition, this scheme can automatically learn the rules suitable for the document and extract the relationship in the document based on the generated rules, so as to obtain better relationship extraction performance.
以下将参照附图来具体描述本公开的实施例。Embodiments of the present disclosure will be specifically described below with reference to the accompanying drawings.
图1示出了本公开的多个实施例能够在其中实现的示例环境100的示意图。在该示例环境100中,计算设备110可以接收文档120。计算设备110可以是任何适合的具有计算能力的设备。文档120可以包括多个句子。应理解,相较于句子级别或句子的序列级别的关系抽取,文档120的文本长度可以更长。但是,本公开的范围对于文档的文本长度不作限制。例如,如图1所示,文档120可以仅包括三个句子。FIG. 1 shows a schematic diagram of an example environment 100 in which various embodiments of the present disclosure can be implemented. In this example environment 100 , computing device 110 may receive document 120 . Computing device 110 may be any suitable device with computing capabilities. Document 120 may include multiple sentences. It should be understood that the text length of the document 120 may be longer than that of sentence-level or sentence-sequence-level relation extraction. However, the scope of the present disclosure is not limited to the text length of the document. For example, as shown in FIG. 1, document 120 may only include three sentences.
计算设备110可以利用关系抽取模型130来从文档120抽取关系140。文档120可以包括多个实体(实体的集合可以记为ε)。例如,在图1示出的文档120的示例中,文档120可以包括诸如“英国”、“哈里”、“威廉”、“凯特”之类的多个实体。这些实体可以两两组成实体对。 Computing device 110 may utilize relation extraction model 130 to extract relations 140 from documents 120 . Document 120 may include multiple entities (a collection of entities may be denoted ε). For example, in the example of document 120 shown in FIG. 1 , document 120 may include multiple entities such as "England", "Harry", "William", "Kate". These entities can be paired into entity pairs.
描述文档120中的多个实体对之间联系的一组关系。关系140可以是双向或单向的。例如,关系“是朋友”是双向的而关系“是妻子”是单向的。关系140可以包括描述不同实体对之间联系的关系。例如,关系140可以包括描述“哈里”和“英国”之间联 系的关系“是王子”或“是皇室成员”。又例如,关系140可以包括描述“哈里”和“梅根”之间联系的关系“是配偶”或“是丈夫”。A set of relationships describing the connection between pairs of entities in document 120 . Relationship 140 may be bidirectional or unidirectional. For example, the relationship "is a friend" is two-way and the relationship "is a wife" is one-way. Relationships 140 may include relationships that describe associations between different pairs of entities. For example, relationship 140 may include the relationship "is a prince" or "is a member of the royal family" describing the connection between "Harry" and "Britain". As another example, relationship 140 may include the relationship "is spouse" or "is husband" describing the connection between "Harry" and "Meghan".
以上参考图1描述了文档120和关系140的各种示例,应当理解,图1中所示的文档120和关系140仅是示意性地,不旨在构成对本公开的限制。Various examples of documents 120 and relationships 140 are described above with reference to FIG. 1 , and it should be understood that documents 120 and relationships 140 shown in FIG. 1 are merely illustrative and not intended to limit the present disclosure.
以下将结合图2来详细描述计算设备110利用关系抽取模型130从文档120抽取关系140的过程。图2示出了根据本公开的一些实施例的关系抽取的示例过程200的示意图。The process of extracting the relationship 140 from the document 120 by the computing device 110 using the relationship extraction model 130 will be described in detail below with reference to FIG. 2 . FIG. 2 shows a schematic diagram of an example process 200 of relation extraction according to some embodiments of the present disclosure.
如图2所示,关系抽取模型130可以接收文档120以及由目标实体对(记为(h,t),h,t∈ε)和关联的目标关系(记为r(h,t))组成的给定三元组(记为q=(h,r,t))210。目标实体对可以是文档120中用户感兴趣的实体对。目标实体对也可以是文档120中的任意实体对。As shown in FIG. 2 , the relation extraction model 130 can receive a document 120 and consist of a target entity pair (denoted as (h, t), h, t∈ε) and an associated target relation (denoted as r(h, t)) A given triple (denoted as q=(h, r, t)) 210 of . The target entity pair may be an entity pair in the document 120 that the user is interested in. The target entity pair may also be any entity pair in the document 120 .
与目标实体对关联的目标关系r可以是用于描述文档120中的实体对之间联系的任何合适的一组关系(记为
Figure PCTCN2022116286-appb-000001
)中的任意关系。目标关系也可以是一组关系中与目标实体组最匹配的关系。目标关系也可以是一组关系中用户选定的关系。一组关系
Figure PCTCN2022116286-appb-000002
是由用户基于经验确定的一组关系。
The target relation r associated with the target entity pair may be any suitable set of relations (denoted as
Figure PCTCN2022116286-appb-000001
) in any relationship. The target relationship can also be the one of a set of relationships that best matches the target entity set. The target relationship may also be a user-selected relationship from a set of relationships. a set of relationships
Figure PCTCN2022116286-appb-000002
is a set of relationships determined empirically by the user.
关系抽取模型130可以利用规则生成器220和关系抽取器230来确定在文档120中目标关系对于目标实体对是否有效。关系抽取模型130可以输出针对给定三元组210的得分来指示在文档120中目标关系对于目标实体对是否有效。 Relation extraction model 130 may utilize rule generator 220 and relation extractor 230 to determine whether a target relation is valid for a target entity pair in document 120 . The relation extraction model 130 may output a score for a given triple 210 to indicate whether the target relation is valid for the target entity pair in the document 120 .
例如,关系抽取模型130可以确定在文档120中目标关系“是皇室成员”针对目标实体对(“凯特”,“英国”)是有效的,并输出表示目标关系针对目标实体对有效的经确认的三元组240。在另一示例中(未示出),关系抽取模型130可以确定在文档120中目标关系“是妻子”针对目标实体对(“凯特”,“哈里”)是无效的。For example, the relation extraction model 130 may determine that the target relation "is a member of the royal family" is valid for the target entity pair ("Kate", "UK") in the document 120, and output a confirmed Triple 240. In another example (not shown), the relation extraction model 130 may determine that the target relation "is wife" is not valid for the target entity pair ("Kate", "Harry") in the document 120 .
具体地,规则生成器220可以基于目标实体对和目标关系来确定描述目标实体对之间联系的逻辑的一组规则。规则生成器220可以是基于目标实体对和目标关系确定描述目标实体对之间联系的逻辑的一组规则的任何合适的模型。规则生成器220可以是任何合适的序列生成模型。在一些实现中,规则生成器220可以是自回归模型,例如基于Transformer模型的自回归模型。在一个示例中,规则生成器220可以是具有2层编码器和2层解码器的Transformer模型。Specifically, the rule generator 220 may determine a set of rules describing the logic of the relationship between the target entity pair based on the target entity pair and the target relationship. The rule generator 220 may be any suitable model that determines a set of rules that describe the logic of relationships between target entity pairs based on target entity pairs and target relationships. Rule generator 220 may be any suitable sequence generation model. In some implementations, rule generator 220 may be an autoregressive model, such as an autoregressive model based on a Transformer model. In one example, the rule generator 220 may be a Transformer model with a 2-layer encoder and a 2-layer decoder.
在一些实现中,规则生成器220可以基于目标实体对和目标关系r生成关系序列(可以记为[r 1,...,r l],其中
Figure PCTCN2022116286-appb-000003
)。基于所生成的关系序列,规则生成器220可以确定对应的规则(记为rule)。规则可以采取r←r 1∧…∧r l的形式。例如,参考图2的示例,规则的一个示例可以是关系“是皇室成员”←关系“是配偶”∧关系“是兄弟姐妹”∧关系“是皇室成员”。
In some implementations, the rule generator 220 can generate a sequence of relations based on the target entity pair and the target relation r (which can be denoted as [r 1 ,...,r l ], where
Figure PCTCN2022116286-appb-000003
). Based on the generated relationship sequence, the rule generator 220 can determine a corresponding rule (denoted as rule). A rule may take the form r←r 1 ∧...∧r l . For example, referring to the example of FIG. 2 , an example of a rule may be the relation "is royal" ← relation "is spouse" ∧ relation "is sibling" ∧ relation "is royal".
规则可以用多个关系的序列来表示。例如,规则可以被表示为[r,r1,…,rl]。备选地,规则可以被表示为[r1,…,rl,r]。本公开的范围对于规则的具体表示方法不作限制。A rule can be represented by a sequence of multiple relations. For example, a rule can be expressed as [r,r1,...,rl]. Alternatively, the rules can be represented as [r1,...,rl,r]. The scope of the present disclosure is not limited to the specific expression method of the rules.
可以利用规则生成器220进行多次采样,从而确定描述目标实体对之间联系的逻辑的一组规则(记为z)。规则生成器220在生成关系序列[r 1,...,r l]中的每个关系时可以给出针对该关系的多个候选(例如,一组关系
Figure PCTCN2022116286-appb-000004
)的概率分布。因此,通过对关系的多个候选进行采样,可以利用规则生成器220生成一组规则z。
The rule generator 220 can be used to perform multiple samplings to determine a set of rules (denoted as z) describing the logic of the relationship between the target entity pair. When generating each relation in the relation sequence [r 1 , .
Figure PCTCN2022116286-appb-000004
) probability distribution. Thus, a set of rules z can be generated using rule generator 220 by sampling a plurality of candidates for a relation.
基于所生成的一组规则,关系抽取器230可以确定在目标实体对之间满足规则的至少一个路径,并且基于所确定的路径来确定指示目标关系对于目标实体对是否有效的得分。关系抽取器230可以是实现上述功能的任何合适的模型。在一些实现中,关系抽取器230可以是常规的关系抽取模型的改进版本。例如,关系抽取器230可以将用于关系抽取的基于序列的模型或基于图的模型作为骨干模型,并添加附加单元来实现根据本公开的一些实施例 的功能。Based on the generated set of rules, relationship extractor 230 may determine at least one path between the target entity pair that satisfies the rule, and based on the determined path, determine a score indicating whether the target relationship is valid for the target entity pair. The relation extractor 230 may be any suitable model that realizes the functions described above. In some implementations, relation extractor 230 may be a modified version of a conventional relation extraction model. For example, the relation extractor 230 may use a sequence-based model or a graph-based model for relation extraction as a backbone model, and add additional units to implement functions according to some embodiments of the present disclosure.
在一些实现中,关系抽取器230可以用于确定在目标实体对之间满足规则的至少一个路径。换句话说,针对所生成的一组规则中的每个规则,附加单元可以确定在目标实体对之间满足该规则的一个或多个对应的路径。对应的路径开始于目标实体对中的起始实体(例如,h)并且结束于目标实体对中的末尾实体(例如,t),并且路径途经的实体对之间联系的逻辑满足该规则。In some implementations, relationship extractor 230 may be used to determine at least one path between target entity pairs that satisfies a rule. In other words, for each rule in the generated set of rules, the additional unit may determine one or more corresponding paths between target entity pairs that satisfy the rule. The corresponding path starts from the start entity (eg, h) of the target entity pair and ends at the end entity (eg, t) of the target entity pair, and the logic of the relationship between the entity pairs passed by the path satisfies this rule.
例如,参考图2的示例,针对规则:“是皇室成员”←“是配偶”∧“是兄弟姐妹”∧“是皇室成员”,可以确定目标实体对(“凯特”,“英国”)之间的一条路径:“凯特”“是皇室成员”“英国”←“凯特”“是配偶”“威廉”“是兄弟姐妹”“哈里”“是皇室成员”“英国”。For example, with reference to the example of Figure 2, for the rule: "is a member of the royal family" ← "is a spouse" ∧ "is a sibling" ∧ "is a member of the royal family", it is possible to determine the relationship between the target entity pair ("Kate", "UK") One path for: "Kate" "is a royal" "UK" ← "Kate" "is a spouse" "William" "is a sibling" "Harry" "is a royal" "UK".
在一些实现中,关系抽取器230可以基于针对每个规则所确定的路径来确定指示目标关系对于目标实体对是否有效的得分。可以基于路径途经的实体对以及关联的关系来确定该得分。确定得分的细节将参考图3来详细描述。In some implementations, relationship extractor 230 may determine, based on the paths determined for each rule, a score that indicates whether the target relationship is valid for the target entity pair. The score may be determined based on the entity pairs traversed by the path and the associated relationships. Details of determining the score will be described in detail with reference to FIG. 3 .
利用规则生成器220和关系抽取器230,可以生成针对文档120和给定三元组210的一组规则,并且基于规则来确定指示目标关系对于目标实体对是否有效的得分。以此方式,可以利用实体和关系之间的交互来显式地描述关系的长程依赖性,从而提高关系抽取的精度和可解释性。Using rule generator 220 and relation extractor 230, a set of rules may be generated for a document 120 and a given triple 210, and based on the rules determine a score indicating whether a target relationship is valid for a target entity pair. In this way, the interaction between entities and relations can be exploited to explicitly describe the long-range dependencies of relations, thus improving the accuracy and interpretability of relation extraction.
以上参考图2描述了关系抽取的示例过程200。应当理解,图2中所示的过程仅是示意性的,不旨在构成对本公开的范围的限制。例如,关系抽取模型130还可以包括诸如前处理、后处理单元之类的其他单元来实现根据本公开的一些实施例。又例如,关系抽取模型130可以接收多个给定三元组210并分别确定每个给定三元组210中的目标关系对于目标实体对是否有效。An example process 200 of relation extraction is described above with reference to FIG. 2 . It should be understood that the process shown in FIG. 2 is only illustrative and not intended to limit the scope of the present disclosure. For example, the relationship extraction model 130 may also include other units such as pre-processing and post-processing units to implement some embodiments of the present disclosure. For another example, the relation extraction model 130 may receive multiple given triples 210 and separately determine whether the target relation in each given triple 210 is valid for the target entity pair.
以下将结合图3至图4来详细描述关系抽取模型130的参数化和训练过程。根据本公开的方案,关系抽取模型130可以用概率 模型来表示,并且一组规则z可以作为概率模型中的隐变量。The parameterization and training process of the relation extraction model 130 will be described in detail below with reference to FIGS. 3 to 4 . According to the solution of the present disclosure, the relationship extraction model 130 can be represented by a probability model, and a set of rules z can be used as hidden variables in the probability model.
关系抽取的任务可以定义为给定文档D和给定三元组q,确定得分y的概率分布
Figure PCTCN2022116286-appb-000005
其中y可以是二元随机变量,并且y的值指示给定三元组是否成立。例如,可以设置y∈{-1,1},y=1指示给定三元组q中的目标关系r对于目标实体对(h,t)有效,并且y=-1指示给定三元组q中的目标关系r对于目标实体对(h,t)无效。应理解,关系抽取的任务也可以定义为其他形式,例如y可以为三元随机变量,y的值可以指示目标关系与目标实体对正相关、目标关系的反向关系与目标实体对正相关、目标关系与目标实体对不相关等。
The task of relation extraction can be defined as given a document D and given a triplet q, determine the probability distribution of the score y
Figure PCTCN2022116286-appb-000005
where y can be a binary random variable, and the value of y indicates whether a given triple holds or not. For example, y ∈ {-1, 1} can be set, y = 1 indicating that the target relation r in the given triple q is valid for the target entity pair (h, t), and y = -1 indicating that the given triple The target relation r in q is invalid for the target entity pair (h,t). It should be understood that the task of relationship extraction can also be defined in other forms, for example, y can be a ternary random variable, and the value of y can indicate that the target relationship is positively correlated with the target entity pair, the reverse relationship of the target relationship is positively correlated with the target entity pair, The target relationship is not related to the target entity pair, etc.
得分y的概率分布
Figure PCTCN2022116286-appb-000006
被定义为:
Probability distribution of score y
Figure PCTCN2022116286-appb-000006
is defined as:
Figure PCTCN2022116286-appb-000007
Figure PCTCN2022116286-appb-000007
其中p θ(z|q)表示由规则生成器220确定的在给定三元组(以及文档)的条件下一组规则的概率分布,θ表示规则生成器220的可学习参数,p w(y|q,z)表示由关系抽取器230确定的在给定三元组和一组规则(以及文档)的条件下得分的概率分布,w表示关系抽取器230的可学习参数。简便起见,假设文档的分布和一组规则的分布独立,并且在下文中省略“在文档的条件下”的表述。 where p θ (z|q) represents the probability distribution of a set of rules determined by rule generator 220 under the condition of given triples (and documents), θ represents the learnable parameters of rule generator 220, p w ( y|q, z) denote the probability distribution of scores determined by the relation extractor 230 given triples and a set of rules (and documents), and w denote the learnable parameters of the relation extractor 230. For simplicity, it is assumed that the distribution of documents is independent of the distribution of a set of rules, and the expression "under the condition of documents" is omitted below.
图3示出了根据本公开的一些实施例的训练关系抽取模型130的示例方法300的流程图。该方法300例如可以在图1的计算设备110处实施。FIG. 3 shows a flowchart of an example method 300 of training the relation extraction model 130 according to some embodiments of the present disclosure. The method 300 may be implemented, for example, at the computing device 110 of FIG. 1 .
在框302,计算设备110基于目标关系和目标实体对,确定在给定三元组的条件下一组规则的概率分布p θ(z|q)。在一些实现中,可以假设p θ(z|q)~Multi(z|N,AutoReg θ(rule|q))。换句话说,可以由规则生成器220生成服从多元正态分布的一组规则z(包括N个rule),并且N个rule分别服从相应的概率分布AutoReg θ(rule|q)。AutoReg θ定义了在给定三元组q的条件下规则的概率分布。备选地,可以利用其他合适的方法来确定p θ(z|q)。例如,可以由规则 生成器220生成服从其他类型的独立同分布的一组规则z。 At block 302 , the computing device 110 determines a regular set of probability distributions p θ (z|q) conditioned on the given triples, based on the target relationship and the target entity pair. In some implementations, p θ (z|q)˜Multi(z|N, AutoReg θ (rule|q)) may be assumed. In other words, the rule generator 220 can generate a set of rules z (including N rules) that obey multivariate normal distribution, and the N rules obey the corresponding probability distribution AutoReg θ (rule|q) respectively. AutoReg θ defines a regular probability distribution conditioned on a given triplet q. Alternatively, p θ (z|q) may be determined using other suitable methods. For example, a set of rules z subject to other types of independent and identical distributions may be generated by the rule generator 220 .
在框304,计算设备110至少基于在给定三元组的条件下一组规则的概率分布p θ(z|q),确定在给定三元组的条件下得分的概率分布p w,θ(y|q)。 At block 304, the computing device 110 determines a probability distribution pw,θ of scores conditional on the given triplet based at least on a regular set of probability distributions (z|q) conditional on the given triplet (y|q).
在一些实现中,关系抽取器230可以基于所确定的一组规则,确定在目标实体对之间的至少一个路径。基于至少一个路径途经的实体对以及关联的关系,关系抽取器230可以确定在给定三元组和一组规则的条件下得分的概率分布p w(y|q,z)。基于p θ(z|q)以及p w(y|q,z),关系抽取器230可以确定在给定三元组的条件下得分的概率分布p w,θ(y|q)。 In some implementations, the relationship extractor 230 can determine at least one path between the target entity pair based on the determined set of rules. Based on the entity pairs traversed by at least one path and the associated relations, the relation extractor 230 may determine a probability distribution pw (y|q,z) of scores given triples and a set of rules. Based on p θ (z|q) and p w (y|q, z), the relation extractor 230 may determine a probability distribution p w, θ (y|q) of scores conditioned on a given triple.
在一些实现中,针对所确定的一组规则z中的每个规则rule,可以确定对应的路径。对应的路径被定义为开始于目标实体对中的起始实体h并且结束于目标实体对中的末尾实体t,并且路径途经的实体对之间联系的逻辑满足该规则rule。应理解,可以利用多种方法来确定满足上述定义的路径,本公开的范围在此方面不受限制。In some implementations, for each rule in the determined set of rules z, a corresponding path may be determined. The corresponding path is defined as starting from the start entity h in the target entity pair and ending with the end entity t in the target entity pair, and the logic of the connection between the entity pairs passed by the path satisfies the rule. It should be understood that a variety of methods can be used to determine the path satisfying the above definition, and the scope of the present disclosure is not limited in this respect.
在一些实现中,p w(y|q,z)可以根据下式来定义: In some implementations, p w (y|q, z) can be defined according to the following equation:
p w(y|q,z)=Sigmoid(y·score w(q,z))   (2) p w (y | q, z) = Sigmoid (y score w (q, z)) (2)
Figure PCTCN2022116286-appb-000008
Figure PCTCN2022116286-appb-000008
Figure PCTCN2022116286-appb-000009
Figure PCTCN2022116286-appb-000009
Figure PCTCN2022116286-appb-000010
Figure PCTCN2022116286-appb-000010
Figure PCTCN2022116286-appb-000011
Figure PCTCN2022116286-appb-000011
其中φ w(q)和φ w(q,rule)是可学习的标量参数,φ w(rule)表示遵循rule从目标实体对中的起始实体到末尾实体的路径的可达性。
Figure PCTCN2022116286-appb-000012
表示基于rule所确定的目标实体对之间的至少一个路径的 集合。φ w(e i-1,r i,e i)表示关系ri对于实体对(ei-1,ei)有效的置信度。φ w(e i-1,r i,e i)可以利用任何合适的关系抽取方法来获得。例如,可以利用关系抽取器230的骨干模型来获得φ w(e i-1,r i,e i)。
where φw (q) and φw (q,rule) are learnable scalar parameters, and φw (rule) represents the reachability of the path from the start entity to the end entity in the target entity pair following the rule.
Figure PCTCN2022116286-appb-000012
Represents a collection of at least one path between target entity pairs determined based on rules. φ w (e i-1 , r i , e i ) represents the confidence that the relation ri is valid for the entity pair (ei-1,ei). φ w (e i-1 , r i , e i ) can be obtained using any suitable relation extraction method. For example, φ w (e i−1 , ri , e i ) can be obtained by using the backbone model of the relation extractor 230 .
应理解,上述公式(2)-(6)仅是示例性的,可以利用其他合适的方法来定义p w(y|q,z)。例如,可以采用其他模糊逻辑函数来将score w(q,z)转化为p w(y|q,z)。 It should be understood that the above formulas (2)-(6) are only exemplary, and other suitable methods can be used to define p w (y|q, z). For example, other fuzzy logic functions can be used to transform score w (q, z) into p w (y|q, z).
另外,应注意的是,在推理阶段可以利用公式(3)来计算针对给定三元组210的预测得分。在一些实现中,预测得分score w(q,z)是0左右的连续值,值越大表示给定三元组成立的可能性越大,也即目标关系对于目标实体对有效的可能性越大。 Additionally, it should be noted that the prediction score for a given triplet 210 can be calculated using equation (3) during the inference phase. In some implementations, the prediction score score w (q, z) is a continuous value around 0, and the larger the value, the greater the possibility of the establishment of a given triple, that is, the more likely the target relationship is valid for the target entity pair big.
在框306,计算设备110基于与得分y对应的标记值y*,通过使在给定三元组的条件下得分的概率分布p w,θ(y|q)的参数的似然函数最大化,来获得经训练的关系抽取模型130。标记值y*是指人工标注的、用于指示给定三元组是否成立的真实值,也即,指示目标关系对于目标实体对是否有效的真实值。在一些实现中,通过使该概率分布p w,θ(y|q)的参数的似然函数
Figure PCTCN2022116286-appb-000013
最大化,可以估计参数w和θ,从而获得经训练的关系抽取模型130。
At block 306, the computing device 110, based on the label value y* corresponding to the score y, maximizes the likelihood function of the parameters of the probability distribution pw, θ (y|q) of the score conditioned on the triplet , to obtain the trained relation extraction model 130. The labeled value y* refers to the human-annotated ground truth value indicating whether a given triple holds, that is, the ground truth value indicating whether the target relation is valid for the target entity pair. In some implementations, the likelihood function of the parameters of θ(y|q) by making the probability distribution pw,
Figure PCTCN2022116286-appb-000013
Maximizing, the parameters w and θ can be estimated to obtain a trained relation extraction model 130 .
在一些实现中,可以通过迭代地更新参数w和θ以及隐变量z来使似然函数
Figure PCTCN2022116286-appb-000014
最大化。可以基于参数w和θ的当前值,确定隐变量z的后验概率分布。然后,基于隐变量z的后验概率分布,可以通过使似然函数最大化来确定参数w和θ的更新值。以此方式进行迭代直至收敛,可以估计参数w和θ以及隐变量z。
In some implementations, the likelihood function can be made by iteratively updating the parameters w and θ and the latent variable z
Figure PCTCN2022116286-appb-000014
maximize. Based on the current values of the parameters w and θ, the posterior probability distribution of the latent variable z can be determined. Then, based on the posterior probability distribution of the latent variable z, the updated values of the parameters w and θ can be determined by maximizing the likelihood function. Iterating in this way until convergence, the parameters w and θ and the latent variable z can be estimated.
例如,可以使用期望最大化(EM)算法来迭代地更新参数w和θ以及隐变量z。在期望(E)步骤,可以基于参数w和θ的当前值,确定隐变量z的期望,也即隐变量z的后验概率分布。在最大化(M)步骤,可以通过使似然函数最大化来确定参数w和θ 的更新值。备选地或附加地,可以采用近似后验的方法来确定参数w和θ以及隐变量z。For example, the parameters w and θ and the latent variable z can be iteratively updated using an Expectation-Maximization (EM) algorithm. In the expectation (E) step, the expectation of the hidden variable z can be determined based on the current values of the parameters w and θ, that is, the posterior probability distribution of the hidden variable z. In the maximization (M) step, updated values of the parameters w and θ may be determined by maximizing the likelihood function. Alternatively or additionally, an approximate posterior method can be used to determine the parameters w and θ and the latent variable z.
在一些实现中,可以确定隐变量z的近似后验概率分布来代替隐变量z的准确后验概率分布,从而简化参数w和θ以及隐变量z的优化过程。在一些实现中,可以通过使似然函数的下限最大化来确定参数w和θ,从而进一步简化参数w和θ以及隐变量z的优化过程。In some implementations, an approximate posterior probability distribution of the hidden variable z may be determined instead of an exact posterior probability distribution of the hidden variable z, thereby simplifying the optimization process of the parameters w and θ and the hidden variable z. In some implementations, the parameters w and θ can be determined by maximizing the lower bound of the likelihood function, thereby further simplifying the optimization process of the parameters w and θ and the latent variable z.
在一些示例中,如下式(7)所示,可以利用隐变量z的近似后验概率分布q(z)来代替隐变量z的准确后验概率分布p(z|y,q),并且可以通过使下限
Figure PCTCN2022116286-appb-000015
最大化来使似然函数
Figure PCTCN2022116286-appb-000016
最大化。
In some examples, as shown in the following formula (7), the approximate posterior probability distribution q(z) of the hidden variable z can be used to replace the exact posterior probability distribution p(z|y,q) of the hidden variable z, and can By making the lower bound
Figure PCTCN2022116286-appb-000015
maximize the likelihood function
Figure PCTCN2022116286-appb-000016
maximize.
Figure PCTCN2022116286-appb-000017
Figure PCTCN2022116286-appb-000017
在一些实现中,可以确定合适的隐变量z的近似后验概率分布q(z),以使得满足KL(q(z)||p w,θ(z|q,y))≥0。可以通过对后验概率分布做泰勒展开或变分近似等方法来确定近似后验概率分布。 In some implementations, a suitable approximate posterior probability distribution q(z) for the latent variable z may be determined such that KL(q(z)||p w, θ (z|q, y))≧0 is satisfied. The approximate posterior probability distribution can be determined by performing Taylor expansion or variational approximation on the posterior probability distribution.
在一些实现中,可以基于在给定三元组的条件下一组规则的概率分布(即,规则的先验概率分布)、所确定的至少一个路径途经的实体对以及关联的关系、以及标记值,来确定针对一组规则中的每个规则的得分函数。得分函数可以估计每个规则的质量。例如,可以参考下式(8)来确定每个规则的得分函数H(rule)。In some implementations, the probability distribution of a set of rules (i.e., the prior probability distribution of rules), the determined entity pair and associated relationship of at least one path traversed by a given triplet, and the label value to determine the scoring function for each rule in a set of rules. A scoring function estimates the quality of each rule. For example, the scoring function H(rule) of each rule can be determined with reference to the following formula (8).
Figure PCTCN2022116286-appb-000018
Figure PCTCN2022116286-appb-000018
基于针对每个规则的得分函数,可以确定相应规则的后验概率分布。例如,可以参考下式(9)来确定相应规则的后验概率分布
Figure PCTCN2022116286-appb-000019
Based on the scoring function for each rule, a posterior probability distribution for the corresponding rule can be determined. For example, the following formula (9) can be referred to to determine the posterior probability distribution of the corresponding rule
Figure PCTCN2022116286-appb-000019
Figure PCTCN2022116286-appb-000020
Figure PCTCN2022116286-appb-000020
基于每个规则的后验概率分布和一组规则中规则的数目,可以确定一组规则的近似后验概率分布q(z)。例如,q(z)可以服从
Figure PCTCN2022116286-appb-000021
Based on the posterior probability distribution for each rule and the number of rules in the set, an approximate posterior probability distribution q(z) for the set of rules can be determined. For example, q(z) can obey
Figure PCTCN2022116286-appb-000021
应理解,上述公式(8)-(9)仅是示例性的,可以采用其他合适的方法来确定隐变量z的近似后验概率分布q(z)。It should be understood that the above formulas (8)-(9) are only exemplary, and other suitable methods can be used to determine the approximate posterior probability distribution q(z) of the hidden variable z.
在一些实现中,通过使
Figure PCTCN2022116286-appb-000022
最大化,可以使下限
Figure PCTCN2022116286-appb-000023
最大化。其中
Figure PCTCN2022116286-appb-000024
分别针对规则生成器220和关系抽取器230。在一些实现中,还可以将
Figure PCTCN2022116286-appb-000025
等价地转化为
Figure PCTCN2022116286-appb-000026
在已经确定了q(z)的情况下,可以采用常规的参数估计方法来确定参数w和θ的更新值。例如,可以采用梯度下降的方法来确定参数w和θ的更新值。
In some implementations, by using
Figure PCTCN2022116286-appb-000022
Maximize, you can make the lower bound
Figure PCTCN2022116286-appb-000023
maximize. in
Figure PCTCN2022116286-appb-000024
for the rule generator 220 and the relation extractor 230 respectively. In some implementations, it is also possible to
Figure PCTCN2022116286-appb-000025
equivalently converts to
Figure PCTCN2022116286-appb-000026
In the case that q(z) has been determined, conventional parameter estimation methods can be used to determine updated values of parameters w and θ. For example, the method of gradient descent can be used to determine the updated values of parameters w and θ.
下文将参考图4详细描述迭代地更新参数w和θ以及隐变量z的过程。图4示出了根据本公开一些实施例的优化过程的示例方法400的流程图。方法400可以在图1所示的计算设备110处实施。应理解,图4所示的优化过程仅是示例性的,本公开的范围在此方面不受限制。The process of iteratively updating the parameters w and θ and the latent variable z will be described in detail below with reference to FIG. 4 . FIG. 4 shows a flowchart of an example method 400 of an optimization process according to some embodiments of the present disclosure. Method 400 may be implemented at computing device 110 shown in FIG. 1 . It should be understood that the optimization process shown in FIG. 4 is exemplary only, and the scope of the present disclosure is not limited in this respect.
如图4所示,在框402,计算设备110可以利用规则生成器220生成一组规则。可以由规则生成器220基于初始参数θ或经更新的当前参数θ来生成满足p θ(z|q)~Multi(z|N,AutoReg θ(rule|q))的一组规则。 As shown in FIG. 4 , at block 402 , computing device 110 may utilize rule generator 220 to generate a set of rules. A set of rules satisfying p θ (z|q)˜Multi(z|N, AutoReg θ (rule|q)) can be generated by the rule generator 220 based on the initial parameter θ or the updated current parameter θ.
在框404,计算设备110可以针对一组规则中的每个规则计算得分函数,从而确定每个规则的后验概率分布
Figure PCTCN2022116286-appb-000027
可以由关系抽取器230基于在给定三元组的条件下一组规则的概率分布(即,规则的先验概率分布)、所确定的至少一个路径途经的实 体对以及关联的关系、以及标记值,来确定针对一组规则中的每个规则的得分函数H(rule)。基于每个规则的得分函数H(rule),可以由关系抽取器230确定相应规则的后验概率分布
Figure PCTCN2022116286-appb-000028
At block 404, computing device 110 may compute a score function for each rule in a set of rules, thereby determining a posterior probability distribution for each rule
Figure PCTCN2022116286-appb-000027
The probability distribution of a set of rules (i.e., the prior probability distribution of the rules) under the condition of a given triplet, the determined entity pair and associated relationship of at least one path passed by the relationship extractor 230, and the label value to determine the scoring function H(rule) for each rule in a set of rules. Based on the score function H(rule) of each rule, the posterior probability distribution of the corresponding rule can be determined by the relation extractor 230
Figure PCTCN2022116286-appb-000028
在框406,计算设备110可以基于从每个规则的后验概率分布
Figure PCTCN2022116286-appb-000029
采样得到的第一组更新规则来更新相应的AutoReg θ(rule|q)。在一些实现中,计算设备110可以通过使
Figure PCTCN2022116286-appb-000030
最大化来确定参数θ的更新值,从而更新AutoReg θ(rule|q),也即更新在给定三元组的条件下一组规则的概率分布p θ(z|q)。
At block 406, computing device 110 may base the posterior probability distribution from each rule on
Figure PCTCN2022116286-appb-000029
The first set of update rules sampled to update the corresponding AutoReg θ (rule|q). In some implementations, computing device 110 may use
Figure PCTCN2022116286-appb-000030
Maximize to determine the update value of the parameter θ, thereby updating AutoReg θ (rule|q), that is, to update a set of regular probability distribution p θ (z|q) under the condition of a given triple.
在框408,计算设备110可以基于从经更新的在给定三元组的条件下一组规则的概率分布p θ(z|q)采样得到的第二组更新规则来更新在给定三元组和一组规则的条件下得分的概率分布p w(y|q,z)。在一些实现中,可以由规则生成器220基于经更新的当前参数θ生成满足p θ(z|q)~Multi(z|N,AutoReg θ(rule|q))的第二组更新规则。基于第二组更新规则,计算设备110可以通过使
Figure PCTCN2022116286-appb-000031
最大化来确定参数w的更新值,从而更新在给定三元组和一组规则的条件下得分的概率分布p w(y|q,z)。
At block 408, computing device 110 may update the probability distribution p θ (z|q) for the given triplet based on a second set of update rules sampled from the updated probability distribution p θ (z|q) of the set of rules conditional on the given triplet. Probability distribution p w (y|q, z) of scores conditioned on groups and a set of rules. In some implementations, the rule generator 220 may generate a second set of update rules satisfying p θ (z|q)˜Multi(z|N, AutoReg θ (rule|q)) based on the updated current parameter θ. Based on the second set of update rules, computing device 110 may use
Figure PCTCN2022116286-appb-000031
Maximize to determine the updated value of the parameter w, thereby updating the probability distribution pw (y|q, z) of the score given the triplet and a set of rules.
以上参考图1至图4描述了根据本公开的一些实施例的关系抽取方法以及关系抽取模型130的构建和训练过程。The above describes the relationship extraction method and the construction and training process of the relationship extraction model 130 according to some embodiments of the present disclosure with reference to FIGS. 1 to 4 .
以此方式,通过利用规则进行逻辑推理,可以容易地捕获关系的长程依赖性并且提供较好的可解释性。此外,通过迭代优化概率模型的参数和隐变量,可以在优化模型参数的同时自动学习作为隐变量的规则,从而能够基于针对文档所生成的规则来抽取该文档中的关系,以获得更好的关系抽取性能。再者,可以容易地对常规的关系抽取模型进行修改来实现根据本公开的实施例的一些功能,因而本方案具有较高的可移植性。In this way, long-range dependencies of relations can be easily captured and better interpretability provided by utilizing rules for logical reasoning. In addition, by iteratively optimizing the parameters and hidden variables of the probability model, the rules as hidden variables can be automatically learned while optimizing the model parameters, so that the relationship in the document can be extracted based on the rules generated for the document to obtain better relation extraction performance. Furthermore, the conventional relation extraction model can be easily modified to implement some functions according to the embodiments of the present disclosure, so this solution has high portability.
应理解,根据本公开的一些实施例的关系抽取模型130还可以利用其他合适的方式来进行训练。It should be understood that the relation extraction model 130 according to some embodiments of the present disclosure may also be trained in other suitable ways.
本公开的实施例还提供了用于实现上述方法或过程的相应装置。图5示出了根据本公开的一些实施例的用于关系抽取的装置500的示意性结构框图。Embodiments of the present disclosure also provide corresponding devices for implementing the above method or process. Fig. 5 shows a schematic structural block diagram of an apparatus 500 for relation extraction according to some embodiments of the present disclosure.
如图5所示,装置500可以包括规则生成模块510,被配置为基于由文档中的目标实体对和与所述目标实体对关联的目标关系组成的给定三元组,生成用于描述所述目标实体对之间联系的逻辑的一组规则,所述目标关系选自用于描述所述文档中的实体对之间联系的一组关系。此外,装置500还包括路径确定模块520,被配置为基于所述一组规则,确定在所述目标实体对之间的至少一个路径。装置500还包括得分确定模块530,被配置为基于所述至少一个路径途经的实体对以及关联的关系,确定指示在所述文档中所述目标关系对于所述目标实体对是否有效的得分。As shown in FIG. 5 , the apparatus 500 may include a rule generation module 510 configured to, based on a given triplet consisting of a target entity pair in a document and a target relationship associated with the target entity pair, generate A set of rules describing the logic of the relationship between target entity pairs, the target relationship being selected from a set of relationships used to describe the relationship between entity pairs in the document. In addition, the apparatus 500 further includes a path determination module 520 configured to determine at least one path between the pair of target entities based on the set of rules. The apparatus 500 further includes a score determination module 530 configured to determine a score indicating whether the target relationship is valid for the target entity pair in the document based on the entity pair and the associated relationship passed by the at least one path.
在一些实施例中,路径确定模块520还包括路径探索模块,路径探索模块被配置为针对所述一组规则中的每个规则,确定对应的路径,所述路径开始于所述目标实体对中的起始实体并且结束于所述目标实体对中的末尾实体,并且所述路径途经的实体对之间联系的逻辑满足所述规则。In some embodiments, the path determination module 520 further includes a path exploration module configured to, for each rule in the set of rules, determine a corresponding path, the path starting from the target entity pair and ends at the end entity in the target entity pair, and the logic of the relationship between the entity pairs passed by the path satisfies the rule.
本公开的实施例还提供了用于训练关系抽取模型的装置。装置可以包括规则概率确定模块,其被配置为基于由文档中的目标实体对和与所述目标实体对关联的目标关系组成的给定三元组,确定在给定三元组的条件下一组规则的概率分布,所述目标关系选自用于描述所述文档中的实体对之间联系的一组关系,所述一组规则用于描述所述目标实体对之间联系的逻辑。装置还包括:得分概率确定模块,被配置为基于所述在给定三元组的条件下一组规则的概率分布,确定在给定三元组的条件下得分的概率分布,所述得分指示在所述文档中所述目标关系对于所述目标实体对是否有效。装置还包括:优化模块,被配置为基于与所述得分对应的标记值,通过使所述在给定三元组的条件下得分的概率分布的参数的似然函数最大化,获得经训练的所述关系抽取模型。Embodiments of the present disclosure also provide an apparatus for training a relation extraction model. The apparatus may include a rule probability determination module configured to determine, based on a given triplet consisting of a target entity pair in the document and a target relationship associated with the target entity pair, under the condition of the given triplet A probability distribution of a set of rules, the target relationship is selected from a set of relationships used to describe the relationship between the entity pairs in the document, and the set of rules is used to describe the logic of the relationship between the target entity pairs. The apparatus also includes a score probability determination module configured to determine a probability distribution of scores conditioned on a given triplet based on the probability distribution conditioned on the set of rules given the triplet, the score indicating Whether the target relationship is valid for the target entity pair in the document. The apparatus also includes an optimization module configured to obtain a trained The relation extraction model.
在一些实施例中,得分概率确定模块包括:路径寻找模块,其被配置为基于所述一组规则,确定在所述目标实体对之间的至少一个路径。得分概率确定模块还包括:第一概率确定模块,其被配置为基于所述至少一个路径途经的实体对以及关联的关系,确定所述在给定三元组和一组规则的条件下得分的概率分布。得分概率确定模块还包括第二概率确定模块,其被配置为基于所述在给定三元组的条件下一组规则的概率分布以及所述在给定三元组和一组规则的条件下得分的概率分布,确定所述在给定三元组的条件下得分的概率分布。In some embodiments, the score probability determination module includes a path finding module configured to determine at least one path between the pair of target entities based on the set of rules. The scoring probability determination module further includes: a first probability determination module configured to determine the score given a triplet and a set of rules based on the entity pairs passed by the at least one path and the associated relationship. Probability distributions. The score probability determination module also includes a second probability determination module configured to be based on the probability distribution of the set of rules conditioned on the given triples and the probability distribution conditioned on the given triples and the set of rules Probability Distribution of Scores, Determines the probability distribution of scores conditioned on the triples.
在一些实施例中,优化模块包括后验概率确定模块,其被配置为基于所述参数的当前值,确定所述一组规则的后验概率分布。优化模块还包括似然函数最大化模块,其被配置为基于所述一组规则的后验概率分布,通过使所述似然函数最大化来确定所述参数的更新值。In some embodiments, the optimization module includes a posterior probability determination module configured to determine a posterior probability distribution for the set of rules based on current values of the parameters. The optimization module also includes a likelihood function maximization module configured to determine an updated value for the parameter by maximizing the likelihood function based on the posterior probability distribution of the set of rules.
在一些实施例中,后验概率确定模块包括得分函数确定模块,其被配置为基于所述在给定三元组的条件下一组规则的概率分布、所述至少一个路径途经的实体对以及所述关联的关系、以及所述标记值,确定针对所述一组规则中的每个规则的得分函数。后验概率确定模块还包括第一后验概率确定模块,其被配置为基于针对每个规则的得分函数,确定每个规则的后验概率分布。后验概率确定模块还包括第二后验概率确定模块,其被配置为基于所述每个规则的后验概率分布和所述一组规则中规则的数目,确定所述一组规则的近似后验概率分布,以作为所述一组规则的后验概率分布。In some embodiments, the posterior probability determination module includes a score function determination module configured to be based on said probability distribution of a set of rules conditioned on a given triplet, pairs of entities traversed by said at least one path, and The associated relationship, and the flag value, determine a scoring function for each rule in the set of rules. The posterior probability determination module also includes a first posterior probability determination module configured to determine a posterior probability distribution for each rule based on the scoring function for each rule. The posterior probability determination module also includes a second posterior probability determination module configured to determine an approximate posterior probability of the set of rules based on the posterior probability distribution of each rule and the number of rules in the set of rules. The posterior probability distribution is used as the posterior probability distribution of the set of rules.
在一些实施例中,似然函数最大化模块包括下限最大化模块,其被配置为使所述似然函数的下限最大化,所述似然函数的下限与所述一组规则的近似后验概率分布关联。In some embodiments, the likelihood function maximization module includes a lower bound maximization module configured to maximize the lower bound of the likelihood function, the lower bound of the likelihood function being an approximate posterior of the set of rules Probability distribution association.
在一些实施例中,下限最大化模块包括第一采样模块,其被配置为基于所述一组规则的近似后验概率分布,采样第一组更新规 则。下限最大化模块还包括第一更新模块,其被配置为基于所述第一组更新规则,更新所述在给定三元组的条件下一组规则的概率分布。下限最大化模块还包括第二采样模块,其被配置为基于经更新的在给定三元组的条件下一组规则的概率分布,采样第二组更新规则。下限最大化模块还包括第二更新模块,其被配置为基于所述第二组更新规则,更新所述在给定三元组和一组规则的条件下得分的概率分布。In some embodiments, the floor maximization module includes a first sampling module configured to sample the first set of updated rules based on an approximate posterior probability distribution of the set of rules. The floor maximization module also includes a first update module configured to update the probability distribution of the set of rules conditioned on the given triples based on the first set of update rules. The floor maximization module also includes a second sampling module configured to sample a second updated set of rules based on the updated probability distribution of the set of rules conditioned on the given triples. The floor maximization module also includes a second update module configured to update the probability distribution of scores conditioned on the triplet and the set of rules based on the second set of update rules.
在一些实施例中,所述一组规则中的每个规则由所述一组关系中的多个关系的序列表示。In some embodiments, each rule in the set of rules is represented by a sequence of a plurality of relationships in the set of relationships.
在一些实施例中,优化模块包括期望最大化模块,其被配置为利用期望最大化算法来对所述参数进行最大似然估计。In some embodiments, the optimization module includes an expectation maximization module configured to perform maximum likelihood estimation of the parameters using an expectation maximization algorithm.
用于关系抽取的装置500以及用于训练关系模型的装置中所包括的单元或模块可以利用各种方式来实现,包括软件、硬件、固件或其任意组合。以装置500为例,在一些实施例中,一个或多个单元可以使用软件和/或固件来实现,例如存储在存储介质上的机器可执行指令。除了机器可执行指令之外或者作为替代,装置500中的部分或者全部单元可以至少部分地由一个或多个硬件逻辑组件来实现。作为示例而非限制,可以使用的示范类型的硬件逻辑组件包括现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准品(ASSP)、片上系统(SOC)、复杂可编程逻辑器件(CPLD),等等。Units or modules included in the apparatus 500 for relation extraction and the apparatus for training a relation model may be implemented in various ways, including software, hardware, firmware or any combination thereof. Taking apparatus 500 as an example, in some embodiments, one or more units may be implemented using software and/or firmware, such as machine-executable instructions stored on a storage medium. In addition to or instead of machine-executable instructions, some or all of the units in apparatus 500 may be at least partially implemented by one or more hardware logic components. Exemplary types of hardware logic components that may be used include, by way of example and not limitation, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logic Devices (CPLD), and so on.
图6示出了其中可以实施本公开的一个或多个实施例的计算设备/服务器600的框图。应当理解,图6所示出的计算设备/服务器600仅仅是示例性的,而不应当构成对本文所描述的实施例的功能和范围的任何限制。FIG. 6 shows a block diagram of a computing device/server 600 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the computing device/server 600 shown in FIG. 6 is exemplary only and should not constitute any limitation on the functionality and scope of the embodiments described herein.
如图6所示,计算设备/服务器600是通用计算设备的形式。计算设备/服务器600的组件可以包括但不限于一个或多个处理器或处理单元610、存储器620、存储设备630、一个或多个通信单元640、一个或多个输入设备650以及一个或多个输出设备660。 处理单元610可以是实际或虚拟处理器并且能够根据存储器620中存储的程序来执行各种处理。在多处理器系统中,多个处理单元并行执行计算机可执行指令,以提高计算设备/服务器600的并行处理能力。As shown in Figure 6, computing device/server 600 is in the form of a general purpose computing device. Components of computing device/server 600 may include, but are not limited to, one or more processors or processing units 610, memory 620, storage devices 630, one or more communication units 640, one or more input devices 650, and one or more output device 660. The processing unit 610 may be an actual or virtual processor and is capable of performing various processes according to programs stored in the memory 620 . In a multi-processor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capability of the computing device/server 600 .
计算设备/服务器600通常包括多个计算机存储介质。这样的介质可以是计算设备/服务器600可访问的任何可以获得的介质,包括但不限于易失性和非易失性介质、可拆卸和不可拆卸介质。存储器620可以是易失性存储器(例如寄存器、高速缓存、随机访问存储器(RAM))、非易失性存储器(例如,只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、闪存)或它们的某种组合。存储设备630可以是可拆卸或不可拆卸的介质,并且可以包括机器可读介质,诸如闪存驱动、磁盘或者任何其他介质,其可以能够用于存储信息和/或数据(例如用于训练的训练数据)并且可以在计算设备/服务器600内被访问。Computing device/server 600 typically includes multiple computer storage media. Such media can be any available media that is accessible to computing device/server 600 , including but not limited to, volatile and nonvolatile media, removable and non-removable media. Memory 620 can be volatile memory (eg, registers, cache, random access memory (RAM)), nonvolatile memory (eg, read only memory (ROM), electrically erasable programmable read only memory (EEPROM) , flash memory) or some combination of them. Storage device 630 may be removable or non-removable media, and may include machine-readable media, such as flash drives, magnetic disks, or any other media that may be capable of storing information and/or data (e.g., training data for training ) and can be accessed within computing device/server 600.
计算设备/服务器600可以进一步包括另外的可拆卸/不可拆卸、易失性/非易失性存储介质。尽管未在图6中示出,可以提供用于从可拆卸、非易失性磁盘(例如“软盘”)进行读取或写入的磁盘驱动和用于从可拆卸、非易失性光盘进行读取或写入的光盘驱动。在这些情况中,每个驱动可以由一个或多个数据介质接口被连接至总线(未示出)。存储器620可以包括计算机程序产品626,其具有一个或多个程序模块,这些程序模块被配置为执行本公开的各种实施例的各种方法或动作。Computing device/server 600 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in FIG. 6, a disk drive for reading from or writing to a removable, nonvolatile disk (such as a "floppy disk") and a disk drive for reading from a removable, nonvolatile disk may be provided. CD-ROM drive for reading or writing. In these cases, each drive may be connected to the bus (not shown) by one or more data media interfaces. Memory 620 may include a computer program product 626 having one or more program modules configured to perform the various methods or actions of the various embodiments of the present disclosure.
通信单元640实现通过通信介质与其他计算设备进行通信。附加地,计算设备/服务器600的组件的功能可以以单个计算集群或多个计算机器来实现,这些计算机器能够通过通信连接进行通信。因此,计算设备/服务器600可以使用与一个或多个其他服务器、网络个人计算机(PC)或者另一个网络节点的逻辑连接来在联网环境中进行操作。The communication unit 640 enables communication with other computing devices through the communication medium. Additionally, the functionality of the components of computing device/server 600 may be implemented in a single computing cluster or as a plurality of computing machines capable of communicating via communication links. Accordingly, computing device/server 600 may operate in a networked environment using logical connections to one or more other servers, a network personal computer (PC), or another network node.
输入设备650可以是一个或多个输入设备,例如鼠标、键盘、 追踪球等。输出设备660可以是一个或多个输出设备,例如显示器、扬声器、打印机等。计算设备/服务器600还可以根据需要通过通信单元640与一个或多个外部设备(未示出)进行通信,外部设备诸如存储设备、显示设备等,与一个或多个使得用户与计算设备/服务器600交互的设备进行通信,或者与使得计算设备/服务器600与一个或多个其他计算设备通信的任何设备(例如,网卡、调制解调器等)进行通信。这样的通信可以经由输入/输出(I/O)接口(未示出)来执行。The input device 650 may be one or more input devices, such as a mouse, keyboard, trackball, and the like. Output device 660 may be one or more output devices, such as a display, speakers, printer, or the like. The computing device/server 600 can also communicate with one or more external devices (not shown) through the communication unit 640 as needed, such as storage devices, display devices, etc., and one or more external devices that allow users to communicate with the computing device/server The devices that interact with 600 communicate, or communicate with any device (eg, network card, modem, etc.) that enables computing device/server 600 to communicate with one or more other computing devices. Such communication may be performed via an input/output (I/O) interface (not shown).
根据本公开的示例性实现方式,提供了一种计算机可读存储介质,其上存储有一条或多条计算机指令,其中一条或多条计算机指令被处理器执行以实现上文描述的方法。According to an exemplary implementation of the present disclosure, there is provided a computer-readable storage medium on which one or more computer instructions are stored, wherein the one or more computer instructions are executed by a processor to implement the method described above.
这里参照根据本公开实现的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products implemented according to the disclosure. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理单元,从而生产出一种机器,使得这些指令在通过计算机或其他可编程数据处理装置的处理单元执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processing unit of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其他可编程数据处理装置、或其他设备上,使得在计算机、其他可编程数据处理装置或其他设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其他可编程数据处理装置、或其他设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的 功能/动作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operation steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本公开的多个实现的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various implementations of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a program segment, or a portion of an instruction that contains one or more executable instruction. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.
以上已经描述了本公开的各实现,上述说明是示例性的,并非穷尽性的,并且也不限于所公开的各实现。在不偏离所说明的各实现的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实现的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其他普通技术人员能理解本文公开的各实现。Having described various implementations of the present disclosure above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed implementations. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described implementations. The terminology used herein is chosen to best explain the principle of each implementation, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each implementation disclosed herein.

Claims (20)

  1. 一种训练关系抽取模型的方法,包括:A method of training a relation extraction model, comprising:
    基于由文档中的目标实体对和与所述目标实体对关联的目标关系组成的给定三元组,确定在给定三元组的条件下一组规则的概率分布,所述目标关系选自用于描述所述文档中的实体对之间联系的一组关系,所述一组规则用于描述所述目标实体对之间联系的逻辑;Based on a given triplet consisting of a target entity pair in a document and a target relationship associated with said target entity pair, determine the probability distribution of a set of rules conditional on the given triplet, said target relationship being selected from a set of relations for describing the relationship between the entity pairs in the document, and the set of rules for describing the logic of the relationship between the target entity pairs;
    基于所述在给定三元组的条件下一组规则的概率分布,确定在给定三元组的条件下得分的概率分布,所述得分指示在所述文档中所述目标关系对于所述目标实体对是否有效;以及Based on the probability distribution of the set of rules conditioned on the given triples, a probability distribution of scores conditioned on the given triples is determined, the score indicating that the target relationship is significant in the document for the Whether the target entity pair is valid; and
    基于与所述得分对应的标记值,通过使所述在给定三元组的条件下得分的概率分布的参数的似然函数最大化,获得经训练的所述关系抽取模型。The trained relation extraction model is obtained by maximizing a likelihood function of a parameter of the probability distribution of scores conditioned on the given triples, based on the flag values corresponding to the scores.
  2. 根据权利要求1所述的方法,其中确定在给定三元组的条件下得分的概率分布包括:The method of claim 1, wherein determining a probability distribution of scores conditioned on a given triple comprises:
    基于所述一组规则,确定在所述目标实体对之间的至少一个路径;determining at least one path between the pair of target entities based on the set of rules;
    基于所述至少一个路径途经的实体对以及关联的关系,确定所述在给定三元组和一组规则的条件下得分的概率分布;以及determining said probability distribution of scores given triples and a set of rules based on pairs of entities traversed by said at least one path and associated relationships; and
    基于所述在给定三元组的条件下一组规则的概率分布以及所述在给定三元组和一组规则的条件下得分的概率分布,确定所述在给定三元组的条件下得分的概率分布。Based on said probability distribution of a set of rules conditioned on a given triple and said probability distribution of scores conditioned on a given triple and a set of rules, determining said The probability distribution of the next score.
  3. 根据权利要求2所述的方法,其中使所述在给定三元组的条件下得分的概率分布的参数的似然函数最大化包括:The method of claim 2, wherein maximizing the likelihood function of a parameter of the probability distribution scored conditioned on the triples comprises:
    基于所述参数的当前值,确定所述一组规则的后验概率分布;以及determining a posterior probability distribution for the set of rules based on current values of the parameters; and
    基于所述一组规则的后验概率分布,通过使所述似然函数最大化来确定所述参数的更新值。An updated value for the parameter is determined by maximizing the likelihood function based on the posterior probability distribution of the set of rules.
  4. 根据权利要求3所述的方法,其中确定所述一组规则的后验概率分布包括:The method of claim 3, wherein determining the posterior probability distribution of the set of rules comprises:
    基于所述在给定三元组的条件下一组规则的概率分布、所述至少一个路径途经的实体对以及所述关联的关系、以及所述标记值,确定针对所述一组规则中的每个规则的得分函数;Based on the probability distribution of the set of rules under the condition of the given triples, the entity pairs passed by the at least one path and the associated relationship, and the tag value, determine the A scoring function for each rule;
    基于针对每个规则的得分函数,确定每个规则的后验概率分布;以及determining a posterior probability distribution for each rule based on the scoring function for each rule; and
    基于所述每个规则的后验概率分布和所述一组规则中规则的数目,确定所述一组规则的近似后验概率分布,以作为所述一组规则的后验概率分布。An approximate posterior probability distribution for the set of rules is determined as the posterior probability distribution for the set of rules based on the posterior probability distribution for each rule and the number of rules in the set of rules.
  5. 根据权利要求4所述的方法,其中使所述似然函数最大化包括:The method of claim 4, wherein maximizing the likelihood function comprises:
    使所述似然函数的下限最大化,所述似然函数的下限与所述一组规则的近似后验概率分布关联。A lower bound of the likelihood function is maximized, the lower bound of the likelihood function being associated with an approximate posterior probability distribution for the set of rules.
  6. 根据权利要求5所述的方法,其中使所述似然函数的下限最大化包括:The method of claim 5, wherein maximizing the lower bound of the likelihood function comprises:
    基于所述一组规则的近似后验概率分布,采样第一组更新规则;sampling a first set of update rules based on an approximate posterior probability distribution of said set of rules;
    基于所述第一组更新规则,更新所述在给定三元组的条件下一组规则的概率分布;updating the probability distribution of the set of rules conditioned on the given triples based on the first set of updated rules;
    基于经更新的在给定三元组的条件下一组规则的概率分布,采样第二组更新规则;以及sampling a second set of updated rules based on the updated probability distribution of the set of rules conditional on the triplet; and
    基于所述第二组更新规则,更新所述在给定三元组和一组规则的条件下得分的概率分布。Based on the second set of update rules, the probability distribution of scores conditioned on the triplet and a set of rules is updated.
  7. 根据权利要求1所述的方法,其中所述一组规则中的每个规则由所述一组关系中的多个关系的序列表示。The method of claim 1, wherein each rule in the set of rules is represented by a sequence of a plurality of relationships in the set of relationships.
  8. 根据权利要求1所述的方法,其中使所述在给定三元组的条件下得分的概率分布的参数的似然函数最大化包括:The method of claim 1 , wherein maximizing the likelihood function of a parameter of the probability distribution scored conditioned on the triples comprises:
    利用期望最大化算法来对所述参数进行最大似然估计。The parameters are estimated with maximum likelihood using an expectation-maximization algorithm.
  9. 一种用于关系抽取的方法,包括:A method for relation extraction comprising:
    基于由文档中的目标实体对和与所述目标实体对关联的目标关系组成的给定三元组,生成用于描述所述目标实体对之间联系的逻辑的一组规则,所述目标关系选自用于描述所述文档中的实体对之间联系的一组关系;Based on a given triplet consisting of a target entity pair in a document and a target relationship associated with the target entity pair, generate a set of rules for describing the logic of the connection between the target entity pair, the target relationship selected from a set of relationships describing relationships between pairs of entities in said document;
    基于所述一组规则,确定在所述目标实体对之间的至少一个路径;以及determining at least one path between the pair of target entities based on the set of rules; and
    基于所述至少一个路径途经的实体对以及关联的关系,确定指示在所述文档中所述目标关系对于所述目标实体对是否有效的得分。A score indicating whether the target relationship is valid for the target entity pair in the document is determined based on the entity pair traversed by the at least one path and the associated relationship.
  10. 根据权利要求9所述的方法,其中确定在所述目标实体对之间的至少一个路径包括:The method of claim 9, wherein determining at least one path between the pair of target entities comprises:
    针对所述一组规则中的每个规则,确定对应的路径,所述路径开始于所述目标实体对中的起始实体并且结束于所述目标实体对中的末尾实体;并且for each rule in the set of rules, determine a corresponding path that begins with a start entity of the pair of target entities and ends with an end entity of the pair of target entities; and
    所述路径途经的实体对之间联系的逻辑满足所述规则。The logic of the connection between the entity pairs passed by the path satisfies the rule.
  11. 根据权利要求9所述的方法,其中所述一组规则中的每个规则由所述一组关系中的多个关系的序列表示。The method of claim 9, wherein each rule in the set of rules is represented by a sequence of a plurality of relationships in the set of relationships.
  12. 一种用于训练关系抽取的装置,包括:A device for training relation extraction, comprising:
    规则概率确定模块,被配置为基于由文档中的目标实体对和与所述目标实体对关联的目标关系组成的给定三元组,确定在给定三元组的条件下一组规则的概率分布,所述目标关系选自用于描述所述文档中的实体对之间联系的一组关系,所述一组规则用于描述所述目标实体对之间联系的逻辑;a rule probability determination module configured to determine, based on a given triplet consisting of a target entity pair in a document and a target relationship associated with said target entity pair, the probability of a set of rules conditional on the given triplet distribution, the target relationship is selected from a set of relationships used to describe the relationship between the entity pairs in the document, and the set of rules is used to describe the logic of the relationship between the target entity pairs;
    得分概率确定模块,被配置为基于所述在给定三元组的条件下一组规则的概率分布,确定在给定三元组的条件下得分的概率分布,所述得分指示在所述文档中所述目标关系对于所述目标实体对是否有效;以及a score probability determination module configured to determine, based on the probability distribution of a set of rules conditioned on the given triples, a probability distribution of scores conditioned on the given triples, the scores indicating the Whether the target relationship described in is valid for the target entity pair; and
    优化模块,被配置为基于与所述得分对应的标记值,通过使所 述在给定三元组的条件下得分的概率分布的参数的似然函数最大化,获得经训练的所述关系抽取模型。an optimization module configured to obtain said relation extraction trained by maximizing a likelihood function of a parameter of said probability distribution of scores conditioned on a given triplet, based on a flag value corresponding to said score Model.
  13. 根据权利要求12所述的装置,其中所述得分概率确定模块包括:The apparatus of claim 12, wherein the scoring probability determination module comprises:
    路径寻找模块,被配置为基于所述一组规则,确定在所述目标实体对之间的至少一个路径;a path finding module configured to determine at least one path between the pair of target entities based on the set of rules;
    第一概率确定模块,被配置为基于所述至少一个路径途经的实体对以及关联的关系,确定所述在给定三元组和一组规则的条件下得分的概率分布;以及A first probability determination module configured to determine the probability distribution of scores given triples and a set of rules based on entity pairs and associated relationships traversed by the at least one path; and
    第二概率确定模块,被配置为基于所述在给定三元组的条件下一组规则的概率分布以及所述在给定三元组和一组规则的条件下得分的概率分布,确定所述在给定三元组的条件下得分的概率分布。The second probability determination module, configured to determine the Describes the probability distribution of scores conditioned on a given triplet.
  14. 根据权利要求13所述的装置,其中所述优化模块包括:The apparatus of claim 13, wherein said optimization module comprises:
    后验概率确定模块,被配置为基于所述参数的当前值,确定所述一组规则的后验概率分布;以及a posterior probability determination module configured to determine a posterior probability distribution for the set of rules based on current values of the parameters; and
    似然函数最大化模块,被配置为基于所述一组规则的后验概率分布,通过使所述似然函数最大化来确定所述参数的更新值。A likelihood function maximization module configured to determine an updated value of the parameter by maximizing the likelihood function based on the posterior probability distribution of the set of rules.
  15. 根据权利要求14所述的装置,其中所述后验概率确定模块包括:The apparatus according to claim 14, wherein said posterior probability determination module comprises:
    得分函数确定模块,被配置为基于所述在给定三元组的条件下一组规则的概率分布、所述至少一个路径途经的实体对以及所述关联的关系、以及所述标记值,确定针对所述一组规则中的每个规则的得分函数;A scoring function determination module configured to determine, based on the probability distribution of a set of rules under the condition of a given triplet, the entity pairs passed by the at least one path and the associated relationship, and the tag value a scoring function for each rule in the set of rules;
    第一后验概率确定模块,被配置为基于针对每个规则的得分函数,确定每个规则的后验概率分布;以及a first posterior probability determination module configured to determine a posterior probability distribution for each rule based on a scoring function for each rule; and
    第二后验概率确定模块,被配置为基于所述每个规则的后验概率分布和所述一组规则中规则的数目,确定所述一组规则的近似后验概率分布,以作为所述一组规则的后验概率分布。The second posterior probability determination module is configured to determine an approximate posterior probability distribution of the set of rules based on the posterior probability distribution of each rule and the number of rules in the set of rules as the A regular set of posterior probability distributions.
  16. 根据权利要求15所述的装置,其中所述似然函数最大化模块包括:The apparatus according to claim 15, wherein said likelihood function maximization module comprises:
    下限最大化模块,被配置为使所述似然函数的下限最大化,所述似然函数的下限与所述一组规则的近似后验概率分布关联。A lower bound maximization module configured to maximize a lower bound of the likelihood function associated with an approximate posterior probability distribution of the set of rules.
  17. 一种用于关系抽取的装置,包括:A device for relation extraction, comprising:
    规则生成模块,被配置为基于由文档中的目标实体对和与所述目标实体对关联的目标关系组成的给定三元组,生成用于描述所述目标实体对之间联系的逻辑的一组规则,所述目标关系选自用于描述所述文档中的实体对之间联系的一组关系;A rule generation module configured to generate a logic rule for describing the relationship between the target entity pair based on a given triplet consisting of the target entity pair in the document and the target relationship associated with the target entity pair. a set of rules, the target relationship being selected from a set of relationships describing relationships between pairs of entities in the document;
    路径确定模块,被配置为基于所述一组规则,确定在所述目标实体对之间的至少一个路径;以及a path determination module configured to determine at least one path between the pair of target entities based on the set of rules; and
    得分确定模块,被配置为基于所述至少一个路径途经的实体对以及关联的关系,确定指示在所述文档中所述目标关系对于所述目标实体对是否有效的得分。A score determination module configured to determine a score indicating whether the target relationship is valid for the target entity pair in the document based on the entity pair traversed by the at least one path and the associated relationship.
  18. 根据权利要求17所述的装置,其中所述路径确定模块包括:The apparatus according to claim 17, wherein said path determination module comprises:
    路径探索模块,被配置为针对所述一组规则中的每个规则,确定对应的路径,所述路径开始于所述目标实体对中的起始实体并且结束于所述目标实体对中的末尾实体;并且a path exploration module configured to, for each rule in the set of rules, determine a corresponding path that begins at the start entity in the pair of target entities and ends at the end in the pair of target entities entity; and
    所述路径途经的实体对之间联系的逻辑满足所述规则。The logic of the connection between the entity pairs passed by the path satisfies the rule.
  19. 一种电子设备,包括:An electronic device comprising:
    存储器和处理器;memory and processor;
    其中所述存储器用于存储一条或多条计算机指令,其中所述一条或多条计算机指令被所述处理器执行以实现根据权利要求1至11中任一项所述的方法。Wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method according to any one of claims 1-11.
  20. 一种计算机可读存储介质,其上存储有一条或多条计算机指令,其中所述一条或多条计算机指令被处理器执行以实现根据权利要求1至11中任一项所述的方法。A computer-readable storage medium having one or more computer instructions stored thereon, wherein the one or more computer instructions are executed by a processor to implement the method according to any one of claims 1-11.
PCT/CN2022/116286 2021-09-30 2022-08-31 Method and apparatus for relationship extraction, device and medium WO2023051142A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111161205.4A CN113901151B (en) 2021-09-30 2021-09-30 Method, apparatus, device and medium for relation extraction
CN202111161205.4 2021-09-30

Publications (1)

Publication Number Publication Date
WO2023051142A1 true WO2023051142A1 (en) 2023-04-06

Family

ID=79189839

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/116286 WO2023051142A1 (en) 2021-09-30 2022-08-31 Method and apparatus for relationship extraction, device and medium

Country Status (2)

Country Link
CN (1) CN113901151B (en)
WO (1) WO2023051142A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113901151B (en) * 2021-09-30 2023-07-04 北京有竹居网络技术有限公司 Method, apparatus, device and medium for relation extraction

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902145A (en) * 2019-01-18 2019-06-18 中国科学院信息工程研究所 A kind of entity relationship joint abstracting method and system based on attention mechanism
CN110196913A (en) * 2019-05-23 2019-09-03 北京邮电大学 Multiple entity relationship joint abstracting method and device based on text generation formula
US20200073932A1 (en) * 2018-08-30 2020-03-05 Intelligent Fusion Technology, Inc Method and system for pattern discovery and real-time anomaly detection based on knowledge graph
CN111125318A (en) * 2019-12-27 2020-05-08 北京工业大学 Method for improving knowledge graph relation prediction performance based on sememe-semantic item information
CN111539211A (en) * 2020-04-17 2020-08-14 中移(杭州)信息技术有限公司 Entity and semantic relation recognition method and device, electronic equipment and storage medium
CN113901151A (en) * 2021-09-30 2022-01-07 北京有竹居网络技术有限公司 Method, apparatus, device and medium for relationship extraction

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7890438B2 (en) * 2007-12-12 2011-02-15 Xerox Corporation Stacked generalization learning for document annotation
CN103268348B (en) * 2013-05-28 2016-08-10 中国科学院计算技术研究所 A kind of user's query intention recognition methods
CN106874380B (en) * 2017-01-06 2020-01-14 北京航空航天大学 Method and device for checking triple of knowledge base
US20180232443A1 (en) * 2017-02-16 2018-08-16 Globality, Inc. Intelligent matching system with ontology-aided relation extraction
CN106934012B (en) * 2017-03-10 2020-05-08 上海数眼科技发展有限公司 Natural language question-answering implementation method and system based on knowledge graph
CN108228877B (en) * 2018-01-22 2020-08-04 北京师范大学 Knowledge base completion method and device based on learning sorting algorithm
CN108304933A (en) * 2018-01-29 2018-07-20 北京师范大学 A kind of complementing method and complementing device of knowledge base
EP3794511A1 (en) * 2018-05-18 2021-03-24 BenevolentAI Technology Limited Graph neutral networks with attention
US11574179B2 (en) * 2019-01-07 2023-02-07 International Business Machines Corporation Deep symbolic validation of information extraction systems
CN111144570B (en) * 2019-12-27 2022-06-21 福州大学 Knowledge representation method combining logic rules and confidence degrees
CN111191460B (en) * 2019-12-30 2023-01-03 福州大学 Relation prediction method combining logic rule and fragmentation knowledge
CN111324743A (en) * 2020-02-14 2020-06-23 平安科技(深圳)有限公司 Text relation extraction method and device, computer equipment and storage medium
CN111651528A (en) * 2020-05-11 2020-09-11 北京理工大学 Open entity relation extraction method based on generative countermeasure network
CN112364166B (en) * 2020-11-02 2022-02-01 北京中科凡语科技有限公司 Method for establishing relation extraction model and relation extraction method
CN112765369A (en) * 2021-01-31 2021-05-07 西安电子科技大学 Knowledge graph information representation learning method, system, equipment and terminal
CN112949835A (en) * 2021-03-30 2021-06-11 太原理工大学 Inference method and device for knowledge graph based on convolution cyclic neural network
CN113268985B (en) * 2021-04-26 2023-06-20 华南理工大学 Relationship path-based remote supervision relationship extraction method, device and medium
CN113190688B (en) * 2021-05-08 2022-07-19 中国人民解放军国防科技大学 Complex network link prediction method and system based on logical reasoning and graph convolution

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200073932A1 (en) * 2018-08-30 2020-03-05 Intelligent Fusion Technology, Inc Method and system for pattern discovery and real-time anomaly detection based on knowledge graph
CN109902145A (en) * 2019-01-18 2019-06-18 中国科学院信息工程研究所 A kind of entity relationship joint abstracting method and system based on attention mechanism
CN110196913A (en) * 2019-05-23 2019-09-03 北京邮电大学 Multiple entity relationship joint abstracting method and device based on text generation formula
CN111125318A (en) * 2019-12-27 2020-05-08 北京工业大学 Method for improving knowledge graph relation prediction performance based on sememe-semantic item information
CN111539211A (en) * 2020-04-17 2020-08-14 中移(杭州)信息技术有限公司 Entity and semantic relation recognition method and device, electronic equipment and storage medium
CN113901151A (en) * 2021-09-30 2022-01-07 北京有竹居网络技术有限公司 Method, apparatus, device and medium for relationship extraction

Also Published As

Publication number Publication date
CN113901151B (en) 2023-07-04
CN113901151A (en) 2022-01-07

Similar Documents

Publication Publication Date Title
Ganea et al. Probabilistic bag-of-hyperlinks model for entity linking
Scardapane et al. Distributed semi-supervised support vector machines
Friedman et al. Regularization paths for generalized linear models via coordinate descent
US10839315B2 (en) Method and system of selecting training features for a machine learning algorithm
JP2023134499A (en) Robust training in presence of label noise
CN113837205B (en) Method, apparatus, device and medium for image feature representation generation
Yin et al. The global anchor method for quantifying linguistic shifts and domain adaptation
WO2022179384A1 (en) Social group division method and division system, and related apparatuses
US11636355B2 (en) Integration of knowledge graph embedding into topic modeling with hierarchical Dirichlet process
Shan et al. Confidence-aware negative sampling method for noisy knowledge graph embedding
WO2023051142A1 (en) Method and apparatus for relationship extraction, device and medium
Qian Understanding negative sampling in knowledge graph embedding
Ye et al. Leapattack: Hard-label adversarial attack on text via gradient-based optimization
Li et al. The max-min high-order dynamic Bayesian network learning for identifying gene regulatory networks from time-series microarray data
Fan et al. Partial label learning with competitive learning graph neural network
US9886498B2 (en) Title standardization
Zhu et al. Generalized universal domain adaptation with generative flow networks
JP6770709B2 (en) Model generator and program for machine learning.
Chen et al. Dag-based long short-term memory for neural word segmentation
Li et al. Learning background prompts to discover implicit knowledge for open vocabulary object detection
WO2023061107A1 (en) Language translation method and apparatus based on layer prediction, and device and medium
CN113590774B (en) Event query method, device and storage medium
JP2017538226A (en) Scalable web data extraction
Zhang et al. Self-paced deep clustering with learning loss
Sun et al. Statistical inference and distributed implementation for linear multicategory SVM

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874533

Country of ref document: EP

Kind code of ref document: A1