CN114385812A - Relation extraction method and system for text - Google Patents

Relation extraction method and system for text Download PDF

Info

Publication number
CN114385812A
CN114385812A CN202111598713.9A CN202111598713A CN114385812A CN 114385812 A CN114385812 A CN 114385812A CN 202111598713 A CN202111598713 A CN 202111598713A CN 114385812 A CN114385812 A CN 114385812A
Authority
CN
China
Prior art keywords
relationship
entity
label
labels
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111598713.9A
Other languages
Chinese (zh)
Inventor
杨一帆
李茂龙
施淼元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN202111598713.9A priority Critical patent/CN114385812A/en
Publication of CN114385812A publication Critical patent/CN114385812A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a relation extraction method for texts. The method comprises the following steps: coding the text by using BERT, copying and raising the dimension of the sequence of the coding result to obtain a matrix based on segment arrangement; performing multi-label classification on the character strings corresponding to each coordinate in the matrix to obtain a label set of each coordinate; and traversing all head labels, tail labels and entity labels in the label set of each coordinate to perform handshake type labeling pairing for relationship extraction, and determining at least one triple of a defined relationship or an open relationship for representing the relationship among the entities in the text. The embodiment of the invention also provides a relation extraction system for the text. The embodiment of the invention utilizes the set label set and the matching relation extraction, can realize the limited relation extraction and can realize the open relation extraction at the same time. The loss of coding information can be reduced to a greater extent, and the conflict-free multi-dimensional relation triple can be accurately represented.

Description

Relation extraction method and system for text
Technical Field
The invention relates to the field of natural language processing, in particular to a method and a system for extracting a relation of a text.
Background
The relation extraction is a basic task in the field of natural language processing, widely exists in the fields of text mining, information retrieval, intelligent question answering and the like, and plays a very important role.
For relationship extraction, the prior art would use:
1. the Chinese unsupervised open type entity relation extraction method based on dependency semantics is characterized in that Chinese word segmentation, named entity recognition, part of speech tagging and dependency syntax analysis are carried out on a text, two entities are randomly selected from the named entity recognition to find a dependency path between the two entities, and once the dependency path meets predefined normal form matching, a relation triple is further obtained through analysis.
2. And convolving each sentence and related entities thereof by utilizing a neural network relation extraction method to obtain sentence vector representation of the entities, and obtaining comprehensive sentence vector representation by adopting a sentence level attention mechanism, wherein the sentence level attention mechanism and the comprehensive sentence vector representation are used as characteristic prediction relations.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:
because most of the frames adopted by the above methods are based on a pipeline model frame, limited relationship extraction and open relationship extraction cannot be simultaneously realized, step-by-step extraction is required, an end-to-end extraction method cannot be realized, the two methods cannot be integrated, and the extracted relationships may be incomplete or conflict.
Disclosure of Invention
To at least solve the incomplete or conflicting problem of relationship extraction in the prior art. In a first aspect, an embodiment of the present invention provides a method for extracting a relationship of a text, including:
coding the text by using BERT, copying and raising the dimension of the sequence of the coding result to obtain a matrix based on segment arrangement;
performing multi-label classification on the character strings corresponding to each coordinate in the matrix to obtain a label set of each coordinate;
traversing all head labels, tail labels and entity labels in the label set of each coordinate, performing handshake type labeling pairing to perform relationship extraction, and determining at least one triple of a defined relationship or an open relationship for representing the relationship among the entities in the text.
In a second aspect, an embodiment of the present invention provides a relationship extraction system for text, including:
the matrix determining program module is used for coding the text by using BERT, copying and raising the dimension of the sequence of the coding result to obtain a matrix based on segment arrangement;
the label classification program module is used for carrying out multi-label classification on the character strings corresponding to each coordinate in the matrix to obtain a label set of each coordinate;
and the label matching program module is used for traversing all head labels, tail labels and entity labels in the label set of each coordinate to perform handshake type label matching for relation extraction, and determining at least one triple of a limited relation or an open relation, which is used for representing the relation among the entities in the text.
In a third aspect, an electronic device is provided, comprising: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the steps of the method for extracting the relationship of the text according to any embodiment of the invention.
In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement the steps of the method for extracting a relationship between texts according to any embodiment of the present invention.
The embodiment of the invention has the beneficial effects that: by utilizing the set label set and the pairing relation extraction, the limited relation extraction can be realized, and meanwhile, the open relation extraction can be realized. The loss of coding information can be reduced to a greater extent, and the multi-dimensional relation triple can be accurately represented. After the multi-dimensional relation triples are determined, whether the triples conflict or not is further judged, and therefore the accuracy of extracting the relation is further ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a relationship extraction method for text according to an embodiment of the present invention;
FIG. 2 is a matrix diagram illustrating a relationship extraction method for texts according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a matrix labeling pairing for a method for extracting a relationship of a text according to an embodiment of the present invention;
fig. 4 is a flowchart of a relationship extraction method for a text according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a relationship extraction system for texts according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an embodiment of an electronic device for relation extraction of texts according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a method for extracting a relationship of a text according to an embodiment of the present invention, which includes the following steps:
s11: coding the text by using BERT, copying and raising the dimension of the sequence of the coding result to obtain a matrix based on segment arrangement;
s12: performing multi-label classification on the character strings corresponding to each coordinate in the matrix to obtain a label set of each coordinate;
s13: traversing all head labels, tail labels and entity labels in the label set of each coordinate, performing handshake type labeling pairing to perform relationship extraction, and determining at least one triple of a defined relationship or an open relationship for representing the relationship among the entities in the text.
In the embodiment, the method can be applied to the fields of text mining, information retrieval, intelligent question answering and the like, and the relation triples in the text are extracted to represent the relation of the text. The text may be a conversation between users, a question input by a user, or an article.
For step S11, for text X ═ X1,x2,…,xn]Output result H ═ H after BERT coding1,h2,…,hn]. Among them, BERT (Bidirectional Encoder from transforms) is a model based on an encoding-decoding structure.
Copying and increasing the dimension of the encoding output result in the dimension of the sequence length to obtain a matrix A and a transposed matrix ATAnd a logits output matrix S of dimension n x n, as shown in fig. 2. Where logits are the outputs of the final fully-connected layer.
Each element S in the output matrixijThe formula is as follows:
Figure BDA0003432426260000041
wherein,
Figure BDA0003432426260000042
is a vector splicing operation, [ is a vector point multiplication operation, [ W ]1And W2As a weight matrix, b1And b2Is a bias matrix. The dimension n is multiplied by n and multiplied by T of the final matrix S, and T is a label set LallThe number of (2).
For step S12, in the matrix S determined in step S11, since the label of each position in the matrix S is not unique, an element S is an element for the matrix SijThere may be multiple tags, so s needs to be matchedijAnd performing multi-label classification. sijThe dimension is 1 × T, i.e. 01 binary classification is done for all tags. When s isijThe corresponding value of the t label is 0 to represent sijDoes not have a t-tag; and vice versa. Thus, the objective function (objective function) is:
Figure BDA0003432426260000043
Figure BDA0003432426260000044
wherein | D | is the number of training set data, n is the length of sentence, T is the total number of labels, PijtDenotes sijCorresponds to the probability that the t-th label value is 1, yijtDenotes sijCorresponds to the true value of the t-tag (0 or 1). In the prediction stage, a threshold value is set, for example, 0.5 as the threshold value (which may be specifically adjusted according to actual conditions, and is not limited herein), when P isijtWhen the probability is greater than 0.5, s is considered to beijHas L attAnd (4) a label.
The set of tags includes: the system comprises a subject entity label, an object entity label, a predicate entity label, head character labels of two entities, tail character labels of the two entities, head character labels of a predicate entity, tail character labels of a predicate entity, head character labels of a predicate entity, tail character labels of two entities, head character labels of a predicate entity, tail character labels of a predicate entity.
Specifically, the label set LallComprises the following steps:
labeling: time; the release is as follows: a time entity; the task is as follows: named entity recognition;
labeling: person; the release is as follows: a persona entity; the task is as follows: named entity recognition;
labeling: subject; the release is as follows: a subject entity; the task is as follows: extracting open relations and limited relations;
labeling: object; the release is as follows: an object entity; the task is as follows: extracting open relations and limited relations;
labeling: nation _ head2 head; the release is as follows: header characters of two entities having a national relationship; the task is as follows: extracting a limited relation;
labeling: nation _ tail2 tail; the release is as follows: the tail characters of two entities with ethnic relationships; the task is as follows: extracting a limited relation;
labeling: predicate; the release is as follows: a predicate entity; the task is as follows: extracting an open relationship;
labeling: subject _ head2 prediction _ head; the release is as follows: the head character of the main and predicate entities with open relation exists; the task is as follows: extracting an open relationship;
labeling: subject _ tail2 prediction _ tail; the release is as follows: the tail characters of the main and predicate entities with open relations exist; the task is as follows: extracting an open relationship;
labeling: object _ head2 prefix _ head; the release is as follows: the head character of the object-meaning entity with open relation; the task is as follows: extracting an open relationship;
labeling: object _ tail2 prediction _ tail; the release is as follows: the tail characters of the object-to-object entities with open relations exist; the task is as follows: extracting an open relationship;
in fact, the labels are not only the above, but also can be added and adjusted based on actual needs. For example, in a sentence in a family structure, there may be tags: parent _ head2 head; the release is as follows: head characters of two entities with parent-child relationship; the task is as follows: extracting a limited relation; labeling: parent _ tail2 tail; the release is as follows: the tail characters of two entities with parent-child relationship; the task is as follows: and (5) limiting the extraction of the relation. That is, if there are other requirements, more tags may be added, and will not be described herein. The text takes 'Yao Mingliai playing basketball' as an example, the sequence dimension of the coding result is copied and subjected to dimension raising to obtain a matrix based on segment arrangement, and viewed intuitively, the abscissa of the matrix is sequentially from left to right as 'Yao', 'Ming', 'Ai', 'Kai', 'basket' and ball ', and the ordinate is sequentially from top to bottom as' Yao ',' Ming ',' Ai ',' Kai ',' basket 'and ball'.
The multi-label classification can be performed by the character strings corresponding to the matrix, for example, the matrix (0,1) expanded based on the segment arrangement. The character string with the starting position of 0 and the ending position of 1 (the abscissa corresponds to "Yao", the ordinate corresponds to "Ming") corresponds to "Yao Ming", PijtThe probability of correspondence (yao, ming, person) is 0.9, which is higher than the preset 0.5, and the label is determined to be "person", i.e., "yaoming" is a character entity. PijtThe probability of correspondence (yao, ming, time) is 0, which is lower than the preset 0.5, and it is found that "time" is not the label thereof. In the same manner, each label of the above example is judged separately. If a plurality of tags are matched, a tag set in (0,1) coordinates can be obtained. And judging other coordinates in the same way, which is not described herein again.
For step S13, taking a more complicated text as an example, "yaoming is present with wife cotyledon li", the processing is performed through the steps of the above steps S11 and S12 to obtain a label set for each coordinate, as shown in fig. 3, where S: subject, subject entity; p: predicate, predicate entity; o: object, object entity; SH: subject _ head2predicate _ head, a head character of a predicate entity for which an open relationship exists; ST: subject _ tail2 prediction _ tail, the tail character of the major and predicate entities with open relations; OH: object _ head2 prefix _ head, a head character of a predicate entity with an open relationship; OT: object _ tail2 prediction _ tail, the tail character of the predicate entity for which an open relationship exists.
Each element is shaped as (i, j, L), i, j is the position in the matrix, L (L ∈ L)all) Indicating that the location has an L-tag. All head and tail tags (including the head and tail tags) and all entity tags then need to be traversed to yield an unambiguous triple.
Taking more complex open relationship extraction as an example, when three open relationship extraction elements exist in the input text X: (i)s,js,subject),(ip,jp,predicate),(io,joObject) (simply, there is a predicate in a sentence), and satisfies the matrix position (i)s,ip) With a "subject _ head2 prediction _ head" tag, matrix position (j)s,jp) With a "subject _ tail2 prediction _ tail" label, matrix position (i)o,ip) With an object _ head2 prediction _ head tag, matrix position (j)o,jp) With an object _ tail2 prediction _ tail tag, this time a triple is available<X[is:js],X[ip:jp],X[io:jo]>. The above-mentioned labeling method can be referred to as hand-grasping type labeling. For example to obtain<Yaoming, wife and Yeli>。
In one embodiment, the relation-defining triplets are determined by a preset relation type triplet, and the open relation triplets are determined by a non-relation-defining triplet in the text.
In the present embodiment, the extraction of the restricted relationship: and the relation triplets are used for extracting the preset relation types. This type of text structure is relatively simple, with the tag sets dominated by "{ REL } _ head2 head" and "{ REL } _ tail2 tail", where REL denotes a predefined relationship type value. When the position of the matrix (i, j) expanded based on the segment arrangement has "{ REL } _ head2 head", it indicates that an entity (named entity or subject entity) starting from i has a REL relationship with another entity (named entity or object entity) starting from j, and "{ REL } _ tail2 tail" is used to indicate an ending position, and the meaning is the same. Since the named entity recognition task cannot cover all entity types, the subjects and objects of some relationship triples do not have strong named entity tags, and therefore "subject" and "object" tags are added. Taking the matrix of the text "yaoming, han man" as an example, the matrix (0,1) position (abscissa corresponding to "yao", ordinate corresponding to "ming") has the label "person", the matrix (3,4) position (abscissa corresponding to "han", ordinate corresponding to "family") has the label "object", the matrix (0,3) position (abscissa corresponding to "yao", ordinate corresponding to "min") has the label "nation _ head2 head", and the matrix (1,4) position (abscissa corresponding to "ming", ordinate corresponding to "family") has the label "nation _ tail2 tail". At the moment, the relation between the character entity 'Yaoming' and the object entity 'Han nationality' can be uniquely determined, and the triad < Yaoming, ethnic and Han nationality > can be obtained.
Open relationship extraction: for extracting the non-limiting relationship triplets appearing in the text. This type of text structure is complex. The tag set is dominated by "subject _ head2predicate _ head", "subject _ tail2predicate _ tail", "subject _ head2predicate _ head", "object _ tail2predicate _ tail", and the task additionally identifies the predicate entity "predicate" (simply, a statement may have a predicate, a predicate structure that is more consistent with the structure of the utterance spoken by the user in the day-to-day). When the matrix (i, j) position expanded based on the segment arrangement has "subject _ head2predicate _ head", it indicates that a subject entity with i as the starting position and another predicate entity with j as the starting position belong to the same set of open relations, and the other labels have the same meaning. Taking the text "< double row" as an example of a game issued by EA ", the matrix (1,4) position (abscissa corresponds to" double ", ordinate corresponds to" row ") has a label" subject ", the matrix (10,11) position (abscissa corresponds to" E ", ordinate corresponds to" a ") has a label" object ", the matrix (12,13) position (abscissa corresponds to" send ", ordinate corresponds to" row ") has a label" predict ", the matrix (1,12) position (abscissa corresponds to" double ", ordinate corresponds to" send ") has a label" subject _ head2 predict _ head ", the matrix (4,13) position (abscissa corresponds to" row ", ordinate corresponds to" row ") has a label" subject _ tail2 predict _ tail ", the matrix (10,12) position (abscissa corresponds to" E ", ordinate corresponds to" send ") has a label" subject _ head _ 2 predict _ tail ", the matrix (11,13) position (abscissa for "a" and ordinate for "row") has the label "object _ tail2 prediction _ tail". At the moment, the fact that the subject entity 'double-person formation' and the object entity 'EA' have an open relationship 'issue' can be uniquely determined, and the triple 'double-person formation, issue, EA' can be obtained.
According to the embodiment, by using the set label set and the pairing relation extraction, the limited relation extraction can be realized, and the open relation extraction can be realized. The loss of coding information can be reduced to a greater extent, and the multi-dimensional relation triple can be accurately represented.
As an embodiment, after the determining at least one triplet of defined or open relationships, the method further comprises:
when entity relationship conflicts exist in a plurality of triples, the average score of all the label probability sums in the triples is determined, and the triples with the lowest score in the triples are deleted to solve the entity relationship conflicts.
In this embodiment, when the set of triples is determined in the above steps, not only the principal and predicate objects (the constraint relation is extracted as the principal object) but also the labels corresponding to the boundaries of the entities need to be identified. Thus, the final score of a triple is defined as determining all tag element probabilities and averaging of the triple:
Figure BDA0003432426260000081
where N represents the number of all elements defining the triple (for example, the extraction of the restricted relationship is generally 4, and the extraction of the open relationship is generally 7, so that a better result can be obtained), and p (i, j, L) represents the corresponding probability score of a certain element. This score may be used to balance the accuracy and recall of producing triples and may be used to resolve triple conflicts. Take the text "the president a meets the president B in country a" as an example. If the original result identifies the triple of < country A, president, a >, < country B, president, B >, < country A, president, B >. At the moment, the triples have obvious conflict, the scores of the triples are calculated and sorted by the formula, and the triples with conflict and lower scores in the relation are eliminated from the country A, the President and the b. Thereby solving the problem of entity relationship conflicts. The overall framework of the method is generally shown in fig. 4.
According to the embodiment, whether the triples conflict or not is further judged after the multi-dimensional relation triples are determined, so that the accuracy of extracting the relation is further ensured.
Fig. 5 is a schematic structural diagram of a relationship extraction system for texts according to an embodiment of the present invention, which can execute the relationship extraction method for texts according to any of the above embodiments and is configured in a terminal.
The present embodiment provides a relationship extraction system 10 for text, which includes: a matrix determination program module 11, a label classification program module 12 and an annotation pairing program module 13.
The matrix determining program module 11 is configured to encode a text by using BERT, copy and raise dimensions of a sequence dimension of an encoding result, and obtain a matrix based on segment arrangement; the label classification program module 12 is configured to perform multi-label classification on the character string corresponding to each coordinate in the matrix to obtain a label set of each coordinate; the label matching program module 13 is configured to traverse all head labels, tail labels, and entity labels in the label set of each coordinate to perform handshake type label matching for relationship extraction, and determine at least one triple of a defined relationship or an open relationship, which is used to represent a relationship between entities in the text.
Further, the system further comprises: a conflict resolution program module for:
when entity relationship conflicts exist in a plurality of triples, the average score of all the label probability sums in the triples is determined, and the triples with the lowest score in the triples are deleted to solve the entity relationship conflicts.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the relation extraction method for the text in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
coding the text by using BERT, copying and raising the dimension of the sequence of the coding result to obtain a matrix based on segment arrangement;
performing multi-label classification on the character strings corresponding to each coordinate in the matrix to obtain a label set of each coordinate;
traversing all head labels, tail labels and entity labels in the label set of each coordinate, performing handshake type labeling pairing to perform relationship extraction, and determining at least one triple of a defined relationship or an open relationship for representing the relationship among the entities in the text.
As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a method for relationship extraction for text in any of the method embodiments described above.
Fig. 6 is a schematic hardware structure diagram of an electronic device for a relationship extraction method of a text according to another embodiment of the present application, and as shown in fig. 6, the device includes:
one or more processors 610 and a memory 620, with one processor 610 being an example in fig. 6. The apparatus for the relationship extraction method of text may further include: an input device 630 and an output device 640.
The processor 610, the memory 620, the input device 630, and the output device 640 may be connected by a bus or other means, such as the bus connection in fig. 6.
The memory 620, as a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the relationship extraction method for text in the embodiments of the present application. The processor 610 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in the memory 620, that is, implements the method for extracting a relationship of text according to the above method embodiment.
The memory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 620 optionally includes memory located remotely from processor 610, which may be connected to a mobile device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 630 may receive input numeric or character information. The output device 640 may include a display device such as a display screen.
The one or more modules are stored in the memory 620 and, when executed by the one or more processors 610, perform the method for relational extraction of text in any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the steps of the method for extracting the relationship of the text according to any embodiment of the invention.
The electronic device of the embodiments of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.
(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.
(4) Other electronic devices with data processing capabilities.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A relationship extraction method for text, comprising:
coding the text by using BERT, copying and raising the dimension of the sequence of the coding result to obtain a matrix based on segment arrangement;
performing multi-label classification on the character strings corresponding to each coordinate in the matrix to obtain a label set of each coordinate;
traversing all head labels, tail labels and entity labels in the label set of each coordinate, performing handshake type labeling pairing to perform relationship extraction, and determining at least one triple of a defined relationship or an open relationship for representing the relationship among the entities in the text.
2. The method of claim 1, wherein after the determining at least one triplet of defined or open relationships, the method further comprises:
when entity relationship conflicts exist in a plurality of triples, the average score of all the label probability sums in the triples is determined, and the triples with the lowest score in the triples are deleted to solve the entity relationship conflicts.
3. The method of claim 1, wherein the relationship-defining triplets are determined by relationship triplets of a preset relationship type, and the open-relationship triplets are determined by relationship triplets of an undefined relationship in the text.
4. The method of claim 1, wherein the labelset comprises: the system comprises a subject entity label, an object entity label, a predicate entity label, head character labels of two entities, tail character labels of the two entities, head character labels of a predicate entity, tail character labels of a predicate entity, head character labels of a predicate entity, tail character labels of two entities, head character labels of a predicate entity, tail character labels of a predicate entity.
5. A relationship extraction system for text, comprising:
the matrix determining program module is used for coding the text by using BERT, copying and raising the dimension of the sequence of the coding result to obtain a matrix based on segment arrangement;
the label classification program module is used for carrying out multi-label classification on the character strings corresponding to each coordinate in the matrix to obtain a label set of each coordinate;
and the label matching program module is used for traversing all head labels, tail labels and entity labels in the label set of each coordinate to perform handshake type label matching for relation extraction, and determining at least one triple of a limited relation or an open relation, which is used for representing the relation among the entities in the text.
6. The system of claim 5, wherein the system further comprises: a conflict resolution program module for:
when entity relationship conflicts exist in a plurality of triples, the average score of all the label probability sums in the triples is determined, and the triples with the lowest score in the triples are deleted to solve the entity relationship conflicts.
7. The system of claim 5, wherein the relationship-defining triplets are determined by relationship triplets of a preset relationship type, and the open-relationship triplets are determined by relationship triplets of an undefined relationship in the text.
8. The system of claim 5, wherein the labelset comprises: the system comprises a subject entity label, an object entity label, a predicate entity label, head character labels of two entities, tail character labels of the two entities, head character labels of a predicate entity, tail character labels of a predicate entity, head character labels of a predicate entity, tail character labels of two entities, head character labels of a predicate entity, tail character labels of a predicate entity.
9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any of claims 1-4.
10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
CN202111598713.9A 2021-12-24 2021-12-24 Relation extraction method and system for text Pending CN114385812A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111598713.9A CN114385812A (en) 2021-12-24 2021-12-24 Relation extraction method and system for text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111598713.9A CN114385812A (en) 2021-12-24 2021-12-24 Relation extraction method and system for text

Publications (1)

Publication Number Publication Date
CN114385812A true CN114385812A (en) 2022-04-22

Family

ID=81197188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111598713.9A Pending CN114385812A (en) 2021-12-24 2021-12-24 Relation extraction method and system for text

Country Status (1)

Country Link
CN (1) CN114385812A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486189A (en) * 2021-06-08 2021-10-08 广州数说故事信息科技有限公司 Open knowledge graph mining method and system
CN114528418A (en) * 2022-04-24 2022-05-24 杭州同花顺数据开发有限公司 Text processing method, system and storage medium
CN113486189B (en) * 2021-06-08 2024-10-18 广州数说故事信息科技有限公司 Open knowledge graph mining method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484675A (en) * 2016-09-29 2017-03-08 北京理工大学 Fusion distributed semantic and the character relation abstracting method of sentence justice feature
CN110781683A (en) * 2019-11-04 2020-02-11 河海大学 Entity relation joint extraction method
CN113468344A (en) * 2021-09-01 2021-10-01 北京德风新征程科技有限公司 Entity relationship extraction method and device, electronic equipment and computer readable medium
CN113568969A (en) * 2021-07-30 2021-10-29 咪咕文化科技有限公司 Information extraction method, device, equipment and computer readable storage medium
CN113822026A (en) * 2021-09-10 2021-12-21 神思电子技术股份有限公司 Multi-label entity labeling method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484675A (en) * 2016-09-29 2017-03-08 北京理工大学 Fusion distributed semantic and the character relation abstracting method of sentence justice feature
CN110781683A (en) * 2019-11-04 2020-02-11 河海大学 Entity relation joint extraction method
CN113568969A (en) * 2021-07-30 2021-10-29 咪咕文化科技有限公司 Information extraction method, device, equipment and computer readable storage medium
CN113468344A (en) * 2021-09-01 2021-10-01 北京德风新征程科技有限公司 Entity relationship extraction method and device, electronic equipment and computer readable medium
CN113822026A (en) * 2021-09-10 2021-12-21 神思电子技术股份有限公司 Multi-label entity labeling method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG Y 等: "TPLinker:Singlestage joint extraction of entities and relations through token pair linking", 《PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS》, 31 December 2020 (2020-12-31), pages 1572 - 1582 *
冯钧 等: "重叠实体关系抽取综述", 《计算机工程与应用》, vol. 58, no. 01, 15 November 2021 (2021-11-15), pages 1 - 11 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486189A (en) * 2021-06-08 2021-10-08 广州数说故事信息科技有限公司 Open knowledge graph mining method and system
CN113486189B (en) * 2021-06-08 2024-10-18 广州数说故事信息科技有限公司 Open knowledge graph mining method and system
CN114528418A (en) * 2022-04-24 2022-05-24 杭州同花顺数据开发有限公司 Text processing method, system and storage medium
CN114528418B (en) * 2022-04-24 2022-10-14 杭州同花顺数据开发有限公司 Text processing method, system and storage medium

Similar Documents

Publication Publication Date Title
CN113283551B (en) Training method and training device of multi-mode pre-training model and electronic equipment
WO2022142014A1 (en) Multi-modal information fusion-based text classification method, and related device thereof
CN108920666B (en) Semantic understanding-based searching method, system, electronic device and storage medium
WO2020232861A1 (en) Named entity recognition method, electronic device and storage medium
CN110516253B (en) Chinese spoken language semantic understanding method and system
CN110909548A (en) Chinese named entity recognition method and device and computer readable storage medium
CN111159485B (en) Tail entity linking method, device, server and storage medium
CN112559800B (en) Method, apparatus, electronic device, medium and product for processing video
CN112287069B (en) Information retrieval method and device based on voice semantics and computer equipment
CN113158656B (en) Ironic content recognition method, ironic content recognition device, electronic device, and storage medium
CN112613293B (en) Digest generation method, digest generation device, electronic equipment and storage medium
US10915756B2 (en) Method and apparatus for determining (raw) video materials for news
CN110619051A (en) Question and sentence classification method and device, electronic equipment and storage medium
CN111831902A (en) Recommendation reason screening method and device and electronic equipment
CN113590810A (en) Abstract generation model training method, abstract generation device and electronic equipment
CN114661881A (en) Event extraction method, device and equipment based on question-answering mode
CN110990627A (en) Knowledge graph construction method and device, electronic equipment and medium
WO2023029397A1 (en) Training data acquisition method, abnormal behavior recognition network training method and apparatus, computer device, storage medium, computer program and computer program product
CN113850291B (en) Text processing and model training method, device, equipment and storage medium
CN115114469B (en) Picture identification method, device, equipment and storage medium
CN114385812A (en) Relation extraction method and system for text
CN113297525A (en) Webpage classification method and device, electronic equipment and storage medium
CN113220824B (en) Data retrieval method, device, equipment and storage medium
CN115859112A (en) Model training method, recognition method, device, processing equipment and storage medium
CN114444609A (en) Data processing method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination