CN115309910A - Language piece element and element relation combined extraction method and knowledge graph construction method - Google Patents

Language piece element and element relation combined extraction method and knowledge graph construction method Download PDF

Info

Publication number
CN115309910A
CN115309910A CN202210859304.8A CN202210859304A CN115309910A CN 115309910 A CN115309910 A CN 115309910A CN 202210859304 A CN202210859304 A CN 202210859304A CN 115309910 A CN115309910 A CN 115309910A
Authority
CN
China
Prior art keywords
question
elements
relation
discussion
head
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210859304.8A
Other languages
Chinese (zh)
Other versions
CN115309910B (en
Inventor
刘杰
许妍欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Technology
Capital Normal University
Original Assignee
North China University of Technology
Capital Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Technology, Capital Normal University filed Critical North China University of Technology
Priority to CN202210859304.8A priority Critical patent/CN115309910B/en
Publication of CN115309910A publication Critical patent/CN115309910A/en
Application granted granted Critical
Publication of CN115309910B publication Critical patent/CN115309910B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The application belongs to the technical field of artificial intelligence, and particularly relates to a language element and element relation combined extraction method and a knowledge graph construction method, wherein the combined extraction method comprises the following steps: s10, aiming at a target discussion paper text, acquiring a preset element type problem template, and generating an element problem through the element type problem template; s20, inputting the element questions into a question and answer frame established in advance to obtain head elements; the question-answer framework is established based on machine reading understanding; s30, inputting the head element and the target discussion paper text into a pre-established element relation prediction model to obtain an element relation; s40, generating a relation problem based on the element relation and the head element; and S50, inputting the relational questions into a question-answer frame established in advance to obtain corresponding tail elements. The method of the application performs combined extraction on the language part elements and the element relations, and relieves error propagation; the limitation of the extraction area is avoided by multiple rounds of question answering.

Description

Language piece element and element relation combined extraction method and knowledge graph construction method
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to a method for jointly extracting factors and factor relations of discussion sentences based on multiple rounds of question answering.
Background
The knowledge map can structurally represent concepts, entities and relations of an objective world, and rich common sense knowledge is provided for intelligent applications such as a recommendation system and a question-answering system. However, in the composition automatic scoring method, the currently mainstream method evaluates only according to the text information of the current composition, and does not consider knowledge level information. Therefore, the composition knowledge graph is constructed, so that the composition evaluation system analyzes the composition on the knowledge level, and the composition evaluation system is a task with research significance.
In knowledge graph construction, knowledge extraction is a crucial step. The main purposes of the language part element extraction task and the element relation extraction task are to identify and extract language part element units in the discussion parliament and determine semantic connection relations between the element units, such as supporting relations between language part element points and data. Knowledge graph of the discussion papers is constructed through two tasks, and knowledge level information can be provided for automatic composition evaluation. However, the existing discussion knowledge graph construction method faces two problems:
firstly, performing language part element extraction and element relation extraction based on a deep learning method respectively, ignoring semantic interaction information between two tasks, and once an error is generated in the element extraction process, classifying and establishing element relations on the wrong elements so as to generate wrong element relations, namely error propagation;
secondly, when the key information is recognized from the text by adopting an entity recognition method, because the entity with specific meaning in the text is recognized by the entity, the entity mainly comprises a name of a person, a place name, a mechanism name, a proper noun and the like, and the sentence element recognizes the discussed sentence in the discussion paper, the words and the sentences are respectively taken as units, the extraction area of the discussion paper element is much larger than that of the entity, and the discussion paper element is even a paragraph composed of a plurality of sentences, such as the data of argumentation; therefore, the method in the entity recognition task cannot accurately recognize the parts of speech element.
The above problems are urgent to be solved in the task of extracting the language elements and the element relation.
Disclosure of Invention
Technical problem to be solved
In view of the above disadvantages and shortcomings of the prior art, the present application provides a multiple round question and answer based discussion language part element and element relation joint extraction method, knowledge graph construction method, device and medium.
(II) technical scheme
In order to achieve the purpose, the following technical scheme is adopted in the application:
in a first aspect, an embodiment of the present application provides a multiple round question and answer based method for jointly extracting elements and element relationships of a discussion language, where the method includes:
s10, aiming at a target discussion paper text, acquiring a preset element type problem template, and generating an element problem through the element type problem template;
s20, inputting the element questions into a question and answer framework established in advance to obtain head elements; the question-answer framework is established based on machine reading understanding;
s30, inputting the head element and the target discussion paper text into a pre-established element relation prediction model to obtain an element relation; wherein the element relation prediction model is established based on multiple classifiers;
s40, generating a relation problem based on the element relation and the head element;
and S50, inputting the relational question into a question-answer frame established in advance to obtain a corresponding tail element.
Optionally, before S10, establishing an element type question template is further included.
Optionally, S20 includes:
inputting the element problem and the target discussion paper text into BERT to obtain semantic representation based on the BERT;
inputting the semantic representation into a pre-established question-answer framework to obtain a plurality of question answers;
the head element is determined based on a preset question weight.
Optionally, inputting the semantic representation into a pre-established question-and-answer framework to obtain a plurality of answers to the questions, including:
carrying out BIOE label classification on the output ht of the hidden layer by using a softmax classification layer;
and identifying a segment from the position starting from B to the position ending from E as a question answer according to the marked hidden layer sequence.
Optionally, S30 includes:
characterizing context of BERT output
Figure BDA0003755630940000031
Characterization of head elements
Figure BDA0003755630940000032
Concatenation is performed as input to the element relationships:
Figure BDA0003755630940000033
wherein ,
Figure BDA0003755630940000034
for the purpose of the characterization of the head element,
Figure BDA0003755630940000035
characterized for context.
The input is passed through a softmax classifier to extract element e i Type of relationship with each element r k Probability of (c):
P r (label=r k |e i )=σ(W r ·l i +b r )
where σ (·) is a sigmoid function,
Figure BDA0003755630940000036
b r ∈R |R| ,d l is the dimension in which the element tag is embedded, | R | is the size of the element relationship set;
the element relationship type having a high score in the classifier is used as the element e i Corresponding element relationship.
Optionally, the relationship question includes a head element, a tail element type and an element relationship.
Optionally, in the model building process, the loss functions of the head language element, the tail language element and the relationship among the elements are subjected to combined optimization and share the training parameters on the BERT, and the average loss of each batch of samples
Figure BDA0003755630940000037
The calculation is as follows:
Figure BDA0003755630940000038
wherein ,
Figure BDA0003755630940000039
a loss function that is an element of the header language,
Figure BDA00037556309400000310
is a loss function of the last element of the tail language,
Figure BDA00037556309400000311
loss function being a relation between elements
In a second aspect, an embodiment of the present application provides a method for constructing a knowledge graph of a treatise based on multiple rounds of question and answer, where the method includes:
extracting head elements, tail elements and element relations by using the multiple rounds of question and answer based discussion language element and element relation joint extraction method according to any one of the first aspect;
and establishing a discussion knowledge graph by taking the head element, the element relation and the tail element as a triple.
In a second aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the computer program when executed by the processor implements the steps of the multiple question and answer based method for extracting elements and element relations of the discussion sentences according to any one of the first aspect.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for jointly extracting elements and element relations of discussion papers based on multiple rounds of question answering as described in any one of the above first aspects are implemented.
(III) advantageous effects
The beneficial effect of this application is: the application provides a multi-turn question and answer based discussion language element and element relation combined extraction method, a knowledge graph construction method, equipment and a medium, wherein the combined extraction method comprises the following steps: s10, aiming at a target discussion paper text, acquiring a preset element type problem template, and generating an element problem through the element type problem template; s20, inputting the element questions into a question and answer frame established in advance to obtain head elements; the question-answer framework is established based on machine reading understanding; s30, inputting the head element and the target discussion paper text into a pre-established element relation prediction model to obtain an element relation; s40, generating a relation problem based on the element relation and the head element; and S50, inputting the relational questions into a question-answer frame established in advance to obtain corresponding tail elements. The method performs multi-round combined extraction of questions and answers on the language element and the element relation, and relieves error propagation; the limitation of the extraction area is also avoided by multiple rounds of question answering.
Drawings
The application is described with the aid of the following figures:
FIG. 1 is a schematic flow chart illustrating a method for extracting elements and element relationships of an essay language based on multiple rounds of question answering in an embodiment of the present application;
FIG. 2 is a diagram of a framework of multiple rounds of question-answering for the joint extraction of elements and element relationships of an essay in another embodiment of the present application;
FIG. 3 is a schematic flow chart of a method for constructing a knowledge graph of a treatise based on multiple rounds of question answering in yet another embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to still another embodiment of the present application.
Detailed Description
For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings. It is to be understood that the following specific examples are illustrative of the invention only and are not to be construed as limiting the invention. In addition, it should be noted that, in the case of no conflict, the embodiments and features in the embodiments in the present application may be combined with each other; for convenience of description, only portions related to the invention are shown in the drawings.
Example one
Fig. 1 is a schematic flow chart of a method for jointly extracting elements and element relationships of a discussion language based on multiple rounds of questions and answers in an embodiment of the present application, as shown in fig. 1, the method for jointly extracting elements and element relationships of a discussion language based on multiple rounds of questions and answers in the embodiment includes:
s10, aiming at a target discussion paper text, acquiring a preset element type problem template, and generating an element problem through the element type problem template;
s20, inputting the element questions into a question and answer framework established in advance to obtain head elements; the question-answer framework is established based on machine reading understanding;
s30, inputting the head element and the target discussion paper text into a pre-established element relation prediction model to obtain an element relation; wherein the element relation prediction model
S40, generating a relation problem based on the element relation and the head element;
and S50, inputting the relational question into a question-answer frame established in advance to obtain a corresponding tail element.
The method for extracting the association of the elements and the element relations of the discussion language based on multiple rounds of question answering improves the performance of extracting the elements and the element relations of the language, combines two tasks to obtain semantic interaction information between the tasks, is beneficial to extracting the elements and the element relations of the language, and relieves error propagation; the question-answer-based combined method enlarges the extraction area and realizes the extraction of sentences and even paragraphs.
In order to better understand the present invention, the steps in the present embodiment are explained below.
The term element refers to an element of discussion in an discussion parliament, such as a point, an argument, etc., which may be a short text, such as a sentence or a clause, or a long text, such as a paragraph. The element relationship refers to a semantic connection relationship between two adjacent text segments or text segments with a span within a certain range within the same chapter, such as an expansion relationship, a demonstration relationship, and the like.
In this embodiment, the question template may be:
problem 1: find all of the element types mentioned in the article?
Problem 2: which element type is mentioned in the article?
Problem 3: which sentence is an { element type } element?
Three questions with the same semantics but different expressions can be generated according to the template filled in the extracted element types.
In this embodiment, the question-answer framework is established based on machine reading understanding. The question-answering framework is divided into a coding layer, an interaction layer and an output layer. The coding layer carries out semantic coding on articles and questions input by the model. The interaction layer establishes semantic relation between the articles and the questions, when the model answers the questions, the semantics of the articles and the questions are combined, and words and sentences with similar semantic codes of key words in the articles and the questions become key attention objects when the model answers the questions, so that answers of the questions are predicted. And the output layer generates answers according to the mode of task requirements, and constructs a reasonable loss function to facilitate the optimization of the model on a training set.
Example two
The execution subject of the embodiment may be a device for extracting the joint element relation of the discussion language, which may be composed of hardware and/or software, and may be generally integrated into a device having the function of extracting the joint element relation of the discussion language, and the device may include a memory and a processor, and may be, for example, a server. In other embodiments, the execution main body may also be other electronic devices that can implement the same or similar functions, and this embodiment is not limited thereto.
In this embodiment, the types of the parts of the discussion include central argument, point of partial argument, factual argument, theoretical argument, and conclusion, and there are relationship types between the parts: supplement, support and inference.
In this embodiment, a context sequence of length n is written as c = { c 1 ,c 2 ,...,c n And E is used for representing a predefined element type set, and R is used for representing a predefined element relation type set. Element and element relationship extraction aims to extract a set of elements, the set of elements e = { e = } 1 ,e 2 ,...,e m Corresponding to a set of element types y = { y = 1 ,y 2 ,...,y m And predicting each element pair (e) i ,e j ) Relation r of ij, wherein yi∈E and rij ∈R。
Fig. 2 is a diagram of a multi-round framework of questions and answers extracted by combining elements and element relations of the discussion language in another embodiment of the present application, as shown in fig. 2, the framework includes four parts: semantic representation, language piece element extraction, element relation prediction and relation problem generation based on Bert. The dashed lines represent the relational problem generation and the tail language piece element extraction. The following describes a specific implementation process of the present embodiment in detail.
S1, generating three problems for each element type by using a problem template.
And generating three questions for each element type by using the question template, wherein the three questions have the same meaning and different expression modes, and the same question can be explained from different angles to make the question clearer. For example, to identify a point element in a context, three semantically identical questions, but with different expressions, can be generated by a question template as follows:
problem 1: find all the points of discussion mentioned in the article?
Problem 2: which point of discourse is mentioned in the article?
Problem 3: which sentence is a point-of-separation element?
The problem in this embodiment provides external prior evidence, that is, the type of the speech piece element and the type of the element relationship, and the type of the speech piece element, the type of the element relationship and the context of the speech piece element in the problem enable the model to obtain more comprehensive and accurate semantic information, and the semantic information can be better captured through the interaction between the problem and the context.
And S2, inputting the connection problem and the context representation through the BERT to obtain the semantic representation based on the Bert.
Using pre-trained Bert to context sequence c = { c 1 ,c 2 ,...,c n Q and problem sequence qt = { qt } 1 ,qt 2 ,…,qt m And (4) semantically characterizing t e 1,2,3. The input to the model is a concatenation of the word insertions of context c and question q:
ht=Bert(CLS,qt,SEP,c) (1)
where CLS denotes a special mark and SEP denotes a delimiter.
By means of a multi-layer self-attention structure coding, bert outputs each question and context as ht = { ht = } 1 ,ht 2 ,...,ht n },
Figure BDA0003755630940000081
te 1,2,3 where d h Representing the dimension of the last hidden layer of Bert.
And S3, extracting the head element by answering a plurality of questions and selecting a best answer.
The utterance element extraction is performed by answering a particular question and calculating the weight of the answer to the question, the final answer being a representation of the utterance element type.
For a plurality of answers obtained by the question, the hidden layer output ht is subjected to BIOE label classification by using a softmax classification layer. Wherein B represents Begin, representing the beginning word of the element; i represents inside, and represents a middle word of an element; e represents End, representing the ending word of the element; o represents Other, indicating that it is not any element type. In the classification, the hidden layer sequences are all labeled as one of the most reasonable labels in "B, I, O, E". And identifying a segment starting from the position B to the position E as an answer of the question, namely a speech piece element according to the marked hidden layer sequence.
For each input x i The likelihood of each boundary label can be calculated as follows:
Figure BDA0003755630940000082
wherein ,
Figure BDA0003755630940000083
and
Figure BDA0003755630940000084
is a learning parameter, d b Is the size of the set of boundary labels B,
Figure BDA0003755630940000085
representing predicted boundary labels.
So that all elements e = { e } can be extracted from the sequence by identifying the boundary tags 1 ,e 2 ,...,e m }。
Then the answer corresponding to each question is obtained as a = { a = { (a) } 1 ,a 2 ,a 3}, wherein at ={a t1 ,a t1 ,...,a tn Is the boundary sequence obtained by the model (equation 2). In order to obtain correct answers from a plurality of answers, a weight W is set for each question t Weight W t Indicates a problem q t The weights are updated using an activation function. At the end of each training phase, the F1 score for the final triplet obtained using the answers to each question is calculated and the weights are updated as:
w t =σ(f t )*T (3)
where σ (·) is an sigmoid function, ft represents the F1 score for the T-th question, and T represents the total number of generated questions.
The higher the F1 score, the higher the weight, and thus the weight W t Indicates a problem q t The quality of (c). Based on the learned weight, carrying out weighted selection on the answer boundary sequences of a plurality of questions to obtain a final answer set
Figure BDA0003755630940000086
According to answer set a * And identifying the segment from the position beginning with B to the position ending with E, and deducing the extracted speech segment elements. Specifically, the boundary label of the ith input is selected as
Figure BDA0003755630940000091
And S4, predicting element relation.
Element relation prediction aims at identifying and extracting head element e i Set of most likely relationship types
Figure BDA0003755630940000092
In particular, the context of the Bert output is characterized
Figure BDA0003755630940000093
(q t Representing the t-th question), and characterization of the head element label
Figure BDA0003755630940000094
Concatenation is performed as input to the element relation prediction model:
Figure BDA0003755630940000095
wherein ,
Figure BDA0003755630940000096
initialization is performed by random sampling and fine tuning is performed during training. The input is then passed through a softmax classifier to extract element e i Type of relationship with each element r k Probability of e.g. R:
P r (label=r k |e i )=σ(W r ·l i +b r ) (6)
wherein ,Pr σ (-) is a sigmoid function, which is the probability that an element corresponds to each element relationship,
Figure BDA0003755630940000097
b r ∈R |R| ,d l is the dimension in which the element tag is embedded and | R | is the size of the element relationship set. Expression element e with high score in classifier i And (4) corresponding relation.
And extracting and predicting a possible relation type set of the head element through element relation to finally obtain a candidate relation set.
S5, generating a relation problem
And generating a relation problem based on the head element and the predicted element relation type, wherein the relation problem comprises the head element, the tail element type and the element relation.
Different from the problem of extracting the head element, the problem is considered to comprise a specific element sequence, the problem is formalized into a statement sentence of the element sequence and a question sentence comprising the tail element type and the element relation, and the reasonability of the problem is ensured. For example, in order to identify the tail element conclusion of reasoning about the corresponding relationship of the head element argument, three problems with the same semantics but different expressions are generated as follows:
problem 1 { head element } is a point of subtotal, finding the conclusion that the point of subtotal proposes;
question 2: { header element } is a point of subtopic, which conclusion is raised by the point of subtopic?
Question 3 { head element } is a point of subtopic, which conclusion is drawn by the point of subtopic?
S6, extracting tail elements
Repeating the steps S2-S3, namely inputting the connection problem and the context representation through the BERT to obtain the semantic representation based on the BERT; and extracting tail elements by answering a plurality of questions and selecting a best answer, and finally obtaining the discussion structure of the element-element relation-element.
Preferably, during the modeling process, the head language part element is processed
Figure BDA0003755630940000101
The tail language and the part of the sentence
Figure BDA0003755630940000102
And the relationship between the elements
Figure BDA0003755630940000103
And share the training parameter at Bert, average loss per batch (batch) sample
Figure BDA0003755630940000104
The calculation is as follows:
Figure BDA0003755630940000105
the method for extracting the association of the elements and the element relations of the discussion language based on multiple rounds of question answering extracts two independent tasks for the association extraction of the elements and the element relations, obtains semantic interaction information between the two tasks, and relieves error propagation; and the method adopts a multi-turn question-answer-based joint method to extract the language elements and element relations, captures more comprehensive semantic information through the interaction of the question and the context, and avoids the limitation of extraction areas. In addition, the method has good expansibility, answers of the questions can be in a word level or a sentence level, and the method can be suitable for the task of extracting the language part elements.
The method of the present embodiment is used to perform a test on a data set, and the test is compared with the results of other extraction methods to further illustrate the technical effects of the method of the present embodiment.
(1) Data set
The data set adopts the language material crawled from the foreign open data set ASAP with authority on the automatic evaluation task of the composition to collect the middle school discussion on the subjects of patience and computer prosperity. For each composition, determining the sentence elements and element relation labeling specifications of the Chinese English discussion, and specifically, the relation between each sentence element is shown in table 1. Table 1 is a table of relationships between language elements.
TABLE 1
Head element Tail element Element relationship
Central point of discourse Points of subdivision Supplement
Point of branch Theory of fact Fact support
Point of branch Theory of the way Reason support
Theory of fact Conclusion Inference
Theory of the way Conclusion Inference
Central point of discourse Conclusion Inference
The central argument (Major) sentence is what proves, namely the author's central assertion of the subject matter of the article, which leads to the whole text. The central argument is at most one.
The point of separation (Thesis) is complementary to and illustrative of the central point of discussion. The point of reference may be 0 to more.
The Fact argument (Fact argument) sentence is an example of what justification supports the argument. By way of example, the actual description and generalization of an objective thing includes specific examples, generalizing facts, statistics, personal experiences, and the like.
The Reason arrangement sentence is an example of what justification, supporting a point of argument. By way of introduction and theory, a question or an opinion is demonstrated to be correct or incorrect, including classical writing and authoritative statements (such as celebrities, etc.), principles of natural sciences, laws, formulas, and the like.
The conclusion (Result) sentence is an extension of the central argument, summarizing the entire text, calling for the argument herein.
The statistics of the annotated composition are carried out, and the data set comprises 3042 sentences with the language part element types. The data set is as follows 4: the scale of 1 is divided into a training set and a test set, and 20% of the training set is selected as a validation set.
(2) Evaluation index
Evaluation was performed using Precision (P), recall (R), F1-score (F1) and Accuracy. Precision represents the percentage of correct language element BIOE label and element relation type predicted by the method; recall represents the percentage of the element labels and element relationship types in the data set of the method. The F1 score is the harmonic mean of P and R.
(3) Parameter setting
The parameter setting comprises the following steps: debugging a multi-turn question-answering frame by using a Pythrch, embedding each composition by using BERTBase (cast), setting the maximum length of a sequence to be 350 words, setting the batch size to be 4, and setting the initial learning rate to be 5e-5; the embedded model is used to train the data and correct the superparameters droout values, best epoch, learning rate (1 e-3,1e-5,3e-5, 5e-5).
(4) Test results
Table 2 is an evaluation table of the joint extraction of language elements and element relations, and as shown in table 2, in comparison with representative methods ECAT, spike and PFN, the method based on multi-round question answering improves the F1 values of the language elements and element relations by 0.02 and 0.03 respectively in two tasks compared with PFN.
Table 3 is an evaluation table of a multi-round question-answering-based combination method on different subjects, experiments were performed on two subjects of a data set, and the experimental results are shown in table 3. Since the written manner of the english discussion paper is usually discussed with a certain subject as the center, the number and the content of the discussion of different subjects in the data set directly affect the experimental result. The F1 value error of the multi-turn question-answer based joint method on the two subject data sets is between 0.01 and 0.02, and the method has universality in multi-subject corpora.
TABLE 2
Figure BDA0003755630940000121
TABLE 3
Figure BDA0003755630940000122
EXAMPLE III
The second aspect of the application provides a method for constructing a discussion knowledge graph based on multiple rounds of question answering. Fig. 3 is a schematic flow chart of a method for constructing a knowledge graph of a treatise based on multiple rounds of question answering in another embodiment of the present application, as shown in fig. 3, the method includes:
s100, extracting head elements, tail elements and element relations by using a multiple round question and answer based discussion language element and element relation combined extraction method in any one of the embodiments;
and S200, establishing a discussion paper knowledge graph by taking the head element, the element relation and the tail element as triples.
For example, each element pair (e) i ,e j ) And the relation between r ij Formed triplet (e) i ,r ij ,e j ) The output may be formulated as head elements, element relationships, tail elements, respectively, e.g. (arguments, supports, arguments).
In the method for constructing a thesis knowledge graph based on multiple rounds of questions and answers provided by this embodiment, the thesis knowledge graph is established based on the step of the multiple rounds of questions and answers based discussion language element and element relation joint extraction method in the above method embodiment, and the implementation principle and technical effect are similar, which is not described in detail in this embodiment.
Example four
A third aspect of the present application provides, by way of example four, an electronic apparatus, including: the device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is executed by the processor to realize the steps of the multiple question-answering-based discussion language element and element relation joint extraction method in any one of the above embodiments.
Fig. 4 is a schematic structural diagram of an electronic device according to still another embodiment of the present application.
The electronic device shown in fig. 4 may include: at least one processor 101, at least one memory 102, at least one network interface 104, and other user interfaces 103. The various components in the electronic device are coupled together by a bus system 105. It is understood that the bus system 105 is used to enable communications among the components. The bus system 105 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 105 in fig. 4.
The user interface 103 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, trackball, or touch pad, among others.
It will be appreciated that the memory 102 in this embodiment may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), enhanced Synchronous SDRAM (ESDRAM), sync Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 62 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
In some embodiments, memory 102 stores elements, executable units or data structures, or a subset thereof, or an expanded set thereof: an operating system 1021 and application programs 1022.
The operating system 1021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application 1022 includes various applications for implementing various application services. Programs that implement methods in accordance with embodiments of the invention can be included in application 1022.
In the embodiment of the present invention, the processor 101 is configured to execute the method steps provided in the first aspect by calling a program or an instruction stored in the memory 102, which may be specifically a program or an instruction stored in the application 1022.
The method disclosed by the above embodiment of the present invention can be applied to the processor 101, or implemented by the processor 101. The processor 101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 101. The processor 101 described above may be a general purpose processor, a digital signal processor, an application specific integrated circuit, an off-the-shelf programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software elements in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in the memory 102, and the processor 101 reads the information in the memory 102 and completes the steps of the method in combination with the hardware thereof.
In addition, in combination with the multiple rounds of question and answer based joint extraction method for elements and element relationships of the discussion language, the embodiments of the present invention may provide a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the multiple rounds of question and answer based joint extraction method for elements and element relationships of the discussion language as in any one of the above method embodiments is implemented.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. The use of the terms first, second, third and the like are for convenience only and do not denote any order. These words are to be understood as part of the name of the component.
Furthermore, it should be noted that in the description of the present specification, the description of the term "one embodiment", "some embodiments", "examples", "specific examples" or "some examples", etc., means that a specific feature, structure, material or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, the claims should be construed to include preferred embodiments and all changes and modifications that fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention should also include such modifications and variations.

Claims (10)

1. A method for extracting elements and element relations of an interviewing literature based on multiple rounds of question answering is characterized by comprising the following steps:
s10, aiming at a target discussion paper text, acquiring a preset element type problem template, and generating an element problem through the element type problem template;
s20, inputting the element questions into a question and answer framework established in advance to obtain head elements; the question-answer framework is established based on machine reading understanding;
s30, inputting the head element and the target discussion paper text into a pre-established element relation prediction model to obtain an element relation; wherein the element relation prediction model is established based on multiple classifiers;
s40, generating a relation problem based on the element relation and the head element;
and S50, inputting the relational question into a question-answer frame established in advance to obtain a corresponding tail element.
2. The method for extracting elements and element relations of the discussion language based on multiple rounds of question and answer as claimed in claim 1, further comprising establishing an element type question template before S10.
3. The method for extracting elements and element relations of the discussion sentences based on multiple rounds of question answering according to claim 1, wherein S20 comprises:
inputting the element problem and the target discussion paper text into BERT to obtain semantic representation based on the BERT;
inputting the semantic representation into a pre-established question-answer framework to obtain a plurality of question answers;
the head element is determined based on a preset question weight.
4. The method for extracting element and element relationship of multiple rounds of question answering-based discussion sentences according to claim 3, wherein the step of inputting the semantic representation into a pre-established question answering frame to obtain multiple question answers comprises the following steps:
carrying out BIOE label classification on the output ht of the hidden layer by using a softmax classification layer;
and identifying a segment from the position beginning from B to the position ending from E according to the marked hidden layer sequence as a question answer.
5. The method for extracting elements and element relations of the discussion sentences based on multiple rounds of question answering according to claim 1, wherein S30 comprises:
characterizing context of BERT output
Figure FDA0003755630930000021
Characterization of head elements
Figure FDA0003755630930000022
Concatenation is performed as input to the element relationships:
Figure FDA0003755630930000023
wherein ,
Figure FDA0003755630930000024
for the purpose of the characterization of the head element,
Figure FDA0003755630930000025
characterized for context.
The input is passed through a softmax classifier to extract element e i Type of relation to each element r k Probability of (c):
P r (label=r k |e i )=σ(W r ·l i +b r )
where σ (·) is a sigmoid function,
Figure FDA0003755630930000026
br∈R |R| ,d l is the dimension in which the element tag is embedded, | R | is the size of the element relationship set;
the element relationship type having a high score in the classifier is used as the element e i Corresponding element relationship.
6. The method for extracting element relationship and element relationship of discussion sentences based on multiple rounds of questions and answers as claimed in claim 1, wherein the relationship questions comprise head element, tail element type and element relationship.
7. The method for extracting element relationship of discussion language based on multiple question answering according to claim 1, wherein in the process of model building, the loss functions of the head language element, the tail language element and the relationship between the elements are optimized in a combined mode and share the training parameters on BERT, and the average loss of each batch of samples is
Figure FDA0003755630930000027
The calculation is as follows:
Figure FDA0003755630930000028
wherein ,
Figure FDA0003755630930000029
is a loss function of the elements of the head language,
Figure FDA00037556309300000210
is a loss function of the last element of the tail language,
Figure FDA00037556309300000211
is a loss function of the relationship between the elements.
8. A method for constructing a thesis knowledge map based on multiple rounds of question answering is characterized by comprising the following steps:
extracting head elements, tail elements and element relations by the multi-turn question and answer based discussion language element and element relation joint extraction method of any one of claims 1 to 8;
and establishing a discussion paper knowledge graph by taking the head element, the element relation and the tail element as triples.
9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of the method for jointly extracting elements and element relations of the discussion sentence based on multiple rounds of question answering according to any one of the claims 1 to 8.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the multiple round question answering based joint extraction method of the elements and element relations of the discussion sentences according to any one of the above claims 1 to 8.
CN202210859304.8A 2022-07-20 2022-07-20 Language-text element and element relation joint extraction method and knowledge graph construction method Active CN115309910B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210859304.8A CN115309910B (en) 2022-07-20 2022-07-20 Language-text element and element relation joint extraction method and knowledge graph construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210859304.8A CN115309910B (en) 2022-07-20 2022-07-20 Language-text element and element relation joint extraction method and knowledge graph construction method

Publications (2)

Publication Number Publication Date
CN115309910A true CN115309910A (en) 2022-11-08
CN115309910B CN115309910B (en) 2023-05-16

Family

ID=83857121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210859304.8A Active CN115309910B (en) 2022-07-20 2022-07-20 Language-text element and element relation joint extraction method and knowledge graph construction method

Country Status (1)

Country Link
CN (1) CN115309910B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116384381A (en) * 2023-01-04 2023-07-04 深圳擎盾信息科技有限公司 Automatic contract element identification method and device based on knowledge graph
CN116384382A (en) * 2023-01-04 2023-07-04 深圳擎盾信息科技有限公司 Automatic long contract element identification method and device based on multi-round interaction

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001184351A (en) * 1999-12-27 2001-07-06 Toshiba Corp Document information extracting device and document sorting device
CN110210019A (en) * 2019-05-21 2019-09-06 四川大学 A kind of event argument abstracting method based on recurrent neural network
CN112464641A (en) * 2020-10-29 2021-03-09 平安科技(深圳)有限公司 BERT-based machine reading understanding method, device, equipment and storage medium
CN113590776A (en) * 2021-06-23 2021-11-02 北京百度网讯科技有限公司 Text processing method and device based on knowledge graph, electronic equipment and medium
CN114360677A (en) * 2021-12-16 2022-04-15 浙江大学 CT image report information extraction method and device based on multiple rounds of questions and answers, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001184351A (en) * 1999-12-27 2001-07-06 Toshiba Corp Document information extracting device and document sorting device
CN110210019A (en) * 2019-05-21 2019-09-06 四川大学 A kind of event argument abstracting method based on recurrent neural network
CN112464641A (en) * 2020-10-29 2021-03-09 平安科技(深圳)有限公司 BERT-based machine reading understanding method, device, equipment and storage medium
CN113590776A (en) * 2021-06-23 2021-11-02 北京百度网讯科技有限公司 Text processing method and device based on knowledge graph, electronic equipment and medium
CN114360677A (en) * 2021-12-16 2022-04-15 浙江大学 CT image report information extraction method and device based on multiple rounds of questions and answers, computer equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIA FU ET AL.: "Exploiting Named Entity Recognition via Pre-trained Language Model and Adversarial Training", 《IEEE XPLORE》 *
JIE LIU ET AL.: "XLNet For Knowledge Graph Completion", 《IEEE XPLORE》 *
宋东桓等: "英文科技论文摘要的语义特征词典构建", 《图书情报工作》 *
陈金菊等: "基于道路法规知识图谱的多轮自动问答研究", 《现代情报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116384381A (en) * 2023-01-04 2023-07-04 深圳擎盾信息科技有限公司 Automatic contract element identification method and device based on knowledge graph
CN116384382A (en) * 2023-01-04 2023-07-04 深圳擎盾信息科技有限公司 Automatic long contract element identification method and device based on multi-round interaction
CN116384382B (en) * 2023-01-04 2024-03-22 深圳擎盾信息科技有限公司 Automatic long contract element identification method and device based on multi-round interaction

Also Published As

Publication number Publication date
CN115309910B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN109344236B (en) Problem similarity calculation method based on multiple characteristics
CN111475623B (en) Case Information Semantic Retrieval Method and Device Based on Knowledge Graph
CN111639171A (en) Knowledge graph question-answering method and device
CN115309910A (en) Language piece element and element relation combined extraction method and knowledge graph construction method
AU2019253908B2 (en) Expert report editor
CN114218379B (en) Attribution method for question answering incapacity of intelligent question answering system
CN110968708A (en) Method and system for labeling education information resource attributes
CN112069815A (en) Answer selection method and device for idiom filling-in-blank question and computer equipment
CN111858896A (en) Knowledge base question-answering method based on deep learning
CN112966117A (en) Entity linking method
Celikyilmaz et al. A graph-based semi-supervised learning for question-answering
CN115859164A (en) Method and system for identifying and classifying building entities based on prompt
Atapattu et al. Automated extraction of semantic concepts from semi-structured data: Supporting computer-based education through the analysis of lecture notes
CN114238653A (en) Method for establishing, complementing and intelligently asking and answering knowledge graph of programming education
CN117540063A (en) Education field knowledge base searching optimization method and device based on problem generation
JP6942759B2 (en) Information processing equipment, programs and information processing methods
CN116910185A (en) Model training method, device, electronic equipment and readable storage medium
CN115878794A (en) Text classification-based candidate paragraph generation and multi-hop question answering method
CN113157932B (en) Metaphor calculation and device based on knowledge graph representation learning
CN114626463A (en) Language model training method, text matching method and related device
CN110633363B (en) Text entity recommendation method based on NLP and fuzzy multi-criterion decision
Singh et al. Computer Application for Assessing Subjective Answers using AI
Song et al. A hybrid model for community-oriented lexical simplification
CN116414965B (en) Initial dialogue content generation method, device, medium and computing equipment
CN113836306B (en) Composition automatic evaluation method, device and storage medium based on chapter component identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant