CN115309910B

CN115309910B - Language-text element and element relation joint extraction method and knowledge graph construction method

Info

Publication number: CN115309910B
Application number: CN202210859304.8A
Authority: CN
Inventors: 刘杰; 许妍欣
Original assignee: North China University of Technology; Capital Normal University
Current assignee: North China University of Technology; Capital Normal University
Priority date: 2022-07-20
Filing date: 2022-07-20
Publication date: 2023-05-16
Anticipated expiration: 2042-07-20
Also published as: CN115309910A

Abstract

The application belongs to the technical field of artificial intelligence, and particularly relates to a method for jointly extracting a language-fragment element and an element relation and a knowledge graph construction method, wherein the method for jointly extracting comprises the following steps: s10, aiming at a target treatise text, acquiring a preset element type problem template, and generating element problems through the element type problem template; s20, inputting the element questions into a pre-established question-answer frame to obtain head elements; the question-answering frame is established based on machine reading understanding; s30, inputting the head element and the target treatise text into a pre-established element relation prediction model to obtain an element relation; s40, generating a relation problem based on the element relation and the head element; s50, inputting the relation questions into a pre-established question-answer frame to obtain corresponding tail elements. According to the method, the language-fragment elements and element relations are extracted in a combined mode, so that error propagation is relieved; the limitation of the extraction area is avoided by multiple rounds of questions and answers.

Description

Language-text element and element relation joint extraction method and knowledge graph construction method

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a method for jointly extracting discussion sentence element and element relation based on multiple rounds of questions and answers.

Background

The knowledge graph can structurally represent the concepts, entities and relations of the objective world, and provides rich common sense knowledge for intelligent applications such as recommendation systems, question-answering systems and the like. However, in the automatic composition scoring method, the current mainstream method only evaluates according to the text information of the current composition, and knowledge level information is not considered. Therefore, the composition knowledge graph is constructed, so that a composition evaluation system analyzes the composition at the knowledge level, and the composition evaluation system is a task with research significance.

Knowledge extraction is a crucial step in knowledge graph construction. The main purpose of the language-part element extraction task and the element relation extraction task is to identify and extract language-part element units in the discussion article and determine the semantic connection relation between the element units, such as a supporting relation between the language-part element discussion and the discussion data. Knowledge-level information can be provided for automatic evaluation of composition by constructing the treatise knowledge graph through two tasks. However, the existing negligence paper knowledge graph construction method faces two problems:

firstly, extracting the phonetic element and extracting the element relation based on a deep learning method respectively, ignoring semantic interaction information between two tasks, and once errors are generated in the element extraction process, classifying the element relation on the error element so as to generate the error element relation, namely error propagation is generated;

secondly, when the key information is identified from the text by adopting an entity identification method, because the entity with specific meaning in the entity identification text mainly comprises a person name, a place name, a mechanism name, proper nouns and the like, and the sentence with discussion in the sentence is identified by the sentence element, the extraction area of the sentence element is much larger than the extraction area of the entity by taking words and sentences as units respectively, and the sentence element is even a paragraph formed by a plurality of sentences, such as a discussion document; thus, methods in the entity recognition task cannot accurately recognize the phonetic elements.

The above problems are the ones that need to be solved in the task of extracting the phonetic elements and the element relationships.

Disclosure of Invention

First, the technical problem to be solved

In view of the above-mentioned drawbacks and shortcomings of the prior art, the present application provides a method, a knowledge graph construction method, a device and a medium for extracting the combination of the elements and the element relationships of the discussion paper based on multiple questions and answers.

(II) technical scheme

In order to achieve the above purpose, the present application adopts the following technical scheme:

in a first aspect, an embodiment of the present application provides a method for jointly extracting an element and an element relationship in an agenda based on multiple questions and answers, where the method includes:

s10, aiming at a target treatise text, acquiring a preset element type problem template, and generating element problems through the element type problem template;

s20, inputting the element questions into a pre-established question-answer frame to obtain head elements; the question-answering frame is established based on machine reading understanding;

s30, inputting the head element and the target treatise text into a pre-established element relation prediction model to obtain an element relation; wherein the element relation prediction model is established based on a multi-classifier;

s40, generating a relation problem based on the element relation and the head element;

s50, inputting the relation questions into a pre-established question-answer frame to obtain corresponding tail elements.

Optionally, building an element type question template is also included before S10.

Optionally, S20 includes:

inputting the element questions and the target treatise text into BERT to obtain semantic representation based on BERT;

inputting the semantic representation into a pre-established question-answer frame to obtain a plurality of question answers;

the head element is determined based on a preset question weight.

Optionally, inputting the semantic representation into a pre-established question-answer framework to obtain a plurality of question answers, including:

performing BIOE tag classification on the hidden layer output ht by using a softmax classification layer;

and identifying fragments starting from the position B to the position E according to the marked hidden layer sequence, and taking the fragments as answers to the questions.

Optionally, S30 includes:

up and down Wen Biaozheng to output BERT

Characterization with head element->

The series connection is carried out, the connection is carried out,as input of element relationships:

wherein ,

for the characterization of head elements>

Is a contextual characterization.

The input is passed through a softmax classifier to extract element e _i Relationship type r with each element _k Probability of (2):

P _r (label＝r _k |e _i )＝σ(W ^r ·l _i +b ^r )

wherein, sigma (·) is an S-shaped function,

b ^r ∈R ^|R| ，d _l is the dimension in which the element tag is embedded, R is the size of the element relationship set;

element relation type with high score in classifier as element e _i Corresponding element relationships.

Optionally, the relationship problem includes a head element, a tail element type and an element relationship.

Optionally, in the model building process, the loss functions of the relations among the head language element, the tail language element and the elements are combined and optimized, and training parameters on BERT are shared, and the average loss of each batch of samples

The calculation is as follows:

wherein ,

For the loss function of the head speech element, +.>

As a loss function of the tail speech element,

loss function as relationship between elements

In a second aspect, an embodiment of the present application provides a method for constructing a knowledge graph of an treatise on the basis of multiple rounds of questions and answers, where the method includes:

extracting head elements, tail elements and element relations by the multi-round question-and-answer based discussion article element and element relation joint extraction method according to any one of the first aspect;

and taking the head element, the element relation and the tail element as triples, and establishing an treatise knowledge graph.

In a second aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the multiple round question and answer based method for joint extraction of components and component relationships of an agenda according to any of the first aspects above.

In a third aspect, embodiments of the present application provide a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the steps of the multiple-round question-and-answer based method for jointly extracting the elements and element relationships of the discussion language according to any one of the first aspect above.

(III) beneficial effects

The beneficial effects of this application are: the application provides a method for jointly extracting elements and element relations of a discussion paper based on multiple rounds of questions and answers, a method for constructing a knowledge graph, equipment and a medium, wherein the method for jointly extracting comprises the following steps: s10, aiming at a target treatise text, acquiring a preset element type problem template, and generating element problems through the element type problem template; s20, inputting the element questions into a pre-established question-answer frame to obtain head elements; the question-answering frame is established based on machine reading understanding; s30, inputting the head element and the target treatise text into a pre-established element relation prediction model to obtain an element relation; s40, generating a relation problem based on the element relation and the head element; s50, inputting the relation questions into a pre-established question-answer frame to obtain corresponding tail elements. According to the method, the language-part elements and element relations are subjected to multi-round question-answering combined extraction, so that error propagation is relieved; the limitation of the extraction area is also avoided by multiple rounds of questions and answers.

Drawings

The application is described with the aid of the following figures:

FIG. 1 is a flow chart of a method for extracting the combination of elements and element relationships in a multiple-round question-and-answer based discussion language in an embodiment of the present application;

FIG. 2 is a diagram of a multi-round question-answering framework for the joint extraction of the elements and element relationships of the discussion paper in another embodiment of the present application;

FIG. 3 is a flowchart of a method for constructing a knowledge graph of an agenda based on multiple rounds of questions and answers according to another embodiment of the present application;

fig. 4 is a schematic architecture diagram of an electronic device according to another embodiment of the present application.

Detailed Description

The invention will be better explained by the following detailed description of the embodiments with reference to the drawings. It is to be understood that the specific embodiments described below are merely illustrative of the related invention, and not restrictive of the invention. In addition, it should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other; for convenience of description, only parts related to the invention are shown in the drawings.

Example 1

Fig. 1 is a flow chart of a method for jointly extracting the elements and the element relationships of the discussion paper based on multiple rounds of questions and answers in an embodiment of the present application, as shown in fig. 1, the method for jointly extracting the elements and the element relationships of the discussion paper based on multiple rounds of questions and answers in the embodiment includes:

s30, inputting the head element and the target treatise text into a pre-established element relation prediction model to obtain an element relation; wherein the element relation prediction model

According to the method for extracting the sentence elements and the element relations in the combined mode based on the multiple questions and answers, the performance of extracting the sentence elements and the element relations is improved, semantic interaction information between tasks is obtained by combining the two tasks, extraction of the sentence elements and the element relations is facilitated, and error propagation is relieved; the question-answer based combined method expands the extraction area and realizes the extraction of sentences and even paragraphs.

In order to better understand the present invention, each step in this embodiment is explained below.

The phrase element refers to an agenda element in an agenda, such as a discussion point, a discussion document, etc., and may be either short text, such as a sentence or clause, or long text, such as a paragraph. The element relation refers to a semantic connection relation, such as an expansion relation, a proof relation, etc., between two adjacent text segments or text segments which span a certain range within the same chapter.

In this embodiment, the problem template may be:

problem 1: find all { element types }?

Problem 2: which { element type }?

Problem 3: which sentence is an { element type } element?

Three questions with the same semantics but different expressions can be generated according to the types of the elements extracted by the template filling.

In this embodiment, the question-answering frame is a question-answering frame established based on machine reading understanding. The question-answering framework is divided into a coding layer, an interaction layer and an output layer. The coding layer performs semantic coding on articles and questions input by the model. The interaction layer establishes semantic relation between the article and the question, the semantic meaning of the article and the question is combined when the model answers the question, and words and sentences with similar keyword meaning codes in the article and the question become important attention objects when the model answers the question, so that answers of the question are predicted. The output layer generates answers according to the task requirements, and constructs reasonable loss functions so as to facilitate optimization of the model on the training set.

Example two

The execution subject of this embodiment may be an agreed-upon-term-part-and-part-relation joint extraction apparatus, which may be composed of hardware and/or software, and may be generally integrated in a device having an agreed-upon-part-and-part-relation joint extraction function, which may include a memory and a processor, for example, may be a server. In other embodiments, the execution subject may be other electronic devices that can perform the same or similar functions, which the present embodiment is not limited to.

In this embodiment, the types of the elements of the treatises are divided into a central discussion point, a minute point, a fact theory, a theory and a conclusion, and the elements have a relationship type: supplementing, supporting and deducing.

In this embodiment, a context sequence of length n is written as c= { c ₁ ,c ₂ ,...,c _n -representing a predefined set of element types using E and a predefined set of element relationship types using R. Element and element relationship extraction aims to extract a set of elements,the set of elements e= { e ₁ ,e ₂ ,...,e _m A set of element types y= { y }, corresponding to ₁ ,y ₂ ,...,y _m And predicts each element pair (e _i ,e _j ) Relation r of (2) _ij, wherein y_i∈E and r_ij ∈R。

FIG. 2 is a diagram of a multi-round question-answering framework for the joint extraction of the elements and element relationships in the discussion language in another embodiment of the present application, and as shown in FIG. 2, the framework includes four parts: based on the semantic representation of Bert, the extraction of language-fragment elements, the prediction of element relation, and the generation of relation problems. The dashed lines represent the relationship problem generation and tail speech element extraction. The specific implementation procedure of the present embodiment is described in detail below.

S1, generating three questions for each element type by using a question template.

Three questions are generated for each element type by using the question template, the meanings of the three questions are the same, the expression modes are different, and the questions can be more clear by explaining the same questions from different angles. For example, to identify punctuation elements in a context, three questions of the same semantics but different expressions can be generated from the question template, as follows:

problem 1: find all the arguments mentioned in the article?

Problem 2: what minute point is mentioned in the article?

Problem 3: which sentence is the punctuation element?

In the embodiment, the problem provides external priori evidence, namely the type of the phonetic element and the type of the element relation, the type of the phonetic element, the type of the element relation and the context of the phonetic element in the problem enable the model to obtain more comprehensive and more accurate semantic information, and the semantic information can be better captured through interaction of the problem and the context.

S2, inputting the connection problem and the context representation through the BERT, and obtaining the semantic representation based on the Bert.

Using pretrained Bert vs. context sequence c= { c ₁ ,c ₂ ,...,c _n Sequence of sum questions qt= { qt ₁ ,qt ₂ ,…,qt _m }，t∈1，23, semantic characterization is carried out. The input of the model is a concatenation of word embedding of context c and problem q:

ht＝Bert(CLS，qt，SEP，c) (1)

wherein CLS denotes a special tag and SEP denotes a separator.

With multi-layer self-attention structure coding, bert outputs each question and context as ht= { ht ₁ ,ht ₂ ,...,ht _n }，

t.epsilon.1, 2,3 where d _h Representing the dimension of the last hidden layer of Bert.

S3, extracting head elements by answering a plurality of questions and selecting an optimal answer.

The phonetic element extraction is performed by answering a specific question and calculating the weight of the answer to the question, and the final answer is a representation of the phonetic element type.

And carrying out BIOE label classification on the hidden layer output ht by using a softmax classification layer for a plurality of answers obtained by the questions. Wherein B represents Begin, representing the beginning word of the element; i represents an identifier, which represents an intermediate word of an element; e represents End, the ending word representing the element; o represents Other, which means not any element type. In the classification, the hidden layer sequence is labeled as one of the most reasonable labels in "B, I, O, E". And identifying fragments starting from the position B to the position E according to the marked hidden layer sequence, and taking the fragments as answers of the questions, namely the language-fragment elements.

For each input x _i The likelihood of each boundary tag may be calculated as follows:

wherein ,

and />

Is a learning parameter d _b Is the size of the boundary tag set B, +.>

Representing predicted boundary tags.

So that all elements e= { e can be extracted from the sequence by identifying the boundary tags ₁ ,e ₂ ,...,e _m }。

Then the answer corresponding to each question is obtained as a= { a ₁ ,a ₂ ,a ₃}, wherein a_t ＝{a _t1 ，a _t1 ，...，a _tn The sequence of boundaries obtained by the model (equation 2). To obtain a correct answer from a plurality of answers, a weight W is set for each question _t Weight W _t Indicating problem q _t The weights are updated using an activation function. At the end of each training phase, the F1 score of the final triplet is calculated using the answer to each question and the weights are updated to:

w _t ＝σ(f _t )*T (3)

where σ (·) is an S-shaped function, ft represents the F1 score for the T-th problem, and T represents the total number of generated problems.

The higher the F1 score, the higher the weight, and thus the weight W _t Indicating problem q _t Is a mass of (3). Weighting and selecting answer boundary sequences of a plurality of questions based on the learned weights to obtain a final answer set

According to answer set a ^* Fragments beginning with B and ending with E are identified, and extracted phonetic elements are deduced. Specifically, the boundary tag of the ith input is selected to be +.>

S4, predicting element relation.

Element relationPrediction aims at identifying the extraction head element e _i Most likely set of relationship types

Specifically, the up and down Wen Biaozheng of the Bert output are +.>

(q _t Representing the t-th question), characterization of head element tag +.>

And (3) performing series connection as an input of an element relation prediction model:

wherein ,

initialization is done by random sampling and fine tuning is done during training. The input is then passed through a softmax classifier to extract element e _i Relationship type r with each element _k Probability of e R:

P _r (label＝r _k |e _i )＝σ(W ^r ·l _i +b ^r ) (6)

wherein ,P_r For the probability that an element corresponds to each element relationship, σ (·) is an S-shaped function,

b ^r ∈R ^|R| ，d _l is the dimension in which the element tag is embedded and R is the size of the element relationship set. High scoring expression element e in classifier _i Corresponding relation.

And extracting a possible relation type set of the predicted head element through the element relation to finally obtain a candidate relation set.

S5, generating relation problems

A relationship question is generated based on the head element and the predicted element relationship type, and the relationship question includes the head element, the tail element type and the element relationship.

Different from the problem when the head element is extracted, the problem is considered to comprise a specific element sequence, and the problem is formed into a statement sentence of the element sequence and a question sentence containing the tail element type and the element relation, so that the rationality of the problem is ensured. For example, in order to identify the tail element conclusion of the head element argument correspondence reasoning, three problems with the same semantics but different expressions are generated as follows:

question 1 { head element } is a point of discussion, finding the conclusion presented by the point of discussion;

question 2 { head element } is the point of discussion from which is the conclusion presented?

Question 3 { head element } is a point of discussion, which is the point of discussion to suggest?

S6, tail element extraction

Repeating the steps S2-S3, namely inputting the connection problem and the context representation through the BERT to obtain semantic representation based on the Bert; the discussion structure of element-element relation-element is finally obtained by extracting the tail element by answering a plurality of questions and selecting one best answer.

Preferably, during the model building process, the head phonetic element is used for the model building process

Tail word element->

Relationship between elements->

Is optimized in combination and shares training parameters at Bert, average loss per batch (batch size) of samples +.>

The calculation is as follows:

according to the method for extracting the combination of the sentence elements and the element relations based on the multiple questions and answers, the sentence elements and the element relations are extracted to obtain two independent tasks for combined extraction, semantic interaction information between the two tasks is obtained, and error propagation is relieved; and extracting the language-text elements and element relations by using a multi-round question-answer based combined method, capturing more comprehensive semantic information through the interaction of the questions and the contexts, and avoiding the limitation of an extraction area. In addition, the method has good expansibility, answers to questions can be word level or sentence level, and the method can be suitable for a speech element extraction task.

The following adopts the method of this embodiment to test the data set, and compares with the results of other extraction methods to further illustrate the technical effects of the method of this embodiment.

(1) Data set

The data set adopts a foreign public data set ASAP with authority on the automatic composition evaluation task to crawl corpus, and collects middle school treatises with the subjects of patience and computer profits and cheats. For each composition, the phonetic elements and element relationship labeling specifications of the Chinese and English treatises are determined, and specifically, the relationship between each phonetic element is shown in Table 1. Table 1 is a table of relationships between the elements of the speech.

TABLE 1

Head element	Tail element	Element relation
			Center discussion point	Point of discussion	Supplement and supplement
Point of discussion	Theory of facts	Facts support
			Point of discussion	Theory of	Reason support
Theory of facts	Conclusion(s)	Inference
			Theory of	Conclusion(s)	Inference
Center discussion point	Conclusion(s)	Inference

The center argument (Major) sentence is what is proved, i.e. what the author claimed the center of the article topic, and the entire text is collected. There is at most one central argument.

The partials (Thesis) is a supplement and illustration of the central partials. The number of the arguments may be 0 to more.

The Fact theory (Fact term) sentence is an example of what proof to support the discussion. For example, a true description and generalization of an objective thing, including a specific instance, a generalized fact, statistics, in-person experience, etc.

The theory (reflection term) is an example of what proof, supportive arguments. Citations and theory, which demonstrate that a problem or view is correct or incorrect, include classical works and authoritative language (e.g., celebrity's dialect, etc.), principles of natural science, laws, formulas, etc.

The conclusion (Result) sentence is an extension of the central argument and summaries the whole text, in concert with the arguments herein.

And counting the marked composition, wherein the data set contains 3042 sentences with the types of the language-fragment elements. The dataset was read as per 4: the scale of 1 is divided into a training set and a test set, and 20% of the training set is selected as a validation set.

(2) Evaluation index

Evaluation was performed using Precision (P), recall (R), F1-score (F1) and Accuracy. Precision means the percentage of the method that predicts the correct bible element BIOE tags and element relationship types; recall represents the percentage of the element tags and element relationship types in the dataset of the method. The F1 score is the harmonic mean of P and R.

(3) Parameter setting

The parameter settings include: debugging a multi-round question-answering framework by using Pytorch, embedding each composition by using BERTbase (cased), setting the maximum length of the sequence as 350 words, setting the batch size as 4, and setting the initial learning rate as 5e-5; the embedded model training data is used and the super parameters dropout values, best epoch, learning rate (1 e-3,1e-5,3e-5,5 e-5) are modified.

(4) Test results

Table 2 is an evaluation table for combined extraction of the phonetic element and the element relation, and as shown in Table 2, the evaluation table is compared with typical methods ECAT, SPert and PFN, and on the basis of the phonetic element and the element relation extraction task, the F1 value of the method based on multiple questions and answers is respectively improved by 0.02 and 0.03 compared with that of the PFN on two tasks.

Table 3 is an evaluation table based on a combination of multiple questions and answers on different subjects, and experiments were performed on two subjects of the dataset, and the experimental results are shown in table 3. Since the english treatise writing is usually discussed centering on a certain topic, the number of different topics in the dataset and the content of the discussion directly affect the experimental results. The F1 value error of the multi-round question-answering based combined method on two topic data sets is between 0.01 and 0.02, and the method has universality in multi-topic corpus.

TABLE 2

TABLE 3 Table 3

Example III

The second aspect of the application provides a method for constructing a thesis knowledge graph based on multiple rounds of questions and answers. Fig. 3 is a schematic flow chart of a method for constructing a knowledge graph of an treatise on the basis of multiple rounds of questions and answers in another embodiment of the present application, as shown in fig. 3, the method includes:

s100, extracting head elements, tail elements and element relations by the multi-round question and answer based discussion article and element relation joint extraction method according to any one of the embodiments;

s200, taking the head element, the element relation and the tail element as triples, and establishing an treatise knowledge graph.

For example, each element pair (e _i ,e _j ) Relation r between _ij The triplet (e _i ,r _ij ,e _j ) The output may be formed as a head element, an element relation, and a tail element, respectively, for example (discussion, support, discussion).

The method for constructing the treatise knowledge graph based on multiple rounds of questions and answers provided in the embodiment of the present invention establishes the treatise knowledge graph based on the steps of the method for jointly extracting the elements and the element relationships of the treatise in the multiple rounds of questions and answers, and its implementation principle and technical effect are similar, and the embodiment will not be repeated.

Example IV

A third aspect of the present application provides, by way of an embodiment four, an electronic device, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of the multiple round question and answer based method for joint extraction of procedural elements and element relationships of an issue, as described in any of the above embodiments.

The electronic device shown in fig. 4 may include: at least one processor 101, at least one memory 102, at least one network interface 104, and other user interfaces 103. The various components in the electronic device are coupled together by a bus system 105. It is understood that the bus system 105 is used to enable connected communications between these components. The bus system 105 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 105 in fig. 4.

The user interface 103 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, a trackball (trackball), or a touch pad, etc.).

It will be appreciated that the memory 102 in this embodiment may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DRRAM). The memory 62 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some implementations, the memory 102 stores the following elements, executable units or data structures, or a subset thereof, or an extended set thereof: an operating system 1021, and application programs 1022.

The operating system 1021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. Applications 1022 include various applications for implementing various application services. A program for implementing the method of the embodiment of the present invention may be included in the application program 1022.

In an embodiment of the present invention, the processor 101 is configured to execute the method steps provided in the first aspect by calling a program or an instruction stored in the memory 102, specifically, a program or an instruction stored in the application 1022.

The method disclosed in the above embodiment of the present invention may be applied to the processor 101 or implemented by the processor 101. The processor 101 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 101 or instructions in the form of software. The processor 101 described above may be a general purpose processor, a digital signal processor, an application specific integrated circuit, an off-the-shelf programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software elements in a decoding processor. The software elements may be located in a random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 102, and the processor 101 reads information in the memory 102, and in combination with its hardware, performs the steps of the method described above.

In addition, in combination with the method for jointly extracting the components and the element relationships of the discussion paper based on multiple questions and answers in the above embodiment, the embodiment of the invention can provide a computer readable storage medium, and the computer readable storage medium stores a computer program, which when executed by a processor, implements the method for jointly extracting the components and the element relationships of the discussion paper based on multiple questions and answers in any one of the embodiments of the method.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. The use of the terms first, second, third, etc. are for convenience of description only and do not denote any order. These terms may be understood as part of the component name.

Furthermore, it should be noted that in the description of the present specification, the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to a specific feature, structure, material, or characteristic described in connection with the embodiment or example being included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art upon learning the basic inventive concepts. Therefore, the appended claims should be construed to include preferred embodiments and all such variations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, the present invention should also include such modifications and variations provided that they come within the scope of the following claims and their equivalents.

Claims

1. The method for extracting the combination of the elements and the element relations of the discussion paper based on multiple rounds of questions and answers is characterized by comprising the following steps:

s30 includes:

up and down Wen Biaozheng to output BERT

Characterization with head element->

Serial connection is performed as an input of the element relation:

wherein ,

for the characterization of head elements>

Upper and lower Wen Biaozheng;

P _r (label＝r _k |e _i )＝σ(W ^r ·l _i +b ^r )

wherein, sigma (·) is an S-shaped function,

element relation type with high score in classifier as element e _i Corresponding element relationships;

2. The multi-round question-answering based agenda article element and element relationship joint extraction method according to claim 1, further comprising creating an element type question template prior to S10.

3. The method for extracting the combination of the elements and the element relationships in the multiple-round question-answering based discussion language according to claim 1, wherein S20 includes:

the head element is determined based on a preset question weight.

4. A multi-round question-answering based agenda term element and element relationship joint extraction method according to claim 3, wherein inputting the semantic representation into a pre-established question-answering framework to obtain a plurality of question answers comprises:

using a softmax classification layer to classify the hidden layer output ht, wherein the classified labels comprise the beginning words of the elements, the middle words of the elements, the ending words of the elements or no element types;

5. The multi-round question-answering based discussion article element and element relation joint extraction method according to claim 1, wherein the relation questions comprise head element, tail element type and element relation.

6. The method for extracting the combination of the elements and the element relationships of the discussion paper based on multiple questions and answers as claimed in claim 1, wherein in the model establishment process, the loss functions of the relationships among the elements and the elements of the head language, the tail language and the elements are combined and optimized, and the training parameters on BERT are shared, and the average loss of each batch of samples is calculated

The calculation is as follows:

/>

wherein ,

for the loss function of the head speech element, +.>

For the loss function of the tail speech element, +.>

Is a loss function of the relationship between the elements.

7. The method for constructing the knowledge graph of the treatise on the basis of multiple rounds of questions and answers is characterized by comprising the following steps:

extracting head elements, tail elements and element relations by a multi-round question and answer based discussion language and element relation joint extraction method according to any one of claims 1 to 6;

8. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the multi-round question and answer based method of joint extraction of elements and element relationships of an agenda according to any one of claims 1 to 6.

9. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the multiple question and answer based method for joint extraction of components and component relationships of a discussion language according to any of the preceding claims 1 to 6.