CN117077656A

CN117077656A - Demonstration relation mining method and device, medium and electronic equipment

Info

Publication number: CN117077656A
Application number: CN202311233294.8A
Authority: CN
Inventors: 罗云; 杨振; 孟凡东
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-09-22
Filing date: 2023-09-22
Publication date: 2023-11-17
Anticipated expiration: 2043-09-22
Also published as: CN117077656B

Abstract

The application belongs to the technical field of artificial intelligence, and particularly relates to a demonstration relation mining method, a demonstration relation mining device, a computer readable medium, electronic equipment and a computer program product. The demonstration relation mining method comprises the following steps: acquiring text data needing to mine a demonstration relation, wherein the text data comprises a plurality of propositions which are distributed continuously, and the demonstration relation comprises an objection relation, a support relation or an irrelevant relation existing between two propositions; extracting features of the proposition sentences to obtain first semantic features of the proposition sentences; carrying out fusion processing on first semantic features of a plurality of propositions which are continuously distributed to obtain second semantic features of the propositions; and classifying the two propositions according to the first semantic features and the second semantic features to obtain the demonstration relation of the two propositions. The application can improve the accuracy of demonstration relation mining.

Description

Demonstration relation mining method and device, medium and electronic equipment

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a demonstration relation mining method, a demonstration relation mining device, a computer readable medium, electronic equipment and a computer program product.

Background

The argumentation relationship refers to a logical argumentation structure that exists between different text sentences, e.g. the content perspective of one text sentence may be supported or opposed to the content perspective representation of another text sentence. By mining the demonstration relation existing in the text, the meaning of the text content can be understood, and the capability of the computer for understanding human language is improved. However, because of the large amount of interference information in the text, which is irrelevant to the logic demonstration structure, the problem of poor accuracy of the demonstration relation mining scheme is generally caused.

Disclosure of Invention

The application provides a demonstration relation mining method, a demonstration relation mining device, a computer readable medium, electronic equipment and a computer program product, and aims to improve the accuracy of demonstration relation mining.

Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application.

According to an aspect of an embodiment of the present application, there is provided a arguments relation mining method, including: acquiring text data needing to mine a demonstration relation, wherein the text data comprises a plurality of propositions which are distributed continuously, and the demonstration relation comprises an objection relation, a support relation or an irrelevant relation existing between two propositions; extracting features of the proposition sentences to obtain first semantic features of the proposition sentences; carrying out fusion processing on first semantic features of a plurality of propositions which are continuously distributed to obtain second semantic features of the propositions; and classifying the two propositions according to the first semantic features and the second semantic features to obtain the demonstration relation of the two propositions.

According to an aspect of an embodiment of the present application, there is provided a arguments relation mining apparatus including:

the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is configured to acquire text data needing to mine a demonstration relation, the text data comprises a plurality of propositions sentences which are distributed continuously, and the demonstration relation comprises an objection relation, a support relation or an independent relation between two propositions sentences;

the extraction module is configured to perform feature extraction on the proposition sentence to obtain a first semantic feature of the proposition sentence;

the fusion module is configured to fuse the first semantic features of a plurality of propositions which are distributed continuously to obtain the second semantic features of the propositions;

the classification module is configured to classify the two propositions according to the first semantic features and the second semantic features to obtain the arguments of the two propositions.

In some embodiments of the present application, based on the above technical solution, the classification module further includes:

a feature fusion module configured to fuse the first semantic feature and the second semantic feature of the proposition sentence to obtain a third semantic feature of the proposition sentence;

The feature mapping module is configured to map the third semantic features of the two propositions according to a preset activation function to obtain the classification probability of the two propositions, wherein the classification probability comprises the probability of classifying the argumentation relationship of the two propositions into an objection relationship, a support relationship or an independent relationship;

and the relation determining module is configured to determine the demonstration relation of the two propositional sentences according to the maximum value of the classification probability.

In some embodiments of the present application, based on the above technical solution, when the arguments of the two propositions are irrelevant, the maximum value of the classification probability and the feature similarity of the two propositions are in positive correlation; when the demonstration relation of the two propositional sentences is an objection relation or a supporting relation, the maximum value of the classification probability and the feature similarity of the two propositional sentences are in a negative correlation relation.

In some embodiments of the present application, based on the above technical solution, when the arguments of the two propositions are irrelevant, the degree of influence of the feature similarity of the two propositions on the classification probability and the position distance of the two propositions are in a negative correlation; when the demonstration relation of the two propositions is an objection relation or a support relation, the influence degree of the feature similarity of the two propositions on the classification probability and the position distance of the two propositions are in positive correlation.

In some embodiments of the present application, based on the above technical solutions, the apparatus further includes:

a model acquisition module configured to acquire a arguments relation mining model for mining arguments relation for text data, the arguments relation mining model including an encoder for feature extraction of propositions sentences and a classifier for classification processing of two propositions sentences;

a sample acquisition module configured to acquire a text data sample for training the thesaurus relation mining model, the text data sample including a plurality of proposition sentence samples distributed consecutively, and a sample tag for representing a arguments relation between two proposition sentence samples;

a first error determination module configured to obtain sample features obtained by feature extraction of the proposition sentence samples by the encoder, and determine a first loss error of two proposition sentence samples according to the sample features and the sample labels;

a second error determining module configured to obtain a sample distribution probability obtained by classifying the two proposition sentence samples according to the sample characteristics by the classifier, and determine a second loss error of the two proposition sentence samples according to the sample distribution probability and the sample label;

A parameter updating module configured to update model parameters of the discussion relation mining model according to the first loss error and the second loss error.

a masking module configured to randomly mask words or sentences in the text data samples.

In some embodiments of the present application, based on the above technical solution, the shielding module is further configured to: identifying, from the text data sample, a tagged term associated with the treaty relationship; randomly shielding the marking words in the text data sample according to a first preset probability; randomly shielding the propositional sentence sample in the text data sample according to a second preset probability.

In some embodiments of the present application, based on the above technical solution, the first error determining module is further configured to: determining the feature similarity of two proposition sentence samples according to the sample features; and determining a first loss error of the two propositional sentence samples according to the feature similarity and the sample label.

In some embodiments of the present application, based on the above technical solution, when the sample label is irrelevant, the first loss error and the feature similarity of the two propositional sentence samples are in positive correlation; when the sample labels are in an anti-relation or support relation, the first loss error and the feature similarity of the two propositional sentence samples are in a negative correlation relation.

In some embodiments of the present application, based on the above technical solution, when the sample label is irrelevant, the degree of influence of the feature similarity of the two proposition sentence samples on the first loss error and the position distances of the two proposition sentence samples are in a negative correlation relationship; when the sample labels are in an anti-relation or supporting relation, the influence degree of the feature similarity of the two proposition sentence samples on the first loss error and the position distance of the two proposition sentence samples are in positive correlation.

In some embodiments of the present application, based on the above technical solution, the classifier includes an input layer, a hidden layer, and an output layer connected in sequence; the second error determination module is further configured to: obtaining an input vector obtained by embedding and encoding sample characteristics of the two proposition sentence samples by the input layer; obtaining an intermediate vector obtained by normalizing the input vector by the hidden layer; and obtaining sample distribution probability obtained by the output layer through classification mapping processing of the intermediate vector.

In some embodiments of the present application, based on the above technical solution, the parameter updating module is further configured to: carrying out weighted summation on the first loss error and the second loss error according to preset weights to obtain an error value of the discussion relation mining model; back-propagating the error value in the discussion relation mining model to obtain error gradients of various model parameters; and updating the model parameters according to the error gradient.

According to an aspect of the embodiments of the present application, there is provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the argumentation relation mining method as in the above technical solutions.

According to an aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions to implement the argumentation relation mining method as in the above technical solutions.

According to an aspect of the embodiments of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the argumentation relation mining method as in the above technical solution.

In the technical scheme provided by the embodiment of the application, the first semantic features are obtained by carrying out feature extraction on the proposition sentences in the text data, and the second semantic features are obtained by fusing the first semantic features of a plurality of proposition sentences which are continuously distributed, so that the two proposition sentences can be classified according to the first semantic features and the second semantic features to obtain the argumentation relationship of the two proposition sentences. The embodiment of the application realizes the fusion of the context relation of the proposition sentences in the process of mining the argumentation relation, thereby improving the accuracy of mining the argumentation relation.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the technical solution of the present application is applied.

FIG. 2 illustrates a flow chart of a demonstrated relationship mining method in one embodiment of the present application.

FIG. 3 shows a schematic diagram of the length of a sequence of context statements versus model score.

FIG. 4 shows a schematic diagram of the relationship of location distance of a proposition sentence to model score.

FIG. 5 illustrates a flow chart of a training method for demonstrating a relational mining model in one embodiment of the application.

FIG. 6 shows a schematic diagram of the relationship of the presence or absence of tagged terms in a proposition sentence to model scores.

FIG. 7 is a schematic diagram of training a argumentation relation mining model in an application scenario according to an embodiment of the present application.

FIG. 8 shows a comparative schematic of experimental results of a demonstrated relationship mining model and other models in an embodiment of the present application.

Fig. 9 schematically shows a block diagram of a demonstrated relationship mining apparatus provided by an embodiment of the present application.

Fig. 10 schematically shows a block diagram of a computer system suitable for use in implementing embodiments of the application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

In particular embodiments of the present application, related data such as user information, social media information, etc., when the various embodiments of the present application are applied to particular products or technologies, user permissions or consents need to be obtained, and the collection, use, and processing of related data is required to comply with relevant laws and regulations and standards of the relevant countries and regions.

In the related art of the present application, the mining task of the arguments can be modeled generally based on semantic parsing, and statistical and artificial features are used to classify the arguments, such as supporting, objecting, or no relationship.

As shown in fig. 1, a system architecture to which the technical solution of the present application is applied may include a terminal device 110 and a server 130. Terminal device 110 may include various electronic devices such as smart phones, tablet computers, notebook computers, desktop computers, smart speakers, smart wearable devices, smart vehicle devices, smart payment terminals, and the like. The server 130 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like. Various connection types of communication media for providing a communication link may be included between terminal device 110 and server 130, such as a wired communication link or a wireless communication link.

The argumentation relation mining model 120 is a model for mining argumentation relations from text data, for example, in a text composed of a plurality of propositions sentences, it can be judged whether or not there is an argumentation relation between any two propositions sentences and which argumentation relation between the two can be recognized using the argumentation relation mining model 120.

In an application scenario of the embodiment of the present application, the argumentation relation mining model 120 may be deployed on the server 130 in advance, and the server 130 trains the argumentation relation mining model 120. During model training, a loss error may be determined according to the recognition result of the demonstration relation mining model 120 on the training sample, and then model parameters of the demonstration relation mining model 120 are iteratively updated according to the loss error. The loss error of the model can be gradually reduced through continuous training, and the recognition accuracy of the model is improved.

When the training of the demonstration relation mining model 120 is completed, a demonstration relation mining service may be provided to the terminal device 110. For example, the terminal device 110 may upload the text data to the server 130, the argumentation relation mining model 120 deployed on the server 130 performs mining processing on the text data and then outputs the argumentation relation, and the server 130 returns the mining result of the argumentation relation to the terminal device 110.

In other application scenarios, the trained arguments mining model 120 may be deployed directly to the terminal device 110, so that the terminal device 110 can locally run the arguments mining model. When the demonstration relation mining is required, the terminal device 110 may input the text data into the trained demonstration relation mining model 120, and the demonstration relation mining model 120 may mine the proposition sentences in the text data and output the mining result of the demonstration relation.

In one embodiment of the present application, the demonstration relation mining model 120 may include an encoder for extracting features of the proposition sentences in the text data and a classifier for classifying the two proposition sentences, and the distribution probability of the two proposition sentences on multiple relation categories such as supporting relation, anti-relation or no relation can be obtained by adopting the mode of encoding processing and classifying processing, wherein one relation category with the highest distribution probability can be regarded as the demonstration relation between the two proposition sentences.

The technical solution related to the demonstration relation mining provided by the embodiment of the application can realize various application functions as exemplified below in a software product (such as an application program) or a hardware product (such as a computer device provided with the application program).

(1) Demonstration analysis: by mining the demonstration relation in the text, the product can be helped to analyze the logical structure and demonstration thinking of the text, so that the meaning of the text and the intention of an author can be better understood.

(2) Emotion analysis: demonstration relation mining can help products analyze emotion tendencies and emotion intensities in texts, so that emotion requirements and feedback of users can be better known.

(3) Information retrieval: through excavating the demonstration relation in the text, the product can be helped to more accurately match the information requirement of the user, and the information retrieval efficiency and accuracy are improved.

(4) Public opinion analysis: through excavating the demonstration relation in texts such as social media, news and the like, the product can be helped to better know the attitude and the opinion of the public on a certain event or topic, so that public opinion analysis and management can be better carried out.

(5) Natural language generation: through the demonstration relation in the excavated text, the product can be helped to generate the natural language text with more logic and consistency, and the quality and efficiency of text generation are improved.

The technical schemes such as the demonstration relation mining method, the demonstration relation mining device, the computer readable medium, the electronic equipment and the computer program product provided by the application are described in detail below with reference to the specific embodiments.

Fig. 2 shows a flowchart of a argumentation relation mining method in one embodiment of the present application, which may be performed by the terminal device or the server shown in fig. 1 alone or by both the terminal device and the server. The embodiment of the application is illustrated by taking the demonstration relation mining method executed by the terminal equipment as an example. As shown in fig. 2, the argumentation relation mining method may include steps S210 to S240 as follows.

S210: text data needing to mine demonstration relation is obtained, the text data comprises a plurality of propositions which are distributed continuously, and the demonstration relation comprises an objection relation, a support relation or an irrelevant relation between two propositions.

The text data to be mined for the arguments may be a document composed of a plurality of propositional sentences distributed in succession, wherein each propositional sentence may be one complete sentence or a sentence fragment formed by splitting one complete sentence.

For example, N (at least two) propositions are included in one text data; if the content viewpoint of one proposition sentence is the same as the content viewpoint of the other proposition sentence, the two proposition sentences can be considered to have a supporting relationship; if the content viewpoint of one proposition statement is opposite to the content viewpoint of the other proposition statement, the two proposition statements can be considered to have an anti-relation; if the content perspective of one proposition statement and the content perspective of the other proposition statement cannot form a direct supporting relationship or an anti-relationship, the two propositions statement can be considered to have no relationship.

S220: and extracting features of the proposition sentence to obtain a first semantic feature of the proposition sentence.

The first semantic feature is a feature for representing the sentence meaning of one proposition sentence itself, and is an abstract representation of the content perspective of the proposition sentence itself.

In one embodiment of the application, a demonstration relation mining model can be trained in advance, and an encoder in the model is used for encoding the proposition sentence to obtain a corresponding encoding vector as a first semantic feature of the proposition sentence.

For example, one text data is a sentence sequence composed of N propositions which are distributed continuously, and N first semantic features can be obtained after feature extraction is performed on each propositions. Wherein the first semantic feature of the ith proposition sentence a (i) may be represented as H1 (i).

S230: and carrying out fusion processing on the first semantic features of the plurality of propositions which are distributed continuously to obtain the second semantic features of the propositions.

A text data is a sentence sequence composed of a plurality of propositional sentences which are distributed continuously, wherein a plurality of propositional sentences which are distributed continuously can form a context sentence sequence, and the first semantic features of each propositional sentence in the context sentence sequence are fused to obtain the second semantic features of the propositional sentence.

The second semantic feature is a statement meaning feature for representing a sequence of context statements related to one proposition statement, and is an abstract representation of the content perspective of the entire sequence of context statements. A contextual statement sequence related to one proposition statement refers to a set of that proposition statement and several other proposition statements that are adjacent.

In one embodiment of the present application, a sliding window with a preset number of lengths may be used to divide a corresponding context sentence sequence for each proposition sentence, and after the first semantic features of each proposition sentence in the context sentence sequence are fused, the second semantic features of the proposition sentence may be obtained.

For example, a sliding window of number length L may be used to divide the sequence of context statements for each proposition statement. For the i-th proposition sentence a (i), L proposition sentences which are consecutively distributed before it may be divided into a context sentence sequence of the proposition sentence a (i), or L proposition sentences which are consecutively distributed after it may be divided into a context sentence sequence of the proposition sentence a (i), or L proposition sentences which are consecutively distributed before and after it may also be divided into a context sentence sequence of the proposition sentence a (i). The second semantic features H2 (i) of the proposition sentence A (i) can be obtained by performing feature fusion processing on the first semantic features of the L proposition sentences in the context sentence sequence.

In order to verify the influence of the number length L of the sliding windows (namely the length of the context sentence sequence) on the demonstration relation mining effect, the embodiment of the application uses three sample data sets of AbstRCT, ECHR and AMPERE to perform performance verification on the demonstration relation mining model, and uses a Macro F1 score to evaluate the demonstration relation mining effect of the demonstration relation mining model.

FIG. 3 shows a schematic diagram of the length of a sequence of context statements versus model score. As shown in fig. 3, the performance of the model initially improves as the sequence length of the sequence of context statements increases, but the model performance begins to decrease as the sequence length continues to increase. It follows that the context information may improve the performance of the model, but that too long a context may lead to the model focusing too much on some irrelevant sentence information, leading to the introduction of too much noise and thus exhibiting relatively weak performance.

In some optional implementations, the embodiment of the application can set the number length L of the sliding window to be a preset constant with a value between 4 and 14, so that noise introduction is reduced while the context information is utilized to improve the demonstration relation mining effect.

In one embodiment of the application, a arguments mining model with an encoder having an attention module therein for extracting context information for propositions can be pre-trained. Based on a self-attention mechanism, the attention module is used for carrying out fusion processing on the first semantic features of a plurality of propositions in the context sentence sequence, so that the second semantic features of the aggregated context information are obtained.

The attention module may dynamically assign different weights based on the relevance of each proposition sentence in the sequence of contextual sentences, thereby capturing dependency and semantic information in the sequence.

For example, the attention module first maps the first semantic feature of each proposition sentence in the context sentence sequence into a query vector (query), a key vector (key), and a value vector (value), respectively, which have the same dimensions. The query vector represents a target to be focused or retrieved, the key vector represents a source to be matched or compared with the query vector, and the value vector represents information to be weighted and summed according to the degree of matching of the query vector and the key vector.

Then, for each location i, a dot product or other similarity measure of the query vector i and all key vectors j is calculated, resulting in an attention score s_ij representing the attention weight of location i to location j. Then, for each location i, all the attention scores s_ij are normalized, for example, an attention weight a_ij is obtained by a softmax function, which represents the attention degree of the location i to the location j. Finally, for each position i, the corresponding value vector j is weighted and summed by all attention weights a_ij to obtain an output vector o_i, wherein the output vector o_i is the second semantic feature H2 (i) of the proposition sentence A (i).

S240: and classifying the two propositions according to the first semantic features and the second semantic features to obtain the arguments of the two propositions.

In one embodiment of the present application, a method of classifying two propositions may include: fusing the first semantic features and the second semantic features of the proposition sentence to obtain a third semantic feature of the proposition sentence; mapping the third semantic features of the two propositions according to a preset activation function to obtain the classification probability of the two propositions, wherein the classification probability comprises the probability of classifying the arguments of the two propositions into objections, support or independence; and determining the demonstration relation of the two propositional sentences according to the maximum value of the classification probability.

In some alternative embodiments, the method of fusion processing the first semantic feature and the second semantic feature of the proposition sentence may include any one of a fusion method such as splicing, summing or weighted summing. For example, the embodiment of the present application may directly add the first semantic feature H1 (i) and the second semantic feature H2 (i) to obtain a third semantic feature H3 (i), i.e., H3 (i) =h1 (i) +h2 (i).

In some alternative embodiments, the activation function that maps the third semantic feature of the two propositional sentences may include a tanh function and a softmax function. For example, the embodiment of the application can map to obtain the classification probability P of the argumentation relation between the jth proposition sentence and the kth proposition sentence according to the following formula.

Wherein H is _j Represent j-th proposition sentence s _j Third semantic feature of (H) _k Represent the kth proposition sentence s _k W is as follows ₁ And W is ₂ Parameters of the activation function tanh and the activation function softmax, respectively.

In one embodiment of the application, when the arguments of the two propositions are irrelevant, the maximum value of the classification probability and the feature similarity of the two propositions are in positive correlation; when the argumentation relation of the two propositions is an objection relation or a support relation, the maximum value of the classification probability and the feature similarity of the two propositions are in a negative correlation relation.

According to the embodiment of the application, through training the demonstration relation mining model, the classification probability of the demonstration relation and the feature similarity of the two propositions can be controlled to present different correlation relations, so that the feature similarity of the two propositions can be pulled up when the two propositions have a specified demonstration relation (namely an opposite relation or a supporting relation), and the feature similarity of the two propositions can be pulled away when the two propositions do not have a specified demonstration relation (namely no relation), and the feature extraction capacity of an encoder of the model for the propositions is improved by utilizing the differentiated sequence length correlation relation.

In order to verify the influence of the position distances of two propositions on the demonstration relation mining effect, the embodiment of the application uses three sample data sets of AbstRCT, ECHR and AMPERE to perform performance verification on the demonstration relation mining model, and uses a Macro F1 score to evaluate the demonstration relation mining effect of the demonstration relation mining model.

FIG. 4 shows a schematic diagram of the relationship of location distance of a proposition sentence to model score. As shown in fig. 4, the performance of the model decreases with increasing distance of locations between propositions, indicating that detecting the arguments over longer distances remains challenging. This also shows that simply connecting sentences in a context window still lacks utilization of context information, leaving room for improvement.

Aiming at the problem, the application can configure differentiated position distance correlation relations for different types of demonstration relations, and can improve the demonstration relation mining effect on longer position distances by combining the position distance correlation relations with the feature similarity.

In one embodiment of the application, when the arguments of the two propositions are irrelevant, the influence degree of the feature similarity of the two propositions on the classification probability and the position distance of the two propositions are in a negative correlation; when the argumentation relation of the two propositions is an objection relation or a support relation, the influence degree of the feature similarity of the two propositions on the classification probability and the position distance of the two propositions form a positive correlation relation.

In some optional embodiments, the feature similarity of the two propositions may be weighted by using the position distance of the two propositions as a weight coefficient, so as to control the influence degree of the feature similarity of the two propositions on the classification probability by using the position distance of the two propositions.

In one embodiment of the application, to enhance the mining effect of the argumentation relation mining model, it may be iteratively trained using sample data to continually update the model parameters.

Fig. 5 is a flowchart illustrating a training method of a demonstrated relation mining model in one embodiment of the present application, which may be performed by the terminal device or the server shown in fig. 1 alone or by both the terminal device and the server. The embodiment of the application is illustrated by taking a training method of a demonstration relation mining model executed by a server as an example. As shown in fig. 5, the training method of the demonstration relation mining model may include the following steps S510 to S550.

S510: and acquiring a demonstration relation mining model for mining the demonstration relation of the text data, wherein the demonstration relation mining model comprises an encoder for extracting characteristics of the proposition sentences and a classifier for classifying the two proposition sentences.

S520: a text data sample for training a forensic relation mining model is obtained, the text data sample comprising a plurality of proposition sentence samples distributed consecutively, and a sample tag for representing a forensic relation between the two proposition sentence samples.

In one embodiment of the application, words or sentences in the text data samples can be randomly masked to enhance the training data and reduce the dependence of the model on the particular words or sentences with less information.

In one embodiment of the present application, randomly masking words or sentences in the text data samples may further comprise: identifying tagged words associated with the argumentation relationship from the text data sample; randomly shielding the marked words in the text data sample according to the first preset probability; randomly shielding the proposition sentence sample in the text data sample according to the second preset probability.

The tagged word refers to a characteristic word having a strong correlation with the arguments of the propositional sentence, such as a sentence connective such as so, thus, however.

In order to verify the influence of the marked words on the demonstration relation mining effect, the embodiment of the application uses three sample data sets of AbstRCT, ECHR and AMPERE to perform performance verification on the demonstration relation mining model, and uses a Macro F1 score to evaluate the demonstration relation mining effect of the demonstration relation mining model.

FIG. 6 shows a schematic diagram of the relationship of the presence or absence of tagged terms in a proposition sentence to model scores. As shown in fig. 6, the model performs better in most cases in propositional sentences where tagged words such as "so", "plus" exist. These tagged words may be considered as important signals to identify the arguments, but their presence may prevent the model from fully exploiting the context information between sentences, weakening the model's feature extraction capability for the context information.

Aiming at the problem, the embodiment of the application adopts a mode of randomly shielding the marked words and the proposition sentences, reduces the dependence of the model on the marked words and reduces the excessive attention of the model on the individual proposition sentences, thereby being capable of enhancing the understanding of the model on the context information.

For example, first, in word-level data enhancement, embodiments of the present application may randomly mask the tagged words (probability p_w) to encourage the model to learn more context information and reduce reliance on the tagged words. Embodiments of the present application may select 18 marker words from the PDTB manual.

Second, because long contexts may introduce noise, excessive attention to less informative sentences may negatively impact the recognition of a particular head-to-tail pair. Thus, embodiments of the present application generate samples from raw data by randomly masking some propositional sentences (probability p_s) to enhance understanding of context information and mitigate excessive attention to some sentences.

S530: sample characteristics obtained by carrying out characteristic extraction on the propositional sentence samples by the encoder are obtained, and first loss errors of the two propositional sentence samples are determined according to the sample characteristics and the sample labels.

In one embodiment of the present application, a method of determining a first loss error for two propositional sentence samples from sample characteristics and sample labels may include: determining the feature similarity of two propositional sentence samples according to the sample features; and determining a first loss error of the two propositional sentence samples according to the feature similarity and the sample label.

In one embodiment of the application, when the sample label is an independent system, the first loss error and the feature similarity of the two proposition sentence samples are in positive correlation; when the sample labels are in an anti-relation or support relation, the first loss error and the feature similarity of the two propositional sentence samples are in a negative correlation relation.

In one embodiment of the application, when the sample label is irrelevant, the degree of influence of the feature similarity of the two proposition sentence samples on the first loss error and the position distance of the two proposition sentence samples are in a negative correlation; when the sample labels are in an anti-relation or supporting relation, the influence degree of the feature similarity of the two proposition sentence samples on the first loss error and the position distance of the two proposition sentence samples are in positive correlation.

For example, the first loss error L in the embodiment of the present application _con Can be expressed as the following formula.

Wherein y represents whether a specified argumentation relationship exists between the ith proposition sentence sample and the jth proposition sentence sample. When (when)When the two propositional sentence samples are not related, when +.>When a supporting relationship or an anti-relationship exists between the two propositional sentence samples.

H _i Representing the ith propositionSample characteristics of sentence sample, H _j Sample features representing a j-th proposition sentence sample. The sample pairs with the demographics relationship (y+.0) are pulled up by optimizing the feature similarity sim (), and the sample pairs without the relationship (y=0) are pushed away.

Considering that sample features have relatively weak similarity over long distances and strong similarity over short distances, embodiments of the present application use distance weights to enhance similarity differences. The weight is calculated by normalizing the index of the distance, whereinIs sample->And sample->The distance between the two positions>Is the longest input text length.

S540: and obtaining sample distribution probability obtained by classifying the two propositional sentence samples according to the sample characteristics by a classifier, and determining a second loss error of the two propositional sentence samples according to the sample distribution probability and the sample label.

In one embodiment of the application, the classifier comprises an input layer, a hidden layer and an output layer which are sequentially connected; the method for obtaining the sample distribution probability obtained by classifying two propositional sentence samples by a classifier can comprise the following steps: obtaining an input vector obtained by embedding and encoding sample characteristics of two proposition sentence samples by an input layer; obtaining an intermediate vector obtained by normalizing an input vector by a hidden layer; and obtaining the sample distribution probability obtained by the classification mapping processing of the intermediate vector by the output layer.

For example, the sample distribution probability P in the embodiment of the present application can be expressed as the following formula.

Wherein H is _j For the j-th proposition sentence sample s _j Input vector of H _k For the kth proposition sentence sample s _k Input vector, W of (2) ₁ And W is ₂ Is a trainable parameter.

And normalizing the input vector by using a tanh function through the hidden layer to obtain an intermediate vector, and classifying and mapping the intermediate vector by using a softmax function through the output layer to obtain the sample distribution probability P.

In one embodiment of the application, the second loss error may use a cross entropy loss function L _cls And (5) calculating to obtain the product.

S550: model parameters of the demonstration relation mining model are updated according to the first loss error and the second loss error.

In one embodiment of the present application, a method of updating model parameters of a demonstrated relationship mining model based on a first loss error and a second loss error may include: carrying out weighted summation on the first loss error and the second loss error according to preset weights to obtain an error value of the demonstration relation mining model; performing back propagation on the error value in the demonstration relation mining model to obtain error gradients of various model parameters; and updating the model parameters according to the error gradient.

For example, the error value L of the argumentation relation mining model can be expressed as. Wherein (1)>Is a super parameter, is a preset weight for controlling the loss duty cycle of both errors.

FIG. 7 is a schematic diagram of training a argumentation relation mining model in an application scenario according to an embodiment of the present application. As shown in fig. 7, the argumentation relation mining model includes an encoder for feature extraction of the proposition sentences and a classifier for classification processing of the two proposition sentences, and the aim of training the argumentation relation mining model is to optimize model parameters of the encoder and the classifier.

Text data samples for training the demonstrated relationship mining model are as follows.

(1)I think the Massachusetts rule is a positive step in the right direction.

(2) However, for those debtors who truly cannot pay at all...

(3) For example, most disability insurance companies ...

...

In order to enhance the model training effect, the embodiment of the application obtains the following sample with the mask by randomly shielding words and propositions.

(1)I think the Massachusetts rule is a positive step in the right direction.

(2)<Mask>, for those debtors who truly cannot pay at all...

(3)<Mask>...<Mask

...

The embodiment of the application converts the demonstration relation mining task into the classification task. Formally, a sample dataset is provided that contains a plurality of document samples, each document sample consisting of a number of propositional sentence samples. The objective of the classification task is to identify whether there is an objection relationship (attack), a support relationship (support), or no-rel between two propositional statement samples. The embodiment of the application can execute classification tasks under end-to-end setting (considering sample pairs formed by all propositional sentence samples) or execute tasks under the setting of given preconditions propositions (considering sample pairs formed by a specified propositional sentence sample and other samples).

As shown in FIG. 7, given a head propositionConnecting it with surrounding sentences to provide context information, including +.>Anterior and posterior->A proposition. Other propositions in the context are considered tail propositions. Add tag [ CLS ]]To separate propositions and as their representations. Use coding module to +.>Or the feature vector obtained after encoding other tail propositions can be expressed as the first semantic feature +.>。

To further extract the relationships between sentences, embodiments of the present application use a sentence-level attention module to aggregate context information. Specifically, sentence token vectors in the context window are concatenated and fed into a multi-headed self-attention module to aggregate context information, which may be represented as a second semantic feature as follows 。

In the self-attention module, each row of the query vector (query), key vector (key), and value vector (value) corresponds to one sentence representation.

By simply summing together token vectors from the tag level and sentence level, a third semantic feature is obtained。

Subsequently, the first loss error is calculated using the following formula.

The specific definition of each parameter in the formula may refer to the description in the above embodiment, and will not be repeated here.

In the classifier, the embodiment of the application can calculate the label distribution probability of two samples by using the multi-layer perceptron MLP, and further calculate the corresponding second loss error L _cls 。

The specific calculation method of the tag distribution probability and the second loss error may refer to the description in the above embodiment, and will not be repeated here.

The overall training objective function is calculated as follows: ,wherein->For cross entropy loss of classification, +.>Is a preset weight.

In order to verify the training effect of the model, the embodiment of the application performs experiments on five data sets in different fields, namely AMPERE, essays, abstRCT, ECHR and CDCP.

FIG. 8 shows a comparative schematic of experimental results of a demonstrated relationship mining model and other models in an embodiment of the present application. The demonstration relation mining model in the embodiment of the application comprises a model ECASE-10 with the context length of 10 and a model ECASE-20 with the context length of 20.

Given the Head (Head Given), the ECASE-10 achieved the most advanced performance (macro-F1 scores 79.01%, 77.38%, 72.17%, 73.15% and 72.16%) relative to the other models.

When the ECASE context length is increased to 20, the performance of the model is also improved compared to model CASE-20. While the shorter context of ECASE-10 still performs better than ECASE-20 in most data sets, the gap between them is relatively small compared to CASE.

Notably, the model performance (72.36%) of ECASE-20 in CDCP dataset is even higher than ECASE-10 (72.18%), which indicates that the model provided by the embodiment of the application can effectively extract context information while alleviating excessive attention to sentences with less information to some extent. In addition, models with backward and forward context inputs, such as CASE and ECASE, exhibit better performance over SEQPAIR and SEQCON-10, demonstrating the validity of the input form and context information.

In an End-to-End setup (End-to-End), ECASE-20 performs better in most data sets relative to CASE-20. The results also show that efficient use of contextual information can improve recognition of the demonstrated structure in the original utterance. However, in an end-to-end setup, the performance of the ECASE-20 is typically lower than in a given header setup. This shows that in the end-to-end case, it becomes more and more difficult to detect the arguments due to the smaller ratio of support and attack tags. However, under this setup, the performance of ECASE-20 was higher on the AbstRCT dataset, which suggests that in this dataset, the "no-rel" sample is very important for model discrimination of the arguments, and that the arguments more clearly express them than the "no-rel" sentence pair.

Based on the description of the above embodiments and application scenarios, the embodiment of the present application can enhance the utilization of context information by the demonstration relation mining model from two aspects of modeling capability and data enhancement.

In terms of modeling capability, the embodiment of the application adopts a self-attention module to aggregate context information on the basis of the coding module and optimize the attention distribution among sentences. The embodiment of the application also uses a distance weighted similarity loss to strengthen the representation similarity to consider the demonstration relationship.

In terms of data enhancement, embodiments of the present application randomly mask the utterance markers and sentences in the training set to mitigate model dependence on specific words and less informative sentences. Thus, the model is encouraged to fully understand the context information.

The embodiment of the application can improve the judgment accuracy of the discourse relation, can more accurately understand the meaning of the text, and can more accurately find the related information and discourse structure in the language text, thereby improving the retrieval efficiency. In the automatic text generation task, the accurate understanding of the demonstration relation can generate more convincing text, and the method can also be used for judging logic problems in the automatically generated text so as to correct the logic problems.

It should be noted that although the steps of the methods of the present application are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

The following describes an embodiment of the apparatus of the present application that may be used to perform the demonstrated relationship mining method of the above-described embodiment of the present application. Fig. 9 schematically shows a block diagram of a demonstrated relationship mining apparatus provided by an embodiment of the present application. As shown in fig. 9, the argumentation relation mining apparatus 900 includes:

an obtaining module 910 configured to obtain text data requiring to mine a argumentation relationship, the text data including a plurality of propositions sentences distributed in succession, the argumentation relationship including an objection relationship, a support relationship, or an independent relationship existing between two propositions sentences;

the extracting module 920 is configured to perform feature extraction on the proposition sentence to obtain a first semantic feature of the proposition sentence;

A fusion module 930, configured to perform fusion processing on the first semantic features of the multiple propositions of the continuous distribution, so as to obtain second semantic features of the propositions;

the classification module 940 is configured to classify two propositions according to the first semantic feature and the second semantic feature, so as to obtain a argumentation relationship of the two propositions.

In some embodiments of the present application, based on the above technical solutions, the classification module 940 further includes:

In some embodiments of the present application, based on the above technical solution, the discussion document relation mining apparatus 900 further includes:

Specific details of the demonstration relation mining apparatus provided in each embodiment of the present application have been described in detail in the corresponding method embodiments, and are not described herein.

Fig. 10 schematically shows a block diagram of a computer system of an electronic device for implementing an embodiment of the application.

It should be noted that, the computer system 1000 of the electronic device shown in fig. 10 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 10, the computer system 1000 includes a central processing unit 1001 (Central Processing Unit, CPU) which can execute various appropriate actions and processes according to a program stored in a Read-Only Memory 1002 (ROM) or a program loaded from a storage section 1008 into a random access Memory 1003 (Random Access Memory, RAM). In the random access memory 1003, various programs and data necessary for the system operation are also stored. The cpu 1001, the rom 1002, and the ram 1003 are connected to each other via a bus 1004. An Input/Output interface 1005 (i.e., an I/O interface) is also connected to bus 1004.

The following components are connected to the input/output interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a local area network card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the input/output interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.

In particular, the processes described in the various method flowcharts may be implemented as computer software programs according to embodiments of the application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. The computer programs, when executed by the central processor 1001, perform the various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of argumentation relation mining, comprising:

acquiring text data needing to mine a demonstration relation, wherein the text data comprises a plurality of propositions which are distributed continuously, and the demonstration relation comprises an objection relation, a support relation or an irrelevant relation existing between two propositions;

extracting features of the proposition sentences to obtain first semantic features of the proposition sentences;

carrying out fusion processing on first semantic features of a plurality of propositions which are continuously distributed to obtain second semantic features of the propositions;

and classifying the two propositions according to the first semantic features and the second semantic features to obtain the demonstration relation of the two propositions.

2. The method of claim 1, wherein classifying two propositions according to the first semantic feature and the second semantic feature to obtain the arguments of the two propositions, comprises:

fusing the first semantic features and the second semantic features of the proposition sentence to obtain a third semantic feature of the proposition sentence;

mapping the third semantic features of the two propositions according to a preset activation function to obtain the classification probability of the two propositions, wherein the classification probability comprises the probability of classifying the argumentation relationship of the two propositions into an objection relationship, a support relationship or an independent relationship;

And determining the demonstration relation of the two propositional sentences according to the maximum value of the classification probability.

3. The forensic relation mining method according to claim 2 wherein when the forensic relation of the two propositions is irrelevant, the maximum value of the classification probability and the feature similarity of the two propositions are in positive correlation; when the demonstration relation of the two propositional sentences is an objection relation or a supporting relation, the maximum value of the classification probability and the feature similarity of the two propositional sentences are in a negative correlation relation.

4. The method according to claim 2, wherein when the arguments of the two propositions are irrelevant, the degree of influence of the feature similarity of the two propositions on the classification probability is in a negative correlation with the position distance of the two propositions; when the demonstration relation of the two propositions is an objection relation or a support relation, the influence degree of the feature similarity of the two propositions on the classification probability and the position distance of the two propositions are in positive correlation.

5. The forensic relation mining method according to claim 1, wherein before acquiring text data requiring the mining of the forensic relation, the method further comprises:

Acquiring a demonstration relation mining model for mining a demonstration relation of text data, wherein the demonstration relation mining model comprises an encoder for extracting characteristics of propositions and a classifier for classifying two propositions;

obtaining a text data sample for training the evidence relation mining model, wherein the text data sample comprises a plurality of proposition sentence samples which are distributed continuously, and a sample label for representing the evidence relation between the two proposition sentence samples;

acquiring sample characteristics obtained by carrying out characteristic extraction on the propositional sentence samples by the encoder, and determining first loss errors of the two propositional sentence samples according to the sample characteristics and the sample labels;

acquiring sample distribution probability obtained by classifying the two propositional sentence samples according to the sample characteristics by the classifier, and determining a second loss error of the two propositional sentence samples according to the sample distribution probability and the sample label;

and updating model parameters of the discussion relation mining model according to the first loss error and the second loss error.

6. The forensic relation mining method according to claim 5, further comprising, prior to obtaining sample features obtained by feature extraction of the proposition sentence samples by the encoder:

Words or sentences in the text data samples are randomly masked.

7. The forensic relation mining method according to claim 6, wherein randomly masking words or sentences in the text data samples comprises:

identifying, from the text data sample, a tagged term associated with the treaty relationship;

randomly shielding the marking words in the text data sample according to a first preset probability;

randomly shielding the propositional sentence sample in the text data sample according to a second preset probability.

8. The forensic relation mining method according to claim 5, wherein determining a first loss error for two proposition sentence samples from the sample feature and the sample tag comprises:

determining the feature similarity of two proposition sentence samples according to the sample features;

and determining a first loss error of the two propositional sentence samples according to the feature similarity and the sample label.

9. The argument relation mining method of claim 8, wherein: when the sample labels are irrelevant, the first loss error and the feature similarity of the two propositional sentence samples are in positive correlation; when the sample labels are in an anti-relation or support relation, the first loss error and the feature similarity of the two propositional sentence samples are in a negative correlation relation.

10. The forensic relation mining method according to claim 8 wherein when the sample labels are irrelevant, the degree of influence of feature similarity of the two proposition sentence samples on the first loss error is in negative correlation with the position distances of the two proposition sentence samples; when the sample labels are in an anti-relation or supporting relation, the influence degree of the feature similarity of the two proposition sentence samples on the first loss error and the position distance of the two proposition sentence samples are in positive correlation.

11. The forensic relation mining method according to claim 5 wherein the classifier comprises an input layer, a hidden layer and an output layer connected in sequence; the method for obtaining the sample distribution probability obtained by classifying the two proposition sentence samples by the classifier according to the sample characteristics comprises the following steps:

obtaining an input vector obtained by embedding and encoding sample characteristics of the two proposition sentence samples by the input layer;

obtaining an intermediate vector obtained by normalizing the input vector by the hidden layer;

and obtaining sample distribution probability obtained by the output layer through classification mapping processing of the intermediate vector.

12. The argument relation mining method of claim 5, wherein updating model parameters of the argument relation mining model from the first loss error and the second loss error, comprising:

carrying out weighted summation on the first loss error and the second loss error according to preset weights to obtain an error value of the discussion relation mining model;

back-propagating the error value in the discussion relation mining model to obtain error gradients of various model parameters;

and updating the model parameters according to the error gradient.

13. A argumentation relation mining device, comprising:

14. A computer readable medium, characterized in that the computer readable medium has stored thereon a computer program which, when executed by a processor, implements the argument relation mining method of any of claims 1 to 12.

15. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to execute the executable instructions to implement the argumentation relation mining method of any one of claims 1 to 12.