CN109710943B

CN109710943B - Contradictory statement identification method and system and clause logic identification method and system

Info

Publication number: CN109710943B
Application number: CN201811635859.4A
Authority: CN
Inventors: 鞠剑勋; 刘晔诚
Original assignee: Ctrip Travel Information Technology Shanghai Co Ltd
Current assignee: Ctrip Travel Information Technology Shanghai Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2022-12-20
Anticipated expiration: 2038-12-29
Also published as: CN109710943A

Abstract

The invention discloses a method and a system for identifying contradictory sentences and a method and a system for logically identifying clauses. The contradictory statement identification method includes: constructing a contradiction statement identification model based on a BilSTM-Attention mechanism, wherein the model is used for predicting the probability of contradiction between two texts; respectively converting the two compared texts into matrixes to be used as the input of the model; and calculating the model to obtain the probability that the two compared texts are contradictory. The invention solves the natural language inference problem by using the deep learning technology, can reduce the labor cost of feature extraction, and greatly improves the accuracy of inference.

Description

Contradictory statement identification method and system and clause logic identification method and system

Technical Field

The invention belongs to the field of artificial intelligence, and particularly relates to a method and a system for identifying contradictory sentences and a method and a system for logically identifying clauses.

Background

Inference has been one of the core research topics in the field of artificial intelligence, and natural language inference is an important research branch in the natural language processing direction, which is the research basis of tasks such as question-answering systems, information retrieval, automatic summarization, and has a wide application space in many business scenarios. One core problem in natural language inference is to give two sets of text (preconditions) and positivity, and determine whether there is a contradiction between the two at the semantic level.

Because of the diversity of language expression and the confusion of semantic understanding, especially the existence of a large number of polysemons and synonyms in Chinese texts, the difficulty of natural language inference is higher than that of expectation, and the accuracy of inference also has a great rising space.

In addition, the logical relationship of the relevant terms of the product is usually identified manually, so that an intelligent means is lacked, more labor cost is consumed, and the identification accuracy is also required to be improved.

Disclosure of Invention

The invention aims to overcome the defects that natural language inference is difficult and inference precision has a large space rise in the prior art, and provides a method and a system for identifying contradictory sentences and a method and a system for logically identifying clauses.

The invention solves the technical problems through the following technical scheme:

a contradiction sentence identification method based on deep learning comprises the following steps:

constructing a contradiction statement identification model based on a BilSTM (long-short time memory network) -Attention mechanism, wherein the model is used for predicting the probability of contradiction between two texts;

respectively converting the two compared texts into matrixes to be used as the input of the model;

and calculating the model to obtain the probability that the two compared texts are contradictory.

Preferably, the step of constructing comprises:

acquiring original training text data, wherein the original training text data comprises a premix text and a hypthesis text;

converting the premix text and the hypthesis text into a matrix according to the word vector of each word, and taking the matrix as the input of a BilSTM unit to obtain the preliminary semantic representations corresponding to the premix text and the hypthesis text, wherein the preliminary semantic representations are respectively as follows: biLSTM output vector sequence corresponding to premix text

BilsTM output vector sequence corresponding to hypthesis text

Performing maximum pooling processing on the preliminary semantic representation corresponding to the hypthesis text along the step dimension to extract the most important semantic representation as the final semantic representation of the hypthesis text

Computing a Query vector q:

q is added to

As input to Attention to word match hypothesis text with premise text, by formula

Wherein, the first and the second end of the pipe are connected with each other,

obtaining an Attention vector a: a = tanh (W) _att1 ·c+W _att2 ·q)；

Using the Attention vector to infer a two-classification deep neural network model of a contradiction relation to obtain a predicted value of the two-classification deep neural network model

The predicted value

Representing the probability of contradiction between the premix text and the hypothesis text;

the real value y and the predicted value according to the relation between the premix text and the hypothesis text

Calculating cross entropy loss:

and minimizing loss by using an optimization algorithm, and carrying out repeated iterative training to obtain a final model.

Preferably, the BiLSTM output vector sequence corresponding to the premise text is calculated by the following steps

Calculating a forward output vector sequence and a reverse output vector sequence corresponding to the premix text;

combining the forward output vector sequence and the reverse output vector sequence corresponding to the premix text along the last dimension to obtain a BilSTM output vector sequence corresponding to the premix text

Calculating a BilSt output vector sequence corresponding to the hypthesis text by the following steps

Calculating a forward output vector sequence and a reverse output vector sequence corresponding to the hypthesis text;

combining the forward output vector sequence and the backward output vector sequence corresponding to the hypthesis text along the last dimension to obtain the BilSTM output vector sequence corresponding to the hypthesis text

Preferably, the step of constructing further comprises: preprocessing the original training text data;

the contradictory statement identification method further includes: preprocessing the two texts to be compared;

the preprocessing at least comprises denoising, word segmentation and dictionary coding.

A clause logic identification method based on deep learning, the clause logic identification method comprising:

respectively converting the two compared clause texts into matrixes to be used as input of a contradictory sentence identification model constructed by the contradictory sentence identification method;

and calculating the model to obtain the probability that the two compared clause texts are contradictory.

A contradictory sentence identification system based on deep learning, the contradictory sentence identification system comprising:

the model construction module is used for constructing a contradiction statement identification model based on a BilSTM-Attention mechanism, and the model is used for predicting the probability of contradiction between two texts;

the model input module is used for respectively converting the two compared texts into matrixes to be used as the input of the model;

and the model output module is used for obtaining the probability that the two compared texts are contradictory through the calculation of the model.

Preferably, the model building module is configured to:

Bilstm output vector sequence corresponding to hypothesis text

Computing a Query vector q:

q is added to

Wherein the content of the first and second substances,

obtaining an Attention vector a: a = tanh (W) _att1 ·c+W _att2 ·q)；

Using the Attention vector to infer a two-classification deep neural network model of a contradiction relationship to obtain a predicted value of the two-classification deep neural network model

The predicted value

Representing the probability of contradiction between the premix text and the hypthesis text;

according to the real value y and the predicted value of the relationship between the premix text and the hypthesis text

Calculating cross entropy loss:

Preferably, the model building module is further configured to:

mixing textThe corresponding forward output vector sequence and the reverse output vector sequence are combined along the last dimension to obtain a BilSTM output vector sequence corresponding to the premix text

The model building module is further configured to:

Preferably, the model building module is further configured to pre-process the original training text data;

the model input module is also used for preprocessing the two compared texts;

A clause logic discrimination system based on deep learning, the clause logic discrimination system comprising:

the clause input module is used for respectively converting the two compared clause texts into matrixes to be used as input of the contradictory sentence recognition model constructed by the contradictory sentence recognition system;

and the probability output module is used for obtaining the probability that the two compared clause texts are contradictory through the calculation of the model. On the basis of the common knowledge in the field, the above preferred conditions can be combined randomly to obtain the preferred embodiments of the invention.

The positive progress effects of the invention are as follows: the invention solves the natural language inference problem by using the deep learning technology, can reduce the labor cost of feature extraction, and greatly improves the accuracy of inference.

Drawings

Fig. 1 is a flowchart of a contradiction sentence identification method based on deep learning in embodiment 1 of the present invention.

Fig. 2 is a schematic block diagram of a contradictory sentence recognition system based on deep learning according to embodiment 3 of the present invention.

Fig. 3 is a schematic block diagram of a clause logic authentication system based on deep learning according to embodiment 4 of the present invention.

Detailed Description

The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.

Example 1

Fig. 1 shows a contradiction sentence identification method based on deep learning according to the present embodiment. The contradictory statement identification method comprises the following steps:

step 11: based on a BilSTM-Attention mechanism, a contradiction sentence identification model is constructed, and the model is used for predicting the probability of contradiction between two texts.

Step 12: the two texts being compared are preprocessed.

Step 13: and respectively converting the two texts to be compared into matrixes to be used as the input of the model.

Step 14: and calculating the model to obtain the probability that the two compared texts are contradictory.

In this embodiment, step 11 may specifically include:

(1) Acquiring original training text data, and preprocessing the original training text data;

the original training text data comprises a premix text and a hypothesis text, and the preprocessing at least comprises denoising, word segmentation and dictionary coding.

(2) Converting the premise text and the hypthesis text into a matrix according to a word vector of each word (the word in the embodiment can be understood as a character and has the same meaning), and taking the matrix as the input of a BilSt unit to obtain preliminary semantic representations corresponding to the premise text and the hypthesis text, wherein the preliminary semantic representations are respectively as follows: biLSTM output vector sequence corresponding to premise text

Bilstm output vector sequence corresponding to hypothesis text

The specific process of converting the premise text into the matrix according to the word vector of each word is as follows:

(2-1) converting each word in the premix text into a word vector with a fixed length of n (e.g., n = 100);

(2-2) setting the maximum number of words in the text participating in training to m (e.g., m = 100), if the actual number of words in the premix text is less than m, complementing the insufficient part with < PAD > characters, and if the actual number of words in the premix text exceeds m, deleting words other than m;

(2-3) converting the premise text into an m x n matrix as input [ x ] according to the word vector of each word in the text ₁ ,x ₂ ,…,x _m ]。

Referring to the conversion processes (2-1) - (2-3) of the premix text, the hypthesis text can be converted into an m × n matrix according to the word vector of each word, and the detailed process is not repeated.

Taking the matrix converted from the premix text as the input of a BilSTM unit to obtain a preliminary semantic representation corresponding to the premix text: biLSTM output vector sequence corresponding to premix text

The specific process is as follows:

(2-4) setting the outputs of the forgetting gate unit and the input gate unit at the current moment to be f respectively _t And i _t The updated value of the cell state at the current time is

Satisfies the following conditions:

f _t ＝σ(W _f [x _t ,h _t-1 ]+b _f )

i _t ＝σ(W _i [x _t ,h _t-1 ]+b _i )

wherein x is _t For input at the current time, h _t-1 For hiding the state of the layer at the previous moment, W _f 、W _i And W _c Updating the weight matrix of the states for the forgetting gate cell, the input gate cell and the cell, b _f 、b _i And b _c Respectively updating bias vectors of states of a forgetting gate unit, an input gate unit and a cell, wherein sigma is a sigmoid activation function, and tanh is a hyperbolic tangent function;

(2-5) by the formula

Updating state C of cell _t ；

(2-6) obtaining the output h of each hidden node according to the following formula _t H is to be _t Connected in sequence to form m-dimensional vector sequence h ₁ ,h ₂ ,…,h _m ]：

o _t ＝σ(W _o [x _t ,h _t-1 ]+b _o )

h _t ＝o _t *tanh(C _t )

Wherein, W _o As a weight matrix of the output gate unit, b _o Is an offset vector of the output gate unit, o _t Is the output of the output gate unit;

(2-7) processing the premix text in the forward direction through the steps (2-4) - (2-6) to obtain a forward output vector sequence corresponding to the premix text

Processing the premix text reversely through the steps (2-4) - (2-6) to obtain a reverse output vector sequence corresponding to the premix text

(2-8) Forward output vector sequence corresponding to premise text

And a reverse output vector sequence

Combining along the last dimension to obtain a BilSTM output vector sequence corresponding to the premix text

Is marked as

In the reference process (2-4) - (2-8), forward processing is carried out on the hypthesis text in the steps (2-4) - (2-6) to obtain a forward output vector sequence corresponding to the hypthesis text

Processing the hypothesis text reversely through the steps (2-4) - (2-6) to obtain a reverse output vector sequence corresponding to the hypothesis text

Forward output vector sequence corresponding to hypothesis text

And a reverse output vector sequence

Combining along the last dimension to obtain a BilSt output vector sequence corresponding to the hypthesis text

Is marked as

The detailed process is not described again.

(3) For hypreliminary semantic representation corresponding to potesis text

Performing maximal pooling along the step dimension to extract the most important semantic representation as the final semantic representation of the hypothesis text

Further compute Query vector q:

(4) Q is added to

obtaining an Attention vector a: a = tanh (W) _att1 ·c+W _att2 ·q)。

(5) Using the Attention vector a to infer a two-classification deep neural network model of a contradiction relationship to obtain a predicted value of the two-classification deep neural network model

The predicted value

Representing premix text and hypthesis textProbability of contradiction.

(6) According to the real value y and the predicted value of the relationship between the premix text and the hypthesis text

Calculating cross entropy loss:

(7) And minimizing the loss by using an Adam (Adaptive motion Estimation) optimization algorithm, and carrying out iterative training for multiple times to obtain a final model.

Through the steps, when the semantics of the hypthesis text are extracted, the semantics are integrated into a single vector by using the maximum pooling operation, and then the single vector is matched with each word vector in the premise text, so that the inference precision is ensured, and the training cost of the model is greatly reduced.

In step 12, preprocessing the two texts to be compared is the same as the preprocessing in step (1); in step 13, referring to the conversion processes (2-1) - (2-3) of the premix text, the two texts to be compared can be converted into an m × n matrix according to the word vector of each word, and the detailed process is not repeated; in step 14, it can be further determined whether the two texts to be compared are contradictory through the probability, for example, if the probability exceeds 50%, the two texts to be compared can be generally considered to be contradictory, and if the probability does not exceed 50%, the two texts to be compared can be generally considered to be not contradictory.

Example 2

The embodiment provides a clause logic identification method based on deep learning. The clause logic authentication method comprises the following steps:

converting the two compared clause texts into matrixes respectively to be used as input of a contradictory sentence identification model constructed by the contradictory sentence identification method of the embodiment 1;

Of course, the clause logic identification method of the present embodiment may pre-process the two compared clause texts with reference to step 12 of embodiment 1 before converting the two compared clause texts into matrices, respectively.

The term logic identification method of the embodiment can be applied to intelligent identification of logic relations of related terms of travel products, such as terms and contents contained in parts of reservation restriction, reservation description, product description and the like, and aims to ensure reasonability and logicality of related terms of travel products of companies, further fully guarantee rights and interests of consumers and provide the most satisfactory service for customers. Certainly, the method for logically identifying clauses is not limited to this embodiment, and the method for logically identifying clauses can also be applied to other service products or entity products, even scenes such as certain systems and rule clauses, so as to achieve the effects of reducing the labor cost of manual check and improving the identification accuracy, and can be specifically revised and improved for the clauses with contradictions.

Example 3

Fig. 2 shows a contradictory sentence recognition system based on deep learning of the present embodiment. The contradictory sentence recognition system includes:

and the model construction module 21 is used for constructing a contradiction statement identification model based on a BilSTM-Attention mechanism, and the model is used for predicting the probability of contradiction between two texts.

And the model input module 22 is configured to preprocess the two compared texts, and convert the two compared texts into matrices, respectively, as input of the model.

And the model output module 23 is used for obtaining the probability that the two compared texts are inconsistent through the calculation of the model.

In this embodiment, the model input module 22 may be specifically configured to:

the original training text data comprises a premix text and a hypothesis text, and the preprocessing at least comprises denoising, word segmentation and dictionary encoding.

Bilstm output vector sequence corresponding to hypothesis text

Taking the matrix converted by the premix text as the input of a BilSTM unit to obtain a preliminary semantic representation corresponding to the premix text: biLSTM output vector sequence corresponding to premix text

The specific process is as follows:

(2-4) setting the outputs of the forgetting gate unit and the input gate unit at the current moment respectivelyIs f _t And i _t The updated value of the cell state at the current time is

Satisfies the following conditions:

f _t ＝σ(W _f [x _t ,h _t-1 ]+b _f )

i _t ＝σ(W _i [x _t ,h _t-1 ]+b _i )

(2-5) by the formula

Updating state C of cell _t ；

(2-6) obtaining the output h of each hidden node according to the following formula _t H is to be _t Are connected in sequence to form a m-dimensional vector sequence h ₁ ,h ₂ ,…,h _m ]：

o _t ＝σ(W _o [x _t ,h _t-1 ]+b _o )

h _t ＝o _t *tanh(C _t )

(2-7) processing the premix text in the forward direction by (2-4) - (2-6) to obtain a forward output vector sequence corresponding to the premix text

Processing the premise text reversely by (2-4) - (2-6) to obtain a reverse output vector sequence corresponding to the premise text

(2-8) Forward output vector sequence corresponding to premise text

And a reverse output vector sequence

Is marked as

In the reference processes (2-4) - (2-8), forward direction of the hypothesis text is processed through the steps (2-4) - (2-6) to obtain a forward direction output vector sequence corresponding to the hypothesis text

Processing (2-4) - (2-6) in the reverse direction of the hypthesis text to obtain a reverse output vector sequence corresponding to the hypthesis text

Forward output vector sequence corresponding to hypothesis text

And backward output vector sequence

Combining along the last dimension to obtain a BilSt output vector corresponding to the hypothesis textSequence of

Is marked as

The detailed process is not described again.

(3) Preliminary semantic representation corresponding to hypthesis text

A Query vector q is further computed:

(4) Q is added to

Wherein the content of the first and second substances,

obtaining an Attention vector a: a = tanh (W) _att1 ·c+W _att2 ·q)。

(5) Using the Attention vector a to infer a two-classification deep neural network model of a contradiction relation to obtain a predicted value of the two-classification deep neural network model

The predicted value

Indicating the probability that the premix text and the hypthesis text contradict.

Calculating cross entropy loss:

In this embodiment, the model building module 21 integrates the semantics of the hypthesis text into a single vector by using the maximum pooling operation when extracting the semantics of the hypthesis text, and then matches the single vector with each word vector in the premise text, thereby greatly reducing the training cost of the model while ensuring the inference accuracy.

The model input module 22 preprocesses the two compared texts, which is the same as the preprocessing in the model construction module 21, and with reference to the conversion processes (2-1) to (2-3) of the premise texts, the two compared texts can be converted into an m × n matrix according to the word vector of each word, and the specific process is not repeated; the model output module 23 may further determine whether the two texts to be compared contradict each other according to the probability, for example, if the probability exceeds 50%, the two texts to be compared may be generally considered to be contradictory, and if the probability does not exceed 50%, the two texts to be compared may be generally considered to be not contradictory.

Example 4

Fig. 3 shows a deep learning-based clause logic authentication system according to the present embodiment. The clause logic authentication system comprises:

a clause input module 31, configured to convert the two compared clause texts into matrices, respectively, and input the matrix as a contradiction sentence identification model constructed by using the contradiction sentence identification system in embodiment 3;

and the probability output module 32 is used for obtaining the probability that the two compared clause texts are contradictory through the calculation of the model.

Of course, the clause input module may pre-process the two compared clause texts with reference to the model input module 22 of embodiment 3 before converting the two compared clause texts into matrices, respectively.

The term logic identification system of the embodiment can be applied to intelligent identification of logic relations of related terms of travel products, such as terms and contents contained in parts of reservation restriction, reservation description, product description and the like, and aims to ensure reasonability and logicality of related terms of travel products of companies, further fully guarantee rights and interests of consumers and provide the most satisfactory service for customers. Certainly, the term logical identification system is not limited to this embodiment, and the term logical identification system may also be applied to other service products or entity products, even some systems, rules and other scenarios, to achieve the effects of reducing the labor cost of manual checking and improving the identification accuracy, and may be purposefully revised and improved for the terms in contradiction.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that these are by way of example only, and that the scope of the invention is defined by the appended claims. Various changes or modifications to these embodiments may be made by those skilled in the art without departing from the principle and spirit of this invention, and these changes and modifications are within the scope of this invention.

Claims

1. A contradiction sentence identification method based on deep learning is characterized in that the contradiction sentence identification method comprises the following steps:

constructing a contradiction statement identification model based on a BilSTM-Attention mechanism, wherein the model is used for predicting the probability of contradiction between two texts;

calculating the model to obtain the probability that the two compared texts are contradictory;

the steps of constructing include:

BilsTM output vector sequence corresponding to hypthesis text

；

And calculating a Query vector q:

；

q is added to

Wherein, in the step (A),

obtaining an Attention vector

：

；

The predicted value

Calculating cross entropy loss:

；

2. The contradictory sentence identification method of claim 1, wherein the sequence of BilSTM output vectors corresponding to the premise text is calculated by the following steps

：

；

Calculating a BiLSTM output vector sequence corresponding to the hypothesis text by the following steps

：

combining the forward output vector sequence and the backward output vector sequence corresponding to the hypothesis text along the last dimension to obtain the BilSTM output vector sequence corresponding to the hypothesis text

。

3. The contradictory statement identification method according to claim 1, characterized in that the step of constructing further comprises: preprocessing the original training text data;

4. A clause logic identification method based on deep learning, characterized in that the clause logic identification method comprises the following steps:

converting the two compared clause texts into matrixes respectively as input of a contradictory sentence identification model constructed by the contradictory sentence identification method of any one of claims 1 to 3;

5. A contradictory sentence recognition system based on deep learning, the contradictory sentence recognition system comprising:

the model construction module is used for constructing a contradiction sentence identification model based on a BilSTM-Attention mechanism, and the model is used for predicting the probability of contradiction between two texts;

the model output module is used for obtaining the probability that the two compared texts are contradictory through the calculation of the model;

the model building module is configured to:

Bilstm output vector sequence corresponding to hypothesis text

；

And calculating a Query vector q:

；

q is added to

Wherein, in the process,

deriving an Attention vector

：

；

The predicted value

Calculating cross entropy loss:

；

and minimizing loss by using an optimization algorithm, and performing iterative training for multiple times to obtain a final model.

6. The contradictory statement identification system of claim 5, wherein the model building module is further configured to:

；

The model building module is further configured to:

。

7. The contradictory sentence recognition system of claim 5 wherein the model construction module is further configured to preprocess the raw training text data;

the model input module is also used for preprocessing the two compared texts;

the preprocessing at least comprises denoising, word segmentation and dictionary encoding.

8. A clause logic authentication system based on deep learning, the clause logic authentication system comprising:

a clause input module, configured to convert the two compared clause texts into matrices, respectively, as an input of a contradictory sentence recognition model constructed using the contradictory sentence recognition system according to any one of claims 5 to 7;

and the probability output module is used for obtaining the probability that the two compared clause texts are contradictory through the calculation of the model.