CN110019685B

CN110019685B - Deep text matching method and device based on sequencing learning

Info

Publication number: CN110019685B
Application number: CN201910285853.7A
Authority: CN
Inventors: 李健铨; 刘小康; 刘子博; 晋耀红
Original assignee: Dingfu Intelligent Technology Co Ltd
Current assignee: China Science and Technology (Beijing) Co., Ltd.
Priority date: 2019-04-10
Filing date: 2019-04-10
Publication date: 2021-08-20
Anticipated expiration: 2039-04-10
Also published as: CN110019685A

Abstract

The application provides a deep text matching method and device based on sequencing learning, and specifically comprises the steps of firstly, obtaining a sentence pair consisting of a hypothesis sentence and an inference sentence, wherein the inference sentence comprises a positive inference sentence and a plurality of negative inference sentences, and the hypothesis sentence is related to the semantics of the positive inference sentence and is not related to the semantics of the negative inference sentence; then, after the sentences in the sentence pairs are correspondingly processed, sentence vectors are formed, loss values of preset loss functions are calculated according to the matching degree values among the sentence vectors, and parameters of a deep matching model are adjusted according to the loss values; and finally, performing text matching on the input sentence by using the finally obtained depth matching model through parameter adjustment. According to the method and the device, the input sentence pair is expanded into the sentence sequence from two sentence pairs and contains two types of data of positive examples and negative examples, and due to the fact that the input quantity and the type of the model are expanded, the fitting speed of the model is accelerated, and the matching accuracy of the model is improved.

Description

Deep text matching method and device based on sequencing learning

Technical Field

The application relates to the technical field of natural language processing, in particular to a deep text matching method and device based on sequencing learning.

Background

Text matching is an important fundamental problem in natural language processing, and many tasks in natural language processing can be abstracted into text matching tasks. For example, web page search may be abstracted as a relevance matching question of web pages and user search Query, auto question answering may be abstracted as a satisfaction matching question of candidate answers and questions, text deduplication may be abstracted as a similarity matching question of text and text.

Traditional text matching technology (such as vector space model algorithm in information retrieval) mainly solves the matching problem at the vocabulary level. In fact, the matching algorithm based on vocabulary contact ratio has great limitation, and can not solve many problems, such as the ambiguous synonymy problem of language, the language combination structure problem (for example, "from beijing to shanghai high-speed railway" and "from shanghai to beijing high-speed railway"), and the asymmetric problem of matching (for example, the language expression form of query end and web page end in the web page search task often have great difference).

After the development of deep learning technology, it has attracted a wide interest to perform text matching calculation based on Word Embedding (Word Embedding vector) trained by neural network. The training mode of Word Embedding is more concise, and the semantic computability of the expression of the obtained Word vector is further enhanced. However, Word Embedding obtained by training only with label-free data is not much different from the topic model technology in the practical effect of matching degree calculation, and both are essentially training based on co-occurrence information. In addition, Word Embedding does not solve the semantic representation problem of phrases and sentences, and does not solve the asymmetry problem of matching.

Based on the above problems, a supervised neural network depth matching Model is proposed so as to improve the effect of Semantic matching computation, such as DSSM (Deep Structured Semantic matching Model), CDSSM (Convolutional Latent Semantic Model), ESIM (enhanced Sequential Inference Model), and the like. The existing deep matching model mostly adopts sentence pair matching during training. However, in the sentence pair mode, for a plurality of sentences similar to the training sentences, the model cannot judge which sentence is more similar, and the final matching effect of the model is further influenced.

Disclosure of Invention

Based on the defects of the existing sentence pair training mode, the application provides a deep text matching method and device based on sequencing learning.

According to a first aspect of the embodiments of the present application, there is provided a deep text matching method based on rank learning, applied to a deep matching model, the method including:

obtaining a sentence pair consisting of a hypothesis sentence and an inference sentence, wherein the inference sentence comprises a positive inference sentence and a plurality of negative inference sentences, and the hypothesis sentence is related to the semantics of the positive inference sentence and is not related to the semantics of the negative inference sentence;

respectively representing the sentences in the sentence pairs by word vectors to obtain a word vector matrix of each sentence in the sentence pairs;

obtaining a similarity matrix of the sentence pairs by using the word vector matrixes, and generating sentence vectors with weighted similarity by combining sentences in the sentence pairs;

respectively calculating matching degree values between the assumed sentences and the positive inference sentences and between the assumed sentences and the sentence vectors corresponding to the negative inference sentences;

calculating a loss value between each statement vector matching degree value and a standard value by using a combined loss function consisting of a Pointwise loss function and a Listwise loss function;

adjusting parameters of the depth matching model according to the loss value;

and performing text matching on the input sentence by using the depth matching model obtained by parameter adjustment.

Optionally, the calculation formula of the joint loss function loss is: loss is L_p+L_l+L₂Regulartization, wherein:

L_pas a function of Pointwise loss, L_p＝max(0,m-s(r^h；r^p+)+s(r^h；r^p-))；L_lIs a function of the Listwise loss,

L₂regulartization isL₂A regularization function; r is^hFor statement vector representation of hypothetical statements, r^p+And r^p-Is a statement vector representation of positive and negative inference statements, s (r), respectively^h；r^p+) Is the cosine similarity of statement vectors corresponding to the hypothesis statement and the positive inference statement, s (r)^h；r^p-) The cosine similarity of statement vectors corresponding to the assumed statement and the negative inference statement is shown, m is a preset threshold value for judging the positive inference statement and the negative inference statement, and n is the number of samples consisting of the positive inference statement and the negative inference statement.

Optionally, obtaining a sentence pair composed of a hypothesis sentence and an inference sentence includes:

selecting two semantic related regular sentences which are taken as an assumed sentence and a regular inference sentence;

selecting a plurality of negative example sentences which are used as negative reasoning sentences and are irrelevant to the semantics of the positive example sentences;

and forming a sentence pair by the two positive example sentences and the negative example sentences.

Optionally, the representing the sentences in the sentence pair by word vectors respectively to obtain a word vector matrix of each sentence in the sentence pair, including:

respectively segmenting words of sentences in the sentence pairs and expressing the words by word vectors to obtain an initial word vector matrix;

and adding the part of speech, the co-occurrence information and the position coding vector to the initial word vector matrix to obtain a word vector matrix of each sentence in the sentence pair.

Optionally, after generating a sentence vector in which the sentences in the sentence pair are weighted in similarity to each other, the method further includes:

and respectively weighting the similarity of the assumed sentences with the positive reasoning sentences and the negative reasoning sentences to obtain sentence vectors, and carrying out normalization processing.

Optionally, generating a sentence vector after weighting similarity of sentences in the sentence pair by using the similarity matrix corresponding to each word vector matrix, including:

generating initial sentence vectors after similarity weighting of sentences in the sentence pairs by using the similarity matrixes corresponding to the word vector matrixes;

and re-encoding the initial statement vectors according to the contexts of the statements corresponding to the initial statement vectors to obtain the statement vectors of the statements in the sentence pair.

Optionally, adjusting parameters of the depth matching model according to the loss value includes:

and adjusting parameters of the depth matching model with the aim of minimizing the loss value. .

According to a second aspect of the embodiments of the present application, there is provided a deep text matching apparatus based on rank learning, applied to a deep matching model, the apparatus including:

sentence pair obtaining module: the method comprises the steps of obtaining a sentence pair consisting of a hypothesis sentence and an inference sentence, wherein the inference sentence comprises a positive inference sentence and a plurality of negative inference sentences, and the hypothesis sentence is related to the semantics of the positive inference sentence and is not related to the semantics of the negative inference sentence;

a word vector representation module: the word vector matrix is used for representing the sentences in the sentence pairs by word vectors respectively to obtain word vector matrixes of the sentences in the sentence pairs;

a similarity weighting module: the similarity matrix of the sentence pairs is obtained by utilizing the word vector matrixes, and sentence vectors with weighted similarity are generated by combining sentences in the sentence pairs;

a loss value calculation module: the system comprises a statement vector matching degree value calculation module, a statement vector matching degree calculation module and a statement vector matching degree calculation module, wherein the statement vector matching degree value calculation module is used for calculating matching degree values between the hypothesis statement and the positive inference statement and between the hypothesis statement and the statement vectors corresponding to the negative inference statements respectively, and calculating loss values between the statement vector matching degree values and standard values by utilizing a combined loss function consisting of a Pointwise loss function and a Listwise loss function;

a model parameter adjustment module: the parameter of the depth matching model is adjusted according to the loss value;

a text matching module: and the method is used for performing text matching on the input sentence by using the depth matching model obtained by parameter adjustment.

According to the technical scheme, when the deep matching model is trained, the deep text matching method and the deep text matching device based on the sequencing learning provided by the embodiment enable the model input sentence pair to not only comprise the hypothesis sentence and the positive inference sentence, but also comprise the sentence pair consisting of a plurality of negative inference sentences irrelevant to the semantics of the hypothesis sentence and the positive inference sentence through the adjustment of the model. In this way, the input sentence pair is expanded into a sentence sequence from two input sentence pairs and contains two types of data of positive examples and negative examples, and due to the fact that the input number and types of the model are expanded, the fitting speed of the model is increased, and the generalization capability of the model is enhanced. In addition, in the embodiment, the parameters of the depth matching model are adjusted by using the loss function, so that the sentence pair with the highest matching degree probability output by the final depth matching model is the hypothesis sentence and the positive inference sentence, and the thought of sequencing is merged, so that the text matching accuracy of the model finally obtained by adjusting the parameters is higher.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a basic flowchart diagram of a deep text matching method based on rank learning according to an embodiment of the present application;

fig. 2 is a schematic diagram of a basic structure of a depth matching model according to an embodiment of the present disclosure;

FIG. 3a is a schematic diagram of a bitwise addition of an augmented information vector to a word vector according to an embodiment of the present application;

FIG. 3b is a diagram illustrating the connection of an added information vector to a word vector according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating differences between a shared weight and an unshared weight when a bidirectional LSTM is used for feature extraction according to an embodiment of the present application;

FIG. 5 is a schematic diagram of feature selection using a convolutional neural network according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram illustrating different output modes for performing feature extraction on a bidirectional LSTM according to an embodiment of the present application;

fig. 7 is a schematic basic structure diagram of a deep text matching apparatus based on rank learning according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Aiming at the problems that the matching similarity is inaccurate due to the fact that the existing deep matching model mostly adopts sentence pair matching, the deep text matching method based on the sequencing learning is applied to the deep matching model, and the method can be applied to various deep matching models.

Fig. 1 is a basic flowchart diagram of a deep text matching method based on rank learning according to an embodiment of the present application.

As shown in fig. 1, the method specifically includes the following steps:

s110: obtaining a sentence pair consisting of a hypothesis sentence and an inference sentence, wherein the inference sentence comprises a positive inference sentence and a plurality of negative inference sentences, and the hypothesis sentence is related to the semantics of the positive inference sentence and is not related to the semantics of the negative inference sentence.

Fig. 2 is a schematic diagram of a basic structure of a depth matching model according to an embodiment of the present disclosure. As shown in fig. 2, the depth matching model mainly includes an input layer, a presentation layer, an interaction layer, a feature selection layer, a coding layer, a matching layer, and an output layer, and it should be noted that the method provided in this embodiment is not limited to the depth matching model of the structure, and may be of other structures.

During model training, the input sentence pair usually comprises two sentences marked as sentence A and sentence B in the existing mode, and the problem of low accuracy of matching results exists. Therefore, in addition to the input of sentences a and B, the present embodiment also inputs several sentences which are not semantically related to sentences a and B, wherein sentences a and B are considered as positive examples, i.e. assumed sentences and positive inference sentences in the present embodiment, and several sentences which are not semantically related are considered as negative examples, i.e. negative inference sentences. In addition, the number of negative examples in this embodiment is not limited, and the negative examples may be examples randomly generated in other matching sentence pairs.

For example, the input sentence sample is as follows:

suppose the statement: today's sunshine;

positive reasoning statements: today, the weather is very good;

negative inference statement 1: heavy rain today;

negative inference statement 2: … …

Further, since the depth matching model is performed separately for encoding each sentence, in order to increase the data input amount, the present embodiment performs twice input of the sentence a and the sentence B by exchanging the roles in sequence, specifically as follows:

firstly, selecting two semantic related regular sentences which are used as an assumed sentence and a regular inference sentence, such as a sentence A and a sentence B; then, selecting a plurality of negative example sentences which are used as negative reasoning sentences and are irrelevant to the semantics of the positive example sentences, such as sentences C and D … …; and finally, selecting one positive example sentence from the two positive example sentences as an assumed sentence, selecting the other positive example sentence as a positive reasoning sentence, and forming a sentence pair with each negative example sentence. Thus, the input sentence pair includes < sentence a, sentence B, sentence C, sentence D … … >, < sentence B, sentence a, sentence C, sentence D … … >. Then, the sentence in each sentence pair is subjected to word segmentation processing.

S120: and respectively representing the sentences in the sentence pairs by word vectors to obtain a word vector matrix of each sentence in the sentence pairs.

The method comprises the steps of segmenting input data into words, and then representing the words by using a trained Word Embedding model, wherein the Word Embedding model uses Word2vec models, glove models and the like.

In order to increase the amount of input information, the present embodiment adds some information vectors on the basis of the word vectors, wherein the information vectors include parts of speech, co-occurrence information, and position encoding vectors. Specifically, the expression method of each vector is as follows:

part-of-speech vectors: each part of speech is represented by a random vector with a fixed length

Co-occurrence information vector: co-occurrence information refers to words that co-occur in hypothesized and inferred sentences, such as the word "today" in hypothesized and positively inferred sentences described above. In the present embodiment, the co-occurrence information is represented by three types, i.e., 0, 1, and 2, where 0: the added sentence dimension representing < PAD >, i.e. the sentence has no value at this position, in order to put in the null value filled by the depth matching model; 1: representing that the word co-occurs in sentences and words; 2: meaning that the term does not co-occur in the hypothesis and inference sentences. The present embodiment sets the co-occurrence information vector as a one-dimensional long vector.

Position-coding vector: the position code can be calculated by formula and expressed by random initialized vector which can be learned.

Specifically, the position-coding vector calculated according to the formula may adopt the following formula:

in formulas (1) and (2), pos represents the position of the participle in the input sentence, d₁Representing the dimension of the word vector, C being the period coefficient, PE_(pos2i)Position coding, PE, of 2 i-dimension of a participle representing a pos-th position_(pos2i+1)Position coding of 2i +1 dimension of the participle representing the pos position.

In addition, when the position-coding vector is expressed by using a learnable randomly initialized vector, a randomly initialized vector may be input to the model, and the model may learn to adjust the vector to a reasonable value by itself and use the adjusted vector as the position-coding vector.

After the part of speech, the co-occurrence information, and the position coding vector are obtained, the part of speech, the co-occurrence information, and the position coding vector can be added to a Word vector, wherein the Word Embedding is used to name an initial Word vector, and the Word Embedding is used to obtain a Word vector. Specifically, the vector and the initial word vector may be added in an adding manner, fig. 3a is a schematic diagram of adding the added information vector and the word vector in a bit manner according to the embodiment of the present application, or the vector may be connected to the initial word vector to form a longer vector, and fig. 3b is a schematic diagram of connecting the added information vector to the word vector according to the embodiment of the present application.

S130: and generating sentence vectors after the similarity of sentences in the sentence pairs is weighted by using the similarity matrix corresponding to each word vector matrix.

In the interaction layer corresponding to the model in fig. 2, an Attention mechanism is used to obtain a similarity matrix of each sentence pair, and in this embodiment, a word vector representation matrix of two sentences is used to perform matrix multiplication to obtain the matrix. And regenerating the representation of the hypothesis H and the inference P in the sentence pair according to the similarity matrix, which can also be understood as re-encoding under the current context after the word vector representation to obtain a new sentence representation.

The following equations (3) and (4).

In formulas (3) and (4), len (h) and len (p) respectively refer to the lengths of two sentences,

and

for the purpose of the weighted sentence representation,

and

and e is represented by the original sentence, and is the weight and is obtained by the corresponding value of the similarity matrix.

It should be noted that various sentence interaction mechanisms can be used in the present embodiment. This example uses a two-way LSTM (Long-Short-Term Memory) structure, which is expressed as follows:

y_t＝g(VA_t+V′A′_t) Formula (5)

A_t＝f(Ux_t+WA_t-1) Formula (6)

A′_t＝f(U′x_t+W′A_t-1) Formula (7)

In equations (5) to (7), V, V ', U ', U, W, W ' are weight matrices, f and g are activation functions, x is an input, a is a hidden state parameter, y is an output, and t is a time.

By using the bidirectional LSTM structure, firstly, performing word alignment on two sentences in each sentence pair to obtain a similarity matrix between the two sentences; then, local reasoning of the two sentences is carried out, and sentences with weighted similarity are generated by combining the two sentences in the sentence pair by using the obtained similarity matrix. In addition, the bidirectional LSTM herein may also use a tree LSTM if sentence parsing is possible.

S140: and calculating the loss value of the preset loss function according to the matching degree value between the sentence vectors.

In the matching layer and the output layer corresponding to the model in fig. 2, the matching degree values of the statement vector of the hypothesis statement H and the statement vector of the inference statement P in each sentence pair obtained above are respectively calculated, so that N output values can be obtained, such as Score1 and Score2 … … ScoreN in fig. 2, where N is the number of all inference statements, including positive examples and negative examples. Then, a loss function may be calculated according to the sorted results of the N output values, the model parameters may be adjusted, and training may be continued, wherein, in order to reduce the amount of calculation, only whether the matching degree is the highest is the hypothesis statement and the positive inference statement.

In order to better evaluate the matching degree value, the embodiment fuses concepts of Pointwise and Listwise, and specifically, calculates a difference value between the matching degree value of each statement vector and a standard value by using a joint loss function composed of a Pointwise loss function and a Listwise loss function, and adjusts a parameter of a depth matching model according to the difference value. Wherein, the calculation formula of the Pointwise loss function is as follows:

L_p＝max(0,m-s(r^h；r^p+)+s(r^h；r^p-) Equation (8)

In the formula (8), s (r)^h；r^p+) Is the cosine similarity of statement vectors corresponding to the hypothesis statement and the positive inference word, s (r)^h；r^p) M is a preset threshold value for judging positive and negative reasoning sentences, and n is the number of samples consisting of the positive reasoning sentences and the negative reasoning sentences.

According to the formula, the Pointwise loss function has a large corresponding loss value when the matching degree of the hypothesis statement and the positive inference statement is low, and has a large corresponding loss value when the matching degree of the hypothesis statement and the negative inference statement is high. Therefore, the Pointwise loss function alone has a good ordering effect, but the similarity value is not accurate enough. For the above reasons, the embodiment further combines a Listwise loss function, and the calculation formula is as follows:

to prevent overfitting to the model, this embodiment adds L to the loss function₂Canonical (L)₂Regularisation), the resulting combined loss function loss is as follows:

loss＝L_p+L_l+L₂regulartization formula (10)

S150: and adjusting parameters of the depth matching model according to the loss value.

Specifically, in the training process, the model is continuously trained with the goal of minimizing the loss value, so as to obtain the final depth matching model

S160: and performing text matching on the input sentence by using the finally obtained depth matching model through parameter adjustment.

For example, the deep matching model obtained by continuous parameter adjustment can be used for inputting the sentences in the test set into the model for text matching, and the matching accuracy of the sentences can be calculated.

In the deep text matching method based on rank learning provided by this embodiment, when the deep matching model is trained, and when the deep matching model is trained, the model is adjusted, so that the model input sentence pair includes not only the hypothesis sentence and the positive inference sentence, but also a sentence pair composed of a plurality of negative inference sentences unrelated to the semantics of the hypothesis sentence and the positive inference sentence. In this way, the two input sentence pairs are expanded into a sentence pair sequence and contain two types of data of positive examples and negative examples, the input quantity and types of the model are expanded, the fitting speed of the model is further accelerated, and the generalization capability of the model is enhanced. In addition, in this embodiment, the ordering idea is further integrated into the model, and when the parameters of the depth matching model are adjusted by using the loss function, the sentence pair with the highest matching degree probability output by the depth matching model is targeted to the hypothesis sentence and the positive inference sentence, so that the text matching accuracy of the model after the parameters are adjusted is higher. Finally, the embodiment also fuses an Attention mechanism to generate a sentence vector after the similarity of two sentences in each sentence pair is weighted, and because words between the two sentences in each sentence pair are correlated, the performance level of the model can be improved.

As shown in fig. 2, the depth matching model provided in this embodiment includes a representation layer, a feature selection layer, and a coding layer in addition to data processing of an input layer, an interaction layer, a matching layer, and an output layer, and correspondingly, the depth text matching method includes the following steps in addition to the above steps:

first, after the two sentences in each sentence pair are represented by word vectors in step S120, the method further includes a step of feature extraction, that is, in the representation layer, each word vector is encoded again according to the context in the sentence in which the word vector is located, so as to obtain a new word vector representation of the sentence in the sentence pair.

Specifically, the step may be performed by using a variety of feature extraction structures, such as a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), an Attention mechanism, and the like. In this embodiment, a bidirectional LSTM structure is still adopted, and fig. 4 is a schematic diagram illustrating a difference between a shared weight and a non-shared weight when performing feature extraction using a bidirectional LSTM according to the embodiment of the present application, as shown in fig. 4, during feature extraction, hypothesis and inference statements may or may not share a weight, and in a specific implementation process, selection may be performed according to a training speed requirement and a training data amount.

Further, since the sentences in each sentence pair are represented by word vectors in step S120, the assumed sentences can obtain N word vectors corresponding to the N inference sentences, and in order to facilitate subsequent operations, in this embodiment, in the feature selection layer, it is also necessary to perform normalization processing on the contents of the N word vectors obtained by the assumed sentences.

As shown in fig. 2, the model adopts the most basic averaging method:

in equation (11), N is the number of all hypothetical sentences,

is a word vector representation of a hypothesis that represents the output of the layer.

Of course, in a specific implementation process, in addition to the above manner, a manner of performing weighted summation on the model by using a learnable weight value may be used, or feature extraction may be performed by using a convolutional neural network, a recurrent neural network, or the like. Fig. 5 is a schematic diagram of feature selection using a convolutional neural network according to an embodiment of the present disclosure, and as shown in fig. 5, a plurality of word vectors are transversely spliced, represented by convolution using the convolutional neural network, and then output in a pooling manner.

Further, after generating sentence vectors with weighted similarity between sentences in the sentence pairs by using the similarity matrix corresponding to each word vector matrix, the following processing is also performed:

and recoding the word vectors according to the context of the sentence in which the word vectors are positioned to obtain a new word vector representation of the sentence in the sentence pair.

Specifically, the present embodiment also uses a bidirectional LSTM structure for feature extraction and encoding, and fig. 6 is a schematic diagram of different output modes when performing feature extraction on the bidirectional LSTM provided in the present embodiment, as shown in fig. 6, in the present embodiment, a hidden state result output to the LSTM structure may be used as a new word vector representation, or, an output at each time of the bidirectional LSTM may be used to respectively perform maximum value and mean value by bit and connect as a new word vector representation.

The deep matching model obtained by training by the method can reach 94% of accuracy in the existing financial corpus test set, and the accuracy of the traditional model is only 88% in the same training set and test set. Therefore, the method provided by the embodiment improves the model training process in a series, and the experimental result proves that the model effect obtained by the training of the method is higher than that obtained by the conventional method.

Based on the method, the embodiment also provides a deep text matching device based on the sequencing learning. Fig. 7 is a schematic basic structure diagram of a deep text matching apparatus based on rank learning according to an embodiment of the present application, and as shown in fig. 7, the apparatus includes:

sentence pair acquisition module 710: the method is used for acquiring sentence pairs consisting of hypothesis sentences and inference sentences, wherein the inference sentences comprise positive inference sentences and a plurality of negative inference sentences, and the hypothesis sentences are related to semantics of the positive inference sentences and unrelated to semantics of the negative inference sentences.

Word vector representation module 720: and the word vector matrix is used for representing the sentences in the sentence pairs by word vectors respectively to obtain the word vector matrix of the sentences in the sentence pairs.

The similarity weighting module 730: and the sentence vector is used for generating the sentence vectors after weighting the similarity of the sentences in the sentence pairs by utilizing the similarity matrix corresponding to each word vector matrix.

Loss value calculation module 740: and the loss value of the preset loss function is calculated according to the matching degree between the sentence vectors.

Model parameter adjustment module 750: and the parameter of the depth matching model is adjusted according to the loss value.

Text matching module 760: and the method is used for performing text matching on the input sentence by utilizing the finally obtained depth matching model through parameter adjustment.

Further, the loss value calculation module 740 further includes:

a similarity calculation unit: the sentence vector matching degree value is used for respectively calculating the sentence vector matching degree values between the hypothesis sentence and the positive inference sentence and between the negative inference sentences;

a loss calculation unit: and the loss value between each statement vector matching degree value and a standard value is calculated by using a joint loss function consisting of a Pointwise loss function and a Listwise loss function.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The foregoing is merely a detailed description of the invention, and it should be noted that modifications and adaptations by those skilled in the art may be made without departing from the principles of the invention, and should be considered as within the scope of the invention.

Claims

1. A deep text matching method based on ranking learning is applied to a deep matching model and is characterized by comprising the following steps:

calculating a loss value between each statement vector matching degree value and a standard value by using a combined loss function consisting of a Pointwise loss function and a Listwise loss function; wherein, the calculation formula of the joint loss function loss is as follows: loss is L_p+L_l+L₂Regularization，L_pAs a function of Pointwise loss, L_p＝max(0,m-s(r^h；r^p+)+s(r^h；r^p-))；L_lIs a function of the Listwise loss,

L₂regulartization to L₂A regularization function; r is^hFor statement vector representation of hypothetical statements, r^p+And r^p-Is a statement vector representation of positive and negative inference statements, s (r), respectively^h；r^p+) Is the cosine similarity of statement vectors corresponding to the hypothesis statement and the positive inference statement, s (r)^h；r^p-) The cosine similarity of statement vectors corresponding to the hypothesis statement and the negative inference statement is obtained, m is a preset threshold value for judging the positive inference statement and the negative inference statement, and n is the number of samples consisting of the positive inference statement and the negative inference statement;

adjusting parameters of the depth matching model according to the loss value;

and performing text matching on the input sentence by using the depth matching model obtained by adjusting the parameters.

2. The method of claim 1, wherein obtaining sentence pairs consisting of hypothesis sentences and inference sentences comprises:

3. The method of claim 1, wherein representing the sentences in the sentence pairs with word vectors respectively to obtain a word vector matrix for each sentence in the sentence pairs comprises:

4. The method of claim 1, wherein after generating a mutually similarity weighted sentence vector in conjunction with the sentences in the sentence pair, the method further comprises:

and normalizing the statement vectors obtained by weighting the similarity of the assumed statement and each positive inference statement and each negative inference statement respectively.

5. The method of claim 1, wherein obtaining a similarity matrix of the sentence pairs by using each word vector matrix, and generating, in combination with sentences in the sentence pairs, sentence vectors with weighted similarity to each other, comprises:

6. The method of claim 1, wherein adjusting parameters of the depth matching model based on the penalty value comprises:

and adjusting parameters of the depth matching model with the aim of minimizing the loss value.

7. A deep text matching device based on rank learning is applied to a deep matching model, and is characterized in that the device comprises:

a loss value calculation module: the system comprises a statement vector matching degree value calculation module, a statement vector matching degree calculation module and a statement vector matching degree calculation module, wherein the statement vector matching degree value calculation module is used for calculating matching degree values between the hypothesis statement and the positive inference statement and between the hypothesis statement and the statement vectors corresponding to the negative inference statements respectively, and calculating loss values between the statement vector matching degree values and standard values by utilizing a combined loss function consisting of a Pointwise loss function and a Listwise loss function; wherein, the calculation formula of the joint loss function loss is as follows: loss is L_p+L_l+L₂Regularization，L_pAs a function of Pointwise loss, L_p＝max(0,m-s(r^h；r^p+)+s(r^h；r^p-))；L_lIs a function of the Listwise loss,

a text matching module: and the depth matching model is used for performing text matching on the input sentence by using the depth matching model obtained by adjusting the parameters.