CN110019685B - Deep text matching method and device based on sequencing learning - Google Patents

Deep text matching method and device based on sequencing learning Download PDF

Info

Publication number
CN110019685B
CN110019685B CN201910285853.7A CN201910285853A CN110019685B CN 110019685 B CN110019685 B CN 110019685B CN 201910285853 A CN201910285853 A CN 201910285853A CN 110019685 B CN110019685 B CN 110019685B
Authority
CN
China
Prior art keywords
sentence
statement
inference
sentences
positive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910285853.7A
Other languages
Chinese (zh)
Other versions
CN110019685A (en
Inventor
李健铨
刘小康
刘子博
晋耀红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Science and Technology (Beijing) Co., Ltd.
Original Assignee
Dingfu Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dingfu Intelligent Technology Co Ltd filed Critical Dingfu Intelligent Technology Co Ltd
Priority to CN201910285853.7A priority Critical patent/CN110019685B/en
Publication of CN110019685A publication Critical patent/CN110019685A/en
Application granted granted Critical
Publication of CN110019685B publication Critical patent/CN110019685B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The application provides a deep text matching method and device based on sequencing learning, and specifically comprises the steps of firstly, obtaining a sentence pair consisting of a hypothesis sentence and an inference sentence, wherein the inference sentence comprises a positive inference sentence and a plurality of negative inference sentences, and the hypothesis sentence is related to the semantics of the positive inference sentence and is not related to the semantics of the negative inference sentence; then, after the sentences in the sentence pairs are correspondingly processed, sentence vectors are formed, loss values of preset loss functions are calculated according to the matching degree values among the sentence vectors, and parameters of a deep matching model are adjusted according to the loss values; and finally, performing text matching on the input sentence by using the finally obtained depth matching model through parameter adjustment. According to the method and the device, the input sentence pair is expanded into the sentence sequence from two sentence pairs and contains two types of data of positive examples and negative examples, and due to the fact that the input quantity and the type of the model are expanded, the fitting speed of the model is accelerated, and the matching accuracy of the model is improved.

Description

Deep text matching method and device based on sequencing learning
Technical Field
The application relates to the technical field of natural language processing, in particular to a deep text matching method and device based on sequencing learning.
Background
Text matching is an important fundamental problem in natural language processing, and many tasks in natural language processing can be abstracted into text matching tasks. For example, web page search may be abstracted as a relevance matching question of web pages and user search Query, auto question answering may be abstracted as a satisfaction matching question of candidate answers and questions, text deduplication may be abstracted as a similarity matching question of text and text.
Traditional text matching technology (such as vector space model algorithm in information retrieval) mainly solves the matching problem at the vocabulary level. In fact, the matching algorithm based on vocabulary contact ratio has great limitation, and can not solve many problems, such as the ambiguous synonymy problem of language, the language combination structure problem (for example, "from beijing to shanghai high-speed railway" and "from shanghai to beijing high-speed railway"), and the asymmetric problem of matching (for example, the language expression form of query end and web page end in the web page search task often have great difference).
After the development of deep learning technology, it has attracted a wide interest to perform text matching calculation based on Word Embedding (Word Embedding vector) trained by neural network. The training mode of Word Embedding is more concise, and the semantic computability of the expression of the obtained Word vector is further enhanced. However, Word Embedding obtained by training only with label-free data is not much different from the topic model technology in the practical effect of matching degree calculation, and both are essentially training based on co-occurrence information. In addition, Word Embedding does not solve the semantic representation problem of phrases and sentences, and does not solve the asymmetry problem of matching.
Based on the above problems, a supervised neural network depth matching Model is proposed so as to improve the effect of Semantic matching computation, such as DSSM (Deep Structured Semantic matching Model), CDSSM (Convolutional Latent Semantic Model), ESIM (enhanced Sequential Inference Model), and the like. The existing deep matching model mostly adopts sentence pair matching during training. However, in the sentence pair mode, for a plurality of sentences similar to the training sentences, the model cannot judge which sentence is more similar, and the final matching effect of the model is further influenced.
Disclosure of Invention
Based on the defects of the existing sentence pair training mode, the application provides a deep text matching method and device based on sequencing learning.
According to a first aspect of the embodiments of the present application, there is provided a deep text matching method based on rank learning, applied to a deep matching model, the method including:
obtaining a sentence pair consisting of a hypothesis sentence and an inference sentence, wherein the inference sentence comprises a positive inference sentence and a plurality of negative inference sentences, and the hypothesis sentence is related to the semantics of the positive inference sentence and is not related to the semantics of the negative inference sentence;
respectively representing the sentences in the sentence pairs by word vectors to obtain a word vector matrix of each sentence in the sentence pairs;
obtaining a similarity matrix of the sentence pairs by using the word vector matrixes, and generating sentence vectors with weighted similarity by combining sentences in the sentence pairs;
respectively calculating matching degree values between the assumed sentences and the positive inference sentences and between the assumed sentences and the sentence vectors corresponding to the negative inference sentences;
calculating a loss value between each statement vector matching degree value and a standard value by using a combined loss function consisting of a Pointwise loss function and a Listwise loss function;
adjusting parameters of the depth matching model according to the loss value;
and performing text matching on the input sentence by using the depth matching model obtained by parameter adjustment.
Optionally, the calculation formula of the joint loss function loss is: loss is Lp+Ll+L2Regulartization, wherein:
Lpas a function of Pointwise loss, Lp=max(0,m-s(rh;rp+)+s(rh;rp-));LlIs a function of the Listwise loss,
Figure GDA0002994856630000021
L2regulartization isL2A regularization function; r ishFor statement vector representation of hypothetical statements, rp+And rp-Is a statement vector representation of positive and negative inference statements, s (r), respectivelyh;rp+) Is the cosine similarity of statement vectors corresponding to the hypothesis statement and the positive inference statement, s (r)h;rp-) The cosine similarity of statement vectors corresponding to the assumed statement and the negative inference statement is shown, m is a preset threshold value for judging the positive inference statement and the negative inference statement, and n is the number of samples consisting of the positive inference statement and the negative inference statement.
Optionally, obtaining a sentence pair composed of a hypothesis sentence and an inference sentence includes:
selecting two semantic related regular sentences which are taken as an assumed sentence and a regular inference sentence;
selecting a plurality of negative example sentences which are used as negative reasoning sentences and are irrelevant to the semantics of the positive example sentences;
and forming a sentence pair by the two positive example sentences and the negative example sentences.
Optionally, the representing the sentences in the sentence pair by word vectors respectively to obtain a word vector matrix of each sentence in the sentence pair, including:
respectively segmenting words of sentences in the sentence pairs and expressing the words by word vectors to obtain an initial word vector matrix;
and adding the part of speech, the co-occurrence information and the position coding vector to the initial word vector matrix to obtain a word vector matrix of each sentence in the sentence pair.
Optionally, after generating a sentence vector in which the sentences in the sentence pair are weighted in similarity to each other, the method further includes:
and respectively weighting the similarity of the assumed sentences with the positive reasoning sentences and the negative reasoning sentences to obtain sentence vectors, and carrying out normalization processing.
Optionally, generating a sentence vector after weighting similarity of sentences in the sentence pair by using the similarity matrix corresponding to each word vector matrix, including:
generating initial sentence vectors after similarity weighting of sentences in the sentence pairs by using the similarity matrixes corresponding to the word vector matrixes;
and re-encoding the initial statement vectors according to the contexts of the statements corresponding to the initial statement vectors to obtain the statement vectors of the statements in the sentence pair.
Optionally, adjusting parameters of the depth matching model according to the loss value includes:
and adjusting parameters of the depth matching model with the aim of minimizing the loss value. .
According to a second aspect of the embodiments of the present application, there is provided a deep text matching apparatus based on rank learning, applied to a deep matching model, the apparatus including:
sentence pair obtaining module: the method comprises the steps of obtaining a sentence pair consisting of a hypothesis sentence and an inference sentence, wherein the inference sentence comprises a positive inference sentence and a plurality of negative inference sentences, and the hypothesis sentence is related to the semantics of the positive inference sentence and is not related to the semantics of the negative inference sentence;
a word vector representation module: the word vector matrix is used for representing the sentences in the sentence pairs by word vectors respectively to obtain word vector matrixes of the sentences in the sentence pairs;
a similarity weighting module: the similarity matrix of the sentence pairs is obtained by utilizing the word vector matrixes, and sentence vectors with weighted similarity are generated by combining sentences in the sentence pairs;
a loss value calculation module: the system comprises a statement vector matching degree value calculation module, a statement vector matching degree calculation module and a statement vector matching degree calculation module, wherein the statement vector matching degree value calculation module is used for calculating matching degree values between the hypothesis statement and the positive inference statement and between the hypothesis statement and the statement vectors corresponding to the negative inference statements respectively, and calculating loss values between the statement vector matching degree values and standard values by utilizing a combined loss function consisting of a Pointwise loss function and a Listwise loss function;
a model parameter adjustment module: the parameter of the depth matching model is adjusted according to the loss value;
a text matching module: and the method is used for performing text matching on the input sentence by using the depth matching model obtained by parameter adjustment.
According to the technical scheme, when the deep matching model is trained, the deep text matching method and the deep text matching device based on the sequencing learning provided by the embodiment enable the model input sentence pair to not only comprise the hypothesis sentence and the positive inference sentence, but also comprise the sentence pair consisting of a plurality of negative inference sentences irrelevant to the semantics of the hypothesis sentence and the positive inference sentence through the adjustment of the model. In this way, the input sentence pair is expanded into a sentence sequence from two input sentence pairs and contains two types of data of positive examples and negative examples, and due to the fact that the input number and types of the model are expanded, the fitting speed of the model is increased, and the generalization capability of the model is enhanced. In addition, in the embodiment, the parameters of the depth matching model are adjusted by using the loss function, so that the sentence pair with the highest matching degree probability output by the final depth matching model is the hypothesis sentence and the positive inference sentence, and the thought of sequencing is merged, so that the text matching accuracy of the model finally obtained by adjusting the parameters is higher.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a basic flowchart diagram of a deep text matching method based on rank learning according to an embodiment of the present application;
fig. 2 is a schematic diagram of a basic structure of a depth matching model according to an embodiment of the present disclosure;
FIG. 3a is a schematic diagram of a bitwise addition of an augmented information vector to a word vector according to an embodiment of the present application;
FIG. 3b is a diagram illustrating the connection of an added information vector to a word vector according to an embodiment of the present application;
FIG. 4 is a schematic diagram illustrating differences between a shared weight and an unshared weight when a bidirectional LSTM is used for feature extraction according to an embodiment of the present application;
FIG. 5 is a schematic diagram of feature selection using a convolutional neural network according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram illustrating different output modes for performing feature extraction on a bidirectional LSTM according to an embodiment of the present application;
fig. 7 is a schematic basic structure diagram of a deep text matching apparatus based on rank learning according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
Aiming at the problems that the matching similarity is inaccurate due to the fact that the existing deep matching model mostly adopts sentence pair matching, the deep text matching method based on the sequencing learning is applied to the deep matching model, and the method can be applied to various deep matching models.
Fig. 1 is a basic flowchart diagram of a deep text matching method based on rank learning according to an embodiment of the present application.
As shown in fig. 1, the method specifically includes the following steps:
s110: obtaining a sentence pair consisting of a hypothesis sentence and an inference sentence, wherein the inference sentence comprises a positive inference sentence and a plurality of negative inference sentences, and the hypothesis sentence is related to the semantics of the positive inference sentence and is not related to the semantics of the negative inference sentence.
Fig. 2 is a schematic diagram of a basic structure of a depth matching model according to an embodiment of the present disclosure. As shown in fig. 2, the depth matching model mainly includes an input layer, a presentation layer, an interaction layer, a feature selection layer, a coding layer, a matching layer, and an output layer, and it should be noted that the method provided in this embodiment is not limited to the depth matching model of the structure, and may be of other structures.
During model training, the input sentence pair usually comprises two sentences marked as sentence A and sentence B in the existing mode, and the problem of low accuracy of matching results exists. Therefore, in addition to the input of sentences a and B, the present embodiment also inputs several sentences which are not semantically related to sentences a and B, wherein sentences a and B are considered as positive examples, i.e. assumed sentences and positive inference sentences in the present embodiment, and several sentences which are not semantically related are considered as negative examples, i.e. negative inference sentences. In addition, the number of negative examples in this embodiment is not limited, and the negative examples may be examples randomly generated in other matching sentence pairs.
For example, the input sentence sample is as follows:
suppose the statement: today's sunshine;
positive reasoning statements: today, the weather is very good;
negative inference statement 1: heavy rain today;
negative inference statement 2: … …
Further, since the depth matching model is performed separately for encoding each sentence, in order to increase the data input amount, the present embodiment performs twice input of the sentence a and the sentence B by exchanging the roles in sequence, specifically as follows:
firstly, selecting two semantic related regular sentences which are used as an assumed sentence and a regular inference sentence, such as a sentence A and a sentence B; then, selecting a plurality of negative example sentences which are used as negative reasoning sentences and are irrelevant to the semantics of the positive example sentences, such as sentences C and D … …; and finally, selecting one positive example sentence from the two positive example sentences as an assumed sentence, selecting the other positive example sentence as a positive reasoning sentence, and forming a sentence pair with each negative example sentence. Thus, the input sentence pair includes < sentence a, sentence B, sentence C, sentence D … … >, < sentence B, sentence a, sentence C, sentence D … … >. Then, the sentence in each sentence pair is subjected to word segmentation processing.
S120: and respectively representing the sentences in the sentence pairs by word vectors to obtain a word vector matrix of each sentence in the sentence pairs.
The method comprises the steps of segmenting input data into words, and then representing the words by using a trained Word Embedding model, wherein the Word Embedding model uses Word2vec models, glove models and the like.
In order to increase the amount of input information, the present embodiment adds some information vectors on the basis of the word vectors, wherein the information vectors include parts of speech, co-occurrence information, and position encoding vectors. Specifically, the expression method of each vector is as follows:
part-of-speech vectors: each part of speech is represented by a random vector with a fixed length
Co-occurrence information vector: co-occurrence information refers to words that co-occur in hypothesized and inferred sentences, such as the word "today" in hypothesized and positively inferred sentences described above. In the present embodiment, the co-occurrence information is represented by three types, i.e., 0, 1, and 2, where 0: the added sentence dimension representing < PAD >, i.e. the sentence has no value at this position, in order to put in the null value filled by the depth matching model; 1: representing that the word co-occurs in sentences and words; 2: meaning that the term does not co-occur in the hypothesis and inference sentences. The present embodiment sets the co-occurrence information vector as a one-dimensional long vector.
Position-coding vector: the position code can be calculated by formula and expressed by random initialized vector which can be learned.
Specifically, the position-coding vector calculated according to the formula may adopt the following formula:
Figure GDA0002994856630000051
Figure GDA0002994856630000052
in formulas (1) and (2), pos represents the position of the participle in the input sentence, d1Representing the dimension of the word vector, C being the period coefficient, PE(pos2i)Position coding, PE, of 2 i-dimension of a participle representing a pos-th position(pos2i+1)Position coding of 2i +1 dimension of the participle representing the pos position.
In addition, when the position-coding vector is expressed by using a learnable randomly initialized vector, a randomly initialized vector may be input to the model, and the model may learn to adjust the vector to a reasonable value by itself and use the adjusted vector as the position-coding vector.
After the part of speech, the co-occurrence information, and the position coding vector are obtained, the part of speech, the co-occurrence information, and the position coding vector can be added to a Word vector, wherein the Word Embedding is used to name an initial Word vector, and the Word Embedding is used to obtain a Word vector. Specifically, the vector and the initial word vector may be added in an adding manner, fig. 3a is a schematic diagram of adding the added information vector and the word vector in a bit manner according to the embodiment of the present application, or the vector may be connected to the initial word vector to form a longer vector, and fig. 3b is a schematic diagram of connecting the added information vector to the word vector according to the embodiment of the present application.
S130: and generating sentence vectors after the similarity of sentences in the sentence pairs is weighted by using the similarity matrix corresponding to each word vector matrix.
In the interaction layer corresponding to the model in fig. 2, an Attention mechanism is used to obtain a similarity matrix of each sentence pair, and in this embodiment, a word vector representation matrix of two sentences is used to perform matrix multiplication to obtain the matrix. And regenerating the representation of the hypothesis H and the inference P in the sentence pair according to the similarity matrix, which can also be understood as re-encoding under the current context after the word vector representation to obtain a new sentence representation.
The following equations (3) and (4).
Figure GDA0002994856630000053
Figure GDA0002994856630000054
In formulas (3) and (4), len (h) and len (p) respectively refer to the lengths of two sentences,
Figure GDA0002994856630000055
and
Figure GDA0002994856630000056
for the purpose of the weighted sentence representation,
Figure GDA0002994856630000057
and
Figure GDA0002994856630000058
and e is represented by the original sentence, and is the weight and is obtained by the corresponding value of the similarity matrix.
It should be noted that various sentence interaction mechanisms can be used in the present embodiment. This example uses a two-way LSTM (Long-Short-Term Memory) structure, which is expressed as follows:
yt=g(VAt+V′A′t) Formula (5)
At=f(Uxt+WAt-1) Formula (6)
A′t=f(U′xt+W′At-1) Formula (7)
In equations (5) to (7), V, V ', U ', U, W, W ' are weight matrices, f and g are activation functions, x is an input, a is a hidden state parameter, y is an output, and t is a time.
By using the bidirectional LSTM structure, firstly, performing word alignment on two sentences in each sentence pair to obtain a similarity matrix between the two sentences; then, local reasoning of the two sentences is carried out, and sentences with weighted similarity are generated by combining the two sentences in the sentence pair by using the obtained similarity matrix. In addition, the bidirectional LSTM herein may also use a tree LSTM if sentence parsing is possible.
S140: and calculating the loss value of the preset loss function according to the matching degree value between the sentence vectors.
In the matching layer and the output layer corresponding to the model in fig. 2, the matching degree values of the statement vector of the hypothesis statement H and the statement vector of the inference statement P in each sentence pair obtained above are respectively calculated, so that N output values can be obtained, such as Score1 and Score2 … … ScoreN in fig. 2, where N is the number of all inference statements, including positive examples and negative examples. Then, a loss function may be calculated according to the sorted results of the N output values, the model parameters may be adjusted, and training may be continued, wherein, in order to reduce the amount of calculation, only whether the matching degree is the highest is the hypothesis statement and the positive inference statement.
In order to better evaluate the matching degree value, the embodiment fuses concepts of Pointwise and Listwise, and specifically, calculates a difference value between the matching degree value of each statement vector and a standard value by using a joint loss function composed of a Pointwise loss function and a Listwise loss function, and adjusts a parameter of a depth matching model according to the difference value. Wherein, the calculation formula of the Pointwise loss function is as follows:
Lp=max(0,m-s(rh;rp+)+s(rh;rp-) Equation (8)
In the formula (8), s (r)h;rp+) Is the cosine similarity of statement vectors corresponding to the hypothesis statement and the positive inference word, s (r)h;rp) M is a preset threshold value for judging positive and negative reasoning sentences, and n is the number of samples consisting of the positive reasoning sentences and the negative reasoning sentences.
According to the formula, the Pointwise loss function has a large corresponding loss value when the matching degree of the hypothesis statement and the positive inference statement is low, and has a large corresponding loss value when the matching degree of the hypothesis statement and the negative inference statement is high. Therefore, the Pointwise loss function alone has a good ordering effect, but the similarity value is not accurate enough. For the above reasons, the embodiment further combines a Listwise loss function, and the calculation formula is as follows:
Figure GDA0002994856630000061
to prevent overfitting to the model, this embodiment adds L to the loss function2Canonical (L)2Regularisation), the resulting combined loss function loss is as follows:
loss=Lp+Ll+L2regulartization formula (10)
S150: and adjusting parameters of the depth matching model according to the loss value.
Specifically, in the training process, the model is continuously trained with the goal of minimizing the loss value, so as to obtain the final depth matching model
S160: and performing text matching on the input sentence by using the finally obtained depth matching model through parameter adjustment.
For example, the deep matching model obtained by continuous parameter adjustment can be used for inputting the sentences in the test set into the model for text matching, and the matching accuracy of the sentences can be calculated.
In the deep text matching method based on rank learning provided by this embodiment, when the deep matching model is trained, and when the deep matching model is trained, the model is adjusted, so that the model input sentence pair includes not only the hypothesis sentence and the positive inference sentence, but also a sentence pair composed of a plurality of negative inference sentences unrelated to the semantics of the hypothesis sentence and the positive inference sentence. In this way, the two input sentence pairs are expanded into a sentence pair sequence and contain two types of data of positive examples and negative examples, the input quantity and types of the model are expanded, the fitting speed of the model is further accelerated, and the generalization capability of the model is enhanced. In addition, in this embodiment, the ordering idea is further integrated into the model, and when the parameters of the depth matching model are adjusted by using the loss function, the sentence pair with the highest matching degree probability output by the depth matching model is targeted to the hypothesis sentence and the positive inference sentence, so that the text matching accuracy of the model after the parameters are adjusted is higher. Finally, the embodiment also fuses an Attention mechanism to generate a sentence vector after the similarity of two sentences in each sentence pair is weighted, and because words between the two sentences in each sentence pair are correlated, the performance level of the model can be improved.
As shown in fig. 2, the depth matching model provided in this embodiment includes a representation layer, a feature selection layer, and a coding layer in addition to data processing of an input layer, an interaction layer, a matching layer, and an output layer, and correspondingly, the depth text matching method includes the following steps in addition to the above steps:
first, after the two sentences in each sentence pair are represented by word vectors in step S120, the method further includes a step of feature extraction, that is, in the representation layer, each word vector is encoded again according to the context in the sentence in which the word vector is located, so as to obtain a new word vector representation of the sentence in the sentence pair.
Specifically, the step may be performed by using a variety of feature extraction structures, such as a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), an Attention mechanism, and the like. In this embodiment, a bidirectional LSTM structure is still adopted, and fig. 4 is a schematic diagram illustrating a difference between a shared weight and a non-shared weight when performing feature extraction using a bidirectional LSTM according to the embodiment of the present application, as shown in fig. 4, during feature extraction, hypothesis and inference statements may or may not share a weight, and in a specific implementation process, selection may be performed according to a training speed requirement and a training data amount.
Further, since the sentences in each sentence pair are represented by word vectors in step S120, the assumed sentences can obtain N word vectors corresponding to the N inference sentences, and in order to facilitate subsequent operations, in this embodiment, in the feature selection layer, it is also necessary to perform normalization processing on the contents of the N word vectors obtained by the assumed sentences.
As shown in fig. 2, the model adopts the most basic averaging method:
Figure GDA0002994856630000071
in equation (11), N is the number of all hypothetical sentences,
Figure GDA0002994856630000072
is a word vector representation of a hypothesis that represents the output of the layer.
Of course, in a specific implementation process, in addition to the above manner, a manner of performing weighted summation on the model by using a learnable weight value may be used, or feature extraction may be performed by using a convolutional neural network, a recurrent neural network, or the like. Fig. 5 is a schematic diagram of feature selection using a convolutional neural network according to an embodiment of the present disclosure, and as shown in fig. 5, a plurality of word vectors are transversely spliced, represented by convolution using the convolutional neural network, and then output in a pooling manner.
Further, after generating sentence vectors with weighted similarity between sentences in the sentence pairs by using the similarity matrix corresponding to each word vector matrix, the following processing is also performed:
and recoding the word vectors according to the context of the sentence in which the word vectors are positioned to obtain a new word vector representation of the sentence in the sentence pair.
Specifically, the present embodiment also uses a bidirectional LSTM structure for feature extraction and encoding, and fig. 6 is a schematic diagram of different output modes when performing feature extraction on the bidirectional LSTM provided in the present embodiment, as shown in fig. 6, in the present embodiment, a hidden state result output to the LSTM structure may be used as a new word vector representation, or, an output at each time of the bidirectional LSTM may be used to respectively perform maximum value and mean value by bit and connect as a new word vector representation.
The deep matching model obtained by training by the method can reach 94% of accuracy in the existing financial corpus test set, and the accuracy of the traditional model is only 88% in the same training set and test set. Therefore, the method provided by the embodiment improves the model training process in a series, and the experimental result proves that the model effect obtained by the training of the method is higher than that obtained by the conventional method.
Based on the method, the embodiment also provides a deep text matching device based on the sequencing learning. Fig. 7 is a schematic basic structure diagram of a deep text matching apparatus based on rank learning according to an embodiment of the present application, and as shown in fig. 7, the apparatus includes:
sentence pair acquisition module 710: the method is used for acquiring sentence pairs consisting of hypothesis sentences and inference sentences, wherein the inference sentences comprise positive inference sentences and a plurality of negative inference sentences, and the hypothesis sentences are related to semantics of the positive inference sentences and unrelated to semantics of the negative inference sentences.
Word vector representation module 720: and the word vector matrix is used for representing the sentences in the sentence pairs by word vectors respectively to obtain the word vector matrix of the sentences in the sentence pairs.
The similarity weighting module 730: and the sentence vector is used for generating the sentence vectors after weighting the similarity of the sentences in the sentence pairs by utilizing the similarity matrix corresponding to each word vector matrix.
Loss value calculation module 740: and the loss value of the preset loss function is calculated according to the matching degree between the sentence vectors.
Model parameter adjustment module 750: and the parameter of the depth matching model is adjusted according to the loss value.
Text matching module 760: and the method is used for performing text matching on the input sentence by utilizing the finally obtained depth matching model through parameter adjustment.
Further, the loss value calculation module 740 further includes:
a similarity calculation unit: the sentence vector matching degree value is used for respectively calculating the sentence vector matching degree values between the hypothesis sentence and the positive inference sentence and between the negative inference sentences;
a loss calculation unit: and the loss value between each statement vector matching degree value and a standard value is calculated by using a joint loss function consisting of a Pointwise loss function and a Listwise loss function.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing is merely a detailed description of the invention, and it should be noted that modifications and adaptations by those skilled in the art may be made without departing from the principles of the invention, and should be considered as within the scope of the invention.

Claims (7)

1. A deep text matching method based on ranking learning is applied to a deep matching model and is characterized by comprising the following steps:
obtaining a sentence pair consisting of a hypothesis sentence and an inference sentence, wherein the inference sentence comprises a positive inference sentence and a plurality of negative inference sentences, and the hypothesis sentence is related to the semantics of the positive inference sentence and is not related to the semantics of the negative inference sentence;
respectively representing the sentences in the sentence pairs by word vectors to obtain a word vector matrix of each sentence in the sentence pairs;
obtaining a similarity matrix of the sentence pairs by using the word vector matrixes, and generating sentence vectors with weighted similarity by combining sentences in the sentence pairs;
respectively calculating matching degree values between the assumed sentences and the positive inference sentences and between the assumed sentences and the sentence vectors corresponding to the negative inference sentences;
calculating a loss value between each statement vector matching degree value and a standard value by using a combined loss function consisting of a Pointwise loss function and a Listwise loss function; wherein, the calculation formula of the joint loss function loss is as follows: loss is Lp+Ll+L2Regularization,LpAs a function of Pointwise loss, Lp=max(0,m-s(rh;rp+)+s(rh;rp-));LlIs a function of the Listwise loss,
Figure FDA0003044441990000011
L2regulartization to L2A regularization function; r ishFor statement vector representation of hypothetical statements, rp+And rp-Is a statement vector representation of positive and negative inference statements, s (r), respectivelyh;rp+) Is the cosine similarity of statement vectors corresponding to the hypothesis statement and the positive inference statement, s (r)h;rp-) The cosine similarity of statement vectors corresponding to the hypothesis statement and the negative inference statement is obtained, m is a preset threshold value for judging the positive inference statement and the negative inference statement, and n is the number of samples consisting of the positive inference statement and the negative inference statement;
adjusting parameters of the depth matching model according to the loss value;
and performing text matching on the input sentence by using the depth matching model obtained by adjusting the parameters.
2. The method of claim 1, wherein obtaining sentence pairs consisting of hypothesis sentences and inference sentences comprises:
selecting two semantic related regular sentences which are taken as an assumed sentence and a regular inference sentence;
selecting a plurality of negative example sentences which are used as negative reasoning sentences and are irrelevant to the semantics of the positive example sentences;
and forming a sentence pair by the two positive example sentences and the negative example sentences.
3. The method of claim 1, wherein representing the sentences in the sentence pairs with word vectors respectively to obtain a word vector matrix for each sentence in the sentence pairs comprises:
respectively segmenting words of sentences in the sentence pairs and expressing the words by word vectors to obtain an initial word vector matrix;
and adding the part of speech, the co-occurrence information and the position coding vector to the initial word vector matrix to obtain a word vector matrix of each sentence in the sentence pair.
4. The method of claim 1, wherein after generating a mutually similarity weighted sentence vector in conjunction with the sentences in the sentence pair, the method further comprises:
and normalizing the statement vectors obtained by weighting the similarity of the assumed statement and each positive inference statement and each negative inference statement respectively.
5. The method of claim 1, wherein obtaining a similarity matrix of the sentence pairs by using each word vector matrix, and generating, in combination with sentences in the sentence pairs, sentence vectors with weighted similarity to each other, comprises:
generating initial sentence vectors after similarity weighting of sentences in the sentence pairs by using the similarity matrixes corresponding to the word vector matrixes;
and re-encoding the initial statement vectors according to the contexts of the statements corresponding to the initial statement vectors to obtain the statement vectors of the statements in the sentence pair.
6. The method of claim 1, wherein adjusting parameters of the depth matching model based on the penalty value comprises:
and adjusting parameters of the depth matching model with the aim of minimizing the loss value.
7. A deep text matching device based on rank learning is applied to a deep matching model, and is characterized in that the device comprises:
sentence pair obtaining module: the method comprises the steps of obtaining a sentence pair consisting of a hypothesis sentence and an inference sentence, wherein the inference sentence comprises a positive inference sentence and a plurality of negative inference sentences, and the hypothesis sentence is related to the semantics of the positive inference sentence and is not related to the semantics of the negative inference sentence;
a word vector representation module: the word vector matrix is used for representing the sentences in the sentence pairs by word vectors respectively to obtain word vector matrixes of the sentences in the sentence pairs;
a similarity weighting module: the similarity matrix of the sentence pairs is obtained by utilizing the word vector matrixes, and sentence vectors with weighted similarity are generated by combining sentences in the sentence pairs;
a loss value calculation module: the system comprises a statement vector matching degree value calculation module, a statement vector matching degree calculation module and a statement vector matching degree calculation module, wherein the statement vector matching degree value calculation module is used for calculating matching degree values between the hypothesis statement and the positive inference statement and between the hypothesis statement and the statement vectors corresponding to the negative inference statements respectively, and calculating loss values between the statement vector matching degree values and standard values by utilizing a combined loss function consisting of a Pointwise loss function and a Listwise loss function; wherein, the calculation formula of the joint loss function loss is as follows: loss is Lp+Ll+L2Regularization,LpAs a function of Pointwise loss, Lp=max(0,m-s(rh;rp+)+s(rh;rp-));LlIs a function of the Listwise loss,
Figure FDA0003044441990000021
L2regulartization to L2A regularization function; r ishFor statement vector representation of hypothetical statements, rp+And rp-Is a statement vector representation of positive and negative inference statements, s (r), respectivelyh;rp+) Is the cosine similarity of statement vectors corresponding to the hypothesis statement and the positive inference statement, s (r)h;rp-) The cosine similarity of statement vectors corresponding to the hypothesis statement and the negative inference statement is obtained, m is a preset threshold value for judging the positive inference statement and the negative inference statement, and n is the number of samples consisting of the positive inference statement and the negative inference statement;
a model parameter adjustment module: the parameter of the depth matching model is adjusted according to the loss value;
a text matching module: and the depth matching model is used for performing text matching on the input sentence by using the depth matching model obtained by adjusting the parameters.
CN201910285853.7A 2019-04-10 2019-04-10 Deep text matching method and device based on sequencing learning Active CN110019685B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910285853.7A CN110019685B (en) 2019-04-10 2019-04-10 Deep text matching method and device based on sequencing learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910285853.7A CN110019685B (en) 2019-04-10 2019-04-10 Deep text matching method and device based on sequencing learning

Publications (2)

Publication Number Publication Date
CN110019685A CN110019685A (en) 2019-07-16
CN110019685B true CN110019685B (en) 2021-08-20

Family

ID=67190939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910285853.7A Active CN110019685B (en) 2019-04-10 2019-04-10 Deep text matching method and device based on sequencing learning

Country Status (1)

Country Link
CN (1) CN110019685B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457444A (en) * 2019-08-14 2019-11-15 山东浪潮人工智能研究院有限公司 A kind of sentence of same meaning conversion method based on depth text matches
CN110705283A (en) * 2019-09-06 2020-01-17 上海交通大学 Deep learning method and system based on matching of text laws and regulations and judicial interpretations
CN110795934B (en) * 2019-10-31 2023-09-19 北京金山数字娱乐科技有限公司 Sentence analysis model training method and device and sentence analysis method and device
CN111027320A (en) * 2019-11-15 2020-04-17 北京三快在线科技有限公司 Text similarity calculation method and device, electronic equipment and readable storage medium
CN110969006B (en) * 2019-12-02 2023-03-21 支付宝(杭州)信息技术有限公司 Training method and system of text sequencing model
CN111368903B (en) * 2020-02-28 2021-08-27 深圳前海微众银行股份有限公司 Model performance optimization method, device, equipment and storage medium
CN111898362A (en) * 2020-05-15 2020-11-06 联想(北京)有限公司 Data processing method and device
CN112560427B (en) * 2020-12-16 2023-09-22 平安银行股份有限公司 Problem expansion method, device, electronic equipment and medium
CN113935329B (en) * 2021-10-13 2022-12-13 昆明理工大学 Asymmetric text matching method based on adaptive feature recognition and denoising
CN114065729A (en) * 2021-11-16 2022-02-18 神思电子技术股份有限公司 Text sorting method based on deep text matching model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509463A (en) * 2017-02-28 2018-09-07 华为技术有限公司 A kind of answer method and device of problem
CN109086423A (en) * 2018-08-08 2018-12-25 北京神州泰岳软件股份有限公司 A kind of text matching technique and device
CN109101537A (en) * 2018-06-27 2018-12-28 北京慧闻科技发展有限公司 More wheel dialogue data classification methods, device and electronic equipment based on deep learning
CN109145292A (en) * 2018-07-26 2019-01-04 黑龙江工程学院 Paraphrasing text depth Matching Model construction method and paraphrasing text Matching Method of Depth
CN109344404A (en) * 2018-09-21 2019-02-15 中国科学技术大学 The dual attention natural language inference method of context aware
CN109471945A (en) * 2018-11-12 2019-03-15 中山大学 Medical file classification method, device and storage medium based on deep learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10268646B2 (en) * 2017-06-06 2019-04-23 Facebook, Inc. Tensor-based deep relevance model for search on online social networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509463A (en) * 2017-02-28 2018-09-07 华为技术有限公司 A kind of answer method and device of problem
CN109101537A (en) * 2018-06-27 2018-12-28 北京慧闻科技发展有限公司 More wheel dialogue data classification methods, device and electronic equipment based on deep learning
CN109145292A (en) * 2018-07-26 2019-01-04 黑龙江工程学院 Paraphrasing text depth Matching Model construction method and paraphrasing text Matching Method of Depth
CN109086423A (en) * 2018-08-08 2018-12-25 北京神州泰岳软件股份有限公司 A kind of text matching technique and device
CN109344404A (en) * 2018-09-21 2019-02-15 中国科学技术大学 The dual attention natural language inference method of context aware
CN109471945A (en) * 2018-11-12 2019-03-15 中山大学 Medical file classification method, device and storage medium based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"面向问答领域的语义相关性计算的研究";周伟杰;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180715(第7期);第I138-1877页 *

Also Published As

Publication number Publication date
CN110019685A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN109992648B (en) Deep text matching method and device based on word migration learning
CN110019685B (en) Deep text matching method and device based on sequencing learning
CN107798140B (en) Dialog system construction method, semantic controlled response method and device
CN111427995B (en) Semantic matching method, device and storage medium based on internal countermeasure mechanism
CN108549658B (en) Deep learning video question-answering method and system based on attention mechanism on syntax analysis tree
CN106569998A (en) Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN109992788B (en) Deep text matching method and device based on unregistered word processing
CN111737426B (en) Method for training question-answering model, computer equipment and readable storage medium
CN110232113B (en) Method and system for improving question and answer accuracy of knowledge base
CN113282711B (en) Internet of vehicles text matching method and device, electronic equipment and storage medium
CN111814453B (en) Fine granularity emotion analysis method based on BiLSTM-textCNN
CN113569001A (en) Text processing method and device, computer equipment and computer readable storage medium
CN114428850B (en) Text retrieval matching method and system
CN112232053A (en) Text similarity calculation system, method and storage medium based on multi-keyword pair matching
CN114896377A (en) Knowledge graph-based answer acquisition method
CN111552773A (en) Method and system for searching key sentence of question or not in reading and understanding task
CN110597968A (en) Reply selection method and device
CN113011172A (en) Text processing method and device, computer equipment and storage medium
CN114298055B (en) Retrieval method and device based on multilevel semantic matching, computer equipment and storage medium
CN115964459B (en) Multi-hop reasoning question-answering method and system based on food safety cognition spectrum
CN111581364A (en) Chinese intelligent question-answer short text similarity calculation method oriented to medical field
CN113326374B (en) Short text emotion classification method and system based on feature enhancement
CN113934835A (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
CN113343118A (en) Hot event discovery method under mixed new media
Ye et al. A sentiment based non-factoid question-answering framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190904

Address after: Room 630, 6th floor, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Applicant after: China Science and Technology (Beijing) Co., Ltd.

Address before: Room 601, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Applicant before: Beijing Shenzhou Taiyue Software Co., Ltd.

TA01 Transfer of patent application right
CB02 Change of applicant information

Address after: 230000 zone B, 19th floor, building A1, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province

Applicant after: Dingfu Intelligent Technology Co., Ltd

Address before: Room 630, 6th floor, Block A, Wanliu Xingui Building, 28 Wanquanzhuang Road, Haidian District, Beijing

Applicant before: DINFO (BEIJING) SCIENCE DEVELOPMENT Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant