CN112307179A

CN112307179A - Text matching method, device, equipment and storage medium

Info

Publication number: CN112307179A
Application number: CN202011131760.8A
Authority: CN
Inventors: 傅向华; 韦炎志
Original assignee: Shenzhen Technology University
Current assignee: Shenzhen Technology University
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2021-02-02

Abstract

The invention is suitable for the technical field of computers, and provides a text matching method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a first text and a second text; processing the first text and the second text to obtain a first text vector and a second text vector, wherein the first text vector comprises a plurality of first word embeddings corresponding to the first text, and the second text vector comprises a plurality of second word embeddings corresponding to the second text; in a pre-trained text matching model, semantic feature extraction, emotion feature extraction and context feature extraction are respectively carried out on a first text vector and a second text vector to obtain a first feature vector of a first text and a second feature vector of a second text; in the text matching model, the matching relation between the first text and the second text is determined according to the first feature vector and the second feature vector, so that the semantic features, the emotion features and the context information of the text are integrated to perform text matching, and the text matching effect is improved.

Description

Text matching method, device, equipment and storage medium

Technical Field

The invention belongs to the technical field of natural language processing, and particularly relates to a text matching method, a text matching device, text matching equipment and a storage medium.

Background

As a core of natural language understanding, text matching can be applied to a large number of known natural language processing tasks, such as information retrieval, automatic question answering, machine translation, dialog systems, question replying, which can all be abstracted to some extent into text matching questions, such as information retrieval can be attributed to matching of query terms and documents, automatic answering can be attributed to matching of questions and candidate answers, machine translation can be attributed to matching of two language texts, dialog systems can be attributed to matching of previous sentence dialog and reply, and question replying can be attributed to matching of two synonymous texts. Therefore, the series of natural language processing tasks all need to depend on the participation and cooperation of the text matching model, and the performance of the text matching model can greatly influence the final performance of the natural language processing tasks.

With the development of deep learning technology, a text sentence matching model based on a neural network becomes a mainstream choice of the text matching model. According to different matching ideas, text sentence matching models based on a neural network can be divided into two categories, namely a text sentence matching model based on word interaction and a text sentence matching model based on sentence representation. The basic idea of the text sentence matching model based on word interaction is to generate interaction signals through direct interaction among words, and finally realize modeling and matching of semantic relations by using the interaction signals among the words as matching features.

The text matching model based on word interaction has the advantages that fine-grained interaction features at the word level can be obtained, and local semantics can be better modeled through an attention mechanism. However, when the text is matched, the text matching model based on word interaction cannot effectively capture the emotion information of the text, which results in poor text matching effect.

Disclosure of Invention

The invention aims to provide a text matching method, a text matching device, text matching equipment and a storage medium, and aims to solve the problem that the text matching effect is poor due to the fact that an effective text matching method cannot be provided in the prior art.

In one aspect, the present invention provides a text matching method, including the steps of:

acquiring a first text and a second text;

processing the first text and the second text to obtain a first text vector and a second text vector, wherein the first text vector comprises a plurality of first word embeddings corresponding to the first text, and the second text vector comprises a second word embeddings corresponding to the second text;

in a pre-trained text matching model, performing semantic feature extraction, emotion feature extraction and context feature extraction on the first text vector and the second text vector respectively to obtain a first feature vector of the first text and a second feature vector of the second text;

and in the text matching model, determining the matching relation between the first text and the second text according to the first feature vector and the second feature vector.

In another aspect, the present invention provides a text matching apparatus, including:

the text acquisition module is used for acquiring a first text and a second text;

a text processing module, configured to process the first text and the second text to obtain a first text vector and a second text vector, where the first text vector includes a plurality of first word embeddings corresponding to the first text, and the second text vector includes a plurality of second word embeddings corresponding to the second text;

the feature extraction module is used for performing semantic feature extraction, emotional feature extraction and context feature extraction on the first text vector and the second text vector respectively in a pre-trained text matching model to obtain a first feature vector of the first text and a second feature vector of the second text; and

and the matching relation determining module is used for determining the matching relation between the first text and the second text according to the first feature vector and the second feature vector in the text matching model.

In another aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the text matching method when executing the computer program.

In another aspect, the present invention further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the text matching method.

When the text matching is carried out, the acquired first text and the acquired second text are processed to obtain a first text vector and a second text vector, semantic feature extraction, emotion feature extraction and context feature extraction are carried out on the first text vector and the second text vector in a text matching model to obtain a first feature vector of the first text and a second feature vector of the second text, and the matching relation between the first text and the second text is determined according to the first feature vector and the second feature vector, so that the emotion information of the text is fully considered when the text is matched, and the text matching effect is improved.

Drawings

Fig. 1 is an exemplary diagram of an application scenario of a text matching method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an implementation of a text matching method according to an embodiment of the present invention;

FIG. 3 is a flowchart of an implementation of a text matching method according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of a text matching apparatus according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Text matching is a core problem in natural language understanding, and therefore, research on text matching can be applied to a large number of natural language processing tasks, for example, information retrieval, automatic question answering, machine translation, dialogue system, question repeating can be abstracted to a certain extent into text matching questions, for example, information retrieval can be attributed to matching of query terms and documents, automatic question answering can be attributed to matching of questions and candidate answers, machine translation can be attributed to matching of texts in two languages, dialogue system can be attributed to matching of preceding sentence dialogue and reply, and question repeating can be attributed to matching of two synonymous texts. Therefore, the series of natural language processing tasks all need to depend on the participation and cooperation of the text matching model, and the performance of the text matching model can greatly influence the final performance of the natural language processing tasks.

The traditional method mainly depends on feature engineering, linguistic tools (such as syntactic resolvers, grammar resolvers, dictionaries and the like) and an external knowledge base, and has the disadvantages of consuming a great deal of time and energy and also having the problem of sparse training data. With the development of deep learning technology, a text matching model capable of effectively improving the text matching efficiency and effect becomes the mainstream choice of text matching. In the text matching model based on deep learning, researchers propose a text matching model based on word interaction. The basic idea of the sentence matching model based on word interaction is to generate a matching signal through the interaction between words, and realize the modeling and matching of semantic relation by using the interaction signal between words as the matching feature.

The text matching model based on word interaction has the advantages that fine-grained interaction features at the word level can be obtained, and local semantics can be modeled by using an attention mechanism. The inventors have found that such text matching models also have some drawbacks and deficiencies. The main defect is that when the existing matching model based on word interaction is used for word interaction through a traditional attention mechanism, subjective emotional difference information among words cannot be effectively captured, and emotional factors contained in the words are not considered. Emotional factors contained in the words play an important role in semantic understanding of the text, and therefore the text matching effect is still to be improved.

In order to solve the technical problem, according to the text matching method provided by the invention, in the process of text matching of a first text and a second text, the first text and the second text are processed to obtain a first text vector and a second text vector, in a text matching model, semantic feature extraction, emotion feature extraction and context feature extraction are respectively carried out on the first text vector and the second text vector to obtain a first feature vector and a second feature vector, and the matching relationship between the first text and the second text is determined according to the first feature vector and the second feature vector, so that the semantic features, the emotion features and the context features of the first text and the second text are comprehensively considered in the matching process of the first text and the second text, and the matching effect of the first text and the second text is effectively improved based on the feature information.

Fig. 1 is an exemplary diagram of an application scenario to which the present invention is applicable, and as shown in fig. 1, the application scenario to which the present invention is applicable includes a terminal device 101 and a server 102, a text matching model may be trained in advance on the terminal device 101 or the server 102, and the trained text matching model is stored on the terminal device 101 or the server 102. When the texts are matched, the user can input the first text and the second text to be matched on the terminal device 101, the terminal device 101 performs text matching of the first text and the second text according to the text matching model, or the terminal device 101 sends the first text and the second text to the server 102, and the server 102 performs text matching of the first text and the second text.

The terminal device 101 is, for example, a computer, a tablet computer, a mobile phone, and a wearable smart device (e.g., a smart band and a smart watch), the server 102 may be a single server or a server group, and the server group may be a distributed server or a centralized server. The terminal device 101 and the server 102 communicate, for example, via a wireless network.

The following detailed description of specific implementations of the present invention is provided in conjunction with specific embodiments:

the first embodiment is as follows:

fig. 2 shows an implementation flow of a text matching method provided in the first embodiment of the present invention, and for convenience of description, only the relevant parts related to the first embodiment of the present invention are shown, which are detailed as follows:

in step S201, a first text and a second text are acquired.

The embodiment is applicable to electronic devices, such as the terminal device or the server shown in fig. 1.

In this embodiment, two texts to be matched with each other are acquired, and for convenience of distinction, the two texts are respectively referred to as a first text and a second text. The text may be a sentence or a plurality of sentences, for example, a segment of speech including a plurality of sentences. The language of the first text and the second text is the same language, such as Chinese, English, Japanese, etc.

In this embodiment, one or more text pairs may be obtained from a text library to perform text matching on a first text and a second text in the one or more text pairs, where the text library stores a plurality of pre-collected text pairs, and each text pair includes two texts. Alternatively, the first text and the second text input by the user on the current device may be acquired, or the first text and the second text sent to the current device by the third-party device may be acquired, for example, the current device is a server, the third-party device is a mobile phone, and the user sends the first text and the second text to the server through the mobile phone.

In step S202, the first text and the second text are processed to obtain a first text vector and a second text vector, where the first text vector includes a plurality of first word insertions corresponding to the first text, and the second text vector includes a plurality of second word insertions corresponding to the second text.

In this embodiment, a first text vector corresponding to the first text and a second text vector corresponding to the second text may be obtained by vectorizing the first text and the second text through a text coding model, where the first text and the second text include a plurality of words respectively, the first text vector includes word insertions of the words in the first text, and the second text vector includes word insertions of the words in the second text, and for convenience of distinction, the word insertions of the words in the first text vector are referred to as first word insertions, and the word insertions of the words in the second text vector are referred to as second word insertions. The text encoding model is, for example, a word2vec model or a Global vector for word representation (Glove) model. Word embedding refers to the conversion of words into vector representations, and thus, first word embedding may also be referred to as first word vector and second word embedding may also be referred to as second word vector. The vectorization process of the text is not described in detail here.

In step S203, in the pre-trained text matching model, semantic feature extraction, emotion feature extraction, and context feature extraction are performed on the first text vector and the second text vector, respectively, to obtain a first feature vector of the first text and a second feature vector of the second text.

In this embodiment, a pre-trained text matching model is obtained. In a pre-trained text matching model, semantic feature extraction, emotion feature extraction and context feature extraction can be sequentially performed on each first word embedding in a first text vector and each second word embedding in a second text vector respectively. Combining the semantic features, the emotional features and the context features embedded into the first words to obtain the feature vector embedded into the first words, and further obtaining the feature vector of the first text according to the feature vector embedded into the first words. And combining the semantic features, the emotional features and the context features embedded into the second words to obtain the feature vector embedded into the second words, and further obtaining the feature vector of the second text according to the feature vector embedded into the second words. For the sake of convenience of distinction, the feature vector of the first text is referred to as a first feature vector, and the feature vector of the second text is referred to as a second feature vector. The first feature vectors comprise feature vectors embedded by the first words, and the second feature vectors comprise feature vectors embedded by the second words.

In the embodiment, by extracting the semantic features, the emotional features and the context features from the first text vector and the second text vector, not only the semantic and the context information of the words in the first text and the second text are considered, but also the emotional information of the words in the first text and the second text is considered, which conforms to the daily reading habit of people for understanding a sentence by synthesizing a plurality of features, and especially the emotional information in the sentence has a non-negligible influence on people for understanding the sentence. For example, in one sentence, the words can represent positive emotions and can also represent negative emotions, taking two sentences of 'the plum likes watching movies' and 'the plum is not interested in the television play' as an example, the word 'like' represents positive emotions, and the word 'not interested' represents negative emotions, and if the emotions represented by the words 'like' and 'not interested' are ignored, the text matching model can greatly deviate the understanding of the two sentences, and further influence the text matching effect.

Therefore, the semantic feature, the emotional feature and the context feature embedded in the first word are extracted, so that the first feature vector of the first text has the semantic meaning, the emotional information and the context information of the word in the first text, the semantic feature, the emotional feature and the context feature embedded in the second word are extracted, so that the second feature vector of the second text has the semantic meaning, the emotional information and the context information of the word in the second text, and text matching is performed based on the first feature vector and the second feature vector, so that the text matching effect can be effectively improved.

In this embodiment, in the process of extracting the emotion feature embedded in the first word, a word corresponding to the first word embedding may be searched for in a pre-established emotion dictionary, and the emotion feature corresponding to the word is determined as the emotion feature embedded in the first word. The emotion dictionary comprises a plurality of words and emotion characteristics corresponding to each word in the plurality of words, and the emotion characteristics comprise positive emotion identifications and negative emotion identifications, if the emotion represented by a word is a positive emotion, the emotion characteristics corresponding to the word in the emotion dictionary are positive emotion identifications (for example, the number 1 is used as a positive emotion identification), and if the emotion represented by the word is a negative emotion, the emotion characteristics corresponding to the word in the emotion dictionary are negative emotion identifications (for example, the number 0 is used as a negative emotion identification). In addition to positive and negative emotions, the emotions can be further subdivided, such as neutral emotions. Taking three sentences of 'the matter of learning English, the quiet of the expression of the litters', 'the worry of the expression of the litters' and 'the resistance of the expression of the litters' as examples, the emotional feature of the word 'quiet' is neutral emotion, the emotional feature of the word 'happy' is positive emotion, and the emotional feature of the word 'resistant' is negative emotion.

The value of the emotion mark in the emotion dictionary can also be represented by a continuous value range interval.

In this embodiment, the semantic features and the context features of the first text and the semantic features and the context features of the second text may be extracted through a language model and a context information extraction model, respectively. The language model is, for example, a Global vector model (Global Vectors for Word Representation) based on Word Representation, and may refer to the existing language model specifically without limitation. The context information extraction model is, for example, a Long-Short-term Memory network (LSTM) model or a Bi-directional Long-Short-term Memory network (BLSTM) model.

In step S204, in the text matching model, a matching relationship between the first text and the second text is determined according to the first feature vector and the second feature vector.

In this embodiment, the matching relationships included in different text matching tasks are different.

For example, in a paraphrase recognition task, three matching relationships may occur for a first text and a second text: implication, contradiction and irrelevance. If the matching relation between the first text and the second text is implication, the meaning represented by the first text includes the meaning represented by the second text, or the meaning represented by the second text includes the meaning represented by the first text. For example, if the first text is "the plum likes to watch a movie", and the second text is "the plum especially likes to watch a movie of" tamannic ", the matching relationship between the first text and the second text is implied. If the matching relation between the first text and the second text is contradictory, the meaning represented by the first text and the meaning represented by the second text are contradictory. If the matching relationship between the first text and the second text is irrelevant, the meaning represented by the first text is irrelevant to the meaning represented by the second text.

As another example, in a natural semantic reasoning task, two matching relationships may occur between the first text and the second text: related and unrelated. Wherein, if the matching relationship between the first text and the second text is related, the second text can be inferred from the first text, or the first text can be inferred from the second text. For example, if the first text is "yesterday is weekend and the student does not use lessons", and the second text is "plum yesterday does not go to school and class", the matching relationship between the first text and the second text is related. If the matching relationship between the first text and the second text is irrelevant, the second text cannot be inferred from the first text, and the first text cannot be inferred from the second text.

In this embodiment, after the first feature vector and the second feature vector are obtained, the matching relationship between the first feature vector and the second feature vector is obtained by matching the first feature vector and the second feature vector in the text matching model, and the matching relationship between the first feature vector and the second feature vector is determined as the matching relationship between the first text and the second text.

In this embodiment, the text matching model includes a semantic feature extraction model, an emotion feature extraction model, a context information extraction model, and a decision model for determining a matching relationship between the first feature vector and the second feature vector. The models can be realized by adopting a neural network model, can be trained respectively or together during training, and can also directly acquire the existing trained semantic feature extraction model, emotion feature extraction model and context information extraction model and train only a decision model.

In this embodiment, during the training process of the text matching model, training data may be acquired. The training data comprises a plurality of training text pairs and relationship labels of the training text pairs, each training text pair comprises two training texts, for the convenience of distinguishing, the two training texts are called a first training text and a second training text, and the relationship labels of the training text pairs are the matching relationship between the first training text and the second training text. Thus, the text matching model may be supervised trained based on training data. The training process may refer to the above steps, and adjust the model parameters of the text matching model according to the error between the matching relationship output by the text matching model and the relationship labels of the first training text and the die training text, for example, adjust the model parameters by using a random gradient descent method, an Adaptive estimation method (Adam), and an Adamax algorithm, so as to obtain the trained text matching model. Among them, the Adamax algorithm is a commonly used gradient optimization algorithm, and is widely used in the parameter training process of the neural network model.

In the embodiment of the invention, a first text and a second text are processed to obtain a first text vector and a second text vector, in a text matching model, semantic feature extraction, emotion feature extraction and context feature extraction are respectively carried out on the first text vector and the second text vector to obtain a first feature vector and a second feature vector, and the first feature vector and the second feature vector are matched to obtain the matching relation of the first text and the second text, so that the semantic features, the emotion features and the context features are integrated, and the text matching effect is improved.

Example two:

fig. 3 shows an implementation flow of the text matching method provided by the second embodiment of the present invention, and for convenience of description, only the relevant parts to the second embodiment of the present invention are shown, which are detailed as follows:

in step S301, a first text and a second text are acquired.

In this embodiment, step S301 may refer to the related description of step S201, and is not described again.

In step S302, the first text and the second text are processed to obtain a first text vector and a second text vector, where the first text vector includes a plurality of first word insertions corresponding to the first text, and the second text vector includes a plurality of second word insertions corresponding to the second text.

In this embodiment, step S302 may refer to the related description of step S202, and is not described again.

In a feasible implementation manner, in the process of processing the first text and the second text to obtain the first text vector and the second text vector, the first text and the second text may be preprocessed according to a preset external corpus, and the preprocessed first text and the preprocessed second text may be vectorized to obtain the first text vector and the second text vector. Therefore, the first text and the second text are supplemented and preprocessed through external semantics, words in the preprocessed first text and second text are converted into second words to be embedded, and the information richness and the information accuracy of the first text vector and the second text vector are improved.

The external corpus may be a corpus of a professional field existing on the internet, or a corpus constructed by collecting papers, terms, and the like of the professional field, for example, if matching is performed on a text of a medical field, a corpus of a medical field existing on the internet may be obtained. The corpus includes a plurality of terms in the field of expertise or words known to those skilled in the field of expertise.

When the first text and the second text are preprocessed and vectorized, the first text and the second text can be subjected to word segmentation, high-frequency word assistance removal and the like according to the external linguistic data, and a plurality of words in the first text and a plurality of words in the second text are obtained. And matching the plurality of words in the first text and the plurality of words in the second text with the external linguistic data respectively to obtain the plurality of words in the first text and the plurality of words in the second text which correspond to the external linguistic data respectively, acquiring word embedding of the words in the external linguistic data, and obtaining first word embedding of the plurality of words in the first text and second word embedding of the plurality of words in the second text. Therefore, according to the external corpus, the defects of information fragmentation, small information amount, more noise data, irregular linguistic expression, semantic ambiguity and the like of the first text and the second text are overcome, and the accuracy of subsequent extraction of semantic features, emotional features and context features is improved.

In step S303, in the pre-trained text matching model, semantic feature extraction, emotion feature extraction, and context feature extraction are performed on the first text vector and the second text vector, respectively, to obtain a first feature vector of the first text and a second feature vector of the second text.

In this embodiment, step S303 may refer to the related description of step S203, and is not described again.

In a feasible implementation manner, in the process of performing semantic feature extraction, emotion feature extraction, and context feature extraction on the first text vector and the second text vector in the text matching model, semantic feature extraction and emotion feature extraction may be performed on each first word sequence in the first text vector and each second word sequence in the second text vector, respectively, to obtain a feature vector in which each first word is embedded and a feature vector in which each second word is embedded. And introducing context information of each first word embedded in the first text vector for the feature vector of each first word embedded by a context information extraction model, and introducing context information of each second word embedded in the second text vector for the feature vector of each second word embedded by a context information extraction model. The feature vectors introduced with the context information and embedded by the first words are combined to obtain first feature vectors, and the feature vectors introduced with the context information and embedded by the second words are combined to obtain second feature vectors. Therefore, the combination effect of the semantic features, the emotional features and the context features is improved through the process. The introduction of the context information into the feature vector embedded in each first word can be realized by inputting the feature vector embedded in each first word, the first text vector and the position information of each first word embedded in the first text vector into a context information extraction model, for example, a bidirectional long-short term memory network, and the process of introducing the context information into the feature vector embedded in each second word can refer to the process of introducing the context information into the feature vector embedded in each first word.

Further, in the process of introducing context information for the first word embedded feature vector and the second word embedded feature vector through the bidirectional long-short term memory network, the output of the bidirectional long-short term memory network can be compressed through the activation layer to control the characteristic dimensions of the first word embedded feature vector and the second word embedded feature vector.

Furthermore, semantic feature extraction and emotion feature extraction are respectively performed on each first word sequence in the first text vector and each second word sequence in the second text vector to obtain a feature vector embedded by each first word and a feature vector embedded by each second word, and the method can be realized through the following processes:

(1) semantic feature extraction is respectively carried out on each first word embedding and each second word embedding, and a semantic feature vector of the first word embedding and a semantic feature vector of each second word embedding are obtained;

in this embodiment, semantic feature extraction is performed on each first word embedding and each second word embedding, which may refer to related contents in the first embodiment, where the semantic feature vector is equivalent to the semantic feature in the first embodiment, and thus the semantic feature extraction is not repeated.

(2) Extracting emotional characteristics of the embedded first words and the embedded second words respectively to obtain emotional characteristic vectors of the embedded first words and the embedded second words;

in this embodiment, the emotion feature extraction is performed on each first word embedding and each second word embedding, which may refer to the relevant content in the first embodiment, and the emotion feature vector is equivalent to the emotion feature in the first embodiment, so that the extraction of the emotion feature is not described in detail.

(3) Determining emotion space position sensing vectors embedded by the first words according to the emotion feature vectors embedded by the first words and the positions of the first words embedded in the first text, and determining emotion space position sensing vectors embedded by the second words according to the emotion feature vectors embedded by the second words and the positions of the second words embedded in the second text;

in this embodiment, in order to accurately encode the emotion difference of each word in the text, an emotion spatial position sensing vector may be constructed based on an assumption of gaussian distribution, so that the text matching model can sense interaction dependence influence caused by emotion difference between different words in the text, in other words, the text matching model can sense emotion difference between different words in the text and influence caused by emotion difference between different words in the text on a subsequent word interaction process.

In this embodiment, the formula for determining the emotion spatial location perception vector embedded by each first word according to the gaussian distribution based on the emotion feature vector embedded by each first word and the location of each first word embedded in the first text may be represented as:

p (i, γ) to N (Gaussian (N), σ'). Wherein the content of the first and second substances,

n represents the first word embedded emotion feature vector, gamma represents the position of the first word embedded in the first text (or the first text vector), N represents the gaussian distribution, sigma' represents the preset standard deviation, and p (i, gamma) represents the emotion space position perception dependence influence of the position in the ith dimension of the first word sequence of gamma. And combining the emotion space position perception dependency influences of all dimensions of the first word sequence to obtain an emotion space position perception vector of the first word sequence, combining the emotion space position perception vectors of all the first word sequence to obtain an emotion space position perception vector P of the first text vector, wherein each column of P represents the emotion space position perception vector of one first word sequence.

In this embodiment, the process of determining the emotion spatial location sensing vector embedded by each second word according to the emotion feature vector embedded by each second word and the position of each second word embedded in the second text and the gaussian distribution may refer to the generation process of the emotion spatial location sensing vector embedded by each first word, and is not described in detail.

(4) Determining the feature vector embedded by each first word according to the semantic feature vector and the emotion space position sensing vector embedded by each first word, and determining the feature vector embedded by each second word according to the semantic feature vector and the emotion space position sensing vector embedded by each second word.

In this embodiment, the semantic feature vector embedded by each first word and the emotion spatial position sensing vector may be spliced to obtain the feature vector embedded by each first word. And splicing the semantic feature vector embedded by each second word and the emotion spatial position sensing vector to obtain the feature vector embedded by each second word.

In this embodiment, after extracting the emotion features embedded in each first word, the positions where each first word is embedded in the first text, and the gaussian distributions are integrated to obtain the emotion spatial position sensing vectors embedded in each first word, and after extracting the emotion features embedded in each second word, the positions where each second word is embedded in the second text, and the gaussian distributions are integrated to obtain the emotion spatial position sensing vectors embedded in each second word. And synthesizing the semantic feature vector and the emotion spatial position sensing vector embedded by each first word to obtain the feature vector embedded by each first word, and synthesizing the semantic feature vector and the emotion spatial position sensing vector embedded by each second word to obtain the feature vector embedded by each second word. Therefore, when introducing emotion information into the feature vector embedded by the first word and introducing emotion information into the feature vector embedded by the second word, considering the emotion characteristics embedded by the first word, the influence of the position of the first word embedded in the first text and the emotion information embedded by the first word on word interaction, and the influence of the emotion characteristics embedded by the second word, the position of the second word embedded in the second text and the emotion information embedded by the second word on word interaction, the accuracy of the feature vector embedded by the first word and the accuracy of the feature vector embedded by the second word are effectively improved, and the accuracy of the first feature vector and the accuracy of the second feature vector are further effectively improved.

In step S304, word interaction is performed on the first feature vector and the second feature vector to obtain a word interaction signal of the first text and a word interaction signal of the second text.

In this embodiment, the attention weight embedded by each first word in the first feature vector may be determined through a preset attention mechanism, the attention weights embedded by each first word and each first word are weighted to obtain each weighted embedded first word, and the weighted embedded first words are combined to obtain a weighted first text vector. And determining the attention weight of each second word embedding in the second feature vector through an attention mechanism, performing weighted calculation on the attention weights of each second word embedding and each second word embedding to obtain weighted second word embedding, and combining the weighted second word embedding to obtain a weighted second text vector. And interacting the first text vector with the weighted second text vector to obtain a word interaction signal of the first text, and interacting the second text vector with the weighted first text vector to obtain a word interaction signal of the second text.

The predetermined attention mechanism is, for example, a dot-product (dot-product) attention mechanism.

In this embodiment, the attention weight embedded in each first word may be calculated first through the attention mechanism and the feature vector embedded in each first word, and then the weighted value may be calculated through the attention weight and the weighting method. Taking the dot product attention mechanism as an example, the ith first word in the first feature vector is embedded into x_iThe weighted calculation formula of (c) can be expressed as:

wherein, Com p_i,j＝softmax(x_i·x_j) Embedding x for ith first word_iAnd the jth first word embedding x_jAttention weight of (1), softmax () is a logistic regression function,

embedding x for weighted ith first word_i. Embedding the weighted first words, and combining to obtain word intersection of the first textAnd (4) mutual signals. The process of calculating the weight for embedding each second word may be referred to in this description, and will not be described in detail.

Because the above-mentioned dependency relationship (i.e. attention weight) between two word embeddings in the first text vector is calculated by the dot-product attention mechanism, the subjective emotion difference and the position distance between the two word embeddings are not considered, and only the difference degree of the two word embeddings in the word vector coding space can be calculated. In order to introduce the emotional characteristics of the words in the process of calculating the attention weight, one possible implementation of S304 is: and performing word interaction on the first feature vector and the second feature vector distribution through an attention mechanism and a Gaussian distribution hypothesis based on the emotional features to obtain a word interaction signal of the first text and a word interaction signal of the second text.

In this embodiment, the point-to-point attention mechanism may be adjusted based on the emotional features and the gaussian distribution hypothesis, so as to obtain an attention mechanism based on the emotional features. Taking the first text as an example, the calculation process of the second text may refer to the first text, and the gaussian distribution assumption means that one first word may be selected to be embedded in the first text vector as a central word in the process of determining the word interaction signal of the first text, and it is assumed that semantic influence of words at different positions and distances in the emotion space on the central word obeys gaussian distribution. The emotional features (which may be the emotional feature vectors mentioned in the above embodiments, or the emotional spatial location sensing vectors mentioned in the above embodiments) are used as prior knowledge to correct the weight of semantic influence of each word on the central word, and the weighted first word embedding may be obtained through calculation in a way of weight and weighted summation. Wherein the variance of the standard Gaussian distribution is expressed as σ²1/(2 pi), the probability density function of the standard gaussian distribution is expressed as

d represents the emotional difference, the first word embedded weighted calculation formula can be expressed as:

wherein d is_i,jRepresenting the emotional difference between the ith first word embedding and the jth first word embedding, can be calculated by the emotional characteristics of the ith first word embedding and the emotional characteristics of the jth first word embedding,

and

is a regularization factor and k denotes the kth first word embedding. The above formula converts normal distribution to gaussian distribution with offset terms, simplifying product operations.

Further, since the variance of the gaussian distribution is not necessarily the same as the variance of the normal distribution, a variable ω can be additionally introduced to limit the above formula:

to reduce the degree of difference between the assumed gaussian distribution and the standard normal distribution. Wherein, the variable ω can be obtained by training in the training process of the text matching model.

Further, the formula for calculating the attention weight between different word insertions in the first text vector and the second text vector can also be expressed as:

where A is the attention weight matrix for each first word embedding in the first text vector and each second word embedding in the second text vector, dropout () is a function for parameter smoothing,

and

the first feature vector processed by the activation function and the second feature vector processed by the activation function are respectively. The activation function being, for example, a linear integerA flow function (Rectified Linear Unit, ReLU). f. of_attention() For the attention mechanism, the above-mentioned attention mechanism or the above-mentioned attention mechanism based on emotional characteristics may be adopted.

Further, assuming that the attention weight matrix of the first text is represented as a and the attention weight matrix of the second text is represented as a ', the first text vector and the second text vector may be weighted according to a and a' to obtain a weighted first text vector and a weighted second text vector, the first text vector and the weighted second text vector are interacted to obtain a word interaction signal of the first text, and the second text vector and the weighted first text vector are interacted to obtain a word interaction signal of the second text. In other words, the first text vector and the weighted second text vector may be spliced to obtain the word interaction signal of the first text, and the second text vector and the weighted first text vector may be spliced to obtain the word interaction information of the second text. For example, the interaction formula is:

U^p＝[C^p；C^hA']，U^h＝[C^h；C^pA]wherein, C^pRepresenting a first feature vector, C^hRepresenting a second feature vector, U^pWord interaction signal, U, representing a first text^hA word interaction signal representing the second text.

In step S305, a matching relationship between the first text and the second text is determined through a decision model according to an interaction signal between the first text and the second text.

The decision model in the text matching model can adopt a bidirectional long-short term memory network model, and the model parameters of the decision model can be obtained by training in the training process of the text matching model.

In this embodiment, the first feature vector, the word interaction signal of the first text, the second feature vector, and the word interaction signal of the second text are input into the decision model. When the decision model is the bidirectional long-short term memory network model, the memory state of the first text and the memory state of the second text can be calculated in the bidirectional long-short term memory network model, iterative calculation can be performed according to the memory state of the first text and the memory state of the second text, a temporary decision can be calculated in each iteration, and the final decision is determined according to the temporary decisions of all iteration times. Wherein the provisional decision and the final decision are probabilities including a plurality of preset matching relationships. And finally, determining the matching relation with the highest probability in the final decision as the matching relation between the first text and the second text.

Further, in the two-way long-short term memory network model, the calculation formula of the memory state of the first text and the memory state of the second text can be expressed as:

M^p＝BiLSTM([U^p；C^p])，M^h＝BiLSTM([U^h；C^h]) Wherein M is^pRepresenting the memory state of the first text, M^hRepresenting the memory state of the second text, BiLSTM () represents the computation process of the two-way long-short term memory network model.

Further, during each iteration, the formula for the tentative decision may be expressed as:

wherein r represents the r-th matching relation, t represents the t-th iteration,

representing the probability, x, of the r-th match in the tentative decision obtained in the t-th iteration_tInput, s, representing the t-th iteration_tRepresenting the cumulative state of the t-th iteration, theta₄The parameters obtained for training. x is the number of_tAnd s_tCan be expressed as:

S_t＝GRU(S_t-1，x_t)。

wherein, beta_j＝softmax(s_t-1θ₃M^p)，

A word interaction signal representing the embedding of the jth first word in the word interaction signal of the first text, GRU representing a gated cyclic unit model,

a word interaction signal, θ, representing the embedding of the jth second word in the word interaction signal of the second text₂And theta₃The parameters obtained for training.

Further, after the tentative decision is obtained, all the tentative decisions may be averaged to obtain a final decision. The calculation formula of the final decision is, for example:

where T represents the total number of iterations, p represents the probability of the r-th match in the final decision, and avg () represents the calculated average.

In the embodiment of the invention, a first text and a second text are processed, linguistic features are extracted, emotional features are extracted and context features are extracted to obtain a first feature vector and a second feature vector, word interaction is carried out on the first feature vector and the second feature vector based on an attention mechanism to obtain word interaction signals of the first text and word interaction signals of the second text, and a matching relation between the first text and the second text is determined through a decision model based on the first feature vector, the word interaction signals of the first text, the second feature vector and the word interaction signals of the second text, so that the emotional features are fully considered in a text matching process, and a text matching effect is improved.

Example three:

fig. 4 shows a structure of a text matching apparatus provided in a third embodiment of the present invention, and for convenience of description, only the parts related to the third embodiment of the present invention are shown, which include:

a text obtaining module 41, configured to obtain a first text and a second text;

the text processing module 42 is configured to process the first text and the second text to obtain a first text vector and a second text vector, where the first text vector includes a plurality of first word insertions corresponding to the first text, and the second text vector includes a plurality of second word insertions corresponding to the second text;

the feature extraction module 43 is configured to perform semantic feature extraction, emotion feature extraction, and context feature extraction on the first text vector and the second text vector in a pre-trained text matching model, respectively, to obtain a first feature vector of the first text and a second feature vector of the second text; and

and the matching relation determining module 44 is configured to determine, in the text matching model, a matching relation between the first text and the second text according to the first feature vector and the second feature vector.

In one possible embodiment, the text processing module 42 includes:

the preprocessing module is used for preprocessing the first text and the second text according to a preset external corpus; and

and the vectorization module is used for vectorizing the preprocessed first text and the preprocessed second text to obtain a first text vector and a second text vector.

In one possible embodiment, the first feature vector includes a feature vector embedded in each first word, the second feature vector includes a feature vector embedded in each second word, and the feature extraction module 43 includes:

the feature extraction submodule is used for extracting semantic features and emotional features of the embedded first words and the embedded second words respectively to obtain feature vectors of the embedded first words and feature vectors of the embedded second words;

and the context information introduction module is used for respectively introducing context information into the feature vector embedded into each first word and the feature vector embedded into each second word through the context information extraction model.

In one possible embodiment, the feature extraction sub-module includes:

the semantic feature extraction module is used for respectively extracting semantic features of each first word embedding and each second word embedding to obtain a semantic feature vector of each first word embedding and a semantic feature vector of each second word embedding;

the emotion feature extraction module is used for respectively extracting emotion features of the embedded first words and the embedded second words to obtain emotion feature vectors of the embedded first words and emotion feature vectors of the embedded second words;

the emotion vector generation module is used for determining emotion space position sensing vectors embedded by the first words according to Gaussian distribution according to the emotion characteristic vectors embedded by the first words and the positions of the first words embedded in the first text, and determining emotion space position sensing vectors embedded by the second words according to the emotion characteristic vectors embedded by the second words and the positions of the second words embedded in the second text; and

and the feature synthesis module is used for determining the feature vector embedded by each first word according to the semantic feature vector and the emotion spatial position sensing vector embedded by each first word, and determining the feature vector embedded by each second word according to the semantic feature vector and the emotion spatial position sensing vector embedded by each second word.

In one possible embodiment, the text matching model includes a decision model, and the matching relation determining module 44 includes:

the interaction module is used for carrying out word interaction on the first characteristic vector and the second characteristic vector to obtain a word interaction signal of the first text and a word interaction signal of the second text;

and the determining module is used for determining the matching relation between the first text and the second text through a decision model according to the word interaction signal of the first text and the word interaction signal of the second text.

In one possible implementation, the interaction module includes:

and the interaction submodule is used for carrying out word interaction on the first characteristic vector and the second characteristic vector through an attention mechanism and Gaussian distribution based on emotional characteristics to obtain a word interaction signal of the first text and a word interaction signal of the second text.

In one possible embodiment, the decision model is a bidirectional long-short term memory network, and the determining module includes:

a temporary decision generating module, configured to construct a memory state and perform iteration according to the first feature vector, the second feature vector, the word interaction signal of the first text, and the interaction signal of the second text vector in the bidirectional long-short term memory network, and determine a temporary decision according to the memory state in each iteration;

a final decision generation module for determining a final decision according to the provisional decision in each iteration; and

and the determining submodule is used for selecting the matching relation between the first text and the second text from a plurality of preset matching relations according to the final decision.

In the embodiment of the present invention, specific implementation and achieved technical effects of the text matching apparatus may refer to specific descriptions of the first and second embodiments of the corresponding methods, and are not described again.

In the embodiment of the present invention, each unit of the text matching apparatus may be implemented by a corresponding hardware or software unit, and each unit may be an independent software or hardware unit, or may be integrated into a software or hardware unit, which is not limited herein.

Example four:

fig. 5 shows a structure of an electronic device according to a fourth embodiment of the present invention, and for convenience of description, only a part related to the fourth embodiment of the present invention is shown.

The electronic device 5 of an embodiment of the present invention comprises a processor 50, a memory 51 and a computer program 52 stored in the memory 51 and executable on the processor 50. The processor 50, when executing the computer program 52, implements the steps in the various text matching method embodiments described above, such as the steps S201 to S204 shown in fig. 2. Alternatively, the processor 50, when executing the computer program 52, implements the functions of the units in the above-described device embodiments, such as the functions of the units 41 to 44 shown in fig. 4.

In the embodiment of the invention, a first text and a second text are obtained, the first text and the second text are processed to obtain a first text vector and a second text vector, in a text matching model, semantic feature extraction, emotion feature extraction and context feature extraction are respectively carried out on the first text vector and the second text vector to obtain a first feature vector and a second feature vector, the first feature vector and the second feature vector are matched to obtain the matching relation of the first text and the second text, so that the semantic feature, the emotion feature and the context feature are synthesized, and the text matching effect is improved.

The electronic device of the embodiment of the invention can be a computer, a server, a mobile device (such as a mobile phone, a tablet computer, and an intelligent wearable device). The steps of the text matching method implemented by the processor 50 executing the computer program 52 in the electronic device 5 can refer to the description of the foregoing method embodiments, and are not described herein again.

Example five:

in an embodiment of the present invention, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the steps in the above-described text matching method embodiment, for example, steps S201 to S204 shown in fig. 2. Alternatively, the computer program realizes the functions of the units in the above-described device embodiments, such as the functions of the units 41 to 44 shown in fig. 4, when executed by the processor.

The computer readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program code, a recording medium, such as a ROM/RAM, a magnetic disk, an optical disk, a flash memory, or the like.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method of text matching, the method comprising the steps of:

acquiring a first text and a second text;

processing the first text and the second text to obtain a first text vector and a second text vector, wherein the first text vector comprises a plurality of first word embeddings corresponding to the first text, and the second text vector comprises a plurality of second word embeddings corresponding to the second text;

2. The method of claim 1, wherein processing the first text and the second text to obtain a first text vector and a second text vector comprises:

preprocessing the first text and the second text according to a preset external corpus;

vectorizing the preprocessed first text and the preprocessed second text to obtain the first text vector and the second text vector.

3. The method of claim 1, wherein the first feature vector comprises a feature vector embedded in each of the first words, the second feature vector comprises a feature vector embedded in each of the second words, and performing semantic feature extraction, emotion feature extraction, and context feature extraction on the first text vector and the second text vector in a pre-trained text matching model to obtain the first feature vector of the first text and the second feature vector of the second text respectively comprises:

extracting semantic features and emotional features of the embedded first words and the embedded second words respectively to obtain feature vectors of the embedded first words and feature vectors of the embedded second words;

and respectively introducing context information for the feature vector embedded by each first word and the feature vector embedded by each second word through the context information extraction model.

4. The method of claim 3, wherein the extracting semantic features and emotional features from each of the first word insertions and each of the second word insertions to obtain feature vectors of each of the first word insertions and feature vectors of each of the second word insertions comprises:

semantic feature extraction is respectively carried out on each first word embedding and each second word embedding, so that semantic feature vectors embedded into each first word and semantic feature vectors embedded into each second word are obtained;

extracting emotional characteristics of each first word embedding and each second word embedding respectively to obtain an emotional characteristic vector of each first word embedding and an emotional characteristic vector of each second word embedding;

determining emotion space position sensing vectors embedded by the first words according to the emotion feature vectors embedded by the first words and the positions of the first words embedded in the first text, and determining emotion space position sensing vectors embedded by the second words according to the emotion feature vectors embedded by the second words and the positions of the second words embedded in the second text;

determining the feature vector embedded by each first word according to the semantic feature vector and the emotion spatial position sensing vector embedded by each first word, and determining the feature vector embedded by each second word according to the semantic feature vector and the emotion spatial position sensing vector embedded by each second word.

5. The method of claim 1, wherein the text matching model comprises a decision model, and wherein determining a matching relationship between the first text and the second text according to the first feature vector and the second feature vector in the text matching model comprises:

performing word interaction on the first feature vector and the second feature vector to obtain a word interaction signal of the first text and a word interaction signal of the second text;

and determining the matching relation between the first text and the second text through the decision model according to the word interaction signal of the first text and the word interaction signal of the second text.

6. The method of claim 5, wherein said performing a word interaction on the first feature vector and the second feature vector to obtain a word interaction signal of the first text and a word interaction signal of the second text comprises:

and performing word interaction on the first feature vector and the second feature vector through an attention mechanism and a Gaussian distribution hypothesis based on emotional features to obtain a word interaction signal of the first text and a word interaction signal of the second text.

7. The method of claim 5, wherein the decision model is a two-way long-short term memory network, and determining the matching relationship between the first text and the second text through the decision model according to the word interaction signal of the first text and the word interaction signal of the second text comprises:

in the bidirectional long-short term memory network, according to the first characteristic vector, the second characteristic vector, the word interactive signal of the first text and the interactive signal of the second text vector, a memory state is constructed and iterated, and a temporary decision is determined according to the memory state in each iteration;

determining a final decision according to the temporary decision in each iteration;

and selecting the matching relation between the first text and the second text from a plurality of preset matching relations according to the final decision.

8. A text matching apparatus, characterized in that the apparatus comprises:

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.