CN111611346A

CN111611346A - Text matching method and device based on dynamic semantic coding and double attention

Info

Publication number: CN111611346A
Application number: CN202010387348.6A
Authority: CN
Inventors: 迟殿委
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2020-09-01

Abstract

The embodiment of the invention provides a text matching method and a text matching device based on dynamic semantic coding and double attention, wherein the text matching method comprises the following steps of: acquiring a first text and a second text to be matched, and performing dynamic semantic coding and vectorization on the first text and the second text to obtain a first semantic text vector and a second semantic text vector; extracting context vectors of the first semantic text vector through the LSTM, and forming all the context vectors into a first text global vector according to an attention mechanism; extracting context vectors of the second semantic text vectors through the LSTM, and forming all the context vectors into a second text global vector according to an attention mechanism; and calculating the similarity of the first text global vector and the second text global vector through the linear layer, and judging whether the first text to be matched is matched with the second text. And dynamically and semantically encoding and vectorizing the text to be matched, and refining the context vector by LSTM assisted with double attention to obtain higher text matching precision.

Description

Text matching method and device based on dynamic semantic coding and double attention

Technical Field

The invention relates to the field of language processing, in particular to a text matching method and device based on dynamic semantic coding and double attention.

Background

Semantic matching is a basic task in the field of natural language processing, and can be applied to many practical applications.

In the process of implementing the invention, the applicant finds that at least the following problems exist in the prior art:

traditional semantic matching usually relies on extracting some specific matching patterns or some features, however, this approach often results in too many mismatches and difficult coverage of all samples. The existing semantic matching method has the following defects: firstly, many existing semantic matching models are trained in a fixed word vector coding manner, or a batch of word vectors are pre-trained in advance by word2vec and other technologies, or a word vector matrix is initialized and then iterated in the training process, but in the prediction process, given different text sequences, the obtained word vector sequences are the same, which may cause some ambiguity phenomena. Second, many semantic matching models do not focus attention on more important data parts, which is not beneficial to improving model representation.

Disclosure of Invention

The embodiment of the invention provides a text matching method and device based on dynamic semantic coding and double attention, so as to improve the semantic matching precision of a text.

To achieve the above object, in one aspect, an embodiment of the present invention provides a text matching method based on dynamic semantic coding and double attention, including:

acquiring a first text and a second text to be matched, and performing dynamic semantic coding and vectorization on the first text and the second text to obtain a first semantic text vector and a second semantic text vector;

extracting context vectors of the first semantic text vector through a bidirectional model LSTM, and forming all the context vectors into a first text global vector according to an attention mechanism; extracting context vectors of the second semantic text vector through a bidirectional model LSTM, and forming all the context vectors into a second text global vector according to an attention mechanism;

and calculating the similarity of the first text global vector and the second text global vector through the linear layer, and judging whether the first text to be matched is matched with the second text.

On the other hand, an embodiment of the present invention provides a text matching device based on dynamic semantic coding and double attention, including:

the dynamic coding module is used for acquiring a first text and a second text to be matched, performing dynamic semantic coding on the first text and the second text and vectorizing to obtain a first semantic text vector and a second semantic text vector;

the double attention module is used for extracting context vectors of the first semantic text vector through a bidirectional model LSTM and forming a first text global vector from all the context vectors according to an attention mechanism; extracting context vectors of the second semantic text vector through a bidirectional model LSTM, and forming all the context vectors into a second text global vector according to an attention mechanism;

and the similarity judging module is used for calculating the similarity of the first text global vector and the second text global vector through the linear layer and judging whether the first text to be matched is matched with the second text.

The technical scheme has the following beneficial effects: dynamic semantic coding and vectorization are carried out on two texts to be matched, then each coding vector is transmitted into a bidirectional model LSTM, a double-attention machine mechanism is used for refining context vectors, and the two texts to be matched generate respective global vectors. And finally, the global vector is transmitted into a linear layer to calculate text similarity to obtain a semantic matching result, so that higher text matching accuracy can be obtained.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a dynamic semantic coding and dual attention based text matching method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a dynamic semantic code and dual attention based text matching apparatus according to an embodiment of the present invention;

FIG. 3 is a text matching model architecture of an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, in combination with the embodiment of the present invention, there is provided a text matching method based on dynamic semantic coding and double attention, including:

s101: acquiring a first text and a second text to be matched, and performing dynamic semantic coding and vectorization on the first text and the second text to obtain a first semantic text vector and a second semantic text vector;

s102: extracting context vectors of the first semantic text vector through a bidirectional model LSTM, and forming all the context vectors into a first text global vector according to an attention mechanism; extracting context vectors of the second semantic text vector through a bidirectional model LSTM, and forming all the context vectors into a second text global vector according to an attention mechanism;

s103: and calculating the similarity of the first text global vector and the second text global vector through the linear layer, and judging whether the first text to be matched is matched with the second text.

Preferably, in step 101, the method specifically includes:

s1011: sequentially combining the first text and the second text to obtain a combined text;

s1012: transmitting the merged text into a BERT model, and interactively coding the characters of the first text and the characters of the second text through the BERT model to form a merged text vector;

s1013: extracting vectors corresponding to all characters of the first text from the merged text vector, and splicing according to original positions of all the characters in the first text to obtain a first semantic text vector; and extracting vectors corresponding to the characters of the second text from the merged text vector, and splicing according to the original positions of the characters in the second text to obtain a second semantic text vector.

Preferably, step 102 specifically includes:

s1021: extracting an upper context vector positioned before a character vector time step in the first semantic text vector and extracting a lower context vector positioned after the character vector time step through a bidirectional model LSTM; according to an attention mechanism, transmitting an upper vector and a lower vector of the character vector to the character vector, and performing matrix calculation on the character vector, the upper vector and the lower vector of the character vector to obtain a context vector of the character;

s1022: extracting an upper vector positioned before a character vector time step in the second semantic text vector and extracting a lower vector positioned after the character vector time step through a bidirectional model LSTM; according to the attention mechanism, the upper vector and the lower vector of the character vector are transmitted to the character vector, and the character vector, the upper vector and the lower vector of the character vector are subjected to matrix calculation to obtain the context vector of the character.

Preferably, step 102 further comprises:

s1023: calculating to obtain a first text global vector according to the context vector of each character in the first text and the corresponding contribution weight;

s1024: and calculating to obtain a second text global vector according to the context vector of each character in the second text and the corresponding contribution weight.

Preferably, step 103 specifically includes:

s1031: splicing the first text global vector and the second text global vector to obtain a combined vector;

s1032: and carrying out normalization processing on the combined vector through a linear layer to obtain the matching similarity of the first text and the second text.

As shown in fig. 2, in conjunction with the embodiment of the present invention, there is provided a text matching apparatus based on dynamic semantic coding and double attention, including:

the dynamic coding module 21 is configured to obtain a first text and a second text to be matched, perform dynamic semantic coding on the first text and the second text, and perform vectorization on the first text and the second text to obtain a first semantic text vector and a second semantic text vector;

a double attention module 22, configured to extract context vectors of the first semantic text vector through a bidirectional model LSTM, and form a first text global vector from all the context vectors according to an attention mechanism; extracting context vectors of the second semantic text vector through a bidirectional model LSTM, and forming all the context vectors into a second text global vector according to an attention mechanism;

and the similarity judging module 23 is configured to calculate a similarity between the first text global vector and the second text global vector through the linear layer, and judge whether the first text and the second text to be matched are matched.

Preferably, the dynamic encoding module 21 includes:

the text merging submodule 211 is configured to sequentially merge the first text and the second text to obtain a merged text;

the encoding vectorization sub-module 212 is used for transmitting the combined text into a BERT model, and interactively encoding the characters of the first text and the characters of the second text through the BERT model to form a combined text vector;

the separation sub-module 213 is configured to extract a vector corresponding to each character of the first text from the merged text vector, and splice the vectors according to the original positions of the characters in the first text to obtain a first semantic text vector; and extracting vectors corresponding to the characters of the second text from the merged text vector, and splicing according to the original positions of the characters in the second text to obtain a second semantic text vector.

Preferably, the dual attention module 22 comprises:

a first text context vector submodule 221, configured to extract, through a bidirectional model LSTM, a context vector located before a character vector time step in a first semantic text vector, and extract a context vector located after the character vector time step; according to an attention mechanism, transmitting an upper vector and a lower vector of the character vector to the character vector, and performing matrix calculation on the character vector, the upper vector and the lower vector of the character vector to obtain a context vector of the character;

a second text context vector submodule 222, configured to extract, through a bidirectional model LSTM, a context vector located before a character vector time step in a second semantic text vector, and extract a context vector located after the character vector time step; according to the attention mechanism, the upper vector and the lower vector of the character vector are transmitted to the character vector, and the character vector, the upper vector and the lower vector of the character vector are subjected to matrix calculation to obtain the context vector of the character.

Preferably, the dual attention module 22 further comprises:

the first text global vector generation module 223 is configured to calculate a first text global vector according to the context vector of each character in the first text and the corresponding contribution weight thereof;

and a second text global vector generation module 224, configured to calculate a second text global vector according to the context vector of each character in the second text and the corresponding contribution weight thereof.

Preferably, the module 23 for judging similarity of files to be matched includes:

the to-be-matched text vector combination submodule 231 is used for splicing the first text global vector and the second text global vector to obtain a combined vector;

and the normalization submodule 232 is configured to perform normalization processing on the combined vector through the linear layer to obtain matching similarity between the first text and the second text.

The technical scheme of the embodiment of the invention has the following beneficial effects: the method utilizes the BERT model to carry out dynamic semantic coding and vectorization on two texts to be matched, and then obtains the independent vector of each text, thereby obtaining more meaningful coding vectors. And then transmitting each encoding vector into a bidirectional model LSTM, and respectively refining an upper vector and a lower vector of each character with relevant meanings by means of a double-attention machine, so that the two texts to be matched generate respective global vectors. Namely: and fusing the context information into a vector of each time step of the text sequence by using a bidirectional model LSTM to obtain a corresponding context coding vector, and independently coding the obtained context coding vector by applying an attention mechanism to obtain a global text vector. And finally, the sequence vectors are transmitted into a linear layer to calculate text similarity to obtain a semantic matching result, so that higher text matching accuracy can be obtained.

The above technical solutions of the embodiments of the present invention are described in detail below with reference to specific application examples, and reference may be made to the foregoing related descriptions for technical details that are not described in the implementation process.

Abbreviations and Key term definitions of the invention

NLP: natural language processing

Bidirectional LSTM: classical LSTM is a one-way model that always receives at each time instant a hidden state from the previous time instant and a new word vector as input. However, for a word in a sentence, it is possible that information before and after the word plays an important role in predicting the word, and the unidirectional model can only obtain information before, but the bidirectional LSTM can simultaneously obtain information before and after a word and be used for prediction, so the bidirectional LSTM effect is generally better than the unidirectional LSTM effect

The Attention mechanism: namely, Attention mechanism is beneficial to help model to have key calculation, taking a Sequence-to-Sequence-based machine translation model as an example, at an encoder end, the model is responsible for encoding a source text, at a decoder end, the model is responsible for decoding an encoding vector generated at the encoder end into a target text, however, in a decoding process at a certain moment, a next word to be predicted may be only in relation with a certain word of the source text, at the moment, an Attention mechanism can be utilized to learn to increase the weight of a relevant vector of the word of the source text, so as to help the accurate output of the next target word

Semantic dynamic coding: after training the very large corpus, the model will obtain the basic ability of a language, such as guessing a word in a sentence. The method can encode the text sequence according to self semantics and obtain a corresponding word encoding vector. Thus, dynamic coding can be performed according to different input text sequences in the prediction process. The BERT model adopted in the invention is a dynamic semantic coding model.

The invention provides a text matching device and a text matching device based on dynamic semantic coding and double attention, as shown in fig. 3, the invention firstly utilizes a pre-training BERT model to perform dynamic semantic coding on two inputted texts to be matched, then respectively transmits coding vectors of the two texts to be matched into a bidirectional LSTM with an attention mechanism to obtain a global text vector, and finally obtains a final text matching result through linear layer calculation.

The method proposed by the present invention is described in detail below, and as shown in fig. 1, the method of the present invention is roughly divided into six layers, which are an input Layer, an Encoder Layer, a Context Layer, an Attention Layer, a Dropout Layer, and an mllayer, and the functions of these layers will be described in detail below.

Input Layer

The main function of this layer is to organize text sequences into input forms suitable for the model. Assuming that two text sequences to be matched are S1 ═ { x1, x2, x3, …, xm } and S2 ═ { x1, x2, x3, …, xn }, where x1 is a character in the text sequence, it is desirable to be able to interact with two input text sequences during the encoding process to generate an encoding vector with an interactive meaning, so that the two text sequences are organized together to form a text character sequence and are transmitted to a BERT model, which dynamically encodes each character, wherein each character is encoded using other character information in the entire text character sequence long string, and certainly includes another text. Namely: when the characters of one text are coded, the information of another text is blended through the interaction of the character sequences to help the task to be carried out.

The input data (a sequence of text characters) is typically organized as: [ CLS, S1, SEP, S2, SEP, PAD _ S ], wherein:

CLS represents the starting Token of the sequence, SEP is the dividing Token of the sequence, Token is actually a character in the text, a character is a Token, and is an element in the text sequence, and the x1 and the x2 are both tokens. PAD _ S represents a completion sequence that does not reach a fixed sequence length, and is typically completed with 0, and the completion is completed at the input layer, assuming the sequence requires a maximum length of 20, 3 words in the first text, 5 words in the second text, 1 cls, and 2 seps, and then 20-3-5-1-2 is required to be completed. If one sequence length (text 1 length + text 2 length + cls1+ sep 2) exceeds a predetermined maximum length, sequence truncation is performed, and the longer one of the text 1 and the text 2 is preferentially truncated.

According to this form, there are three types of model inputs: position id in the original respective vocabulary: token id, which text segment id belongs to and whether the mask id in the corresponding text really exists.

Wherein token id represents the id of the token in a vocabulary (pre-existing dictionary) after separating the text sequences S1 and S2 according to characters to obtain the token sequence and sequentially converting the token; for example, the pre-existing dictionary is [ tree, book, i, read, love, good ], when the text is "i love to read", the corresponding token is [2,4,3,1], i.e. the number in the dictionary, and the number is counted from 0.

segment id is used to indicate whether the corresponding token of the character belongs to S1 or S2, if it belongs to the former, the position is 0, otherwise it is 1;

mask id indicates which tokens are real inputs, not tokens complemented for uniform sentence length, if a token is a complementing token, the word position is 0, otherwise it is 1.

The specific operation is as follows: combining the first text and the second text in a certain form to obtain a character sequence token id with a preset length, namely: sequentially converting each character of the first text into a corresponding position attribute in a word list, and sequentially converting each character of the second text into a corresponding position attribute token in the word list to obtain a position attribute sequence of the combined text; wherein the same character has the same position id in the reference word list.

And setting a text attribute segment id of each character in the combined text, and marking the character as the first text or the second text (namely, the segment id is used for indicating whether the character in the token id belongs to the first text or the second text), so as to obtain a text attribute sequence of the combined text.

And setting the existence attribute mask id of each character in the merged text, and marking whether the character belongs to the character added when the text is completed (namely, whether the character of the mask id used for expressing the token id is a padding character) to obtain the existence attribute sequence of the merged text.

And the constructed segmentid and mask id are the same length as tokenid, i.e.: the location attribute sequence, the text attribute sequence, and the presence attribute sequence of the merged text have the same length.

And transmitting the position attribute sequence, the text attribute sequence and the existence attribute sequence of the constructed combined text into a BERT model, and interactively coding the characters of the first text and the characters of the second text through the BERT model to form a combined text vector.

Encoder Layer

The layer encodes an input sequence, and the input sequence outputs a coding vector (312-dimensional vector) corresponding to each token after passing through a plurality of layers of encoders. Next, the coding vectors belonging to the sequences S1 and S2 are sequentially separated, and a vector sequence P ═ { P1, P2, P3,. once, pm } and Q ═ Q1, Q2, Q3,. once, qn } are respectively formed as inputs of a next layer, where pi and qi dimensions are both 1 × h, h is the number of hidden layer units in the network, a hidden layer unit refers to hidden _ size in the network, and a parameter of the network defines an output size of a corresponding layer of the network.

Context Layer

The layer is used for fusing context information (the content in front of the characters and the content behind the characters) of the characters in the text into a vector of each time step (character) of the text sequence to obtain a corresponding context coding vector.

For example, if the vocabulary is [ tree, book, i, read, love, good ], the maximum sequence length is 15. Text 1: i love reading, text 2: i like reading. Then texts 1 and 2 are organized together to form a token id sequence: [ cls,2,4,3,1, sep,2,4,5,3,1, sep,0,0,0 ]. The token id sequence is transmitted into the bert model, and the output of the bert model is 15x 312 vectors, that is, a total of 15 vectors of 312 dimensions, each vector corresponding to one character of the token id sequence. It is then passed into a bi-directional lstm, which is a time-linear sequence, each time step corresponding to a character of token id.

That is, the input to the lstm model is the output vector corresponding to the bert, and the output from the lstm model is the context vector. Each time step has an output (the output is a context vector), the output of each time step is an overview obtained by the vectors before and after the time step and the vector calculation of the time step, and the context refers to the output vector of the current time step and integrates the information of the previous and subsequent time steps. All vectors before the time step are used as the above vectors, and all vectors after the time step are used as the below vectors.

Compared with the one-way LSTM, the two-way LSTM can obtain information before and after each time step, and the effect is better. The vector sequences P and Q will be computed separately herein using two bi-directional LSTM to obtain the respective context vectors. Taking the vector sequence P as an example, for token at each time step i, the context vector is calculated as follows:

the above vector, forward output:

the following vector, output backwards:

then splicing the vectors according to the formula (3)

And

wherein

The hidden state (context vector) at the ith time of the sequence P is represented, and as described above, all the context vectors of the sequence P can be obtained by formula 3.

Simultaneously applying the same bidirectional LSTM logic processing sequence Q to obtain the context vector output of each moment as

Attention Layer

The Context vector obtained by the present layer according to the Context layer

And

a global vector is calculated where l is the input maximum length after two-way LSTM completion, i.e. the maximum length of the sequence after text 1 and text 2 are organized together. Firstly to h^pAnd h^qPerforming a linear transformation with h^pFor example, the transformation is performed according to the following formula:

h^p＝f(w_ph^p+b_p) (4)

where f represents an activation function, such as tanh. Then respectively calculating the sequences h according to the formula (5)^pAnd h^qWeight of each vector contribution to the global vector:

wherein u is_pIs a trainable vector of parameters, the last sequence h^pThe global vector of (a) is calculated according to a weighted sum formula:

the sequence h can be obtained by the formula (6)^pThe global vector p.

The sequence h can be obtained by applying the same calculation mode^qThe global vector q.

Dropout Layer random activation Layer

The Dropout mechanism is to make neurons in the network stop working with a certain probability P during model training, so as to effectively prevent overfitting of the model (overfitting means that assumptions become too strict in order to obtain consistent assumptions), and improve generalization capability of the model.

MLP Layer multilayer perceptron

The layer comprises three parts, and firstly, input global vectors p and q are spliced according to a formula (7) to obtain a vector v:

v＝[p；q](7)

and then, introducing the vector v into a linear layer, and finally carrying out normalization through Softmax to obtain a final text matching result, wherein a correlation formula is as follows:

r＝Softmax(xw+b) (8)

where w represents the weight of the linear layer, b represents the bias of the linear layer, R ∈ R^1×2Representing the text matching result, and the 2 nd dimension represents the similarity of the two sequences. For example, a sample corresponds to a result of [0.1,0.9 ]]The first dimension 0.1 represents the probability that the label is 0, i.e., not similar; the second dimension, 0.9, represents tag 1, i.e., the probability of similarity. When the similarity probability is not smaller than a preset threshold value, judging that the first text is matched with the second text; otherwise, the first text and the second text are judged not to match.

In summary, the method utilizes the BERT model to perform dynamic semantic coding on two texts to be matched, and then obtains independent coding vectors, thereby obtaining more meaningful coding vectors. And then transmitting each encoding vector into a bidirectional model LSTM, respectively refining effective data of each character by means of a double-attention machine, and generating respective global vectors of the two texts to be matched. Namely: and fusing the context information in the text into vectors of each time step (character) of the text by using a bidirectional model LSTM to obtain corresponding context coding vectors, and independently coding the obtained context coding vectors by applying an attention mechanism to obtain a global text vector. And finally, the sequence vectors are transmitted into a linear layer to calculate text similarity to obtain a semantic matching result, so that higher text matching accuracy can be obtained.

Compared with other representative methods, the method provided by the invention is used for comparing the Chinese data set LCQMC, and the experimental result shows that the method provided by the invention is obviously improved. The following table is an ablation experiment performed by the present invention: wherein, for example, the maximum length of BERT is 128, and the maximum length of LSTM is 50. Firstly, the method is trained together with an encoder BERT, in order to verify the effectiveness of the method, only a model part after an Enoder Layer is trained is compared with the encoder BERT, and the experiment is marked as EXP 1; secondly, linear transformation (formula 4) of the Attention layer is removed, Attention calculation is directly carried out on the output of the bidirectional LSTM layer, and the experiment is marked as EXP 2; finally, to verify the effect of the Dropout Layer on the model, the Dropout Layer was deleted and the experiment was labeled EXP 3. Specific experimental data are shown in the following table:

comparative results of ablation test of surface

Metrics	EXP1	EXP2	EXP3	Proposed
					F1	86.02	87.53	87.90	88.34
Accuracy	85.10	86.43	87.04	87.67

As can be seen from the data in the table, the influence of the BERT model participating in training on the result is great, particularly the improvement on the index Accuracy is obvious, and the improvement is 2.57. In addition, the experimental results for EXP2 were worse than EXP3, indicating that this linear layer is more functional than the Dropout layer. However, the performance of the method provided by the invention is the best, and is improved compared with other ablation experiments, and the performance of the method provided by the invention is improved compared with other ablation experiments, which shows that the model network structure adopted by the invention and the mode of the training text of the BERT model and the bidirectional model LSTM adopted by the invention are reasonable, so that the method has better performance in the aspect of semantic matching of the text, and the matching precision is improved.

It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. To those skilled in the art; various modifications to these embodiments will be readily apparent, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or".

Those of skill in the art will further appreciate that the various illustrative logical blocks, units, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The various illustrative logical blocks, or elements, described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a user terminal. In the alternative, the processor and the storage medium may reside in different components in a user terminal.

In one or more exemplary designs, the functions described above in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media that facilitate transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media can include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store program code in the form of instructions or data structures and which can be read by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Additionally, any connection is properly termed a computer-readable medium, and, thus, is included if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wirelessly, e.g., infrared, radio, and microwave. Such discs (disk) and disks (disc) include compact disks, laser disks, optical disks, DVDs, floppy disks and blu-ray disks where disks usually reproduce data magnetically, while disks usually reproduce data optically with lasers. Combinations of the above may also be included in the computer-readable medium.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A text matching method based on dynamic semantic coding and double attention is characterized by comprising the following steps:

2. The text matching method based on dynamic semantic coding and double attention of claim 1, wherein the dynamic semantic coding and vectorization is performed on the first text and the second text to obtain a semantic vector of the first text and a semantic vector of the second text, and specifically comprises:

sequentially combining the first text and the second text to obtain a combined text;

transmitting the merged text into a BERT model, and interactively coding the characters of the first text and the characters of the second text through the BERT model to form a merged text vector;

extracting vectors corresponding to all characters of the first text from the merged text vector, and splicing according to original positions of all the characters in the first text to obtain a first semantic text vector; and extracting vectors corresponding to the characters of the second text from the merged text vector, and splicing according to the original positions of the characters in the second text to obtain a second semantic text vector.

3. The dynamic semantic coding and bi-attention based text matching method according to claim 2, wherein the context vector of the first semantic text vector is extracted through a bidirectional model LSTM, and all the context vectors are formed into a first text global vector according to an attention mechanism; and extracting context vectors of the second semantic text vector through a bidirectional model LSTM, and forming all the context vectors into a second text global vector according to an attention mechanism, wherein the method specifically comprises the following steps:

extracting an upper context vector positioned before a character vector time step in the first semantic text vector and extracting a lower context vector positioned after the character vector time step through a bidirectional model LSTM; according to an attention mechanism, transmitting an upper vector and a lower vector of the character vector to the character vector, and performing matrix calculation on the character vector, the upper vector and the lower vector of the character vector to obtain a context vector of the character; and the number of the first and second groups,

extracting an upper vector positioned before a character vector time step in the second semantic text vector and extracting a lower vector positioned after the character vector time step through a bidirectional model LSTM; according to the attention mechanism, the upper vector and the lower vector of the character vector are transmitted to the character vector, and the character vector, the upper vector and the lower vector of the character vector are subjected to matrix calculation to obtain the context vector of the character.

4. The dynamic semantic coding and bi-attention based text matching method of claim 3, wherein the bi-directional model LSTM extracts context vectors of a first semantic text vector, and forms all context vectors into a first text global vector according to an attention mechanism; and extracting context vectors for the second semantic text vector through a bidirectional model LSTM, and forming all the context vectors into a second text global vector according to an attention mechanism, and the method further comprises the following steps:

calculating to obtain a first text global vector according to the context vector of each character in the first text and the corresponding contribution weight;

and calculating to obtain a second text global vector according to the context vector of each character in the second text and the corresponding contribution weight.

5. The text matching method based on dynamic semantic coding and bi-attention as claimed in claim 4, wherein the calculating the similarity of the first text global vector and the second text global vector through the linear layer specifically comprises:

splicing the first text global vector and the second text global vector to obtain a combined vector;

and carrying out normalization processing on the combined vector through a linear layer to obtain the matching similarity of the first text and the second text.

6. A text matching apparatus based on dynamic semantic coding and double attention, comprising:

7. The dynamic semantic code and dual attention based text matching device of claim 6 wherein the dynamic coding module comprises:

the text merging submodule is used for sequentially merging the first text and the second text to obtain a merged text;

the encoding vectorization sub-module is used for transmitting the combined text into a BERT model and interactively encoding the characters of the first text and the characters of the second text through the BERT model to form a combined text vector;

the separation submodule is used for extracting vectors corresponding to all characters of the first text from the merged text vectors, and splicing the vectors according to the original positions of all the characters in the first text to obtain a first semantic text vector; and extracting vectors corresponding to the characters of the second text from the merged text vector, and splicing according to the original positions of the characters in the second text to obtain a second semantic text vector.

8. The dynamic semantic code and dual attention based text matching device of claim 7 wherein the dual attention module comprises:

the first text context vector submodule is used for extracting an upper context vector positioned before a character vector time step in the first semantic text vector and extracting a lower context vector positioned after the character vector time step through a bidirectional model LSTM; according to an attention mechanism, transmitting an upper vector and a lower vector of the character vector to the character vector, and performing matrix calculation on the character vector, the upper vector and the lower vector of the character vector to obtain a context vector of the character;

the second text context vector submodule is used for extracting an upper context vector positioned before a character vector time step in the second semantic text vector and extracting a lower context vector positioned after the character vector time step through a bidirectional model LSTM; according to the attention mechanism, the upper vector and the lower vector of the character vector are transmitted to the character vector, and the character vector, the upper vector and the lower vector of the character vector are subjected to matrix calculation to obtain the context vector of the character.

9. The dynamic semantic code and dual attention based text matching device of claim 8 wherein the dual attention module further comprises:

the first text global vector generation module is used for calculating to obtain a first text global vector according to the context vector of each character in the first text and the corresponding contribution weight;

and the second text global vector generation module is used for calculating to obtain a second text global vector according to the context vector of each character in the second text and the corresponding contribution weight.

10. The dynamic semantic code and double attention based text matching device according to claim 9, wherein the similarity determination module comprises:

the text vector combination submodule to be matched is used for splicing the first text global vector and the second text global vector to obtain a combined vector;

and the normalization submodule is used for performing normalization processing on the combined vector through the linear layer to obtain the matching similarity of the first text and the second text.