CN111666397B

CN111666397B - Multi-model joint learning problem matching method and system

Info

Publication number: CN111666397B
Application number: CN202010538105.8A
Authority: CN
Inventors: 吴仁守; 缪庆亮; 俞凯
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2022-07-12
Anticipated expiration: 2040-06-12
Also published as: CN111666397A

Abstract

The embodiment of the invention provides a problem matching method for multi-model joint learning. The method comprises the following steps: establishing a first model based on coding; establishing a second model based on interaction; establishing a third model, wherein the third model comprises an encoder, an interaction layer, a fusion unit and a classifier, the encoder, the interaction layer and the classifier of the third model are the same as those of the second model and share parameters, the fusion unit fuses the outputs from the encoder and the interaction layer respectively, and the output of the interaction layer is taken as a main output and the output of the encoder is taken as an auxiliary output during fusion; performing joint learning on the first model, the second model and the third model; and predicting the matching degree of the problem pair by using the learned third model. The embodiment of the invention also provides a problem matching system for multi-model joint learning. According to the embodiment of the invention, the prediction accuracy is improved, and the prediction speed is kept.

Description

Multi-model joint learning problem matching method and system

Technical Field

The invention relates to the field of problem matching, in particular to a problem matching method and system for multi-model joint learning.

Background

Question matching is a basic task in a retrieval-based question-answering system, also called semantic matching task or paraphrase recognition task, whose purpose is to search existing databases for questions whose intent is similar to the input question. Generally, given a pair of sentences, the problem matching model is required to determine whether the two sentences express the same meaning, and to output a probability of match or mismatch. For example, enter question one: how to improve the credit line, and question two: the question matching model needs to judge whether two questions have the same meaning, and if so, the two questions can be replied with the same answer. Therefore, in the search-based question-answering system, if a question matching the user question exists in the question-answering library, the question answer can be returned to answer the user question. Problem matching models can be generally classified into two categories depending on whether cross-sentence features are used: (1) based on the coding model, the method calculates the similarity of sentence pairs directly through sentence vectors obtained by coding. This type of model is generally simpler and can be easily generalized to other natural language processing tasks. (2) Based on the interactive model, the model considers word alignment and the interaction between sentence pairs on the basis of sentence coding vectors. Interaction-based models generally exhibit better accuracy than coding-based models. However, for better performance, the interaction-based model usually includes multiple alignment layers to maintain its intermediate state to gradually improve its prediction accuracy, but these deeper model structures are usually harder to train and have a slower prediction speed, which is difficult to apply in real scenes.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:

the current accurate higher models are usually interaction-based models, which usually include multiple alignment layers, and the models are more complex and have a slower prediction speed. Because of the high requirements for model response speed in practical applications, a common solution is to use a simple model to improve the model prediction speed by sacrificing accuracy.

Disclosure of Invention

In order to at least solve the problem that the model based on interaction in the prior art is slow in prediction speed and difficult to use in a real scene.

In a first aspect, an embodiment of the present invention provides a problem matching method for multi-model joint learning, including:

establishing a first model based on coding, wherein the first model comprises a coder and a classifier;

establishing a second model based on interaction, wherein the second model comprises an encoder, an interaction layer and a classifier, the interaction layer uses a single-layer multi-head attention mechanism to interact sentence pairs, and the encoder and the classifier of the second model are the same as those of the first model and share parameters;

establishing a third model, wherein the third model comprises an encoder, an interaction layer, a fusion unit and a classifier, the encoder, the interaction layer and the classifier of the third model are the same as those of the second model and share parameters, the fusion unit fuses the outputs from the encoder and the interaction layer respectively, and the output of the interaction layer is taken as a main output and the output of the encoder is taken as an auxiliary output during fusion;

performing joint learning on the first model, the second model and the third model;

and predicting the matching degree of the problem pair by using the learned third model.

In a second aspect, an embodiment of the present invention provides a problem matching system for multi-model joint learning, including:

a coding model building program module for building a first model based on coding, the first model comprising an encoder and a classifier;

an interaction model building program module, configured to build a second model based on interaction, where the second model includes an encoder, an interaction layer, and a classifier, where the interaction layer uses a single-layer multi-head attention mechanism to interact with sentence pairs, and the encoder and the classifier of the second model are the same as those of the first model and share parameters;

the encoding interaction model building program module is used for building a third model, the third model comprises an encoder, an interaction layer, a fusion unit and a classifier, the encoder, the interaction layer and the classifier of the third model are the same as those of the second model and share parameters, the fusion unit fuses the outputs from the encoder and the interaction layer respectively, and the output of the interaction layer is taken as the main output and the output of the encoder as the auxiliary output during fusion;

a joint learning program module for performing joint learning on the first model, the second model, and the third model;

and the matching prediction program module is used for predicting the matching degree of the problem pair by utilizing the third model after learning.

In a third aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the multi-model joint learning problem matching method of any of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the program is configured to, when executed by a processor, implement the steps of the problem matching method for multi-model joint learning according to any embodiment of the present invention.

The embodiment of the invention has the beneficial effects that: the problem matching model based on multi-model joint learning learns a more generalized text representation by performing joint learning on a model based on coding, a model based on interaction and a model based on coding interaction fusion and sharing the same coder, an interaction layer and a two-layer feedforward neural network during final output, and helps the model to better understand the text, thereby improving the generalization ability and accuracy of the matching model. Moreover, the model overfitting can be effectively avoided through the joint learning. Since the used problem matching model based on interaction is the simplest single-layer interaction model, and the fusion method does not use a complex network structure, the model can improve the prediction accuracy and simultaneously the prediction speed is basically the same as that of the interaction-based model only performing simple interaction.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flowchart of a problem matching method for multi-model joint learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a problem matching model based on multi-model fusion of a problem matching method for multi-model joint learning according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a multi-head attention mechanism of a problem matching method for multi-model joint learning according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a problem matching system for multi-model joint learning according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

Fig. 1 is a flowchart of a problem matching method for multi-model joint learning according to an embodiment of the present invention, which includes the following steps:

s11: establishing a first model based on coding, wherein the first model comprises a coder and a classifier;

s12: establishing a second model based on interaction, wherein the second model comprises an encoder, an interaction layer and a classifier, the interaction layer uses a single-layer multi-head attention mechanism to interact sentence pairs, and the encoder and the classifier of the second model are the same as those of the first model and share parameters;

s13: establishing a third model, wherein the third model comprises an encoder, an interaction layer, a fusion unit and a classifier, the encoder, the interaction layer and the classifier of the third model are the same as those of the second model and share parameters, the fusion unit fuses the outputs from the encoder and the interaction layer respectively, and the output of the interaction layer is taken as a main output and the output of the encoder is taken as an auxiliary output during fusion;

s14: performing joint learning on the first model, the second model and the third model;

s15: and predicting the matching degree of the problem pair by using the learned third model.

In this embodiment, the problem matching model based on multi-model joint learning of the method performs joint learning on the model based on coding, the model based on interaction and the model based on coding interaction fusion, and shares the same encoder, interaction layer (since the model based on coding does not need interaction, the term is shared only by the model based on interaction and the model based on coding interaction fusion) and two layers of feedforward neural networks during final output, so as to learn a more generalized text representation, help the model to better understand the text, and thus improve the generalization ability and accuracy of the matching model.

For step S11, the first model is a problem matching model based on coding, which mainly includes two parts, i.e., a coding part and a classifier part, where the coder does not use LSTM (Long Short-Term Memory) or GRU (Gated Recurrent Unit), but uses a convolutional neural network with a faster speed, specifically, the input problem pairs can be coded by using neural networks with convolutional kernels of 1, 3, and 5, and then the obtained 3 coding vectors are spliced to be represented as a final coding vector.

The classifier may use a simple two-layer feed-forward neural network, with the following details:

L₂(x)＝softmax(W_l2x+b_l2)

wherein the content of the first and second substances,

denotes the dot product, tanh is the activation function, x₁And x₂The encoded vector is represented for the input problem. W and b are training parameters.

For step S12, the second model is an interaction-based problem matching model mainly including an encoder, an interaction layer, and a classifier, where the encoder and the classifier are the same as the problem matching model based on encoding, and are not described herein again and share parameters. The interaction layer interacts sentence pairs using a single-layer multi-head attention mechanism.

For step S13, the problem matching model based on coding interaction mainly includes four parts, namely, an encoder, an interaction layer, a fusion unit, and a classifier, where the encoder, the interaction layer, and the classifier are the same as the problem matching model based on interaction and share parameters.

The method provides a new fusion unit for fusing the outputs of the problem matching model based on coding and the problem matching model based on interaction. Unlike common fusion units, it usually treats the information pairs to be fused as being equally significant. The fusion unit compares and fuses the information pairs based on the relatively important part of the information in the information pairs to be fused. Because the problem matching model based on interaction generally has better performance, the problem matching model based on interaction is mainly output in the fusion process, and the problem matching model based on coding is secondarily output. The specific method is as follows:

o(a，b)＝g(a)m(b)+(1-g(a))a

m(x)＝tanh(Wmx+bm)

g(x)＝sigmoid(W_gx+b_g)

wherein tanh and sigmoid are activation functions, W_m，b_m，W_gAnd b_gFor the training parameters, a and b are the vectors to be fused, where a is the problem matching model output based on interaction and b is the problem matching model output based on coding.

For step S14, joint learning is performed on the coding-based question matching model, the interaction-based question matching model, and the coding-interaction-based question matching model. As shown in fig. 2, the model based on the coded interaction fusion of the three models fuses the outputs of the coded-based model and the interaction-based model as a final output. Since the proposed model is a joint learning model, although the model is complex during training, the model during prediction is equivalent to a single model based on coding interaction fusion. Meanwhile, the problem matching model based on interaction is the simplest single-layer interaction model, and the fusion method does not use a complex network structure, so that the model can improve the prediction accuracy and the prediction speed is basically the same as that of the interaction-based model only performing simple interaction.

For step S15, a matching degree prediction is performed on the question pair using the learned question matching model based on the coding interaction.

According to the embodiment, the problem matching model based on the multi-model joint learning learns a more generalized text representation by performing the joint learning on the model based on the coding, the model based on the interaction and the model based on the coding interaction fusion and sharing the same coder, the interaction layer and the two layers of feedforward neural networks during final output, so that the model is helped to better understand the text, and the generalization capability and the accuracy of the matching model are improved. Moreover, the model overfitting can be effectively avoided through the joint learning. Since the used problem matching model based on interaction is the simplest single-layer interaction model, and the fusion method does not use a complex network structure, the model can improve the prediction accuracy and simultaneously the prediction speed is basically the same as that of the interaction-based model only performing simple interaction.

As an embodiment, in this embodiment, the predicting the degree of matching of the problem pair by using the learned third model includes:

determining, by the encoder, a coding vector for a question statement in the question pair;

inputting the coding vector of the question statement in the question pair into the interaction layer, and determining the interaction vector of the question statement in the question pair;

fusing the coding vector of the first weight and the interactive vector of the second weight of the question statement in the question pair based on the fusion unit to generate a fusion vector of the question statement in the question pair, wherein the second weight is larger than the first weight;

and inputting the fusion vector of the question statement in the question pair into the classifier, and determining the matching degree of the question statement in the question pair.

In this embodiment, in each sentence pair, the first question sentence x1 is "which good chat rooms are all good" and the second question sentence x2 is "which good chat rooms are good". Firstly, the convolutional neural network is used for decoding which good chat rooms are all good and which good chat rooms are good respectively, wherein convolution kernels are 1, 3 and 5 respectively. One sentence has three coded vectors, and the three coded vectors are spliced to obtain the final vector representation. Finally, the code of x1 is a 300-dimensional vector, and the same code of x2 is another 300-dimensional vector.

The x1 encoded vector and the x2 encoded vector are then interacted with by an interaction layer, at which point a multi-headed attention mechanism is used to represent c for the resulting x1 encoded representation_x1And x2 coded representation c_x2And performing interactive matching, wherein a multi-head attention mechanism is shown in fig. 3, and in the diagram, the output of the top Linear layer Linear is the semantic code of the information of the fused sentences x1 and x2 obtained by the model. Respectively making V ═ Q ═ c_x1，K＝c_x2And V ═ Q ═ c_x2，K＝c_x1Incoming to interacted-with sentence representation c_x1x2And c_x2x1。

As an embodiment, before the fusing the encoded vector of the first weight and the interaction vector of the second weight of the question statement in the question pair based on the fusing unit, the method further includes:

code vector c of x1 in the question pair_x1X2 code vector c_x1A non-linear transformation is performed.

m(c_x1)＝tanh(W_m c_x1+b_m)

Wherein, W_m,b_mAre training parameters.

Next, interaction vector c for x1 and x2 is determined_x1x2And c_x2x1The occupied weight ratio is as follows:

g(c_x1x2)＝sigmoid(W_g c_x1x2+b_g)

g(c_x2x1)＝sigmoid(W_g c_x2x1+b_g)

finally, the coded vector c of x1 is encoded_x1And interaction vector c_x1x2Performing fusion to obtain a coded interactive vector o1, and encoding the vector c of x2_x2And interaction vector c_x2x1And performing fusion to obtain a coded interaction vector o 2.

o1(c_x1x2,c_x1)＝g(c_x1x2)m(c_x1)+(1-g(c_x1x2))c_x1x2

o2(c_x2x1,c_x2)＝g(c_x2x1)m(c_x1)+(1-g(c_x2x1))c_x2x1

The determined o1 and o2 are brought into the use of a simple two-layer feed-forward neural network classifier as described above and will not be described in detail here. Thus, the matching degree of "what good chat rooms all have" and "what good chat rooms are" is 0.9712 through the classifier.

For another example, for "how do pilot buy money", and "home buy money", both sentences have "no money buy house", and "no money buy house" accounts for a relatively large amount of "home buy house". But the method fuses 'coding' and 'interaction', reduces the proportion of the same simple words in the similarity degree, and improves the word alignment and the interaction between sentences. The matching accuracy of the question sentences is further improved. The matching degree of ' how do pilot buy house and ' buy house with no money for parents ' is 0.0014.

As an embodiment, the question pair includes a first question statement and a second question statement, wherein the first question statement is from an input of a user, and the second question statement is from a question and answer library of a question and answer system;

and when the matching degree reaches a preset threshold value, acquiring a reply answer of the second question sentence from the question-answer library, and feeding back the first question sentence input by the user.

In this embodiment, the method is applied to a question-answering system based on search, in which a first question sentence is input by a user, a second question sentence is from a question-answering database in the question-answering system, and the question-answering database contains various question sentences and answer responses corresponding to the question sentences. Thus, when the matching degree between the question input by the user and a question sentence in the question-answer library is particularly high, the answer of the question in the question-answer library can be used for feeding back to the user so as to answer the question of the user.

According to the embodiment, the method is applied to a specific technical scene. Under the scene, the prediction accuracy is improved, and meanwhile, the prediction speed is basically the same as that of an interaction-based model only performing simple interaction.

Fig. 4 is a schematic structural diagram of a problem matching system for multi-model joint learning according to an embodiment of the present invention, which can execute the problem matching method for multi-model joint learning according to any of the above embodiments and is configured in a terminal.

The problem matching system for multi-model joint learning provided by the embodiment comprises: a coding model building program module 11, an interaction model building program module 12, a coding interaction model building program module 13, a joint learning program module 14 and a matching prediction program module 15.

Wherein, the coding model establishing program module 11 is used for establishing a first model based on coding, and the first model comprises a coder and a classifier; the interaction model building program module 12 is configured to build a second model based on interaction, where the second model includes an encoder, an interaction layer, and a classifier, where the interaction layer uses a single-layer multi-head attention mechanism to interact with sentence pairs, and the encoder and the classifier of the second model are the same as the encoder and the classifier of the first model and share parameters; the coding interaction model building program module 13 is configured to build a third model, where the third model includes an encoder, an interaction layer, a fusion unit, and a classifier, where the encoder, the interaction layer, and the classifier of the third model are the same as those of the second model and share parameters, and the fusion unit fuses outputs from the encoder and the interaction layer, where the fusion is mainly performed on the output of the interaction layer and is assisted by the output of the encoder; the joint learning program module 14 is configured to perform joint learning on the first model, the second model, and the third model; and the matching prediction program module 15 is used for predicting the matching degree of the problem pair by using the third model after learning.

Further, the match predictor module is to:

Further, the question pair comprises a first question statement and a second question statement, wherein the first question statement is input by a user, and the second question statement is from a question and answer library of a question and answer system;

Further, before the fusing the encoding vector of the first weight and the interaction vector of the second weight of the question statement in the question pair based on the fusing unit, the method further includes:

and carrying out nonlinear transformation on the coding vector of the question statement in the question pair.

Further, the encoder comprises a convolutional neural network based encoder, and the classifier comprises a two-layer feedforward neural network based classifier.

The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the problem matching method of multi-model joint learning in any method embodiment;

as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform the problem matching method of multi-model joint learning in any of the method embodiments described above.

The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

An embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the multi-model joint learning problem matching method of any of the embodiments of the present invention.

The client of the embodiment of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.

(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players, handheld game consoles, electronic books, as well as smart toys and portable vehicle navigation devices.

(4) Other electronic devices with data processing capabilities.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of additional identical elements in the process, method, article, or apparatus that comprises the element.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A problem matching method for multi-model joint learning comprises the following steps:

establishing a third model, wherein the third model comprises an encoder, an interaction layer, a fusion unit and a classifier, the encoder, the interaction layer and the classifier of the third model are the same as those of the second model and share parameters, the fusion unit fuses the outputs of the encoder of the third model and the interaction layer of the third model respectively, and the output of the interaction layer of the third model is taken as a main output and the output of the encoder of the third model is taken as an auxiliary output during fusion;

predicting the matching degree of the problem pair by using the learned third model;

wherein the predicting the matching degree of the problem pair by using the learned third model comprises:

determining, by an encoder of the third model, a coding vector for a question statement in the question pair;

inputting the coding vector of the question statement in the question pair to an interaction layer of the third model, and determining the interaction vector of the question statement in the question pair;

fusing the coding vector of the first weight and the interactive vector of the second weight of the question statement in the question pair based on the fusion unit to generate a fusion vector of the question statement in the question pair, wherein the second weight is greater than the first weight;

and inputting the fusion vector of the question statement in the question pair into the classifier of the third model, and determining the matching degree of the question statement in the question pair.

2. The method of claim 1, wherein the question pair comprises a first question statement from an input of a user and a second question statement from a question and answer library of a question and answer system;

3. The method of claim 1, wherein prior to the fusing the encoded vector of the first weight and the interaction vector of the second weight of the question statement in the question pair based on the fusion unit, the method further comprises:

4. The method of claim 1, wherein the encoder comprises a convolutional neural network based encoder and the classifier comprises a two-layer feedforward neural network based classifier.

5. A problem matching system for multi-model joint learning, comprising:

the encoding interaction model building program module is used for building a third model, the third model comprises an encoder, an interaction layer, a fusion unit and a classifier, the encoder, the interaction layer and the classifier of the third model are the same as those of the second model and share parameters, the fusion unit fuses the outputs of the encoder of the third model and the interaction layer of the third model respectively, and the output of the interaction layer of the third model is taken as a main output and the output of the encoder of the third model is taken as an auxiliary output during fusion;

a joint learning program module for performing joint learning on the first model, the second model and the third model;

the matching prediction program module is used for predicting the matching degree of the problem pair by utilizing the learned third model;

wherein the match predictor module is to:

6. The system of claim 5, wherein the question pair comprises a first question statement from an input from a user and a second question statement from a question and answer library of a question and answer system;

7. The system of claim 5, wherein prior to the fusing the encoded vector of the first weight and the interaction vector of the second weight of the question statement in the question pair based on the fusing unit, the system further comprises:

8. The system of claim 5, wherein the encoder comprises a convolutional neural network based encoder and the classifier comprises a two-layer feedforward neural network based classifier.