CN113886550A

CN113886550A - Question-answer matching method, device, equipment and storage medium based on attention mechanism

Info

Publication number: CN113886550A
Application number: CN202111182254.6A
Authority: CN
Inventors: 杨修远
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-10-11
Filing date: 2021-10-11
Publication date: 2022-01-04

Abstract

The invention is used in the field of artificial intelligence, relates to the field of block chains, and discloses a question-answer matching method, a device, equipment and a storage medium based on an attention mechanism, wherein the method comprises the following steps: acquiring a user question input by a user, and acquiring answer vectors corresponding to a plurality of candidate answers; inputting a word vector sequence of a user question into a BERT model to obtain a plurality of hidden state question vectors output by the BERT model; converting a plurality of hidden state problem vectors based on an attention mechanism to obtain m problem feature vectors for characterizing user problems; converting the m question feature vectors according to the answer vectors to obtain user question vectors corresponding to the answer vectors; determining a correct answer of the user question in the plurality of candidate answers according to the matching value of the corresponding user question vector and the answer vector; the invention improves the matching effect of the candidate answers and the user questions, reduces the data processing amount in the matching process and improves the question-answer matching efficiency.

Description

Question-answer matching method, device, equipment and storage medium based on attention mechanism

Technical Field

The invention relates to the field of artificial intelligence, in particular to a question-answer matching method, a question-answer matching device, question-answer matching equipment and a storage medium based on an attention mechanism.

Background

The question-answering system is a high-level form of information retrieval system, and can answer questions posed by users in accurate and concise natural language. The question-answering system generally performs question-answering matching through a matching model based on a BERT network, the matching model pre-trains the BERT model through a mass corpus in the general field, and then a coder built on the top of the BERT model is finely adjusted to improve the matching effect of the matching model, so that the retrieval precision of the question-answering system is ensured.

In the traditional question-answer matching mode, an encoder established at the top of the BERT model is generally a double encoder or a cross encoder, and the two double encoders or the cross encoder have defects, so that the traditional question-answer matching mode cannot give consideration to both matching effect and matching efficiency. The core of the matching model of the double encoders is to encode user questions and candidate answers into vectors respectively, and finally calculate the similarity between the two vectors through a correlation discriminant function. The interactive encoder can realize finer-grained matching between the questions and the candidate answers, so that a matching model of the interactive encoder has a better matching effect, but the question-answer matching mode needs to traverse all combinations of the user questions and the candidate answers and solve the correlation between each question and the answer combination, the data processing amount is large, the time consumption is long, and the question-answer matching efficiency is reduced.

Disclosure of Invention

The invention provides a question-answer matching method, device, equipment and storage medium based on an attention mechanism, and aims to solve the problem that the traditional question-answer matching mode cannot give consideration to both matching effect and matching efficiency.

The question-answer matching method based on the attention mechanism comprises the following steps:

acquiring a user question input by a user, and determining a plurality of candidate answers of the user question and answer vectors corresponding to the candidate answers;

converting the user problem into a word vector sequence, and inputting the word vector sequence of the user problem into a BERT model to obtain a plurality of hidden state problem vectors output by the BERT model;

converting a plurality of hidden state problem vectors based on an attention mechanism to obtain m problem feature vectors for representing user problems, wherein m is an integer greater than 1;

converting the m question feature vectors according to the answer vectors to obtain user question vectors corresponding to the answer vectors;

and determining a correct answer of the user question in the plurality of candidate answers according to the matching value of the corresponding user question vector and the answer vector.

Provided is a question-answer matching device based on an attention mechanism, which is characterized by comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a user question input by a user and determining a plurality of candidate answers of the user question and answer vectors corresponding to the candidate answers;

the encoding module is used for converting the user problem into a word vector sequence and inputting the word vector sequence of the user problem into the BERT model so as to obtain a plurality of hidden state problem vectors output by the BERT model;

the first conversion module is used for converting the hidden state problem vectors based on an attention mechanism to obtain m problem feature vectors used for representing user problems, wherein m is an integer larger than 1;

the second conversion module is used for converting the m question feature vectors according to the answer vectors so as to obtain user question vectors corresponding to the answer vectors;

and the determining module is used for determining the correct answer of the user question in the candidate answers according to the matching value of the corresponding user question vector and the answer vector.

There is provided a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above-mentioned attention-based question-answer matching method when executing the computer program.

There is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described attention-based question-answer matching method.

In one scheme provided by the above question-answer matching method, apparatus, device and storage medium based on the attention mechanism, a plurality of candidate answers of a user question and answer vectors corresponding to the candidate answers are determined by obtaining the user question input by a user; then converting the user problem into a word vector sequence, and inputting the word vector sequence of the user problem into a BERT model to obtain a plurality of hidden state problem vectors output by the BERT model; converting a plurality of hidden state problem vectors based on an attention mechanism to obtain m problem feature vectors for representing user problems, wherein m is an integer greater than 1; converting the m question feature vectors according to the answer vectors to obtain user question vectors corresponding to the answer vectors; finally, according to the matching value of the corresponding user question vector and the answer vector, the correct answer of the user question is determined in the multiple candidate answers; according to the invention, the user question vectors are improved based on the attention mechanism to obtain m question feature vectors, more global features can be represented, then the m question feature vectors are weighted into corresponding user question vectors according to the answer vectors, the candidate answers and the user questions can be fused with each other, the accuracy of the matching values of the corresponding user question vectors and the corresponding answer vectors is improved, the matching effect of the candidate answers and the user questions is further improved, on the basis, the data processing amount in the matching process is reduced, and the question-answer matching efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of an application environment of a query-response matching method based on attention-driven mechanism according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for matching questions and answers based on the attention mechanism according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating an implementation of step S30 in FIG. 2;

FIG. 4 is a flowchart illustrating an implementation of step S40 in FIG. 2;

FIG. 5 is a flowchart illustrating an implementation of step S50 in FIG. 2;

FIG. 6 is a flowchart illustrating an implementation of step S10 in FIG. 2;

FIG. 7 is a schematic flow chart of another implementation of step S10 in FIG. 2;

FIG. 8 is a flowchart illustrating an implementation of step S03 in FIG. 7;

FIG. 9 is a schematic diagram of an embodiment of a device for matching questions and answers based on an attention mechanism;

FIG. 10 is a schematic diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The question-answer matching method based on the attention mechanism provided by the embodiment of the invention can be applied to an application environment shown in fig. 1, wherein the terminal equipment is communicated with a server through a network. The method comprises the steps that a server obtains a user question input by a user through terminal equipment, and determines a plurality of candidate answers of the user question and answer vectors corresponding to the candidate answers; then converting the user problem into a word vector sequence, and inputting the word vector sequence of the user problem into a BERT model to obtain a plurality of hidden state problem vectors output by the BERT model; converting a plurality of hidden state problem vectors based on an attention mechanism to obtain m problem feature vectors for representing user problems, wherein m is an integer greater than 1; converting the m question feature vectors according to the answer vectors to obtain user question vectors corresponding to the answer vectors; finally, according to the matching value of the corresponding user question vector and the answer vector, the correct answer of the user question is determined in the multiple candidate answers; according to the invention, the user question vectors are improved based on the attention mechanism to obtain m question feature vectors, more global features can be represented, then the m question feature vectors are weighted into corresponding user question vectors according to the answer vectors, the candidate answers and the user questions can be fused with each other, the accuracy of the matching values of the corresponding user question vectors and the corresponding answer vectors is improved, the matching effect of the candidate answers and the user questions is further improved, on the basis, the data processing amount in the matching process is reduced, and the question-answer matching efficiency is improved. Finally, the artificial intelligence of the question-answering system is further improved, and the user experience is improved.

The multiple candidate answers and the answer vectors and other related data corresponding to the candidate answers are stored in the database of the server, and when answer matching needs to be carried out on the user questions, the related data are directly obtained from the database of the server, so that the efficiency of question and answer matching is improved.

The database in this embodiment is stored in the blockchain network, and is used to store data used and generated in the question-answer matching method based on the attention mechanism, such as the candidate answers and the answer vectors corresponding to the candidate answers. The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like. The database is deployed in the blockchain, so that the safety of data storage can be improved.

The terminal device may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.

In an embodiment, as shown in fig. 2, a question-answer matching method based on an attention mechanism is provided, which is described by taking the server in fig. 1 as an example, and includes the following steps:

s10: the method comprises the steps of obtaining a user question input by a user, and determining a plurality of candidate answers of the user question and answer vectors corresponding to the candidate answers.

The method comprises the steps of obtaining a user question input by a user through terminal equipment, obtaining a plurality of candidate standard answers from a database, carrying out keyword matching on the user question and the standard answers to obtain a plurality of candidate answers, and further obtaining answer vectors corresponding to the candidate answers. The answer vector is obtained by extracting the feature vector of the candidate answer.

The answer vector can be obtained by the following method:

the first method comprises the following steps: obtaining a candidate answer from a database, inputting the candidate answer into a BERT model to obtain a plurality of hidden state answer vectors output by the BERT model, and then aggregating the hidden state answer vectors to obtain the answer vector of the candidate answer. And if the answer vector is not matched with the user question vector, continuously acquiring a candidate answer from the database, converting the candidate answer into the answer vector to be matched with the user question vector.

Secondly, after a plurality of standard answers are determined, the plurality of standard answers are converted in an off-line mode to obtain answer vectors corresponding to the plurality of standard answers, the plurality of standard answers and the answer vectors corresponding to the standard answers are stored in a database in a one-to-one corresponding mode, after a user question input by a user is obtained, the standard answers corresponding to the user question are determined in the database to serve as candidate answers of the user question, then the answer vectors corresponding to the candidate answers are directly pulled in the database, on-line conversion of the candidate answers is not needed, the calculation amount is reduced, and the question-answer matching efficiency is improved. In the off-line conversion, standard answers are input into a BERT model in advance under an on-line state to obtain a plurality of hidden state answer vectors output by the BERT model, and then the hidden state answer vectors are aggregated to obtain answer vectors of the standard answers, so that the answer vectors of all the standard answers are obtained under the on-line state.

S20: and converting the user question into a word vector sequence, and inputting the word vector sequence of the user question into the BERT model to obtain a plurality of hidden state question vectors output by the BERT model.

After a user problem input by a user is obtained, carrying out single word segmentation on the user problem to obtain a word vector sequence of the user problem, then inputting the word vector sequence of the user problem into a BERT model for coding to obtain a hidden state of the BERT model, and using the hidden state as a plurality of hidden state problem vectors output after the BERT model codes the user problem. The hidden state of the BERT model is used as the representation of the user problem, so that the vector and the user problem information have higher correlation, the subsequent feature extraction based on the hidden state problem vector is facilitated, and the accuracy of extracting the feature vector is ensured.

For example, the user question is q, and the word vector sequence of the user question is represented as:

word vector sequence of user question

Inputting a BERT model for coding to obtain the hidden state of the last layer of the transform of the BERT model as a characterization vector h of a user problem_jNamely, a plurality of hidden state problem vectors output by the BERT model are obtained, and the hidden state problem vectors are expressed by the following formula:

wherein N is_xThe length of the word vector sequence of the user question,

for the Nth in the user question_xWord vector, h_jFor the hidden state problem vector of the jth word in the user problem, j belongs to [0, N ∈_x]。

S30: a plurality of hidden state problem vectors are transformed based on an attention mechanism to obtain m problem feature vectors for characterizing a user problem.

After obtaining a plurality of hidden state problem vectors output by the BERT model, converting the plurality of hidden state problem vectors based on an attention mechanism to obtain m problem feature vectors for representing user problems in a screening mode. Wherein m is a whole number larger than 1, and can be valued according to actual needs.

Converting a plurality of hidden state problem vectors based on an attention mechanism to obtain m problem feature vectors for representing user problems, wherein the specific process comprises the following steps: firstly, determining an attention weight matrix according to a plurality of hidden state problem vectors, carrying out weighted summation on the plurality of hidden state problem vectors according to a plurality of attention weights in the attention weight matrix, and screening to obtain a problem feature vector for representing a user problem; and then updating the attention weight matrix, carrying out weighted summation on a plurality of hidden state problem vectors according to a plurality of attention weights in the updated attention weight matrix, screening to obtain the next problem feature vector, and carrying out multiple screening in a circulating manner in sequence until m problem feature vectors used for representing the user problem are obtained. The m problem feature vectors are obtained based on the attention mechanism, the problem feature vectors can have good semantic relevance, and on the basis of ensuring the accuracy of the problem feature vectors, the m problem feature vectors are captured, so that the calculation amount is reduced.

S40: and converting the m question feature vectors according to the answer vectors to obtain user question vectors corresponding to the answer vectors.

After m question feature vectors used for representing user questions are obtained, weighting and summing the m question feature vectors according to the answer vectors so as to obtain user question vectors corresponding to the answer vectors. The calculation of the user question vector corresponding to the answer vector is also based on the vector calculation of the attention mechanism. Firstly, an attention weight matrix is determined according to the answer vector and the user question vector, and m question feature vectors are combined and calculated according to m attention weights in the attention weight matrix to obtain the user question vector corresponding to the answer vector. And performing vector processing based on a double-layer attention mechanism, and performing interactive fusion on the answer vector and the m question feature vectors to obtain a user question vector when performing vector calculation based on the attention mechanism for the second time, so that the correlation between the user question vector and the answer vector is improved, and the matching is facilitated.

S50: and determining a correct answer of the user question in the plurality of candidate answers according to the matching value of the corresponding user question vector and the answer vector.

After the user question vector corresponding to the answer vector is obtained, the matching value of the corresponding user question vector and the answer vector is determined in a dot product operation mode, and then the correct answer of the user question is determined in a plurality of candidate answers according to the matching value of the corresponding user question vector and the answer vector.

The matching value (matching score) between the user question vector corresponding to the answer vector and the answer vector is calculated by the following formula:

wherein q is a user question; y is_qA user question vector that is a user question; a is_iIs the ith candidate answer;

an answer vector for the ith candidate answer; s (q, a)_i) Matching of answer vector for ith candidate answer with user question vectorThe value is obtained.

In the embodiment, a user question input by a user is obtained, and a plurality of candidate answers of the user question and answer vectors corresponding to the candidate answers are determined; then converting the user problem into a word vector sequence, and inputting the word vector sequence of the user problem into a BERT model to obtain a plurality of hidden state problem vectors output by the BERT model; converting a plurality of hidden state problem vectors based on an attention mechanism to obtain m problem feature vectors for representing user problems, wherein m is an integer greater than 1; then, performing weighted summation conversion on the m question feature vectors according to the answer vectors to obtain user question vectors corresponding to the answer vectors; finally, according to the matching value of the corresponding user question vector and the answer vector, the correct answer of the user question is determined in the multiple candidate answers; in this embodiment, the user question vectors are improved based on the attention mechanism to obtain m question feature vectors, more global features can be represented, then the m question feature vectors are combined into corresponding user question vectors according to the answer vectors, the candidate answers and the user questions can be fused with each other, the accuracy of matching values of the user question vectors and the answer vectors is improved, and further the matching effect of the candidate answers and the user questions is improved.

In an embodiment, as shown in fig. 3, in step S30, that is, converting the hidden-state problem vectors based on the attention mechanism to obtain m problem feature vectors for characterizing the user problem, the method specifically includes the following steps:

a. an initial weight is determined, and a first weight matrix is determined based on the plurality of hidden state problem vectors and the initial weight.

After obtaining the hidden-state problem vectors, an initial weight needs to be randomly determined to determine a first weight matrix according to the hidden-state problem vectors and the initial weight. Wherein, because there are m hidden state problem vectors, it is necessary to cyclically take values of m initial weights, and the initial weight of each value is recorded as c_iThen, after m initial weight values are finally obtained, the initial weight matrix formed by the m initial weights is (c)₁..c_m) The initial weight matrix comprises a plurality of initial weights, and each initial weight and a plurality of hidden state problem vectors are calculated to generate a first weight matrix. Wherein the initial weight matrix (c)₁..c_m) The weight matrix is used for measuring the hidden state of the last layer of the transformer in the BERT model.

Wherein each weight in the first weight matrix is a product of the initial weight and each hidden state problem vector, that is, the first weight matrix is: (c)_i·h₁，...，c_i·h_j)，i∈[0，m]。

Wherein, c_iIs the ith initial weight, h, in the initial weight matrix_jFor the jth hidden state problem vector, m is the number of initial weights.

b. The first weight matrix is normalized using a normalization exponential function to obtain a plurality of attention weights.

After the first weight matrix is determined, the first weight matrix is normalized by using a normalization exponential function (soffmax) to obtain a plurality of attention weights corresponding to the initial weights.

Wherein the plurality of attention weights are represented by the following formula:

wherein softmax is a normalized exponential function, c_iIs the ith initial weight, h_jFor the jth hidden-state problem vector,

the ith attention weight corresponds to the ith initial weight.

c. And carrying out weighted summation on a plurality of hidden state problem vectors according to a plurality of attention weights to obtain a problem feature vector for representing the user problem.

After obtaining the plurality of attention weights, performing weighted summation on the plurality of hidden-state problem vectors according to the plurality of attention weights to obtain a problem feature vector for characterizing the user problem.

The calculation formula of the problem feature vector is as follows:

wherein,

for the jth attention weight corresponding to the ith initial weight, i ∈ [0, m]；h_jIs the jth hidden state problem vector;

is the ith problem feature vector used for characterizing the user problem.

d. And repeating the steps a-c to obtain m problem feature vectors.

Repeating the steps a to c to obtain m problem feature vectors.

In this embodiment, the following steps are performed: a. determining an initial weight, and determining a first weight matrix according to a plurality of hidden state problem vectors and the initial weight; b. performing normalization processing on the first weight matrix by adopting a normalization index function to obtain a plurality of attention weights; c. weighting and summing a plurality of hidden state problem vectors according to a plurality of attention weights to obtain a problem feature vector for representing a user problem; d. repeating the steps a to c to obtain m problem feature vectors, defining a specific process of converting a plurality of hidden state problem vectors based on an attention mechanism to obtain m problem feature vectors for representing user problems, determining attention weight according to the hidden state problem vectors to screen out the m problem feature vectors, and providing a basis for subsequent calculation.

In an embodiment, as shown in fig. 4, in step S40, the method for converting m question feature vectors according to answer vectors to obtain user question vectors corresponding to the answer vectors includes the following steps:

s41: and determining a second weight matrix corresponding to the answer vector according to the answer vector and the m question feature vectors.

After obtaining the answer vector of the candidate answer and the m question feature vectors, determining a second weight matrix corresponding to the answer vector according to the answer vector of the candidate answer and the m question feature vectors.

In a second weight matrix corresponding to the answer vector, each weight is a product of the answer vector of the candidate answer and the question feature vector, and the second weight matrix is expressed as:

wherein,

for the ith problem feature vector of the m problem feature vectors,

the answer vector corresponding to the ith candidate answer in the plurality of candidate answers.

S42: and carrying out normalization processing on the second weight matrix by adopting a normalization index function so as to obtain a plurality of target weights.

After determining a second weight matrix corresponding to the answer vector, performing normalization processing on the second weight matrix by using a normalization exponential function softmax to obtain a plurality of target weights.

Wherein the plurality of target weights are represented by the following formula:

wherein softmax is a normalized exponential function,

for the ith problem feature vector in the m problem feature vectorsThe amount of the compound (A) is,

an answer vector, w, corresponding to the ith candidate answer in the plurality of candidate answers_iIs the ith target weight, i ∈ [0, m]。

S43: and summing the m question feature vectors according to the plurality of target weights to obtain a user question vector corresponding to the answer vector.

After obtaining the multiple target weights, performing weighted summation on the m question feature vectors according to the multiple target weights to obtain user question vectors corresponding to the answer vectors.

The user question vector corresponding to the answer vector is calculated by the following formula:

wherein, y_qIs the user question vector to which the answer vector corresponds,

for the ith problem feature vector, w, of the m problem feature vectors_iIs the ith target weight, i ∈ [0, m]。

In this embodiment, a second weight matrix corresponding to the answer vector is determined according to the answer vector and the m question feature vectors; performing normalization processing on the second weight matrix by adopting a normalization index function to obtain a plurality of target weights; the method comprises the steps of summing m question feature vectors according to a plurality of target weights to obtain user question vectors corresponding to answer vectors, defining the process of converting the m question feature vectors according to the answer vectors to obtain the user question vectors corresponding to the answer vectors, determining the user question vectors corresponding to the answer vectors of candidate answers based on an attention mechanism, providing a basis for candidates according to matching values of the user question vectors and the answer vectors of the candidate answers, introducing the answer vectors of the candidate answers into the user question vectors, increasing interaction between user questions and the candidate answers, improving accuracy of the user question vectors, further improving accuracy of candidate matching values, and being beneficial to increasing matching effects.

In one embodiment, as shown in fig. 5, in step S50, determining a correct answer to the user question from the multiple candidate answers according to the matching value of the corresponding user question vector and answer vector, the method specifically includes the following steps:

s51: and calculating the matching value of the corresponding user question vector and answer vector.

After obtaining the user question vector corresponding to the answer vector, calculating a matching value between the user question vector corresponding to the answer vector and the answer vector in a dot product operation mode, wherein the matching value between the user question vector corresponding to the answer vector and the answer vector is calculated by the following formula:

wherein, s (q, a)_i) Matching the ith candidate answer a with the user question q; y is_qA user question vector that is a user question q;

is the answer vector of the ith candidate answer a.

S52: and taking the matching value as a target matching value of the candidate answer and the user question.

After calculating the matching value of the corresponding user question vector and answer vector, the matching value is used as the target matching value of the candidate answer and the user question.

S53: and sorting the candidate answers in an ascending order according to the size of the target matching value to obtain a candidate answer list.

After the matching value is used as a target matching value between the candidate answer and the user question, the candidate answers are sorted in an ascending order according to the target matching value to obtain a candidate answer list, namely, the candidate answers in the candidate answer list are sorted according to the size of the corresponding target matching value.

S54: and sequencing the first candidate answer in the candidate answer list to serve as a correct answer of the user question.

After the candidate answers are sorted in an ascending order according to the target matching value to obtain the candidate answer list, the target matching value of the first-ranked candidate answer in the candidate answer list is the largest, which means that the first-ranked candidate answer is the best matched with the user question, and then the first-ranked candidate answer in the candidate answer list is used as the correct answer of the user question, so that the user can obtain the most accurate answer, and the user experience is improved.

In this embodiment, the matching value between the corresponding user question vector and answer vector is calculated; taking the matching value as a target matching value of the candidate answer and the user question; according to the size of the target matching value, sequencing a plurality of candidate answers in an ascending order to obtain a candidate answer list; the first candidate answer in the candidate answer list is used as the correct answer of the user question, the process of determining the correct answer of the user question in a plurality of candidate answers according to the matching value of the corresponding user question vector and answer vector is defined, and the process is to

In one embodiment, as shown in fig. 6, the step S10 of determining a plurality of candidate answers to the user question and an answer vector corresponding to the candidate answer specifically includes the following steps:

s11: a plurality of standard answers stored in a database is obtained.

After obtaining the user question input by the user, a plurality of standard answers stored in the database need to be obtained.

S12: the named entities in the user question are determined, and the standard answers containing the named entities are used as candidate answers of the user question.

After obtaining a plurality of standard answers stored in a database, a plurality of candidate answers are determined among the plurality of standard answers according to a user question, wherein the plurality of candidate answers are determined among the plurality of standard answers by way of keyword (entity) matching.

Firstly, determining a named entity in the user question, determining whether the standard answer comprises the named entity in the user question, and if the standard answer comprises the named entity in the user question, taking the standard answer comprising the named entity in the user question as a candidate answer of the user question. By determining the standard answers with the same named entities as the candidate answers in the plurality of standard answers, the candidate matching calculation amount can be reduced, so that the question-answer matching efficiency can be improved, the user can be quickly replied, and the user experience is improved.

S13: and obtaining answer vectors corresponding to the candidate answers in the database so as to obtain the answer vectors corresponding to a plurality of candidate answers.

In this embodiment, the database stores standard answers and answer vectors of the standard answers, and the standard answers and the answer vectors of the standard answers correspond to each other one by one. After the candidate answers of the user questions are determined, answer vectors corresponding to the candidate answers are obtained in the database to obtain answer vectors corresponding to a plurality of candidate answers, the standard answers are converted into the answer vectors in advance in an off-line mode to be stored, so that a subsequent server can rapidly determine the answer vectors of the candidate answers according to the actual user questions, the answer vectors of the candidate answers are not required to be calculated on line, the calculation amount of the server is reduced, and the effects of reducing the load of the server and improving the question-answer matching efficiency and the response speed of the server are achieved.

In this embodiment, a named entity in a user question is determined by obtaining a plurality of standard answers stored in a database, the standard answers including the named entity are used as candidate answers of the user question, answer vectors corresponding to the candidate answers are obtained in the database to obtain answer vectors corresponding to the plurality of candidate answers, a specific process of obtaining the answer vectors corresponding to the plurality of candidate answers is defined, the answer vectors of the candidate answers do not need to be calculated on line, the calculation amount of a server is reduced, and the effects of reducing the load of the server, improving the question-answer matching efficiency and improving the response speed of the server are achieved.

In an embodiment, as shown in fig. 7, obtaining answer vectors corresponding to the candidate answers specifically includes the following steps:

s01: and carrying out single-word segmentation on the candidate answers to obtain a word vector sequence of the candidate answers.

After the candidate answer is obtained, the candidate answer is subjected to single-word segmentation to obtain a word vector sequence of the candidate answer. The word vector sequence is used as the input of the BERT model, the word vectors of the candidate answers are obtained, the division is more accurate than the division of the traditional word vectors, and the obtained vectors are more accurate.

S02: and inputting the word vector sequence of the candidate answers into a BERT model for coding so as to obtain a plurality of hidden-state answer vectors output by the BERT model.

After the word vector sequence of the candidate answer is obtained, the word vector sequence of the candidate answer is input into a BERT model for encoding, so that the hidden state of the last layer of a transformer in the BERT model is obtained and is used as a plurality of hidden state answer vectors of the candidate answer output by the BERT model.

S03: and aggregating the plurality of hidden-state answer vectors based on a preset aggregation mode to obtain answer vectors corresponding to the candidate answers.

After a plurality of hidden-state answer vectors output by the BERT model are obtained, the hidden-state answer vectors are aggregated based on a preset aggregation mode to obtain answer vectors corresponding to the candidate answers.

Wherein, the word vector sequence of the candidate answer a is

Then the calculation formula of the answer vector corresponding to the candidate answer is:

wherein BERT is a function of a BERT model, Concat represents an aggregation function (preset aggregation mode), N_yThe length of the word vector sequence for candidate answer a,

for the Nth in the user question_xA vector of words is formed by a vector of words,

for the answer vector corresponding to the ith candidate answer, i belongs to [0, N ∈_y]。

For example, the user problems are: where the first of china? The candidate answers (correct answers) are: and if the capital of China is Beijing, performing single word segmentation on the capital of China is Beijing to obtain a word vector sequence of the candidate answer, inputting the word vector sequence of the candidate answer into a BERT model for coding to obtain a plurality of hidden state answer vectors output by the BERT model, and aggregating the plurality of hidden state answer vectors based on a preset aggregation mode to obtain answer vectors corresponding to the candidate answer. According to the above process, the answer vector normalized by the candidate answer-the capital of china is beijing is: 0.2341, 0.4353, 0.2352, 0.6436, …, 0.3453; the answer vector is 1 x 128 in length.

In the embodiment, a word vector sequence of the candidate answer is obtained by performing single word segmentation on the candidate answer, and then the word vector sequence of the candidate answer is input into a BERT model for coding to obtain a plurality of hidden state answer vectors output by the BERT model; and finally, aggregating the plurality of hidden state answer vectors based on a preset aggregation mode to obtain answer vectors corresponding to the candidate answers, so that the obtaining process of the answer vectors is defined, and a basis is provided for determining correct answers according to the matching values of the answer vectors corresponding to the candidate answers and the user question vectors.

In an embodiment, as shown in fig. 8, in the step S03, the step of aggregating the hidden-state answer vectors based on a preset aggregation manner to obtain answer vectors corresponding to the candidate answers includes the following steps:

s031: determining a first-bit hidden-state answer vector output by the BERT model from the plurality of hidden-state answer vectors.

After obtaining a plurality of hidden state answer vectors output by the BERT model, determining a first hidden state answer vector output by the BERT model from the hidden state answer vectors, wherein the first hidden state answer vector is a vector corresponding to [ CLS ] in the BERT model.

S032: and taking the first hidden-state answer vector output by the BERT model as an answer vector corresponding to the candidate answer.

And after determining the first hidden-state answer vector output by the BERT model, taking the first hidden-state answer vector output by the BERT model as the answer vector corresponding to the candidate answer. In the BERT model, the vectors corresponding to [ CLS ] can more fairly fuse the semantic information of each character/word in the text, can represent candidate answers than other characters/words, and directly uses the first hidden-state answer vector output by the BERT model as the answer vector corresponding to the candidate answer, so that the calculation amount can be reduced, the acquisition speed of the answer vector corresponding to the candidate answer is improved, and the question-answer matching efficiency is improved.

In other embodiments, the preset aggregation manner may also be: determining the mean value of a plurality of hidden state answer vectors, and taking the mean value of the plurality of hidden state answer vectors as the answer vector corresponding to the candidate answer so as to improve the accuracy of the answer vector corresponding to the candidate answer; or determining the first m hidden-state answer vectors in a plurality of hidden-state answer vectors output by the BERT model, determining the mean value of the first m hidden-state answer vectors, and taking the mean value of the first m hidden-state answer vectors as the answer vector corresponding to the candidate answer, thereby reducing the meaningless vector processing and reducing the calculation amount.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In an embodiment, an attention-based question-answer matching device is provided, and the attention-based question-answer matching device is in one-to-one correspondence with the attention-based question-answer matching method in the above embodiment. As shown in fig. 9, the apparatus for matching questions and answers based on the attention mechanism includes an obtaining module 901, an encoding module 902, a converting module 903, a second converting module 904, and a determining module 905. The functional modules are explained in detail as follows:

an obtaining module 901, configured to obtain a user question input by a user, and determine multiple candidate answers to the user question and answer vectors corresponding to the candidate answers;

the encoding module 902 is configured to convert the user question into a word vector sequence, and input the word vector sequence of the user question into the BERT model to obtain a plurality of hidden-state question vectors output by the BERT model;

a first conversion module 903, configured to convert, based on an attention mechanism, a plurality of hidden-state problem vectors to obtain m problem feature vectors for characterizing a user problem, where m is an integer greater than 1;

a second conversion module 904, configured to convert the m question feature vectors according to the answer vector to obtain user question vectors corresponding to the answer vector;

a determining module 905, configured to determine a correct answer to the user question from the multiple candidate answers according to a matching value between the corresponding user question vector and the answer vector.

Further, the first conversion module 903 is specifically configured to:

a. determining an initial weight, and determining a first weight matrix according to a plurality of hidden state problem vectors and the initial weight;

b. performing normalization processing on the first weight matrix by adopting a normalization index function to obtain a plurality of attention weights;

c. weighting and summing a plurality of hidden state problem vectors according to a plurality of attention weights to obtain a problem feature vector for representing a user problem;

d. and repeating the steps a-c to obtain m problem feature vectors.

Further, the second conversion module 904 is specifically configured to:

determining a second weight matrix corresponding to the answer vector according to the answer vector and the m question feature vectors;

performing normalization processing on the second weight matrix by adopting a normalization index function to obtain a plurality of target weights;

and summing the m question feature vectors according to the plurality of target weights to obtain a user question vector corresponding to the answer vector.

Further, the determining module 905 is specifically configured to:

calculating a matching value of the corresponding user question vector and answer vector;

taking the matching value as a target matching value of the candidate answer and the user question;

according to the size of the target matching value, sequencing a plurality of candidate answers in an ascending order to obtain a candidate answer list;

and sequencing the first candidate answer in the candidate answer list to serve as a correct answer of the user question.

Further, the obtaining module 901 is specifically configured to:

acquiring a plurality of standard answers stored in a database;

determining a named entity in the user question, and taking a standard answer containing the named entity as a candidate answer of the user question;

and obtaining answer vectors corresponding to the candidate answers in the database so as to obtain the answer vectors corresponding to a plurality of candidate answers.

Further, the obtaining module 901 is further specifically configured to obtain an answer vector corresponding to the candidate answer by:

carrying out single word segmentation on the candidate answers to obtain a word vector sequence of the candidate answers;

inputting a word vector sequence of the candidate answers into a BERT model for coding so as to obtain a plurality of hidden state answer vectors output by the BERT model;

and aggregating the plurality of hidden-state answer vectors based on a preset aggregation mode to obtain answer vectors corresponding to the candidate answers.

Further, the obtaining module 901 is further specifically configured to:

determining a first hidden-state answer vector to be output by a BERT model from a plurality of hidden-state answer vectors;

and taking the first hidden-state answer vector output by the BERT model as an answer vector corresponding to the candidate answer.

For specific limitations of the attention-based question-answer matching device, reference may be made to the above limitations of the attention-based question-answer matching method, which are not described herein again. The various modules in the attention-based question matching apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as answer vectors corresponding to the candidate answers and the subsequent answers. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a question-answer matching method based on an attention mechanism.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

converting a plurality of hidden state problem vectors based on an attention mechanism to obtain m problem feature vectors for characterizing user problems;

converting the m question feature vectors according to the answer vectors to obtain user question vectors corresponding to the answer vectors, wherein m is an integer larger than 1;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A question-answer matching method based on an attention mechanism is characterized by comprising the following steps:

converting the user question into a word vector sequence, and inputting the word vector sequence of the user question into a BERT model to obtain a plurality of hidden state question vectors output by the BERT model;

converting the plurality of hidden state problem vectors based on an attention mechanism to obtain m problem feature vectors for characterizing the user problem, wherein m is an integer greater than 1;

2. The method for question-answer matching based on an attention mechanism as claimed in claim 1, wherein the attention mechanism converting the plurality of hidden-state question vectors to obtain m question feature vectors for characterizing the user question comprises:

a. determining an initial weight, and determining a first weight matrix according to the hidden state problem vectors and the initial weight;

c. performing weighted summation on the hidden-state question vectors according to the attention weights to obtain a question feature vector for representing the user question;

d. repeating the steps a-c to obtain m problem feature vectors.

3. The method according to claim 1, wherein the converting m question feature vectors according to the answer vector to obtain the user question vector corresponding to the answer vector comprises:

and summing the m question feature vectors according to the target weights to obtain a user question vector corresponding to the answer vector.

4. The method according to claim 1, wherein the determining the correct answer to the user question from the plurality of candidate answers according to the matching value of the corresponding user question vector and the answer vector comprises:

calculating a matching value of the corresponding user question vector and the answer vector;

using the matching value as a target matching value of the candidate answer and the user question;

and taking the first ranked candidate answer in the candidate answer list as a correct answer of the user question.

5. The method according to claim 1, wherein the determining a plurality of candidate answers to the user question and answer vectors corresponding to the candidate answers comprises:

acquiring a plurality of standard answers stored in a database;

and obtaining answer vectors corresponding to the candidate answers in the database so as to obtain a plurality of answer vectors corresponding to the candidate answers.

6. The method for question-answer matching based on the attention mechanism as claimed in any one of claims 1-5, wherein the answer vector corresponding to the candidate answer is obtained by:

carrying out single word segmentation on the candidate answer to obtain a word vector sequence of the candidate answer;

inputting the word vector sequence of the candidate answer into the BERT model for coding so as to obtain a plurality of hidden state answer vectors output by the BERT model;

7. The method according to claim 6, wherein the aggregating the hidden-state answer vectors based on a predetermined aggregation manner to obtain the answer vector corresponding to the candidate answer comprises:

determining a first hidden-state answer vector output by the BERT model from the plurality of hidden-state answer vectors;

8. A question-answer matching device based on an attention mechanism, comprising:

the coding module is used for converting the user question into a word vector sequence and inputting the word vector sequence of the user question into a BERT model so as to obtain a plurality of hidden state question vectors output by the BERT model;

a first conversion module, configured to convert the hidden-state problem vectors based on an attention mechanism to obtain m problem feature vectors for characterizing the user problem;

the second conversion module is used for converting the m question feature vectors according to the answer vector to obtain user question vectors corresponding to the answer vector, wherein m is an integer larger than 1;

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the attention-based mechanism question-answer matching method according to any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the attention-based question-answer matching method according to any one of claims 1 to 7.