CN112836035A

CN112836035A - Method, device, equipment and computer readable medium for matching data

Info

Publication number: CN112836035A
Application number: CN202110241934.4A
Authority: CN
Inventors: 黄凯; 刘设伟
Original assignee: Taikang Insurance Group Co Ltd; Taikang Online Property Insurance Co Ltd
Current assignee: Taikang Insurance Group Co Ltd; Taikang Online Property Insurance Co Ltd
Priority date: 2021-03-04
Filing date: 2021-03-04
Publication date: 2021-05-25

Abstract

The invention discloses a method, a device, equipment and a computer readable medium for matching data, and relates to the technical field of computers. One embodiment of the method comprises: acquiring a query question of a user; matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database; and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question. The method and the system can improve the accuracy of matching the customer problem with the knowledge point.

Description

Method, device, equipment and computer readable medium for matching data

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a computer-readable medium for matching data.

Background

The intelligent customer service robot is a robot which provides intelligent question and answer service for customers in different channels and products. In the intelligent question and answer process, a client puts forward a question, the intelligent customer service robot returns a corresponding answer after searching and matching in the background knowledge base, the intelligent question and answer process is completed, and the intelligent question and answer system has extremely high convenience, timeliness and accuracy.

In the intelligent question-answering process, the accuracy of the returned answers is an important factor for measuring the quality of the intelligent customer service robot, and the accuracy of the answers is often closely related to the accuracy of matching of customer questions and knowledge points.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the accuracy of matching the customer questions with the knowledge points is low.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a computer readable medium for matching data, which can improve accuracy of matching between a client problem and a knowledge point.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of matching data, including:

acquiring a query question of a user;

matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database;

and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question.

The matching of the query question with the question in the question-answer pair and the matching of the answer in the question-answer pair with the query question and the question in the question-answer pair respectively to obtain the matching degree of the query question and each question-answer pair includes:

determining a first similarity of the query question and a question in the question-answer pair;

determining a second similarity according to the similarity between the answers in the question-answer pairs and the questions in the query question and question-answer pairs;

and obtaining the matching degree of the query question and each question-answer pair according to the first similarity and the second similarity.

The determining a first similarity of the query question and the question-answer pair comprises:

determining a first similarity between the query question and a question in the question-answer pair by adopting a Bert model;

determining a second similarity according to the matching degree of the answers in the question-answer pairs, the query questions and the questions in the question-answer pairs, wherein the determining comprises the following steps:

and determining a second similarity according to the similarity between the answers in the question-answer pairs and the questions in the query question and question-answer pairs by adopting the Bert model.

The obtaining the matching degree between the query question and each pair of question answers according to the first similarity and the second similarity comprises:

splicing the first similarity and the second similarity to obtain a third similarity;

and inputting the third similarity into a multilayer perceptron to obtain the matching probability of the query question and each pair of answer questions, and taking the matching probability as the matching degree.

Determining a second similarity according to the similarity between the answer in the question-answer pair and the query question and the similarity between the answer in the question-answer pair and the question in the question-answer pair by adopting the Bert model, wherein the determining comprises the following steps:

matching the answers in the question-answer pairs with the query questions to obtain a first matching vector;

matching the answers in the question-answer pairs with the questions in the question-answer pairs to obtain second matching vectors;

and taking the similarity between the first matching vector and the second matching vector as the second similarity.

The method further comprises the following steps:

and taking the query question and the answer of the query question as a newly added question-answer pair, and storing the newly added question-answer pair in the database.

The obtaining of the query question of the user comprises:

and acquiring the query question of the user from the browser and/or the mobile terminal application.

The method further comprises the following steps:

and training to obtain the Bert model by adopting a training set and a testing set, wherein each sample of the training set and the testing set comprises a query question, a question-answer pair and a label of the user, and the label is used for representing a matching result of the query question and the question-answer pair of the user.

After the answer in the answer to the question corresponding to the maximum value is taken as the answer to the query question, the method further includes:

and outputting the answer of the query question by one or more of the following modes, wherein the modes comprise texts, voice and images.

According to a second aspect of the embodiments of the present invention, there is provided an apparatus for matching data, including:

the query module is used for acquiring query questions of a user;

the matching module is used for matching the query question with questions in question-answer pairs and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, and the question-answer pairs comprise existing questions of a database and answers corresponding to the existing questions of the database;

and the output module is used for taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question if the maximum value of the matching degree meets the matching condition.

According to a third aspect of embodiments of the present invention, there is provided an electronic device for matching data, including:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method as described above.

According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method as described above.

One embodiment of the above invention has the following advantages or benefits: acquiring a query question of a user; matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database; and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question. The answers can be matched not only from the angle of question matching, but also from the angle of questions and answers, so that the accuracy of matching the query questions and the knowledge points can be improved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of a Q-Q semantic matching model;

FIG. 2 is A schematic diagram of A Q-Q-A single-sided matching model;

FIG. 3 is a schematic diagram of a main flow of a method of matching data according to an embodiment of the invention;

FIG. 4 is a schematic flow chart of obtaining the matching degree between the query question and each question-answer pair according to the embodiment of the present invention;

FIG. 5 is a schematic diagram of obtaining a degree of match of a query question with each question-answer pair in an embodiment in accordance with the invention;

FIG. 6 is a schematic diagram of a detailed process for obtaining the matching degree according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of the main structure of an apparatus for matching data according to an embodiment of the present invention;

FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 9 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Intelligent customer service robots have become important and convenient tools for acquiring knowledge. In order to enhance the efficiency and accuracy of robot service, the similar problem determination is also gradually becoming the core task of intelligent customer service robots. The core task requires to construct a semantic matching model, then to find a database question highly similar to the query question, and finally to return an answer corresponding to the database question to the client.

Referring to FIG. 1, FIG. 1 is a schematic diagram of a Q-Q semantic matching model. A Q-Q (Question-Question) semantic matching model is commonly used as the semantic matching model. The input of the Q-Q semantic matching model is a query problem and a database problem to be matched, but the Q-Q semantic matching model has some defects.

As an example: the inability to obtain client context information results in query problems with unclear semantics. There may also be different query questions in a real environment, but expressing the same meaning scenario, such as "which should these insurance me buy? "and" can you help me recommend a product? ".

Currently, it has been considered to determine matched knowledge points using semantic information of the question answer enrichment database question.

Referring to FIG. 2, FIG. 2 is A diagram of A Q-Q-A single-sided matching model. Traditional methods typically employ A Q- A one-sided matching model, i.e., using answers as an extended representation of the corresponding database questions. However, due to the verbosity and diversity of the answers, some noise may be introduced in the similarity calculation process, so that the accuracy of matching the query question with the knowledge points is low.

In order to solve the technical problem that the accuracy of matching between the query problem and the knowledge point is low, the following technical scheme in the embodiment of the invention can be adopted.

Referring to fig. 3, fig. 3 is a schematic diagram of a main flow of a method for matching data according to an embodiment of the present invention, which not only matches a query question with a question in a question-answer pair, but also matches the query question, the question in the question-answer pair, and an answer in the question-answer pair. As shown in fig. 3, the method specifically includes the following steps:

s301, acquiring the query question of the user.

The technical scheme in the embodiment of the invention can be applied to the server. As one example, a user enters a query question through a browser in a computer. As another example, a user enters a query question through a mobile terminal Application (APP). That is, the server may obtain the user's query questions from the browser and/or the mobile terminal application.

It should be noted that the query question input by the user may be voice and/or text. In the case where the query question input by the user includes speech, the speech may be converted into text using a speech recognition model. In the case where the query question entered by the user includes text, the text may be initially screened to delete the unused words and/or wrongly written words. Thereby improving the matching accuracy.

S302, matching the query question with the question in the question-answer pair, and matching the answer in the question-answer pair with the query question and the question in the question-answer pair respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pair comprises the existing questions of the database and the corresponding answer of the existing questions of the database.

Because of the sparse information between query questions and the database problem, many times a query question may not necessarily match the correct answer before it is discovered by the database manager and added to the set of questions in the knowledge site.

In this embodiment, matching answers based on query questions may be divided into two parts. The first part is: matching between questions; the second part is the matching between questions, questions and answers. Finally, the similarity of the query question and each question-answer pair is measured by the matching degree.

Referring to fig. 4, fig. 4 is a schematic flowchart of obtaining a matching degree between a query question and each question-answer pair according to an embodiment of the present invention, which specifically includes the following steps:

s401, determining first similarity of the questions in the query question and question-answer pairs.

In the embodiment of the invention, the database comprises a plurality of questions and answers corresponding to the questions. An answer to a question corresponding to the question is called a question-answer pair. That is, each question-answer pair includes an existing question of the database and an answer corresponding to the question.

First, a first similarity is determined according to a query question and a question-answer pair question of a user. And further measuring the similarity between the query question of the user and the question in the question-answer pair by using the first similarity. The similarity between the user's query question and the question in the question-answer pair is proportional to the value of the first similarity. It can be understood that the larger the numerical value of the first similarity is, the higher the similarity between the query question of the user and the question in the question-answer pair is; the smaller the numerical value of the first similarity is, the lower the similarity between the query question of the user and the question in the question-answer pair is.

S402, determining a second similarity according to the similarity between the answers in the question-answer pairs and the questions in the query question and question-answer pairs.

In addition to calculating the first similarity, in the embodiment of the present invention, it is also necessary to obtain the similarity between the answer in the question-answer pair and the query question, and the similarity between the answer in the question-answer pair and the question in the question-answer pair. Similarities are known from a dual perspective between questions, and between questions and answers.

It should be noted that the question in the question-answer pair and the answer in the question-answer pair related to determining the second similarity have a corresponding relationship. The second similarity is to determine the similarity with the query question from the viewpoint of the whole answer of the question.

And S403, obtaining the matching degree of the query question and each question-answer pair according to the first similarity and the second similarity.

After the first similarity and the second similarity are determined, the matching degree between the query question and each pair of answers to the question can be obtained according to the first similarity and the second similarity. As an example, the first similarity and the second similarity are concatenated to obtain a third similarity.

In the embodiment of fig. 4, the matching degree between the query question and each question-answer pair is obtained through the first similarity between the questions and the second similarity between the questions and the answers to the questions, so as to lay a foundation for determining the answers to the query question.

In one embodiment of the present invention, the calculation of the similarity may be implemented using a Bert model. That is, the first similarity of the questions in the query question and question-answer pair is determined using the Bert model. And determining a second similarity according to the similarity between the answers in the question-answer pairs and the query questions and the similarity between the answers in the question-answer pairs and the questions in the question-answer pairs by adopting a Bert model.

It should be noted that the Bert model is a trained model. The concrete implementation process of training to get the Bert model is exemplarily described below.

In the embodiment of the invention, a training set and a testing set are required to be adopted to obtain the Bert model through training, each sample of the training set and the testing set comprises a query question, a question-answer pair and a label of a user, and the label is used for representing a matching result of the query question and the question-answer pair of the user.

Specifically, the training samples are derived from the knowledge base system and external corpora. First, training samples need to be cleaned and labeled to improve data quality. Each training sample is composed of questions, knowledge points (questions and answers), and labels. If the label is 0, the problem is not matched with the knowledge point; the label is 1, indicating that the problem matches the knowledge point.

As an example, the training samples are scaled according to 4: 1 ratio, divided into training set and testing set. The training set is used for training the Bert model, and the testing set is used for testing the matching effect of the Bert model obtained through training.

Referring to fig. 5, fig. 5 is a schematic diagram of obtaining a matching degree of the query question and each question-answer pair according to the embodiment of the present invention.

In fig. 5, three steps are involved, namely question matching, question answer matching and aggregation.

And taking the two questions as input in the question matching process, and determining the similarity of the questions. The two questions are respectively a query question and a question-answer pair question. Where the question similarity is a vector.

In the process of matching the answers to the questions, a similar siemese neural network structure can be adopted, and the structure comprises three independent structures:

(1) matching interaction layers: and matching the query question with the answer in the question-answer pair, and matching the question in the question-answer pair with the answer in the question-answer pair to respectively carry out interactive training.

(2) Similarity calculation layer: and (4) making a similarity value of the matching vector obtained by interactive training, so as to obtain a similarity vector diagram.

(3) A pooling layer: and compressing the similarity vector diagram, and obtaining the similarity vector to obtain the question-answer similarity. The question-answer similarity is a vector.

In the aggregation process, the answer similarity is obtained by splicing the question similarity and the question-answer similarity. And inputting the answer similarity into a multi-layer perceptron (MLP) to obtain the matching probability of the query question and each question-answer pair, and taking the matching probability as the matching degree.

After the steps in fig. 5 are established, training is performed by using a training set in combination with the Bert model and the steps in fig. 5.

Firstly, importing a Bert model training set and a mapping dictionary, splicing matched question sentences and answer sentences, processing the spliced question sentences and answer sentences into a tokens array format, starting with [ CLS ], taking [ SEP ] as two sentence spacers, and finally ending with [ SEP ] symbols.

As an example, [ [ CLS ] hello, does not require an identification card for insuring a child [ SEP ] the identification number of a 6 year old child is the written user's notebook number [ SEP ] ]. Where [ CLS ] is at the beginning and [ SEP ] at the end of the sentence, each token is separated by a space.

Further, aligning the token array, assuming the length is m, and then converting the token array into three groups of vectors including token _ id, segment _ id and input _ mask.

And acquiring token _ id in the token array. The token array in the above example may be represented as: [ 101872196280247112207211128329247444620667168196395140810212722592207211146386716819639513844772322110912787136633151384477214081020000000000000000000000000 ], the array is separated by spaces, each number represents the number of the corresponding character mapping, e.g., "you" map to "872", and to perform text alignment, zero padding is required after the number array.

Segment _ id of the tokens array is obtained. [ CLS ], the characters and corresponding [ SEEP ] and text alignment in the first sentence are occupied by 0, and the characters and corresponding [ SEP ] in the second sentence are occupied by 1.

Continuing with the above example, the segment _ id of the above token array is [ 000000000000000011111111111111111110000000000000000000000000 ].

For input _ mask, all characters in the tokens array are occupied with 1 and the text completion is occupied with 0.

Continuing with the example above, the input _ mask of the tokens array described above is [ 111111111111111111111111111111111110000000000000000000000000 ].

The method for representing the question is obtained by utilizing the encode part of a stack transform model in the process of problem matching, wherein each transform layer comprises a multi-head attention network (multi-head attention network) and a feedforward network. The Transformer model has two outputs:

B^p，B^s＝StackedTransformer(Q^u，Q^a) (formula 1)

Wherein, B^pFor text representation in classification tasks, i.e. [ CLS ]]Corresponding direction toAmount of the compound (A). B is^sFor each transform layer [ CLS]The corresponding vector. Using B^pAs question similarity. Q^uFor user's query question, Q^aTo ask and answer questions in the middle.

The Bert model is composed of multiple layers of transformers, and the basic Bert model has 12 layers, so that more comprehensive information can be acquired. At the last layer [ CLS ] using the Bert model]Output vector v of position^qAs the question similarity. The vector dimension of the question similarity is [ batch size, hi dden size ]]The batch size is the batch size and the hidden size is the number of hidden layer neurons.

Secondly, in the process of matching the answers of the questions, a Bert model is adopted to respectively obtain two interactive vectors for matching the questions and the answers. Wherein, B is adopted^sSequence feature vectors to query question Q^uAnswer A in question-answer pair^aFor example, Q^uAnd A^aSplicing and processing the input data into a corresponding input format, and then sending the input format into a stack Transformer model to obtain Q^uAnd A^aThe matching sequence vector of (2):

where l is the number of layers of the transformer in the Bert model.

Likewise, Q can be obtained^aAnd A^aThe matching sequence vector of (2):

will V^uAnd V^aAnd (3) performing Cartesian combination, and calculating the similarity between two vectors by adopting a cosine function:

wherein x is V^u，y＝V^a，s_cosIs cosine likeAnd (4) degree.

Obtaining a similarity vector graph p by cosine similarity through formula 4^f. P is to be^fPerforming Flatten tiling to finally obtain a similarity vector v^f. V is to be^qAnd v^fSplicing to obtain a one-dimensional vector v⁰. Namely: v. of⁰＝[v^q:v^f]。

At v⁰The vectors are sent into an MLP, and a softmax function is accessed, so that the matching probability of the query question and the answer of the question can be obtained, wherein the softmax formula is as follows:

v_i＝x.w_i ^T+b_i(formula 6)

Wherein v is_iAs output vector of the last hidden layer, S_iThe ratio of the index of the current element to the sum of the indexes of all elements is shown, x is the input vector of the last hidden layer, w_iAs a corresponding weight matrix, b_iIs an offset. softmax translates the output values into relative probabilities for easier understanding and comparison.

Further, the model is trained in an end-to-end manner using cross-entropy loss at the stage of training the Bert model. Question matching and question answer matching are combined. Question matching is a major task, aiming at measuring the similarity between two questions; question-answer matching is an auxiliary task aimed at evaluating the matching relationship between the query question and answer, and between the question and answer in the question-answer pair.

In the process of matching answers to questions, [ CLS ] is respectively taken]The vectors are then sent to MLP to obtain predicted values y^uAnd y^aAnd a corresponding loss function value loss^uAnd loss^aAnd finally obtaining a loss function value of the model:

Loss＝rloss^q+(1-r)/2loss^u+(1-r)/2loss^a(formula 8)

Therein, loss^qFor the loss values obtained after polymerization, r ∈ [0,1 ]]. When r is 1, only the polymerization loss function value is considered.

Further, adjusting the parameters of the Bert model, such as batch-size or epoch, and saving the Bert fine tuning model according to a certain training step length, and simultaneously verifying the accuracy on the test set. And if the test results are not improved for a plurality of times, stopping training, and storing the Bert model parameters and the matching model parameters with the highest accuracy.

Specifically, an Adam optimization algorithm can be adopted, and the specific formula is as follows:

m_t＝β₁m_t-1+(1-β₁)g_t(formula 10)

v_t＝β₂v_t-1+(1-β₂) (formula 11)

Wherein, g_tAt parameter theta for the loss function_t-1Upper gradient, update m per sub-optimal computation_t、v_t、θ_t。m_tAnd v_tIs the exponential moving mean and the square gradient of the gradient, beta₁And beta₂For exponential decay rate, α is the learning step size, ε is a very small number to avoid divisor 0, θ_tIs the updated parameter value. Parameters in the model are updated through the method, and the purpose of parameter training is achieved.

So far, the Bert model is obtained through the training in the above way.

The following is an exemplary description of the trained Bert model.

Referring to fig. 6, fig. 6 is a schematic diagram of a specific process for obtaining the matching degree in the embodiment of the present invention, which specifically includes the following steps:

s601, splicing the first similarity and the second similarity to obtain a third similarity.

The first similarity is the similarity between the query question and the question in the question-answer pair to measure the similarity between the questions. The second similarity is the similarity between the questions in the query question pair, the questions in the question-answer pair and the answers in the question-answer pair to measure the similarity between the answers to the questions.

And the first similarity is a vector, the second similarity is also a vector, and the first similarity and the second similarity are spliced to obtain a third similarity.

And S602, inputting the third similarity into the multilayer perceptron to obtain the matching probability of the query question and each question-answer pair, and taking the matching probability as the matching degree.

A Multilayer Perceptron (MLP) is a feedforward artificial neural network model that maps multiple input data sets onto a single output data set.

In the embodiment of the invention, the third similarity can be input into the multilayer perceptron. The output value of the multi-layer perceptron is the matching probability of the query question and each question-answer pair, and the matching probability is used as the matching degree of the query question and each question-answer pair.

It will be appreciated that including multiple question-answer pairs in the database allows the probability of a match of a query question to each question-answer pair to be known. The number of matching probabilities is the same as the number of question-answer pairs.

In the embodiment of fig. 6, the similarity between the questions and the similarity between the answers to the questions are used to obtain the matching degree between the query question and each question-answer pair.

In one embodiment of the present invention, the question-answer similarity is obtained in the question-answer matching process similar to the training of the Bert model in fig. 5. In the process of matching data by using the trained Bert model, the second similarity is obtained by the following steps:

first, the answers in the question-answer pair are matched with the query question to obtain a first matching vector. Secondly, matching the answers in the question-answer pairs with the questions in the question-answer pairs to obtain second matching vectors; and finally, taking the similarity between the first matching vector and the second matching vector as a second similarity.

In the process of calculating the second similarity, not only the matching between the answers in the question-answer pair and the query questions, but also the matching between the answers in the question-answer pair and the questions in the question-answer pair are considered. It can be known that the higher the similarity between the first matching vector and the second matching vector, the more likely the answer in the question-answer pair is the answer to the query question.

And S303, if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question.

A question-answer pair comprises the existing questions of the database and the corresponding answers of the existing questions of the database. And calculating each question-answer pair and the query question to obtain a matching degree. That is, each question-answer pair corresponds to a degree of matching. And arranging the obtained multiple matching degrees in a descending order to obtain a maximum value.

The maximum value, i.e., the maximum value of the matching degree, is determined among the matching degrees. And taking the maximum value of the matching degree as a basis for judging whether the matching condition is met.

In the embodiment of the present invention, the matching condition may be preset. As one example, the match condition includes being greater than a preset match threshold. That is, in the case where the maximum value of the matching degree is greater than the preset matching threshold, it is determined that the matching condition is satisfied.

And under the condition that the maximum value of the matching degree meets the matching condition, the answer in the question-answer pair corresponding to the maximum value of the matching degree is successfully matched with the query question. Then, the answer in the question-answer pair corresponding to the maximum value is used as the answer of the query question.

Certainly, in the case that the maximum value of the matching degree does not satisfy the matching condition, it indicates that the answer and the query question are unsuccessfully matched, and then a message of the unsuccessfully matched is fed back to the user.

In one embodiment of the present invention, in order to facilitate the user to receive the answer to the query question, the answer to the query question may be output in one or more of the following ways. I.e. text, speech and images.

As an example, after the user inputs the query question through voice, the user may output text corresponding to the answer and voice corresponding to the answer in a browser and/or a mobile terminal application. This alerts the user to receive the answer.

As another example, after the user inputs a query question through voice, the user may output text and images corresponding to the answer in a browser and/or a mobile terminal application. Wherein the image is generated based on the answer. Such as: the answer is a driving route, and the image includes displaying the driving route on a map.

In an embodiment of the present invention, after the answer to the query question is obtained by using the technical solution in the embodiment of the present invention, the query question and the answer to the query question may be used as a newly added question-answer pair, and the newly added question-answer pair is stored in the database.

This is because the number of question-answer pairs in the database at the beginning is small, and the greater the number of question-answer pairs, the more accurate the answer to the query question is. To improve the accuracy of the answers, the query question and the answer to the query question are added to the database. When the answer of the query question is obtained, the accuracy of the next query answer is further improved.

In the embodiment of the present invention, the query question of the user is obtained; matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database; and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question. The answers can be matched not only from the angle of question matching, but also from the angle of questions and answers, so that the accuracy of matching the query questions and the knowledge points can be improved.

In the embodiment of the invention, the answer is used as a bridge for the question in the query question and question-answer pair. This is because, through observation and analysis, it is found that questions of the same semantics have the same answer and that answers to questions of different semantics differ. And measuring the matching degree of the query question and the question-answer pair question by matching the query question with the question-answer pair question and matching the query question, the question-answer pair question and the answer of the question-answer pair. The scheme enriches the semantic information of the query problem and simultaneously makes full use of the corpus information of the database. And then the accuracy of matching the query problem and the knowledge point is improved, and finally the service quality of the intelligent customer service robot is improved.

Referring to fig. 7, fig. 7 is a schematic diagram of a main structure of a data matching apparatus according to an embodiment of the present invention, where the data matching apparatus may implement a data matching method, as shown in fig. 7, the data matching apparatus specifically includes:

a query module 701, configured to obtain a query question of a user;

a matching module 702, configured to match the query question with a question-answer pair, and match answers in the question-answer pair with the query question and the question-answer pair, respectively, to obtain a matching degree between the query question and each question-answer pair, where the question-answer pair includes an existing database question and an answer corresponding to the existing database question;

an output module 703, configured to, if the maximum value of the matching degree satisfies the matching condition, use an answer in the question-answer pair corresponding to the maximum value as an answer to the query question.

In an embodiment of the present invention, the matching module 702 is specifically configured to determine a first similarity between the query question and the question-answer pair;

In an embodiment of the present invention, the matching module 702 is specifically configured to determine a first similarity between the query question and the question in the question-answer pair by using a Bert model;

In an embodiment of the present invention, the matching module 702 is specifically configured to splice the first similarity and the second similarity to obtain a third similarity;

In an embodiment of the present invention, the matching module 702 is specifically configured to match the answer in the question-answer pair with the query question to obtain a first matching vector;

In an embodiment of the present invention, the query module 701 is specifically configured to obtain a query question of a user from a browser and/or a mobile terminal application.

In an embodiment of the present invention, the output module 703 outputs the answer to the query question by one or more of the following ways, including text, voice and image.

Fig. 8 shows an exemplary system architecture 800 to which the method of matching data or the apparatus of matching data of an embodiment of the present invention may be applied.

As shown in fig. 8, the system architecture 800 may include

terminal devices

801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the

terminal devices

801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like. The

terminal devices

801, 802, 803 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

801, 802, 803 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 805 may be a server that provides various services, such as a back-office management server (for example only) that supports shopping-like websites browsed by users using the

terminal devices

801, 802, 803. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the method for matching data provided by the embodiment of the present invention is generally performed by the server 805, and accordingly, the device for matching data is generally disposed in the server 805.

It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a query module, a matching module, and an output module. Where the names of these modules do not in some cases constitute a limitation on the module itself, for example, a query module may also be described as "for obtaining a query question for a user".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:

acquiring a query question of a user;

According to the technical scheme of the embodiment of the invention, the query problem of the user is obtained; matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database; and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question. The answers can be matched not only from the angle of question matching, but also from the angle of questions and answers, so that the accuracy of matching the query questions and the knowledge points can be improved.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of matching data, comprising:

acquiring a query question of a user;

2. The method of matching data according to claim 1, wherein the matching the query question with a question-answer pair question and the matching the question-answer pair answer with the query question and the question-answer pair question respectively to obtain the matching degree between the query question and each question-answer pair comprises:

3. The method of matching data of claim 2, wherein said determining a first similarity of said query question and said question-answer pair question comprises:

4. The method of matching data according to claim 2, wherein the obtaining the matching degree between the query question and each pair of answers to the question according to the first similarity and the second similarity comprises:

5. The method of claim 3, wherein the determining a second similarity according to the similarity between the answer in the question-answer pair and the query question and the question in the question-answer pair by using the Bert model comprises:

6. The method of matching data as recited in claim 1, the method further comprising:

7. The method of matching data as recited in claim 4, the method further comprising:

8. An apparatus for matching data, comprising:

the query module is used for acquiring query questions of a user;

9. An electronic device that matches data, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.