CN112836035A - Method, device, equipment and computer readable medium for matching data - Google Patents

Method, device, equipment and computer readable medium for matching data Download PDF

Info

Publication number
CN112836035A
CN112836035A CN202110241934.4A CN202110241934A CN112836035A CN 112836035 A CN112836035 A CN 112836035A CN 202110241934 A CN202110241934 A CN 202110241934A CN 112836035 A CN112836035 A CN 112836035A
Authority
CN
China
Prior art keywords
question
answer
matching
query
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110241934.4A
Other languages
Chinese (zh)
Inventor
黄凯
刘设伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd, Taikang Online Property Insurance Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN202110241934.4A priority Critical patent/CN112836035A/en
Publication of CN112836035A publication Critical patent/CN112836035A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device, equipment and a computer readable medium for matching data, and relates to the technical field of computers. One embodiment of the method comprises: acquiring a query question of a user; matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database; and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question. The method and the system can improve the accuracy of matching the customer problem with the knowledge point.

Description

Method, device, equipment and computer readable medium for matching data
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a computer-readable medium for matching data.
Background
The intelligent customer service robot is a robot which provides intelligent question and answer service for customers in different channels and products. In the intelligent question and answer process, a client puts forward a question, the intelligent customer service robot returns a corresponding answer after searching and matching in the background knowledge base, the intelligent question and answer process is completed, and the intelligent question and answer system has extremely high convenience, timeliness and accuracy.
In the intelligent question-answering process, the accuracy of the returned answers is an important factor for measuring the quality of the intelligent customer service robot, and the accuracy of the answers is often closely related to the accuracy of matching of customer questions and knowledge points.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the accuracy of matching the customer questions with the knowledge points is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a computer readable medium for matching data, which can improve accuracy of matching between a client problem and a knowledge point.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of matching data, including:
acquiring a query question of a user;
matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database;
and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question.
The matching of the query question with the question in the question-answer pair and the matching of the answer in the question-answer pair with the query question and the question in the question-answer pair respectively to obtain the matching degree of the query question and each question-answer pair includes:
determining a first similarity of the query question and a question in the question-answer pair;
determining a second similarity according to the similarity between the answers in the question-answer pairs and the questions in the query question and question-answer pairs;
and obtaining the matching degree of the query question and each question-answer pair according to the first similarity and the second similarity.
The determining a first similarity of the query question and the question-answer pair comprises:
determining a first similarity between the query question and a question in the question-answer pair by adopting a Bert model;
determining a second similarity according to the matching degree of the answers in the question-answer pairs, the query questions and the questions in the question-answer pairs, wherein the determining comprises the following steps:
and determining a second similarity according to the similarity between the answers in the question-answer pairs and the questions in the query question and question-answer pairs by adopting the Bert model.
The obtaining the matching degree between the query question and each pair of question answers according to the first similarity and the second similarity comprises:
splicing the first similarity and the second similarity to obtain a third similarity;
and inputting the third similarity into a multilayer perceptron to obtain the matching probability of the query question and each pair of answer questions, and taking the matching probability as the matching degree.
Determining a second similarity according to the similarity between the answer in the question-answer pair and the query question and the similarity between the answer in the question-answer pair and the question in the question-answer pair by adopting the Bert model, wherein the determining comprises the following steps:
matching the answers in the question-answer pairs with the query questions to obtain a first matching vector;
matching the answers in the question-answer pairs with the questions in the question-answer pairs to obtain second matching vectors;
and taking the similarity between the first matching vector and the second matching vector as the second similarity.
The method further comprises the following steps:
and taking the query question and the answer of the query question as a newly added question-answer pair, and storing the newly added question-answer pair in the database.
The obtaining of the query question of the user comprises:
and acquiring the query question of the user from the browser and/or the mobile terminal application.
The method further comprises the following steps:
and training to obtain the Bert model by adopting a training set and a testing set, wherein each sample of the training set and the testing set comprises a query question, a question-answer pair and a label of the user, and the label is used for representing a matching result of the query question and the question-answer pair of the user.
After the answer in the answer to the question corresponding to the maximum value is taken as the answer to the query question, the method further includes:
and outputting the answer of the query question by one or more of the following modes, wherein the modes comprise texts, voice and images.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for matching data, including:
the query module is used for acquiring query questions of a user;
the matching module is used for matching the query question with questions in question-answer pairs and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, and the question-answer pairs comprise existing questions of a database and answers corresponding to the existing questions of the database;
and the output module is used for taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question if the maximum value of the matching degree meets the matching condition.
According to a third aspect of embodiments of the present invention, there is provided an electronic device for matching data, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method as described above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method as described above.
One embodiment of the above invention has the following advantages or benefits: acquiring a query question of a user; matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database; and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question. The answers can be matched not only from the angle of question matching, but also from the angle of questions and answers, so that the accuracy of matching the query questions and the knowledge points can be improved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a Q-Q semantic matching model;
FIG. 2 is A schematic diagram of A Q-Q-A single-sided matching model;
FIG. 3 is a schematic diagram of a main flow of a method of matching data according to an embodiment of the invention;
FIG. 4 is a schematic flow chart of obtaining the matching degree between the query question and each question-answer pair according to the embodiment of the present invention;
FIG. 5 is a schematic diagram of obtaining a degree of match of a query question with each question-answer pair in an embodiment in accordance with the invention;
FIG. 6 is a schematic diagram of a detailed process for obtaining the matching degree according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of the main structure of an apparatus for matching data according to an embodiment of the present invention;
FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 9 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Intelligent customer service robots have become important and convenient tools for acquiring knowledge. In order to enhance the efficiency and accuracy of robot service, the similar problem determination is also gradually becoming the core task of intelligent customer service robots. The core task requires to construct a semantic matching model, then to find a database question highly similar to the query question, and finally to return an answer corresponding to the database question to the client.
Referring to FIG. 1, FIG. 1 is a schematic diagram of a Q-Q semantic matching model. A Q-Q (Question-Question) semantic matching model is commonly used as the semantic matching model. The input of the Q-Q semantic matching model is a query problem and a database problem to be matched, but the Q-Q semantic matching model has some defects.
As an example: the inability to obtain client context information results in query problems with unclear semantics. There may also be different query questions in a real environment, but expressing the same meaning scenario, such as "which should these insurance me buy? "and" can you help me recommend a product? ".
Currently, it has been considered to determine matched knowledge points using semantic information of the question answer enrichment database question.
Referring to FIG. 2, FIG. 2 is A diagram of A Q-Q-A single-sided matching model. Traditional methods typically employ A Q- A one-sided matching model, i.e., using answers as an extended representation of the corresponding database questions. However, due to the verbosity and diversity of the answers, some noise may be introduced in the similarity calculation process, so that the accuracy of matching the query question with the knowledge points is low.
In order to solve the technical problem that the accuracy of matching between the query problem and the knowledge point is low, the following technical scheme in the embodiment of the invention can be adopted.
Referring to fig. 3, fig. 3 is a schematic diagram of a main flow of a method for matching data according to an embodiment of the present invention, which not only matches a query question with a question in a question-answer pair, but also matches the query question, the question in the question-answer pair, and an answer in the question-answer pair. As shown in fig. 3, the method specifically includes the following steps:
s301, acquiring the query question of the user.
The technical scheme in the embodiment of the invention can be applied to the server. As one example, a user enters a query question through a browser in a computer. As another example, a user enters a query question through a mobile terminal Application (APP). That is, the server may obtain the user's query questions from the browser and/or the mobile terminal application.
It should be noted that the query question input by the user may be voice and/or text. In the case where the query question input by the user includes speech, the speech may be converted into text using a speech recognition model. In the case where the query question entered by the user includes text, the text may be initially screened to delete the unused words and/or wrongly written words. Thereby improving the matching accuracy.
S302, matching the query question with the question in the question-answer pair, and matching the answer in the question-answer pair with the query question and the question in the question-answer pair respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pair comprises the existing questions of the database and the corresponding answer of the existing questions of the database.
Because of the sparse information between query questions and the database problem, many times a query question may not necessarily match the correct answer before it is discovered by the database manager and added to the set of questions in the knowledge site.
In this embodiment, matching answers based on query questions may be divided into two parts. The first part is: matching between questions; the second part is the matching between questions, questions and answers. Finally, the similarity of the query question and each question-answer pair is measured by the matching degree.
Referring to fig. 4, fig. 4 is a schematic flowchart of obtaining a matching degree between a query question and each question-answer pair according to an embodiment of the present invention, which specifically includes the following steps:
s401, determining first similarity of the questions in the query question and question-answer pairs.
In the embodiment of the invention, the database comprises a plurality of questions and answers corresponding to the questions. An answer to a question corresponding to the question is called a question-answer pair. That is, each question-answer pair includes an existing question of the database and an answer corresponding to the question.
First, a first similarity is determined according to a query question and a question-answer pair question of a user. And further measuring the similarity between the query question of the user and the question in the question-answer pair by using the first similarity. The similarity between the user's query question and the question in the question-answer pair is proportional to the value of the first similarity. It can be understood that the larger the numerical value of the first similarity is, the higher the similarity between the query question of the user and the question in the question-answer pair is; the smaller the numerical value of the first similarity is, the lower the similarity between the query question of the user and the question in the question-answer pair is.
S402, determining a second similarity according to the similarity between the answers in the question-answer pairs and the questions in the query question and question-answer pairs.
In addition to calculating the first similarity, in the embodiment of the present invention, it is also necessary to obtain the similarity between the answer in the question-answer pair and the query question, and the similarity between the answer in the question-answer pair and the question in the question-answer pair. Similarities are known from a dual perspective between questions, and between questions and answers.
It should be noted that the question in the question-answer pair and the answer in the question-answer pair related to determining the second similarity have a corresponding relationship. The second similarity is to determine the similarity with the query question from the viewpoint of the whole answer of the question.
And S403, obtaining the matching degree of the query question and each question-answer pair according to the first similarity and the second similarity.
After the first similarity and the second similarity are determined, the matching degree between the query question and each pair of answers to the question can be obtained according to the first similarity and the second similarity. As an example, the first similarity and the second similarity are concatenated to obtain a third similarity.
In the embodiment of fig. 4, the matching degree between the query question and each question-answer pair is obtained through the first similarity between the questions and the second similarity between the questions and the answers to the questions, so as to lay a foundation for determining the answers to the query question.
In one embodiment of the present invention, the calculation of the similarity may be implemented using a Bert model. That is, the first similarity of the questions in the query question and question-answer pair is determined using the Bert model. And determining a second similarity according to the similarity between the answers in the question-answer pairs and the query questions and the similarity between the answers in the question-answer pairs and the questions in the question-answer pairs by adopting a Bert model.
It should be noted that the Bert model is a trained model. The concrete implementation process of training to get the Bert model is exemplarily described below.
In the embodiment of the invention, a training set and a testing set are required to be adopted to obtain the Bert model through training, each sample of the training set and the testing set comprises a query question, a question-answer pair and a label of a user, and the label is used for representing a matching result of the query question and the question-answer pair of the user.
Specifically, the training samples are derived from the knowledge base system and external corpora. First, training samples need to be cleaned and labeled to improve data quality. Each training sample is composed of questions, knowledge points (questions and answers), and labels. If the label is 0, the problem is not matched with the knowledge point; the label is 1, indicating that the problem matches the knowledge point.
As an example, the training samples are scaled according to 4: 1 ratio, divided into training set and testing set. The training set is used for training the Bert model, and the testing set is used for testing the matching effect of the Bert model obtained through training.
Referring to fig. 5, fig. 5 is a schematic diagram of obtaining a matching degree of the query question and each question-answer pair according to the embodiment of the present invention.
In fig. 5, three steps are involved, namely question matching, question answer matching and aggregation.
And taking the two questions as input in the question matching process, and determining the similarity of the questions. The two questions are respectively a query question and a question-answer pair question. Where the question similarity is a vector.
In the process of matching the answers to the questions, a similar siemese neural network structure can be adopted, and the structure comprises three independent structures:
(1) matching interaction layers: and matching the query question with the answer in the question-answer pair, and matching the question in the question-answer pair with the answer in the question-answer pair to respectively carry out interactive training.
(2) Similarity calculation layer: and (4) making a similarity value of the matching vector obtained by interactive training, so as to obtain a similarity vector diagram.
(3) A pooling layer: and compressing the similarity vector diagram, and obtaining the similarity vector to obtain the question-answer similarity. The question-answer similarity is a vector.
In the aggregation process, the answer similarity is obtained by splicing the question similarity and the question-answer similarity. And inputting the answer similarity into a multi-layer perceptron (MLP) to obtain the matching probability of the query question and each question-answer pair, and taking the matching probability as the matching degree.
After the steps in fig. 5 are established, training is performed by using a training set in combination with the Bert model and the steps in fig. 5.
Firstly, importing a Bert model training set and a mapping dictionary, splicing matched question sentences and answer sentences, processing the spliced question sentences and answer sentences into a tokens array format, starting with [ CLS ], taking [ SEP ] as two sentence spacers, and finally ending with [ SEP ] symbols.
As an example, [ [ CLS ] hello, does not require an identification card for insuring a child [ SEP ] the identification number of a 6 year old child is the written user's notebook number [ SEP ] ]. Where [ CLS ] is at the beginning and [ SEP ] at the end of the sentence, each token is separated by a space.
Further, aligning the token array, assuming the length is m, and then converting the token array into three groups of vectors including token _ id, segment _ id and input _ mask.
And acquiring token _ id in the token array. The token array in the above example may be represented as: [ 101872196280247112207211128329247444620667168196395140810212722592207211146386716819639513844772322110912787136633151384477214081020000000000000000000000000 ], the array is separated by spaces, each number represents the number of the corresponding character mapping, e.g., "you" map to "872", and to perform text alignment, zero padding is required after the number array.
Segment _ id of the tokens array is obtained. [ CLS ], the characters and corresponding [ SEEP ] and text alignment in the first sentence are occupied by 0, and the characters and corresponding [ SEP ] in the second sentence are occupied by 1.
Continuing with the above example, the segment _ id of the above token array is [ 000000000000000011111111111111111110000000000000000000000000 ].
For input _ mask, all characters in the tokens array are occupied with 1 and the text completion is occupied with 0.
Continuing with the example above, the input _ mask of the tokens array described above is [ 111111111111111111111111111111111110000000000000000000000000 ].
The method for representing the question is obtained by utilizing the encode part of a stack transform model in the process of problem matching, wherein each transform layer comprises a multi-head attention network (multi-head attention network) and a feedforward network. The Transformer model has two outputs:
Bp,Bs=StackedTransformer(Qu,Qa) (formula 1)
Wherein, BpFor text representation in classification tasks, i.e. [ CLS ]]Corresponding direction toAmount of the compound (A). B issFor each transform layer [ CLS]The corresponding vector. Using BpAs question similarity. QuFor user's query question, QaTo ask and answer questions in the middle.
The Bert model is composed of multiple layers of transformers, and the basic Bert model has 12 layers, so that more comprehensive information can be acquired. At the last layer [ CLS ] using the Bert model]Output vector v of positionqAs the question similarity. The vector dimension of the question similarity is [ batch size, hi dden size ]]The batch size is the batch size and the hidden size is the number of hidden layer neurons.
Secondly, in the process of matching the answers of the questions, a Bert model is adopted to respectively obtain two interactive vectors for matching the questions and the answers. Wherein, B is adoptedsSequence feature vectors to query question QuAnswer A in question-answer pairaFor example, QuAnd AaSplicing and processing the input data into a corresponding input format, and then sending the input format into a stack Transformer model to obtain QuAnd AaThe matching sequence vector of (2):
Figure BDA0002962524540000101
where l is the number of layers of the transformer in the Bert model.
Likewise, Q can be obtainedaAnd AaThe matching sequence vector of (2):
Figure BDA0002962524540000102
will VuAnd VaAnd (3) performing Cartesian combination, and calculating the similarity between two vectors by adopting a cosine function:
Figure BDA0002962524540000103
wherein x is Vu,y=Va,scosIs cosine likeAnd (4) degree.
Obtaining a similarity vector graph p by cosine similarity through formula 4f. P is to befPerforming Flatten tiling to finally obtain a similarity vector vf. V is to beqAnd vfSplicing to obtain a one-dimensional vector v0. Namely: v. of0=[vq:vf]。
At v0The vectors are sent into an MLP, and a softmax function is accessed, so that the matching probability of the query question and the answer of the question can be obtained, wherein the softmax formula is as follows:
Figure BDA0002962524540000111
vi=x.wi T+bi(formula 6)
Figure BDA0002962524540000112
Wherein v isiAs output vector of the last hidden layer, SiThe ratio of the index of the current element to the sum of the indexes of all elements is shown, x is the input vector of the last hidden layer, wiAs a corresponding weight matrix, biIs an offset. softmax translates the output values into relative probabilities for easier understanding and comparison.
Further, the model is trained in an end-to-end manner using cross-entropy loss at the stage of training the Bert model. Question matching and question answer matching are combined. Question matching is a major task, aiming at measuring the similarity between two questions; question-answer matching is an auxiliary task aimed at evaluating the matching relationship between the query question and answer, and between the question and answer in the question-answer pair.
In the process of matching answers to questions, [ CLS ] is respectively taken]The vectors are then sent to MLP to obtain predicted values yuAnd yaAnd a corresponding loss function value lossuAnd lossaAnd finally obtaining a loss function value of the model:
Loss=rlossq+(1-r)/2lossu+(1-r)/2lossa(formula 8)
Therein, lossqFor the loss values obtained after polymerization, r ∈ [0,1 ]]. When r is 1, only the polymerization loss function value is considered.
Further, adjusting the parameters of the Bert model, such as batch-size or epoch, and saving the Bert fine tuning model according to a certain training step length, and simultaneously verifying the accuracy on the test set. And if the test results are not improved for a plurality of times, stopping training, and storing the Bert model parameters and the matching model parameters with the highest accuracy.
Specifically, an Adam optimization algorithm can be adopted, and the specific formula is as follows:
Figure BDA0002962524540000113
mt=β1mt-1+(1-β1)gt(formula 10)
vt=β2vt-1+(1-β2) (formula 11)
Figure BDA0002962524540000114
Wherein, gtAt parameter theta for the loss functiont-1Upper gradient, update m per sub-optimal computationt、vt、θt。mtAnd vtIs the exponential moving mean and the square gradient of the gradient, beta1And beta2For exponential decay rate, α is the learning step size, ε is a very small number to avoid divisor 0, θtIs the updated parameter value. Parameters in the model are updated through the method, and the purpose of parameter training is achieved.
So far, the Bert model is obtained through the training in the above way.
The following is an exemplary description of the trained Bert model.
Referring to fig. 6, fig. 6 is a schematic diagram of a specific process for obtaining the matching degree in the embodiment of the present invention, which specifically includes the following steps:
s601, splicing the first similarity and the second similarity to obtain a third similarity.
The first similarity is the similarity between the query question and the question in the question-answer pair to measure the similarity between the questions. The second similarity is the similarity between the questions in the query question pair, the questions in the question-answer pair and the answers in the question-answer pair to measure the similarity between the answers to the questions.
And the first similarity is a vector, the second similarity is also a vector, and the first similarity and the second similarity are spliced to obtain a third similarity.
And S602, inputting the third similarity into the multilayer perceptron to obtain the matching probability of the query question and each question-answer pair, and taking the matching probability as the matching degree.
A Multilayer Perceptron (MLP) is a feedforward artificial neural network model that maps multiple input data sets onto a single output data set.
In the embodiment of the invention, the third similarity can be input into the multilayer perceptron. The output value of the multi-layer perceptron is the matching probability of the query question and each question-answer pair, and the matching probability is used as the matching degree of the query question and each question-answer pair.
It will be appreciated that including multiple question-answer pairs in the database allows the probability of a match of a query question to each question-answer pair to be known. The number of matching probabilities is the same as the number of question-answer pairs.
In the embodiment of fig. 6, the similarity between the questions and the similarity between the answers to the questions are used to obtain the matching degree between the query question and each question-answer pair.
In one embodiment of the present invention, the question-answer similarity is obtained in the question-answer matching process similar to the training of the Bert model in fig. 5. In the process of matching data by using the trained Bert model, the second similarity is obtained by the following steps:
first, the answers in the question-answer pair are matched with the query question to obtain a first matching vector. Secondly, matching the answers in the question-answer pairs with the questions in the question-answer pairs to obtain second matching vectors; and finally, taking the similarity between the first matching vector and the second matching vector as a second similarity.
In the process of calculating the second similarity, not only the matching between the answers in the question-answer pair and the query questions, but also the matching between the answers in the question-answer pair and the questions in the question-answer pair are considered. It can be known that the higher the similarity between the first matching vector and the second matching vector, the more likely the answer in the question-answer pair is the answer to the query question.
And S303, if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question.
A question-answer pair comprises the existing questions of the database and the corresponding answers of the existing questions of the database. And calculating each question-answer pair and the query question to obtain a matching degree. That is, each question-answer pair corresponds to a degree of matching. And arranging the obtained multiple matching degrees in a descending order to obtain a maximum value.
The maximum value, i.e., the maximum value of the matching degree, is determined among the matching degrees. And taking the maximum value of the matching degree as a basis for judging whether the matching condition is met.
In the embodiment of the present invention, the matching condition may be preset. As one example, the match condition includes being greater than a preset match threshold. That is, in the case where the maximum value of the matching degree is greater than the preset matching threshold, it is determined that the matching condition is satisfied.
And under the condition that the maximum value of the matching degree meets the matching condition, the answer in the question-answer pair corresponding to the maximum value of the matching degree is successfully matched with the query question. Then, the answer in the question-answer pair corresponding to the maximum value is used as the answer of the query question.
Certainly, in the case that the maximum value of the matching degree does not satisfy the matching condition, it indicates that the answer and the query question are unsuccessfully matched, and then a message of the unsuccessfully matched is fed back to the user.
In one embodiment of the present invention, in order to facilitate the user to receive the answer to the query question, the answer to the query question may be output in one or more of the following ways. I.e. text, speech and images.
As an example, after the user inputs the query question through voice, the user may output text corresponding to the answer and voice corresponding to the answer in a browser and/or a mobile terminal application. This alerts the user to receive the answer.
As another example, after the user inputs a query question through voice, the user may output text and images corresponding to the answer in a browser and/or a mobile terminal application. Wherein the image is generated based on the answer. Such as: the answer is a driving route, and the image includes displaying the driving route on a map.
In an embodiment of the present invention, after the answer to the query question is obtained by using the technical solution in the embodiment of the present invention, the query question and the answer to the query question may be used as a newly added question-answer pair, and the newly added question-answer pair is stored in the database.
This is because the number of question-answer pairs in the database at the beginning is small, and the greater the number of question-answer pairs, the more accurate the answer to the query question is. To improve the accuracy of the answers, the query question and the answer to the query question are added to the database. When the answer of the query question is obtained, the accuracy of the next query answer is further improved.
In the embodiment of the present invention, the query question of the user is obtained; matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database; and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question. The answers can be matched not only from the angle of question matching, but also from the angle of questions and answers, so that the accuracy of matching the query questions and the knowledge points can be improved.
In the embodiment of the invention, the answer is used as a bridge for the question in the query question and question-answer pair. This is because, through observation and analysis, it is found that questions of the same semantics have the same answer and that answers to questions of different semantics differ. And measuring the matching degree of the query question and the question-answer pair question by matching the query question with the question-answer pair question and matching the query question, the question-answer pair question and the answer of the question-answer pair. The scheme enriches the semantic information of the query problem and simultaneously makes full use of the corpus information of the database. And then the accuracy of matching the query problem and the knowledge point is improved, and finally the service quality of the intelligent customer service robot is improved.
Referring to fig. 7, fig. 7 is a schematic diagram of a main structure of a data matching apparatus according to an embodiment of the present invention, where the data matching apparatus may implement a data matching method, as shown in fig. 7, the data matching apparatus specifically includes:
a query module 701, configured to obtain a query question of a user;
a matching module 702, configured to match the query question with a question-answer pair, and match answers in the question-answer pair with the query question and the question-answer pair, respectively, to obtain a matching degree between the query question and each question-answer pair, where the question-answer pair includes an existing database question and an answer corresponding to the existing database question;
an output module 703, configured to, if the maximum value of the matching degree satisfies the matching condition, use an answer in the question-answer pair corresponding to the maximum value as an answer to the query question.
In an embodiment of the present invention, the matching module 702 is specifically configured to determine a first similarity between the query question and the question-answer pair;
determining a second similarity according to the similarity between the answers in the question-answer pairs and the questions in the query question and question-answer pairs;
and obtaining the matching degree of the query question and each question-answer pair according to the first similarity and the second similarity.
In an embodiment of the present invention, the matching module 702 is specifically configured to determine a first similarity between the query question and the question in the question-answer pair by using a Bert model;
and determining a second similarity according to the similarity between the answers in the question-answer pairs and the questions in the query question and question-answer pairs by adopting the Bert model.
In an embodiment of the present invention, the matching module 702 is specifically configured to splice the first similarity and the second similarity to obtain a third similarity;
and inputting the third similarity into a multilayer perceptron to obtain the matching probability of the query question and each pair of answer questions, and taking the matching probability as the matching degree.
In an embodiment of the present invention, the matching module 702 is specifically configured to match the answer in the question-answer pair with the query question to obtain a first matching vector;
matching the answers in the question-answer pairs with the questions in the question-answer pairs to obtain second matching vectors;
and taking the similarity between the first matching vector and the second matching vector as the second similarity.
In an embodiment of the present invention, the query module 701 is specifically configured to obtain a query question of a user from a browser and/or a mobile terminal application.
In an embodiment of the present invention, the output module 703 outputs the answer to the query question by one or more of the following ways, including text, voice and image.
Fig. 8 shows an exemplary system architecture 800 to which the method of matching data or the apparatus of matching data of an embodiment of the present invention may be applied.
As shown in fig. 8, the system architecture 800 may include terminal devices 801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the terminal devices 801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like. The terminal devices 801, 802, 803 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 801, 802, 803 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 805 may be a server that provides various services, such as a back-office management server (for example only) that supports shopping-like websites browsed by users using the terminal devices 801, 802, 803. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the method for matching data provided by the embodiment of the present invention is generally performed by the server 805, and accordingly, the device for matching data is generally disposed in the server 805.
It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a query module, a matching module, and an output module. Where the names of these modules do not in some cases constitute a limitation on the module itself, for example, a query module may also be described as "for obtaining a query question for a user".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
acquiring a query question of a user;
matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database;
and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question.
According to the technical scheme of the embodiment of the invention, the query problem of the user is obtained; matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database; and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question. The answers can be matched not only from the angle of question matching, but also from the angle of questions and answers, so that the accuracy of matching the query questions and the knowledge points can be improved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of matching data, comprising:
acquiring a query question of a user;
matching the query question with questions in question-answer pairs, and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, wherein the question-answer pairs comprise existing questions in a database and answers corresponding to the existing questions in the database;
and if the maximum value of the matching degree meets the matching condition, taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question.
2. The method of matching data according to claim 1, wherein the matching the query question with a question-answer pair question and the matching the question-answer pair answer with the query question and the question-answer pair question respectively to obtain the matching degree between the query question and each question-answer pair comprises:
determining a first similarity of the query question and a question in the question-answer pair;
determining a second similarity according to the similarity between the answers in the question-answer pairs and the questions in the query question and question-answer pairs;
and obtaining the matching degree of the query question and each question-answer pair according to the first similarity and the second similarity.
3. The method of matching data of claim 2, wherein said determining a first similarity of said query question and said question-answer pair question comprises:
determining a first similarity between the query question and a question in the question-answer pair by adopting a Bert model;
determining a second similarity according to the matching degree of the answers in the question-answer pairs, the query questions and the questions in the question-answer pairs, wherein the determining comprises the following steps:
and determining a second similarity according to the similarity between the answers in the question-answer pairs and the questions in the query question and question-answer pairs by adopting the Bert model.
4. The method of matching data according to claim 2, wherein the obtaining the matching degree between the query question and each pair of answers to the question according to the first similarity and the second similarity comprises:
splicing the first similarity and the second similarity to obtain a third similarity;
and inputting the third similarity into a multilayer perceptron to obtain the matching probability of the query question and each pair of answer questions, and taking the matching probability as the matching degree.
5. The method of claim 3, wherein the determining a second similarity according to the similarity between the answer in the question-answer pair and the query question and the question in the question-answer pair by using the Bert model comprises:
matching the answers in the question-answer pairs with the query questions to obtain a first matching vector;
matching the answers in the question-answer pairs with the questions in the question-answer pairs to obtain second matching vectors;
and taking the similarity between the first matching vector and the second matching vector as the second similarity.
6. The method of matching data as recited in claim 1, the method further comprising:
and taking the query question and the answer of the query question as a newly added question-answer pair, and storing the newly added question-answer pair in the database.
7. The method of matching data as recited in claim 4, the method further comprising:
and training to obtain the Bert model by adopting a training set and a testing set, wherein each sample of the training set and the testing set comprises a query question, a question-answer pair and a label of the user, and the label is used for representing a matching result of the query question and the question-answer pair of the user.
8. An apparatus for matching data, comprising:
the query module is used for acquiring query questions of a user;
the matching module is used for matching the query question with questions in question-answer pairs and matching answers in the question-answer pairs with the query question and the questions in the question-answer pairs respectively to obtain the matching degree of the query question and each question-answer pair, and the question-answer pairs comprise existing questions of a database and answers corresponding to the existing questions of the database;
and the output module is used for taking the answer in the question-answer pair corresponding to the maximum value as the answer of the query question if the maximum value of the matching degree meets the matching condition.
9. An electronic device that matches data, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202110241934.4A 2021-03-04 2021-03-04 Method, device, equipment and computer readable medium for matching data Pending CN112836035A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110241934.4A CN112836035A (en) 2021-03-04 2021-03-04 Method, device, equipment and computer readable medium for matching data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110241934.4A CN112836035A (en) 2021-03-04 2021-03-04 Method, device, equipment and computer readable medium for matching data

Publications (1)

Publication Number Publication Date
CN112836035A true CN112836035A (en) 2021-05-25

Family

ID=75934677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110241934.4A Pending CN112836035A (en) 2021-03-04 2021-03-04 Method, device, equipment and computer readable medium for matching data

Country Status (1)

Country Link
CN (1) CN112836035A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420139A (en) * 2021-08-24 2021-09-21 北京明略软件系统有限公司 Text matching method and device, electronic equipment and storage medium
CN113515932A (en) * 2021-07-28 2021-10-19 北京百度网讯科技有限公司 Method, device, equipment and storage medium for processing question and answer information
CN114372122A (en) * 2021-12-08 2022-04-19 阿里云计算有限公司 Information acquisition method, computing device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287296A (en) * 2019-05-21 2019-09-27 平安科技(深圳)有限公司 A kind of problem answers choosing method, device, computer equipment and storage medium
CN110837586A (en) * 2018-08-15 2020-02-25 阿里巴巴集团控股有限公司 Question-answer matching method, system, server and storage medium
CN111753062A (en) * 2019-11-06 2020-10-09 北京京东尚科信息技术有限公司 Method, device, equipment and medium for determining session response scheme

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837586A (en) * 2018-08-15 2020-02-25 阿里巴巴集团控股有限公司 Question-answer matching method, system, server and storage medium
CN110287296A (en) * 2019-05-21 2019-09-27 平安科技(深圳)有限公司 A kind of problem answers choosing method, device, computer equipment and storage medium
WO2020232877A1 (en) * 2019-05-21 2020-11-26 平安科技(深圳)有限公司 Question answer selection method and apparatus, computer device, and storage medium
CN111753062A (en) * 2019-11-06 2020-10-09 北京京东尚科信息技术有限公司 Method, device, equipment and medium for determining session response scheme

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113515932A (en) * 2021-07-28 2021-10-19 北京百度网讯科技有限公司 Method, device, equipment and storage medium for processing question and answer information
CN113515932B (en) * 2021-07-28 2023-11-10 北京百度网讯科技有限公司 Method, device, equipment and storage medium for processing question and answer information
CN113420139A (en) * 2021-08-24 2021-09-21 北京明略软件系统有限公司 Text matching method and device, electronic equipment and storage medium
CN114372122A (en) * 2021-12-08 2022-04-19 阿里云计算有限公司 Information acquisition method, computing device and storage medium

Similar Documents

Publication Publication Date Title
CN110647614B (en) Intelligent question-answering method, device, medium and electronic equipment
US10650102B2 (en) Method and apparatus for generating parallel text in same language
CN112836035A (en) Method, device, equipment and computer readable medium for matching data
CN111522958A (en) Text classification method and device
CN110298019A (en) Name entity recognition method, device, equipment and computer readable storage medium
CN108388674A (en) Method and apparatus for pushed information
US11551437B2 (en) Collaborative information extraction
CN111767375A (en) Semantic recall method and device, computer equipment and storage medium
CN108628830A (en) A kind of method and apparatus of semantics recognition
CN114385780B (en) Program interface information recommendation method and device, electronic equipment and readable medium
US20230008897A1 (en) Information search method and device, electronic device, and storage medium
CN113268560A (en) Method and device for text matching
CN111368551A (en) Method and device for determining event subject
WO2023155678A1 (en) Method and apparatus for determining information
CN115114419A (en) Question and answer processing method and device, electronic equipment and computer readable medium
CN115098700A (en) Knowledge graph embedding and representing method and device
CN111008213A (en) Method and apparatus for generating language conversion model
JP2022003544A (en) Method for increasing field text, related device, and computer program product
CN111126073B (en) Semantic retrieval method and device
CN111783424B (en) Text sentence dividing method and device
CN110807097A (en) Method and device for analyzing data
CN114416990B (en) Method and device for constructing object relation network and electronic equipment
WO2023087667A1 (en) Sorting model training method and apparatus for intelligent recommendation, and intelligent recommendation method and apparatus
CN116306974A (en) Model training method and device of question-answering system, electronic equipment and storage medium
CN115809313A (en) Text similarity determination method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination