WO2020133360A1

WO2020133360A1 - Question text matching method and apparatus, computer device and storage medium

Info

Publication number: WO2020133360A1
Application number: PCT/CN2018/125360
Authority: WO
Inventors: 熊友军; 熊为星; 廖洪涛
Original assignee: 深圳市优必选科技有限公司
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-07-02

Abstract

A question text matching method and apparatus, a computer device and a storage medium, the method comprising: obtaining a question text to be matched (S102); combining the question text to be matched with each preset question text in a question text library respectively to obtain a plurality of input question texts (S104); inputting the plurality of input question texts into a question matching model to obtain similarity labels between the question text to be matched and each preset question text (S106); and obtaining a target question text having the highest similarity to the question text to be matched according to the similarity labels (S108). By means of the described method, the question matching accuracy can be improved to a certain extent.

Description

Question sentence matching method, device, computer equipment and storage medium

Technical field

The invention relates to the technical field of customer service robots, in particular to a question sentence matching method, device, computer equipment and storage medium.

Background technique

The customer service robot is mainly responsible for the after-sales service of the product. It has functions such as group message sending, manual transfer, call recording, interruption support, and recording to text. Since the customer service robot can help customers answer questions on their own, it greatly helps the customer service staff to share the workload. Usually, the customer service robot matches the customer's question with each question in the question library, then finds the question closest to the customer, and finally pushes the answer to the question to the customer.

In the question-and-answer matching of customer service robots, a supervised learning model is usually selected. Such a learning model needs to label the entities and non-entities in the customer's question to calculate the similarity between the question and the question, and the maximum The answers to matching questions of similarity are pushed to customers. However, this method requires professional personnel to label entities and non-entities, which not only consumes manpower and is inefficient, but also may result in incorrect labeling results due to the level of the labeling personnel, resulting in low accuracy of the final matching question.

Summary of the invention

Based on this, it is necessary to propose a method, device, computer equipment, and storage medium for question text matching with high accuracy for the above problems.

A method for matching question text, the method includes:

Get the question text to be matched;

Combining the question text to be matched and each preset question text in the question text library to obtain multiple input question texts;

Inputting a plurality of the input question texts into a question matching model to obtain a similarity label of the question text to be matched and each of the preset question texts;

According to the similarity label, the target question text with the highest similarity to the question text to be matched is obtained.

A matching device for question text is provided, including:

The acquisition module is used to obtain the question text to be matched;

A combination module, configured to combine the question text to be matched and each preset question text in the question text library to obtain multiple input question texts;

A label module, configured to input a plurality of the input question texts into a question matching model to obtain a similarity label of the question text to be matched and each of the preset question texts;

The matching module is configured to obtain the target question text with the highest similarity to the question text to be matched according to the similarity label.

A computer device includes a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, the processor is caused to perform the following steps:

Get the question text to be matched;

Input a plurality of the input question texts into a question matching model to obtain a similarity label between the question text to be matched and each of the preset question texts;

A computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor is caused to perform the following steps:

Get the question text to be matched;

The implementation of the embodiments of the present invention will have the following beneficial effects:

The present invention proposes a question sentence text matching method, device, computer equipment and storage medium. Through the method described in the embodiments of the present invention, it is no longer necessary to manually tag the entity keywords, and saves a lot of time for tagging. It is no longer necessary to find professional labeling personnel to label the entities and non-entities in the text of the question, which also reduces a certain cost. Finally, because only the question is combined, the similarity between the question and the question is obtained. Degree label, so that the target question text can be obtained according to the similarity label, without the need to distinguish between entity and non-entity in advance, and the accuracy of question matching is also improved, because the entity labeling workload is large, and repetitive labeling work is likely It leads to errors, and the trained model cannot accurately predict the entity. When judging the similarity between each question, the similarity of the overall meaning of the two sentences is judged. The probability of error is smaller, so it is used. Sentence pairs (that is, two sentences) train the model, and the final prediction accuracy will be higher.

BRIEF DESCRIPTION

In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings required in the embodiments or the description of the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, without paying any creative labor, other drawings can be obtained based on these drawings.

among them:

1 is a schematic diagram of an implementation process of a method for matching question text in an embodiment;

2 is a schematic diagram of an implementation process of step 101 in an embodiment;

3 is a schematic diagram of an implementation process of a method for matching question text in an embodiment;

4 is a schematic diagram of an implementation process of a method for matching question text in an embodiment;

5 is a structural block diagram of an apparatus for matching question text in an embodiment;

6 is a structural block diagram of a computer device in an embodiment.

detailed description

The technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.

As shown in FIG. 1, in one embodiment, a question text matching method is provided. The execution body of the question text matching method described in the embodiment of the present invention may be a server, of course, described in the embodiment of the present invention The execution body of the matching method of the question text may also be other terminal devices, for example, a robot device. The matching method of the question text specifically includes the following steps:

Step S102: Obtain the question text to be matched.

The question text to be matched is the question text used for matching. After obtaining the original question text to be matched, the stop words in the original question text to be matched need to be removed.

Step S104, combining the question text to be matched and each preset question text in the question text library to obtain multiple input question texts.

Among them, the question text library includes a plurality of preset question texts; the preset question texts are preset question texts. For example, the question text to be matched is: how big is Goku, and there are two preset question texts in the question text library: how high is Goku and how much is Goku, and the question text to be matched and the preset question text are carried out Combine, get two input question texts: [how big is Goku, how high is Goku] and [how big is Goku, how much is Goku for one].

Step S106: Input a plurality of the input question texts into a question matching model to obtain a similarity label between the question text to be matched and each of the preset question texts.

The similarity label is used to reflect the similarity between the question text to be matched and the preset question text. The similarity label may be set to a number. As in the above example, suppose the number 1 indicates that the question text to be matched is very similar to the preset question text, and the number 0 indicates that the question text to be matched is not similar to the preset question text, so after the prediction of the question matching model, wait The similarity label of the matching question text "How big is Goku" and the preset question text "How high is Goku" will be 1, and the matching question text "How big is Goku" and the preset question text "How much is Goku?" "Will have a similarity label of 0.

Step 108: Acquire the target question text with the highest similarity to the question text to be matched according to the similarity label.

As in the above example, since the number 1 indicates that the question text to be matched and the preset question text are very similar, and the number 0 indicates that the question text to be matched and the preset question text are not similar, then, according to the similarity label, determine the question to be matched The target text with the highest similarity of the sentence text "How big is Wukong" is: How high is Wukong.

As an optional embodiment of the present invention, after acquiring the target question text with the highest similarity to the question text to be matched in step 108, the method further includes: acquiring the target answer text corresponding to the target question text .

Among them, the target answer text is the answer to the target question text. The question text library is provided with preset question texts. Correspondingly, the preset answer texts of the preset question texts can also be set in the question text library, or a question answer library can be set separately to preset questions The sentence text and the preset answer text are set with the same identifier, so that as long as the preset text question is known, the answer to the preset text question can be known. Here, since the target answer text corresponding to the target question sentence text is obtained, in this way, the answer to the question asked by the user can be directly presented to the user.

In the embodiment of the present invention, before obtaining the question text to be matched in step 102, the method further includes:

Step 101: Train the question matching model.

Specifically, as shown in FIG. 2, training the question matching model in step 101 includes: step 101A, obtaining a preset question training text set including a plurality of preset question training texts. Step 101B: Acquire multiple preset question training texts of different similar levels corresponding to each of the preset question training texts.

Here, a certain preset question training text is used as the main question, and the similarity level of the other preset question training texts is determined according to the similarity between the other preset question training text and the main question. For example, "How to operate a building robot" and "What is the convenient operation of a building robot?" These two questions are similar, and the similar level can be set higher, while "How to operate a building robot" and "How much does the robot cost", this The two questions are not very similar, the similarity level can be set lower.

Step 101C: Combine the preset question training text with a plurality of preset question training texts of different similar levels corresponding to the preset question training text to obtain multiple input training texts.

Construct a triple consisting of the main question, other preset question training texts, and similarity labels corresponding to similarity levels. For example, use "how to operate a building block robot" as the main question, and other preset question training texts: "what is the convenient operation of the building block robot", "how to operate the building block robot", "operation flow of the building block robot", "scan not To the Bluetooth of the building block robot", "What is the use of the building block robot", "How to edit the official model action", "How to buy the accessories", "How much does the robot cost", so multiple triples can be constructed: [How to operate the building block robot , What is the convenient operation of the building block robot, 4], [How to operate the building block robot, how to operate the building block robot, 4], [How to operate the building block robot, the operating process of the building block robot, 4], [How to operate the building block robot, can not be scanned Bluetooth of the building block robot, 3], [how to operate the building block robot, what is the use of the building block robot, 2], [how to operate the building block robot, how to edit the official model actions, 1], [how to operate the building block robot, how to purchase the accessories, 0] , [How to operate a building block robot, how much is the robot, 0]. When the model is trained, the [main question, other preset question training texts] in the triple is taken as input, and the similarity label is used as the desired output. Of course, the specific settings for several similar levels can be determined according to actual needs, and no specific limitation is made here.

Step 101D, using a plurality of the input training texts as input to the question matching model, and using the similarity labels of the preset question training texts and corresponding multiple preset query training texts of different similarity levels as desired The output is to train the question matching model to obtain a trained question matching model.

Since the machine can not recognize the sentence, it is necessary to segment the question text to get the word, and then convert it into a word vector as the input of the model, where the word vector is to express the word in a vector way. For example, the text of the question is "how does the building block robot operate", and the word segmentation is obtained: building blocks, robots, how to operate, and then get the word vectors of these words, and finally organize the input into the form of word vectors and then input model training, First, the obtained word vector matrix is cross-multiplied, and then the first K values after the cross-multiplication are selected (Equation 1). Further, a simple mapping process is performed on the word vector that matches the text of the question (Equation 2). Then, according to the mapping result, a weight value is assigned to the output result (Equation 3) after the activation function to obtain the final matching degree (Equation 4), and then the matching degree is weighted to obtain the final label output value (Equation 5) After inputting the output value of the label into the softmax layer (Equation 6), it is compared with the similarity label to form a question to match the loss function of the model (Equation 7). Finally, the gradient is updated according to the value of the loss function to complete the model training. ,details as follows. It should be noted that, in order to accelerate the speed of model training, the Adam algorithm can also be used to complete the gradient update.

Suppose q ₁ =(x ₁ ,x ₂ ,x ₃ ,...,x _m ) is the word vector of the question text to be matched, q ₂ =(y ₁ ,y ₂ ,y ₃ ,...,y _n ) Is the word vector of the preset question training text, so there are:

Where, m refers to the length of the word segmentation to be matched with the question text, n refers to the length after the word segmentation of the preset question training text, x _i is the word vector corresponding to the i-th word after the word segmentation to be matched, and y _i is The word vector corresponding to the i-th word after word segmentation of the preset question sentence,

For the cross product of vectors, the f function selects the first K values after the cross product, w _p refers to the weight parameter of the map, b _p refers to the offset parameter of the map, H=[h ₁ , h ₂ ,...h _m ] , Where h _i is the mapped value corresponding to the i-th word of the question text to be matched, relu is the relu activation function, W ^(l) is the weight matrix of layer l, and b ^(l) is layer l The bias matrix of L, L is the total number of layers of the neural network, O = [o ₁ , o ₂ , ... o _C ], C is the number of similar levels (that is, how many similar levels are divided, each similar level corresponds to a similarity Label), o _i is the output value of the i-th level label, e is a constant, e≈2.71828, M is the total number of training samples, and t _gj is the true similarity label of the j-th similarity level of the training g sample.

As shown in FIG. 3, a method for matching question text is provided, which specifically includes:

Step 302: Obtain the product category label.

The product category label is used to indicate different products and is composed of numbers and/or characters and/or letters. For example, for robots, there may be "Goku Robot", "Alpha Robot", "jimu Robot", correspondingly, the product category label of "Goku Robot" may be set to: wukong, the product category label of "Alpha Robot" may be Set to: alpha, the product category label of "jimu robot" can be set to: jimu.

Step 304: Obtain the question text to be matched.

Step 306: Determine a target question text sub-library according to the product category label, and obtain a plurality of preset question texts in the target question text sub-library.

In the embodiment of the present invention, the question text library is divided into multiple question text sub-libraries according to the product category label, and each question text sub-stock stores the relevant question of the corresponding robot product. For example, the question text sub-stock of "Goku Robot" contains questions about "Goku Robot", and the question text sub-stock of "Alpha Robot" contains questions about "Alpha Robot".

In step 308, the question text to be matched and the multiple preset question texts in the target question text sub-library are combined to obtain multiple input question texts.

Step 310: Input a plurality of the input question texts into a question matching model to obtain a similarity label between the question text to be matched and each of the preset question texts.

Step 312: Acquire the target question text with the highest similarity to the question text to be matched according to the similarity label.

In order to further ensure the accuracy of the matching answer, as shown in FIG. 4, the matching method of the question text further includes:

Step 312: Acquire preset answer text corresponding to each of the preset question texts in the target question text sub-library.

In the question text sub-library, preset question texts are set. Correspondingly, the preset answer text of the preset question texts can also be set in the question text sub-library, or a question answer sub-library can be set separately. The question text sub-library is associated with the question answer sub-library, so that as long as the preset text question is known, the answer to the preset text question can be known according to the association relationship.

Step 314: Combine the question text to be matched and the preset answer text of each preset question text in the target question text sub-library to obtain multiple input question and answer texts.

Here, only the question text to be matched and the preset answer text of each preset question text in the target question text sub-library are combined, and the preset question texts in the question text library are no longer needed The preset answer text combination greatly saves the program overhead. For example, the question text to be matched is "how does the building block robot operate", and the preset answer text of each preset question text in the target question text sub-library is "the building block robot operates as follows", "the building block robot operates as follows" , "Scan the Bluetooth of the building block robot through the following methods", "The building block robot can be used to sweep the floor", "The official model action is edited as follows", "Accessories can be purchased in the mall", "2000 blocks", so the question will be asked The sentence text and the preset answer text of each preset question text in the target question text sub-library are combined to obtain multiple input question and answer texts: [How to operate the building block robot, the operation mode of the building block robot is as follows], [How to operate the building block robot, The operation process of the building block robot is as follows], [How to operate the building block robot, scan to the building block Bluetooth via the following method], [How to operate the building block robot, the building block robot can be used to sweep the floor], [How to operate the building block robot, the official model action editing method As follows], [how to operate the building block robot, 2000 pieces].

Step 316: Enter a plurality of the input question and answer texts into a question and answer matching model to obtain a match between the question text to be matched and the preset answer text of each preset question text in the target question text sub-library value.

The matching value is used to indicate the degree of matching between the question text to be matched and the preset answer text. The closer the answer matches the question, the higher the matching value. Here, the question answering matching model needs to be trained in advance. In the training, the preset question training text is used as a question, and each preset answer training text is used as an answer, and a binary group including questions and answers is constructed. The group is the input of the question and answer matching model, and at the same time, a restriction condition is set as the output of the question. When the condition is met, the model training is completed. Among them, the restriction condition is set according to the value of the similarity label of the main question sentence and the preset question sentence training text. Specifically, the matching value of the maximum binary group of the similarity label must be greater than the matching value of other binary groups. For example, the existing main question, other preset question training text, similarity label and preset answer training text: [how to operate the building block robot, what is the convenient operation of the building block robot, 4, the convenient operation method of the building block robot is as follows] ,[How to operate the building block robot, the Bluetooth of the building block robot cannot be scanned, 3. Scan the Bluetooth of the robot in this way], [How to operate the building block robot, what is the use of the building block robot, 2, The building block robot is used to sweep the floor], [Building block How to operate the robot, how to edit the official model actions, 1, how to edit the official model actions in this way], [how to operate the building block robot, how to buy accessories, 0, accessories can be purchased in the mall], so that according to the similarity label, you can get Restrictions: [How to operate the building block robot, the convenient way to operate the building robot is as follows] Matching value> [How to operate the building block robot, scan the robot Bluetooth in this way] Matching value> [How to operate the building block robot, the building block robot is used to Matching value of [Sweeping floor]>[How to operate the building block robot, the official model action is edited in this way] Matching value>[How to operate the building block robot, accessories can be purchased in the mall] In the embodiment of the present invention, the training of the question answering matching model is as follows. By gradient updating the L function, the training of the question answering matching model can be completed. In order to speed up the model training speed, the Adam algorithm can be used to complete the gradient update.

q ₁ =(x ₁ ,x ₂ ,x ₃ ,...,x _m ) is the word vector of the preset question training text, q ₂ =(y ₁ ,y ₂ ,y ₃ ,...,y _n ) The word vector of the training text for a preset answer, then there are:

m refers to the length after the word segmentation of the preset question training text, n refers to the length after the word segmentation of the preset answer training text, x _i is the word vector corresponding to the ith word after the word segmentation of the preset question training text, and y _i is the pre-word Set the word vector corresponding to the i-th word after word segmentation in the answer training text,

For the cross product of vectors, the f function selects the first K values after the cross product, relu is the relu activation function, W ^(l) is the weight matrix of layer l, and b ^(l) is the offset matrix of layer l , L is the total number of layers of the neural network, W _p is the weight matrix of the preset question training text, b _p is the weight matrix of the preset question training text, and h is the output value of the preset question training text after mapping , Margin is set to 1, s(q ₁ , q ₂ ) and s(q ₁ , q ₃ ) are the predicted matching values between the preset question training text and a preset answer training text output, and Θ is given in advance Parameters.

Correspondingly, obtaining the target question text with the highest similarity to the question text to be matched according to the similarity label in step 312 includes:

Step 318: Acquire target preset answer text that matches the question text to be matched according to the similarity label and the matching value.

In this embodiment of the present invention, the preset question text and the preset answer text corresponding to the preset question text have the same text identifier. For example, first obtain the one with the largest similarity label and the largest matching value, and then see if their text identifiers are the same. If they are the same, the corresponding preset answer text with the largest matching value is used as the target preset answer text. If they are not the same, Then, the preset answer text corresponding to the preset question text with the largest similarity label is used as the target preset answer text, or the preset answer text with the largest matching value is used as the target preset answer text. As an embodiment of the present invention, obtaining target preset answer text matching the question text to be matched according to the similarity label and the matching value in step 318 includes: step 318A, according to the A similarity label of the question text to be matched and each of the preset question texts in the target question text sub-library, and selecting the question text to be matched from the plurality of preset question texts The preferred preset question text with the highest number of similarities. Among them, the preset question text is preferred, which is the preset question text with the highest similarity predicted by the model among the multiple preset question texts. For example, assuming that there are 10 preset question texts and the preset number is set to 3, the similarity labels are obtained by sorting the similarity labels as 4, 4, 3, 3, 2, 2, 1, 0, 0 , 0, from which the preferred preset question texts of the similarity tags 4, 4, and 3 can be selected. Step 318B, according to the matching value of the question text to be matched and the preset answer text of each preset question text in the target question text sub-library, from multiple preset question texts The preset number of preferred preset answer texts that match the question text to be matched are selected from the preset answer texts of. Step 318B selects the preferred preset question text in the same way as step 318A, and will not be described in detail here. Step 318C: Acquire target preset answer text that matches the question text to be matched according to the text identifier of each preferred preset question text and the text identifier of each preferred preset answer text. Assuming that three preferred preset question texts and three preferred preset answer texts are selected, then the target preset answer text is selected from the three preferred preset answer texts according to the text identifier. As an embodiment of the present invention, in step 318C, according to the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts, the question text to be matched is obtained The matched target preset answer text includes: Step 318C1, according to the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts, to obtain a match with the question to be matched At least one preferred preset question text. Here, at least one preferred preset question text that matches the question to be matched is mainly obtained by taking an intersection of the text identifiers. For example, the text identifiers of the three preferred preset question texts are jimu10, jimu11, and jimu15, and the text identifiers of the three preferred preset answer texts are jimu10, jimu11, and jimu17, so the text identifiers corresponding to the text identifiers jimu10 and jimu1 are The question text is determined to be the preferred preset question text. Step 318C2: Divide the question text to be matched to obtain a word segmentation result that includes multiple words. For example, the text of the question to be matched is "Does Wukong have a golden one", and the result of the word segmentation to be matched is: [Wu, Kong, Yes, Jin, Se, ye]. Step 318C3: Segment at least one preferred preset question text that matches the question to be matched to obtain a plurality of preferred word segmentation results including multiple words. For example, we finally get two preferred preset question texts: "What is the size of Wukong" and "What colors does Wukong have?" The corresponding two preferred word segmentation results are: [Wu, Kong, De, Chi, Chi, Yes, Yes, More, less] and [Enlightenment, empty, have, which, some, face, color]. In step 318C4, according to the word segmentation result to be matched and the preferred word segmentation result, a text matching value of the question text to be matched and each preferred preset question text matching the question to be matched is calculated. Here, first count the total number of non-repeating words of the word segmentation result to be matched and the preferred word segmentation result, then confirm the same number of the same words of the word segmentation result to be matched and the preferred word segmentation result, and finally use the same number/total number The text matching value of the question text to be matched and each preferred preset question text matching the question to be matched can be obtained. Continuing the example above, the total number of non-repeated words in "Does Wukong have a golden color" and "What is the size of Wukong" is 12, the same number of the same words is 3, and the text matching value is 3/12, "Wukong has The total number of non-repetitive words of "Golden" and "Which colors does Goku have?" is 11, the same number of the same word is 4, and the text matching value is 4/11. Of course, in order to improve the validity and accuracy of the calculation, you can choose to remove some irrelevant words in the question to be matched and the preset text question, and then perform the calculation. After removing some meaningless words, "Does Wukong have gold" gets "golden", "what is the size of Wukong" gets "how big is the size", "what colors does Wukong have" gets "which colors", and statistics "Woku The total number of "golden" and "what is the size of Wukong": 7, the same number is 0, the text matching value is 0, and the total number of "Wukong has gold" and "Which colors does Goku have?" Number: 5, the same number is 1, and the text matching value is 0.2. Step 318C5: Obtain the target preset answer text that matches the question text to be matched according to the question text to be matched and the text matching value of each preferred preset question text that matches the question to be matched . Continuing the above example, since the text matching value of "Wukong's size" and "Wukong's size" is greater than the text matching value of "Wukong's size" and "Wukong's colors", the target question text is "Wukong's size" , According to the target question text text identifier, obtain the target preset answer text that matches the question text to be matched: 1 meter.

As shown in FIG. 5, an apparatus 500 for matching question text is provided, which specifically includes:

The obtaining module 502 is used to obtain the question text to be matched; the combination module 504 is used to combine the question text to be matched and each preset question text in the question text library to obtain multiple input question sentences Text; a label module 506, used to input a plurality of the input question text into a question matching model to obtain a similarity label of the question text to be matched and each of the preset question texts; the matching module 508, It is used to obtain the target question text with the highest similarity to the question text to be matched according to the similarity label.

In one of the embodiments, the apparatus 500 further includes: a product label acquisition module for acquiring a product category label; correspondingly, the combination module 504 includes: a first combination module for according to the product category Tags to determine the target question text sub-library and obtain a plurality of preset question texts in the target question text sub-library; a second combination module is used to separate the question text to be matched and the target question Multiple preset question texts in the sentence text sub-library are combined to obtain multiple input question texts.

In one of the embodiments, the device 500 further includes: an answer text acquisition module, configured to acquire a preset answer text corresponding to each of the preset question texts in the target question text sub-library; A module for respectively combining the question text to be matched and the preset answer text of each preset question text in the target question text sub-library to obtain multiple input question and answer texts; matching value acquisition A module for inputting a plurality of the input question-answer texts into a question-answer matching model to obtain the preset answer texts of the question text to be matched and each of the preset question texts in the target question text sub-library Matching value; correspondingly, the matching module 508 includes: a target answer matching module, configured to obtain target preset answer text that matches the question text to be matched according to the similarity label and the matching value.

In one of the embodiments, the preset question text and the preset answer text corresponding to the preset question text have the same text identifier; the target answer matching module includes: a preferred question sentence module for According to the similarity label of the question text to be matched and each of the preset question texts in the target question text sub-library, select from a plurality of the preset question texts to match the to-be-matched text A preset number of preferred preset question texts with the highest similarity of question texts; a preferred answer module for each preset question in the sub-library of the question text to be matched and the target question text The matching value of the preset answer text of the sentence text is selected from the preset answer texts in the plurality of preset question texts to select the preferred preset of the preset number that matches the question text to be matched Answer text; target preset answer text module, used to obtain a match with the question text to be matched according to the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts The goal preset answer text.

In one of the embodiments, the target preset answer text module includes: a first answer text module for each text according to each preferred preset question text identifier and each preferred preset answer text Text identification, to obtain at least one preferred preset question text that matches the question to be matched; a second answer text module for word segmentation of the question text to be matched, to obtain a query containing multiple words Matching word segmentation results; a third answer text module for segmenting at least one preferred preset question text that matches the question to be matched to obtain multiple preferred word segmentation results containing multiple words; a fourth answer text module For calculating the text matching value of the question text to be matched and each preferred preset question text matching the question to be matched according to the word segmentation result to be matched and the preferred word segmentation result; fifth The answer text module is used to obtain the target pre-matched text of the question text to be matched according to the text matching value of the question text to be matched and each preferred preset question text that matches the question to be matched Set the answer text.

In one of the embodiments, the device 500 further includes: a training module for training the question-matching model; the training module includes: a first training module for acquiring multiple presets The preset question training text set of the question training text; the second training module is used to obtain a plurality of preset question training texts of different similar levels corresponding to each of the preset question training text; the third training module For combining the preset question training text with a plurality of preset question training texts of different similar levels corresponding to the preset question training text to obtain multiple input training texts; a fourth training module , Used to input a plurality of the input training texts as the input of the question matching model, and using the similarity labels of the preset question training texts and the corresponding multiple preset query training texts of different similarity levels as the desired The output is to train the question matching model to obtain a trained question matching model.

FIG. 6 shows an internal structure diagram of a computer device in an embodiment. The computer device may be a server or a robot. As shown in FIG. 6, the computer device includes a processor, a memory, and a network interface connected by a system bus. Among them, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program. When the computer program is executed by the processor, the processor may enable the processor to implement a question text matching method. A computer program may also be stored in the internal memory. When the computer program is executed by the processor, the processor may be caused to execute a method for matching question text. Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. The specific computer equipment may It includes more or fewer components than shown in the figure, or some components are combined, or have a different component arrangement.

In one embodiment, the question text matching method provided in this application may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 6. The program templates of the matching device 500 constituting the question text can be stored in the memory of the computer device. For example, the acquisition module 502, the combination module 504, the tag module 506, and the matching module 508.

A computer device includes a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, the processor is caused to perform the following steps: obtain question text to be matched; Combining the question text to be matched and each preset question text in the question text library to obtain a plurality of input question texts; inputting the plurality of input question texts into a question matching model to obtain the to-be-matched The question text and the similarity label of each of the preset question texts; according to the similarity label, the target question text with the highest similarity to the question text to be matched is obtained.

In one embodiment, a computer-readable storage medium is proposed, which stores a computer program, and when the computer program is executed by a processor, the processor is caused to perform the following steps: obtain question text to be matched; The question text to be matched and each preset question text in the question text library are combined to obtain multiple input question texts; multiple input question texts are input into the question matching model to obtain the to-be-matched The question text and the similarity label of each of the preset question texts; according to the similarity label, the target question text with the highest similarity to the question text to be matched is obtained.

It should be noted that the above question text matching method, question text matching device, computer equipment, and computer readable storage medium belong to a general inventive concept. The question text matching method, question text matching device, and computer The content in the embodiments of the device and the computer-readable storage medium may be mutually applicable.

A person of ordinary skill in the art may understand that all or part of the processes in the method of the foregoing embodiments may be completed by instructing relevant hardware through a computer program, and the program may be stored in a non-volatile computer-readable storage medium In this case, when the program is executed, it may include the flow of the above-mentioned method embodiments. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be arbitrarily combined. In order to simplify the description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the scope described in this specification.

The above-mentioned embodiment only expresses several implementation manners of the present application, and its description is more specific and detailed, but it cannot be understood as a limitation of the patent scope of the present application. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, a number of modifications and improvements can also be made, which all fall within the protection scope of the present application. Therefore, the protection scope of the patent of this application shall be subject to the appended claims.

Claims

A method for matching question text, including:

Get the question text to be matched;

Combining the question text to be matched and each preset question text in the question text library to obtain multiple input question texts;

Input a plurality of the input question texts into a question matching model to obtain a similarity label between the question text to be matched and each of the preset question texts;

According to the similarity label, the target question text with the highest similarity to the question text to be matched is obtained.
The method of claim 1, wherein the question text library includes a plurality of question text sub-libraries; before the acquiring the question text to be matched, the method further comprises:

Obtain the product category label;

The combination of the question text to be matched and each preset question text in the question text library to obtain multiple input question texts includes:

Determine a target question text sub-library according to the product category label, and obtain a plurality of preset question texts in the target question text sub-library;

Combining the question text to be matched and the plurality of preset question texts in the target question text sub-library, respectively, to obtain multiple input question texts.
The method according to claim 2, wherein the method further comprises:

Acquiring preset answer text corresponding to each of the preset question texts in the target question text sub-library;

Combining the question text to be matched and the preset answer text of each of the preset question texts in the target question text sub-library to obtain multiple input question and answer texts;

Inputting a plurality of the input question and answer texts into a question and answer matching model to obtain a matching value of the preset answer texts of the question texts to be matched and each of the preset question texts in the target question text sub-library;

The obtaining the target question text with the highest similarity to the question text to be matched according to the similarity label includes:

According to the similarity label and the matching value, target preset answer text that matches the question text to be matched is obtained.
The method according to claim 3, wherein the preset question text and the preset answer text corresponding to the preset question text have the same text identifier;

The obtaining the target preset answer text matching the question text to be matched according to the similarity label and the matching value includes:

According to the similarity label of the question text to be matched and each of the preset question texts in the target question text sub-library, select from a plurality of the preset question texts to match the to-be-matched text The preset preset question texts with the highest number of question text similarities are preferred;

According to the matching value of the question text to be matched and the preset answer text of each of the preset question texts in the target question text sub-library, preset from a plurality of the preset question texts Selecting the preset preset number of preferred answer texts that match the question text to be matched from the answer texts;

According to the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts, a target preset answer text that matches the question text to be matched is obtained.
The method according to claim 4, characterized in that, based on the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts, the Target preset answer text for sentence text matching, including:

Acquiring at least one preferred preset question text that matches the question to be matched according to the text identifier of each preferred preset question text and the text identifier of each preferred preset answer text;

Word segment the question text to be matched to obtain a word segmentation result to be matched that contains multiple words;

Segmenting at least one preferred preset question text that matches the question to be matched to obtain multiple preferred word segmentation results that include multiple words;

Calculating the text matching value of the question text to be matched and each preferred preset question text matching the question to be matched according to the word segmentation result to be matched and the preferred word segmentation result;

According to the text matching value of the question text to be matched and each preferred preset question text matching the question to be matched, a target preset answer text matching the text of the question to be matched is obtained.
The method according to any one of claims 1 to 5, characterized in that, before acquiring the question text to be matched, the method further comprises: training the question matching model, and the training includes the following steps:

Obtain a preset question training text set that includes multiple preset question training texts;

Acquiring a plurality of preset question training texts of different similar levels corresponding to each of the preset question training texts;

Combining the preset question training text with a plurality of preset question training texts of different similar levels corresponding to the preset question training text to obtain multiple input training texts;

Use a plurality of the input training texts as input to the question matching model, and use the similarity labels of the preset question training texts and corresponding multiple preset question training texts of different similarity levels as the desired output. The question matching model is trained to obtain a trained question matching model.
A question sentence matching device, characterized in that it includes:

The acquisition module is used to obtain the question text to be matched;

A combination module, configured to combine the question text to be matched and each preset question text in the question text library to obtain multiple input question texts;

A label module, configured to input a plurality of the input question texts into a question matching model to obtain a similarity label of the question text to be matched and each of the preset question texts;

The matching module is configured to obtain the target question text with the highest similarity to the question text to be matched according to the similarity label.
The apparatus of claim 7, further comprising:

Product label acquisition module for acquiring product category labels;

Correspondingly, the combination module includes:

A first combination module, configured to determine a target question text sub-library according to the product category label, and obtain multiple preset question texts in the target question text sub-library;

A second combination module is used to combine the question text to be matched and the multiple preset question texts in the target question text sub-library to obtain multiple input question texts.
A computer device, characterized in that it includes a memory, a processor, and a computer program stored in the memory and runable on the processor, and characterized in that, when the processor executes the computer program, The steps of the method for matching question text according to any one of claims 1 to 6.
A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, matching of question text according to any one of claims 1 to 6 is realized Method steps.