WO2020133360A1 - Question text matching method and apparatus, computer device and storage medium - Google Patents

Question text matching method and apparatus, computer device and storage medium Download PDF

Info

Publication number
WO2020133360A1
WO2020133360A1 PCT/CN2018/125360 CN2018125360W WO2020133360A1 WO 2020133360 A1 WO2020133360 A1 WO 2020133360A1 CN 2018125360 W CN2018125360 W CN 2018125360W WO 2020133360 A1 WO2020133360 A1 WO 2020133360A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
text
preset
texts
matched
Prior art date
Application number
PCT/CN2018/125360
Other languages
French (fr)
Chinese (zh)
Inventor
熊友军
熊为星
廖洪涛
Original Assignee
深圳市优必选科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技有限公司 filed Critical 深圳市优必选科技有限公司
Priority to PCT/CN2018/125360 priority Critical patent/WO2020133360A1/en
Publication of WO2020133360A1 publication Critical patent/WO2020133360A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data

Definitions

  • the invention relates to the technical field of customer service robots, in particular to a question sentence matching method, device, computer equipment and storage medium.
  • the customer service robot is mainly responsible for the after-sales service of the product. It has functions such as group message sending, manual transfer, call recording, interruption support, and recording to text. Since the customer service robot can help customers answer questions on their own, it greatly helps the customer service staff to share the workload. Usually, the customer service robot matches the customer's question with each question in the question library, then finds the question closest to the customer, and finally pushes the answer to the question to the customer.
  • a supervised learning model In the question-and-answer matching of customer service robots, a supervised learning model is usually selected. Such a learning model needs to label the entities and non-entities in the customer's question to calculate the similarity between the question and the question, and the maximum The answers to matching questions of similarity are pushed to customers.
  • this method requires professional personnel to label entities and non-entities, which not only consumes manpower and is inefficient, but also may result in incorrect labeling results due to the level of the labeling personnel, resulting in low accuracy of the final matching question.
  • a method for matching question text includes:
  • the target question text with the highest similarity to the question text to be matched is obtained.
  • a matching device for question text including:
  • the acquisition module is used to obtain the question text to be matched
  • a combination module configured to combine the question text to be matched and each preset question text in the question text library to obtain multiple input question texts;
  • a label module configured to input a plurality of the input question texts into a question matching model to obtain a similarity label of the question text to be matched and each of the preset question texts;
  • the matching module is configured to obtain the target question text with the highest similarity to the question text to be matched according to the similarity label.
  • a computer device includes a memory and a processor.
  • the memory stores a computer program.
  • the processor is caused to perform the following steps:
  • Input a plurality of the input question texts into a question matching model to obtain a similarity label between the question text to be matched and each of the preset question texts;
  • the target question text with the highest similarity to the question text to be matched is obtained.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor is caused to perform the following steps:
  • the target question text with the highest similarity to the question text to be matched is obtained.
  • the present invention proposes a question sentence text matching method, device, computer equipment and storage medium.
  • it is no longer necessary to manually tag the entity keywords, and saves a lot of time for tagging. It is no longer necessary to find professional labeling personnel to label the entities and non-entities in the text of the question, which also reduces a certain cost.
  • the similarity between the question and the question is obtained.
  • Degree label so that the target question text can be obtained according to the similarity label, without the need to distinguish between entity and non-entity in advance, and the accuracy of question matching is also improved, because the entity labeling workload is large, and repetitive labeling work is likely It leads to errors, and the trained model cannot accurately predict the entity.
  • the similarity of the overall meaning of the two sentences is judged. The probability of error is smaller, so it is used. Sentence pairs (that is, two sentences) train the model, and the final prediction accuracy will be higher.
  • FIG. 1 is a schematic diagram of an implementation process of a method for matching question text in an embodiment
  • step 101 is a schematic diagram of an implementation process of step 101 in an embodiment
  • FIG. 3 is a schematic diagram of an implementation process of a method for matching question text in an embodiment
  • FIG. 4 is a schematic diagram of an implementation process of a method for matching question text in an embodiment
  • FIG. 5 is a structural block diagram of an apparatus for matching question text in an embodiment
  • FIG. 6 is a structural block diagram of a computer device in an embodiment.
  • a question text matching method is provided.
  • the execution body of the question text matching method described in the embodiment of the present invention may be a server, of course, described in the embodiment of the present invention
  • the execution body of the matching method of the question text may also be other terminal devices, for example, a robot device.
  • the matching method of the question text specifically includes the following steps:
  • Step S102 Obtain the question text to be matched.
  • the question text to be matched is the question text used for matching. After obtaining the original question text to be matched, the stop words in the original question text to be matched need to be removed.
  • Step S104 combining the question text to be matched and each preset question text in the question text library to obtain multiple input question texts.
  • the question text library includes a plurality of preset question texts; the preset question texts are preset question texts.
  • the question text to be matched is: how big is Goku, and there are two preset question texts in the question text library: how high is Goku and how much is Goku, and the question text to be matched and the preset question text are carried out Combine, get two input question texts: [how big is Goku, how high is Goku] and [how big is Goku, how much is Goku for one].
  • Step S106 Input a plurality of the input question texts into a question matching model to obtain a similarity label between the question text to be matched and each of the preset question texts.
  • the similarity label is used to reflect the similarity between the question text to be matched and the preset question text.
  • the similarity label may be set to a number. As in the above example, suppose the number 1 indicates that the question text to be matched is very similar to the preset question text, and the number 0 indicates that the question text to be matched is not similar to the preset question text, so after the prediction of the question matching model, wait The similarity label of the matching question text "How big is Goku” and the preset question text "How high is Goku” will be 1, and the matching question text "How big is Goku” and the preset question text "How much is Goku?" "Will have a similarity label of 0.
  • Step 108 Acquire the target question text with the highest similarity to the question text to be matched according to the similarity label.
  • the method further includes: acquiring the target answer text corresponding to the target question text .
  • the target answer text is the answer to the target question text.
  • the question text library is provided with preset question texts.
  • the preset answer texts of the preset question texts can also be set in the question text library, or a question answer library can be set separately to preset questions.
  • the sentence text and the preset answer text are set with the same identifier, so that as long as the preset text question is known, the answer to the preset text question can be known.
  • the target answer text corresponding to the target question sentence text is obtained, in this way, the answer to the question asked by the user can be directly presented to the user.
  • the method before obtaining the question text to be matched in step 102, the method further includes:
  • Step 101 Train the question matching model.
  • training the question matching model in step 101 includes: step 101A, obtaining a preset question training text set including a plurality of preset question training texts.
  • step 101B Acquire multiple preset question training texts of different similar levels corresponding to each of the preset question training texts.
  • a certain preset question training text is used as the main question, and the similarity level of the other preset question training texts is determined according to the similarity between the other preset question training text and the main question. For example, “How to operate a building robot” and “What is the convenient operation of a building robot?" These two questions are similar, and the similar level can be set higher, while “How to operate a building robot” and “How much does the robot cost", this The two questions are not very similar, the similarity level can be set lower.
  • Step 101C Combine the preset question training text with a plurality of preset question training texts of different similar levels corresponding to the preset question training text to obtain multiple input training texts.
  • the [main question, other preset question training texts] in the triple is taken as input, and the similarity label is used as the desired output.
  • the specific settings for several similar levels can be determined according to actual needs, and no specific limitation is made here.
  • Step 101D using a plurality of the input training texts as input to the question matching model, and using the similarity labels of the preset question training texts and corresponding multiple preset query training texts of different similarity levels as desired
  • the output is to train the question matching model to obtain a trained question matching model.
  • the machine Since the machine can not recognize the sentence, it is necessary to segment the question text to get the word, and then convert it into a word vector as the input of the model, where the word vector is to express the word in a vector way.
  • the text of the question is "how does the building block robot operate", and the word segmentation is obtained: building blocks, robots, how to operate, and then get the word vectors of these words, and finally organize the input into the form of word vectors and then input model training, First, the obtained word vector matrix is cross-multiplied, and then the first K values after the cross-multiplication are selected (Equation 1). Further, a simple mapping process is performed on the word vector that matches the text of the question (Equation 2).
  • q 1 (x 1 ,x 2 ,x 3 ,...,x m ) is the word vector of the question text to be matched
  • q 2 (y 1 ,y 2 ,y 3 ,...,y n ) Is the word vector of the preset question training text, so there are:
  • m refers to the length of the word segmentation to be matched with the question text
  • n refers to the length after the word segmentation of the preset question training text
  • x i is the word vector corresponding to the i-th word after the word segmentation to be matched
  • y i is The word vector corresponding to the i-th word after word segmentation of the preset question sentence
  • the f function selects the first K values after the cross product
  • w p refers to the weight parameter of the map
  • b p refers to the offset parameter of the map
  • H [h 1 , h 2 ,...h m ]
  • h i is the mapped value corresponding to the i-th word of the question text to be matched
  • relu is the relu activation function
  • W (l) is the weight matrix of layer l
  • b (l) is layer l
  • the bias matrix of L L is the total number of layers of the neural network
  • O [o 1 ⁇ 1
  • C is the number of similar levels (that is, how many similar levels are divided, each similar level corresponds to a similarity Label)
  • o i is the output value of the i-th level label
  • e is a constant, e ⁇ 2.71828
  • M is the total number of training samples
  • t gj is the true similarity label of the j-th similarity level of the training g sample.
  • a method for matching question text which specifically includes:
  • Step 302 Obtain the product category label.
  • the product category label is used to indicate different products and is composed of numbers and/or characters and/or letters.
  • the product category label of "Goku Robot” may be set to: wukong
  • the product category label of "Alpha Robot” may be Set to: alpha
  • the product category label of "jimu robot” can be set to: jimu.
  • Step 304 Obtain the question text to be matched.
  • Step 306 Determine a target question text sub-library according to the product category label, and obtain a plurality of preset question texts in the target question text sub-library.
  • the question text library is divided into multiple question text sub-libraries according to the product category label, and each question text sub-stock stores the relevant question of the corresponding robot product.
  • the question text sub-stock of "Goku Robot” contains questions about "Goku Robot”
  • the question text sub-stock of "Alpha Robot” contains questions about "Alpha Robot”.
  • step 308 the question text to be matched and the multiple preset question texts in the target question text sub-library are combined to obtain multiple input question texts.
  • Step 310 Input a plurality of the input question texts into a question matching model to obtain a similarity label between the question text to be matched and each of the preset question texts.
  • Step 312 Acquire the target question text with the highest similarity to the question text to be matched according to the similarity label.
  • the matching method of the question text further includes:
  • Step 312 Acquire preset answer text corresponding to each of the preset question texts in the target question text sub-library.
  • the preset question texts are set.
  • the preset answer text of the preset question texts can also be set in the question text sub-library, or a question answer sub-library can be set separately.
  • the question text sub-library is associated with the question answer sub-library, so that as long as the preset text question is known, the answer to the preset text question can be known according to the association relationship.
  • Step 314 Combine the question text to be matched and the preset answer text of each preset question text in the target question text sub-library to obtain multiple input question and answer texts.
  • the question text to be matched is "how does the building block robot operate”
  • the preset answer text of each preset question text in the target question text sub-library is "the building block robot operates as follows”
  • the building block robot operates as follows”
  • the building block robot can be used to sweep the floor
  • the official model action is edited as follows”
  • “2000 blocks” so the question will be asked
  • the sentence text and the preset answer text of each preset question text in the target question text sub-library are combined to obtain multiple input question and answer texts: [How to operate the building block robot, the operation mode of the building block robot is as follows], [How to operate the building block robot, The operation process of the building block robot is as
  • Step 316 Enter a plurality of the input question and answer texts into a question and answer matching model to obtain a match between the question text to be matched and the preset answer text of each preset question text in the target question text sub-library value.
  • the matching value is used to indicate the degree of matching between the question text to be matched and the preset answer text. The closer the answer matches the question, the higher the matching value.
  • the question answering matching model needs to be trained in advance.
  • the preset question training text is used as a question
  • each preset answer training text is used as an answer
  • a binary group including questions and answers is constructed.
  • the group is the input of the question and answer matching model, and at the same time, a restriction condition is set as the output of the question. When the condition is met, the model training is completed.
  • the restriction condition is set according to the value of the similarity label of the main question sentence and the preset question sentence training text.
  • the matching value of the maximum binary group of the similarity label must be greater than the matching value of other binary groups.
  • the existing main question, other preset question training text, similarity label and preset answer training text [how to operate the building block robot, what is the convenient operation of the building block robot, 4, the convenient operation method of the building block robot is as follows] ,[How to operate the building block robot, the Bluetooth of the building block robot cannot be scanned, 3.
  • q 1 (x 1 ,x 2 ,x 3 ,...,x m ) is the word vector of the preset question training text
  • q 2 (y 1 ,y 2 ,y 3 ,...,y n )
  • the word vector of the training text for a preset answer then there are:
  • n refers to the length after the word segmentation of the preset question training text
  • x i is the word vector corresponding to the ith word after the word segmentation of the preset question training text
  • y i is the pre-word Set the word vector corresponding to the i-th word after word segmentation in the answer training text
  • the f function selects the first K values after the cross product
  • relu is the relu activation function
  • W (l) is the weight matrix of layer l
  • b (l) is the offset matrix of layer l
  • L is the total number of layers of the neural network
  • W p is the weight matrix of the preset question training text
  • b p is the weight matrix of the preset question training text
  • h is the output value of the preset question training text after mapping
  • Margin is set to 1, s(q 1 , q 2 ) and s(q 1 , q 3 )
  • obtaining the target question text with the highest similarity to the question text to be matched according to the similarity label in step 312 includes:
  • Step 318 Acquire target preset answer text that matches the question text to be matched according to the similarity label and the matching value.
  • the preset question text and the preset answer text corresponding to the preset question text have the same text identifier. For example, first obtain the one with the largest similarity label and the largest matching value, and then see if their text identifiers are the same. If they are the same, the corresponding preset answer text with the largest matching value is used as the target preset answer text. If they are not the same, Then, the preset answer text corresponding to the preset question text with the largest similarity label is used as the target preset answer text, or the preset answer text with the largest matching value is used as the target preset answer text.
  • obtaining target preset answer text matching the question text to be matched according to the similarity label and the matching value in step 318 includes: step 318A, according to the A similarity label of the question text to be matched and each of the preset question texts in the target question text sub-library, and selecting the question text to be matched from the plurality of preset question texts
  • the preferred preset question text with the highest number of similarities is preferred, which is the preset question text with the highest similarity predicted by the model among the multiple preset question texts.
  • the similarity labels are obtained by sorting the similarity labels as 4, 4, 3, 3, 2, 2, 1, 0, 0 , 0, from which the preferred preset question texts of the similarity tags 4, 4, and 3 can be selected.
  • Step 318B according to the matching value of the question text to be matched and the preset answer text of each preset question text in the target question text sub-library, from multiple preset question texts The preset number of preferred preset answer texts that match the question text to be matched are selected from the preset answer texts of.
  • Step 318B selects the preferred preset question text in the same way as step 318A, and will not be described in detail here.
  • Step 318C Acquire target preset answer text that matches the question text to be matched according to the text identifier of each preferred preset question text and the text identifier of each preferred preset answer text. Assuming that three preferred preset question texts and three preferred preset answer texts are selected, then the target preset answer text is selected from the three preferred preset answer texts according to the text identifier.
  • step 318C according to the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts, the question text to be matched is obtained
  • the matched target preset answer text includes: Step 318C1, according to the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts, to obtain a match with the question to be matched
  • At least one preferred preset question text is mainly obtained by taking an intersection of the text identifiers.
  • the text identifiers of the three preferred preset question texts are jimu10, jimu11, and jimu15
  • the text identifiers of the three preferred preset answer texts are jimu10, jimu11, and jimu17
  • the text identifiers corresponding to the text identifiers jimu10 and jimu1 are
  • the question text is determined to be the preferred preset question text.
  • Step 318C2 Divide the question text to be matched to obtain a word segmentation result that includes multiple words.
  • Step 318C3 Segment at least one preferred preset question text that matches the question to be matched to obtain a plurality of preferred word segmentation results including multiple words.
  • two preferred preset question texts “What is the size of Wukong” and "What colors does Wukong have?”
  • the corresponding two preferred word segmentation results are: [Wu, Kong, De, Chi, Chi, Yes, Yes, More, less] and [Enlightenment, empty, have, which, some, face, color].
  • step 318C4 according to the word segmentation result to be matched and the preferred word segmentation result, a text matching value of the question text to be matched and each preferred preset question text matching the question to be matched is calculated.
  • first count the total number of non-repeating words of the word segmentation result to be matched and the preferred word segmentation result then confirm the same number of the same words of the word segmentation result to be matched and the preferred word segmentation result, and finally use the same number/total number
  • the text matching value of the question text to be matched and each preferred preset question text matching the question to be matched can be obtained.
  • the total number of non-repeated words in "Does Wukong have a golden color" and "What is the size of Wukong" is 12, the same number of the same words is 3, and the text matching value is 3/12, "Wukong has The total number of non-repetitive words of "Golden” and “Which colors does Goku have?" is 11, the same number of the same word is 4, and the text matching value is 4/11.
  • Step 318C5 Obtain the target preset answer text that matches the question text to be matched according to the question text to be matched and the text matching value of each preferred preset question text that matches the question to be matched .
  • the target question text is "Wukong's size”
  • the target question text text identifier obtain the target preset answer text that matches the question text to be matched: 1 meter.
  • an apparatus 500 for matching question text which specifically includes:
  • the obtaining module 502 is used to obtain the question text to be matched; the combination module 504 is used to combine the question text to be matched and each preset question text in the question text library to obtain multiple input question sentences Text; a label module 506, used to input a plurality of the input question text into a question matching model to obtain a similarity label of the question text to be matched and each of the preset question texts; the matching module 508, It is used to obtain the target question text with the highest similarity to the question text to be matched according to the similarity label.
  • the apparatus 500 further includes: a product label acquisition module for acquiring a product category label; correspondingly, the combination module 504 includes: a first combination module for according to the product category Tags to determine the target question text sub-library and obtain a plurality of preset question texts in the target question text sub-library; a second combination module is used to separate the question text to be matched and the target question Multiple preset question texts in the sentence text sub-library are combined to obtain multiple input question texts.
  • a product label acquisition module for acquiring a product category label
  • the combination module 504 includes: a first combination module for according to the product category Tags to determine the target question text sub-library and obtain a plurality of preset question texts in the target question text sub-library; a second combination module is used to separate the question text to be matched and the target question Multiple preset question texts in the sentence text sub-library are combined to obtain multiple input question texts.
  • the device 500 further includes: an answer text acquisition module, configured to acquire a preset answer text corresponding to each of the preset question texts in the target question text sub-library; A module for respectively combining the question text to be matched and the preset answer text of each preset question text in the target question text sub-library to obtain multiple input question and answer texts; matching value acquisition A module for inputting a plurality of the input question-answer texts into a question-answer matching model to obtain the preset answer texts of the question text to be matched and each of the preset question texts in the target question text sub-library Matching value; correspondingly, the matching module 508 includes: a target answer matching module, configured to obtain target preset answer text that matches the question text to be matched according to the similarity label and the matching value.
  • the preset question text and the preset answer text corresponding to the preset question text have the same text identifier;
  • the target answer matching module includes: a preferred question sentence module for According to the similarity label of the question text to be matched and each of the preset question texts in the target question text sub-library, select from a plurality of the preset question texts to match the to-be-matched text A preset number of preferred preset question texts with the highest similarity of question texts; a preferred answer module for each preset question in the sub-library of the question text to be matched and the target question text
  • the matching value of the preset answer text of the sentence text is selected from the preset answer texts in the plurality of preset question texts to select the preferred preset of the preset number that matches the question text to be matched Answer text; target preset answer text module, used to obtain a match with the question text to be matched according to the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts.
  • the target preset answer text module includes: a first answer text module for each text according to each preferred preset question text identifier and each preferred preset answer text Text identification, to obtain at least one preferred preset question text that matches the question to be matched; a second answer text module for word segmentation of the question text to be matched, to obtain a query containing multiple words Matching word segmentation results; a third answer text module for segmenting at least one preferred preset question text that matches the question to be matched to obtain multiple preferred word segmentation results containing multiple words; a fourth answer text module For calculating the text matching value of the question text to be matched and each preferred preset question text matching the question to be matched according to the word segmentation result to be matched and the preferred word segmentation result; fifth The answer text module is used to obtain the target pre-matched text of the question text to be matched according to the text matching value of the question text to be matched and each preferred preset question text that matches the question to be matched Set the answer text.
  • the device 500 further includes: a training module for training the question-matching model; the training module includes: a first training module for acquiring multiple presets The preset question training text set of the question training text; the second training module is used to obtain a plurality of preset question training texts of different similar levels corresponding to each of the preset question training text; the third training module For combining the preset question training text with a plurality of preset question training texts of different similar levels corresponding to the preset question training text to obtain multiple input training texts; a fourth training module , Used to input a plurality of the input training texts as the input of the question matching model, and using the similarity labels of the preset question training texts and the corresponding multiple preset query training texts of different similarity levels as the desired The output is to train the question matching model to obtain a trained question matching model.
  • FIG. 6 shows an internal structure diagram of a computer device in an embodiment.
  • the computer device may be a server or a robot.
  • the computer device includes a processor, a memory, and a network interface connected by a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and may also store a computer program.
  • the processor may enable the processor to implement a question text matching method.
  • a computer program may also be stored in the internal memory.
  • the processor may be caused to execute a method for matching question text.
  • FIG. 6 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • the specific computer equipment may It includes more or fewer components than shown in the figure, or some components are combined, or have a different component arrangement.
  • the question text matching method provided in this application may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 6.
  • the program templates of the matching device 500 constituting the question text can be stored in the memory of the computer device.
  • a computer device includes a memory and a processor.
  • the memory stores a computer program.
  • the processor is caused to perform the following steps: obtain question text to be matched; Combining the question text to be matched and each preset question text in the question text library to obtain a plurality of input question texts; inputting the plurality of input question texts into a question matching model to obtain the to-be-matched.
  • the question text and the similarity label of each of the preset question texts is obtained.
  • a computer-readable storage medium which stores a computer program, and when the computer program is executed by a processor, the processor is caused to perform the following steps: obtain question text to be matched; The question text to be matched and each preset question text in the question text library are combined to obtain multiple input question texts; multiple input question texts are input into the question matching model to obtain the to-be-matched The question text and the similarity label of each of the preset question texts; according to the similarity label, the target question text with the highest similarity to the question text to be matched is obtained.
  • question text matching method question text matching device, computer equipment, and computer readable storage medium belong to a general inventive concept.
  • the question text matching method, question text matching device, and computer The content in the embodiments of the device and the computer-readable storage medium may be mutually applicable.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

A question text matching method and apparatus, a computer device and a storage medium, the method comprising: obtaining a question text to be matched (S102); combining the question text to be matched with each preset question text in a question text library respectively to obtain a plurality of input question texts (S104); inputting the plurality of input question texts into a question matching model to obtain similarity labels between the question text to be matched and each preset question text (S106); and obtaining a target question text having the highest similarity to the question text to be matched according to the similarity labels (S108). By means of the described method, the question matching accuracy can be improved to a certain extent.

Description

问句文本的匹配方法、装置、计算机设备和存储介质Question sentence matching method, device, computer equipment and storage medium 技术领域Technical field
本发明涉及客服机器人技术领域,尤其涉及一种问句文本的匹配方法、装置、计算机设备和存储介质。The invention relates to the technical field of customer service robots, in particular to a question sentence matching method, device, computer equipment and storage medium.
背景技术Background technique
客服机器人主要负责产品的售后服务工作,拥有短信群发、转接人工、通话录音、支持打断、录音转文本等功能。由于客服机器人能够帮助客户自助答疑,大大的帮客服人员分担了工作量。通常情况下,客服机器人将客户的问题与问题库中的各个问题进行匹配,然后找到与客户最接近的问题,最后将该问题的答案推送给客户。The customer service robot is mainly responsible for the after-sales service of the product. It has functions such as group message sending, manual transfer, call recording, interruption support, and recording to text. Since the customer service robot can help customers answer questions on their own, it greatly helps the customer service staff to share the workload. Usually, the customer service robot matches the customer's question with each question in the question library, then finds the question closest to the customer, and finally pushes the answer to the question to the customer.
在客服机器人的问答匹配中,通常会选择有监督的学习模型,这样的学习模型需要对客户问句中的实体与非实体进行标注,以此计算问句与问句的相似度,并将最大相似度的匹配问句的答案推送给客户。但是,这样的方式需要专业的人员对实体和非实体进行标注,不仅耗费人力、效率低下,还可能因为标注人员的水平导致标注结果错误,从而导致最终匹配的问句准确率低。In the question-and-answer matching of customer service robots, a supervised learning model is usually selected. Such a learning model needs to label the entities and non-entities in the customer's question to calculate the similarity between the question and the question, and the maximum The answers to matching questions of similarity are pushed to customers. However, this method requires professional personnel to label entities and non-entities, which not only consumes manpower and is inefficient, but also may result in incorrect labeling results due to the level of the labeling personnel, resulting in low accuracy of the final matching question.
发明内容Summary of the invention
基于此,有必要针对上述问题,提出一种准确率高的问句文本的匹配方法、装置、计算机设备和存储介质。Based on this, it is necessary to propose a method, device, computer equipment, and storage medium for question text matching with high accuracy for the above problems.
一种问句文本的匹配方法,所述方法包括:A method for matching question text, the method includes:
获取待匹配问句文本;Get the question text to be matched;
分别将所述待匹配问句文本和问句文本库中的各个预置问句文本进行组合,得到多个输入问句文本;Combining the question text to be matched and each preset question text in the question text library to obtain multiple input question texts;
将多个所述输入问句文本输入问句匹配模型,得到所述待匹配问句文本和 每个所述预置问句文本的相似度标签;Inputting a plurality of the input question texts into a question matching model to obtain a similarity label of the question text to be matched and each of the preset question texts;
根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本。According to the similarity label, the target question text with the highest similarity to the question text to be matched is obtained.
提供了一种问句文本的匹配装置,包括:A matching device for question text is provided, including:
获取模块,用于获取待匹配问句文本;The acquisition module is used to obtain the question text to be matched;
组合模块,用于分别将所述待匹配问句文本和问句文本库中的各个预置问句文本进行组合,得到多个输入问句文本;A combination module, configured to combine the question text to be matched and each preset question text in the question text library to obtain multiple input question texts;
标签模块,用于将多个所述输入问句文本输入问句匹配模型,得到所述待匹配问句文本和每个所述预置问句文本的相似度标签;A label module, configured to input a plurality of the input question texts into a question matching model to obtain a similarity label of the question text to be matched and each of the preset question texts;
匹配模块,用于根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本。The matching module is configured to obtain the target question text with the highest similarity to the question text to be matched according to the similarity label.
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:A computer device includes a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, the processor is caused to perform the following steps:
获取待匹配问句文本;Get the question text to be matched;
分别将所述待匹配问句文本和问句文本库中的各个预置问句文本进行组合,得到多个输入问句文本;Combining the question text to be matched and each preset question text in the question text library to obtain multiple input question texts;
将多个所述输入问句文本输入问句匹配模型,得到所述待匹配问句文本和每个所述预置问句文本的相似度标签;Input a plurality of the input question texts into a question matching model to obtain a similarity label between the question text to be matched and each of the preset question texts;
根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本。According to the similarity label, the target question text with the highest similarity to the question text to be matched is obtained.
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:A computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor is caused to perform the following steps:
获取待匹配问句文本;Get the question text to be matched;
分别将所述待匹配问句文本和问句文本库中的各个预置问句文本进行组合,得到多个输入问句文本;Combining the question text to be matched and each preset question text in the question text library to obtain multiple input question texts;
将多个所述输入问句文本输入问句匹配模型,得到所述待匹配问句文本和 每个所述预置问句文本的相似度标签;Inputting a plurality of the input question texts into a question matching model to obtain a similarity label of the question text to be matched and each of the preset question texts;
根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本。According to the similarity label, the target question text with the highest similarity to the question text to be matched is obtained.
实施本发明实施例,将具有如下有益效果:The implementation of the embodiments of the present invention will have the following beneficial effects:
本发明提出了一种问句文本的匹配方法、装置、计算机设备和存储介质,通过本发明实施例所述的方式,不再需要人工进行实体关键词的标注,省去了标注的大量时间,也不用再找专业的标注人员对问句文本中的实体和非实体进行标注,也减少了一定的成本花销,最后因为只用将问句进行组合,得到问句与问句之间的相似度标签,从而根据相似度标签得到目标问句文本,而不用再预先对实体和非实体进行区分,也提高了问句匹配的准确率,因为实体标注工作量大,重复性的标注工作很可能导致错误,导致训练的模型并不能准确预测实体,而对各个问句之间进行相似度判断的时候,是对两个句子整体表达的含义的相似程度进行判断,出错的概率更小,所以采用句子对(即两个句子)对模型进行训练,最终预测得到的准确率将更高。The present invention proposes a question sentence text matching method, device, computer equipment and storage medium. Through the method described in the embodiments of the present invention, it is no longer necessary to manually tag the entity keywords, and saves a lot of time for tagging. It is no longer necessary to find professional labeling personnel to label the entities and non-entities in the text of the question, which also reduces a certain cost. Finally, because only the question is combined, the similarity between the question and the question is obtained. Degree label, so that the target question text can be obtained according to the similarity label, without the need to distinguish between entity and non-entity in advance, and the accuracy of question matching is also improved, because the entity labeling workload is large, and repetitive labeling work is likely It leads to errors, and the trained model cannot accurately predict the entity. When judging the similarity between each question, the similarity of the overall meaning of the two sentences is judged. The probability of error is smaller, so it is used. Sentence pairs (that is, two sentences) train the model, and the final prediction accuracy will be higher.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings required in the embodiments or the description of the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, without paying any creative labor, other drawings can be obtained based on these drawings.
其中:among them:
图1为一个实施例中问句文本的匹配方法的实现流程示意图;1 is a schematic diagram of an implementation process of a method for matching question text in an embodiment;
图2为一个实施例中步骤101的实现流程示意图;2 is a schematic diagram of an implementation process of step 101 in an embodiment;
图3为一个实施例中问句文本的匹配方法的实现流程示意图;3 is a schematic diagram of an implementation process of a method for matching question text in an embodiment;
图4为一个实施例中问句文本的匹配方法的实现流程示意图;4 is a schematic diagram of an implementation process of a method for matching question text in an embodiment;
图5为一个实施例中问句文本的匹配装置的结构框图;5 is a structural block diagram of an apparatus for matching question text in an embodiment;
图6为一个实施例中计算机设备的结构框图。6 is a structural block diagram of a computer device in an embodiment.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.
如图1所示,在一个实施例中,提供了一种问句文本的匹配方法,本发明实施例所述的问句文本的匹配方法的执行主体可以是服务器,当然本发明实施例所述的问句文本的匹配方法的执行主体还可以是其他终端设备,例如,机器人设备。该问句文本的匹配方法,具体包括如下步骤:As shown in FIG. 1, in one embodiment, a question text matching method is provided. The execution body of the question text matching method described in the embodiment of the present invention may be a server, of course, described in the embodiment of the present invention The execution body of the matching method of the question text may also be other terminal devices, for example, a robot device. The matching method of the question text specifically includes the following steps:
步骤S102,获取待匹配问句文本。Step S102: Obtain the question text to be matched.
其中,待匹配问句文本,为用于匹配的问句文本。在获取到原始的待匹配问句文本之后,需要去除原始的待匹配问句文本中的停用词。The question text to be matched is the question text used for matching. After obtaining the original question text to be matched, the stop words in the original question text to be matched need to be removed.
步骤S104,分别将所述待匹配问句文本和问句文本库中的各个预置问句文本进行组合,得到多个输入问句文本。Step S104, combining the question text to be matched and each preset question text in the question text library to obtain multiple input question texts.
其中,问句文本库,包括多个预置问句文本;预置问句文本,为预先设置的问句文本。例如,待匹配问句文本为:悟空有多大,问句文本库中有两个预置问句文本:悟空有多高和悟空多少钱一个,将待匹配问句文本和预置问句文本进行组合,得到两个输入问句文本:[悟空有多大,悟空有多高]和[悟空有多大,悟空多少钱一个]。Among them, the question text library includes a plurality of preset question texts; the preset question texts are preset question texts. For example, the question text to be matched is: how big is Goku, and there are two preset question texts in the question text library: how high is Goku and how much is Goku, and the question text to be matched and the preset question text are carried out Combine, get two input question texts: [how big is Goku, how high is Goku] and [how big is Goku, how much is Goku for one].
步骤S106,将多个所述输入问句文本输入问句匹配模型,得到所述待匹配问句文本和每个所述预置问句文本的相似度标签。Step S106: Input a plurality of the input question texts into a question matching model to obtain a similarity label between the question text to be matched and each of the preset question texts.
所述相似度标签,用于反映待匹配问句文本和预置问句文本的相似程度,所述相似度标签,可以设置为一个数字。如上例子,假设用数字1表示待匹配问句文本和预置问句文本很相似,用数字0表示待匹配问句文本和预置问句文 本不相似,于是经过问句匹配模型的预测,待匹配问句文本“悟空有多大”与预置问句文本“悟空有多高”的相似度标签将为1,待匹配问句文本“悟空有多大”与预置问句文本“悟空多少钱一个”的相似度标签将为0。The similarity label is used to reflect the similarity between the question text to be matched and the preset question text. The similarity label may be set to a number. As in the above example, suppose the number 1 indicates that the question text to be matched is very similar to the preset question text, and the number 0 indicates that the question text to be matched is not similar to the preset question text, so after the prediction of the question matching model, wait The similarity label of the matching question text "How big is Goku" and the preset question text "How high is Goku" will be 1, and the matching question text "How big is Goku" and the preset question text "How much is Goku?" "Will have a similarity label of 0.
步骤108,根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本。Step 108: Acquire the target question text with the highest similarity to the question text to be matched according to the similarity label.
如上例子,由于数字1表示待匹配问句文本和预置问句文本很相似,数字0表示待匹配问句文本和预置问句文本不相似,于是,根据相似度标签,确定与待匹配问句文本“悟空有多大”的相似度最高的目标问句文本为:悟空有多高。As in the above example, since the number 1 indicates that the question text to be matched and the preset question text are very similar, and the number 0 indicates that the question text to be matched and the preset question text are not similar, then, according to the similarity label, determine the question to be matched The target text with the highest similarity of the sentence text "How big is Wukong" is: How high is Wukong.
作为本发明一种可选的实施例,在步骤108所述获取与所述待匹配问句文本相似度最高的目标问句文本之后,还包括:获取所述目标问句文本对应的目标答案文本。As an optional embodiment of the present invention, after acquiring the target question text with the highest similarity to the question text to be matched in step 108, the method further includes: acquiring the target answer text corresponding to the target question text .
其中,目标答案文本,为目标问句文本的答案。问句文本库中,设置有预置问句文本,相应的,还可以在问句文本库中设置预置问句文本的预置答案文本,或者,单独设置一个问句答案库,预置问句文本和预置答案文本设置相同的标识,这样,只要知道了预置文本问句,就能知道该预置文本问句的答案。在这里,由于获取到了目标问句文本对应的目标答案文本,这样,可以直接将用户问的问题的答案呈现给用户。Among them, the target answer text is the answer to the target question text. The question text library is provided with preset question texts. Correspondingly, the preset answer texts of the preset question texts can also be set in the question text library, or a question answer library can be set separately to preset questions The sentence text and the preset answer text are set with the same identifier, so that as long as the preset text question is known, the answer to the preset text question can be known. Here, since the target answer text corresponding to the target question sentence text is obtained, in this way, the answer to the question asked by the user can be directly presented to the user.
在本发明实施例中,在步骤102所述获取待匹配问句文本之前,还包括:In the embodiment of the present invention, before obtaining the question text to be matched in step 102, the method further includes:
步骤101,对所述问句匹配模型进行训练。Step 101: Train the question matching model.
具体的,如图2所示,步骤101所述对所述问句匹配模型进行训练,包括:步骤101A,获取包括多个预置问句训练文本的预置问句训练文本集。步骤101B,获取每个所述预置问句训练文本对应的多个不同相似等级的预置问句训练文本。Specifically, as shown in FIG. 2, training the question matching model in step 101 includes: step 101A, obtaining a preset question training text set including a plurality of preset question training texts. Step 101B: Acquire multiple preset question training texts of different similar levels corresponding to each of the preset question training texts.
在这里,将某一预置问句训练文本作为主问句,根据其他预置问句训练文本与主问句的相似程度确定其他预置问句训练文本的相似等级。例如,“积木 机器人如何操作”和“积木机器人的便捷操作是什么”,这两个问句比较相似,相似等级可以设置的高些,而“积木机器人如何操作”和“机器人多少钱”,这两个问句不怎么相似,相似等级可以设置的低些。Here, a certain preset question training text is used as the main question, and the similarity level of the other preset question training texts is determined according to the similarity between the other preset question training text and the main question. For example, "How to operate a building robot" and "What is the convenient operation of a building robot?" These two questions are similar, and the similar level can be set higher, while "How to operate a building robot" and "How much does the robot cost", this The two questions are not very similar, the similarity level can be set lower.
步骤101C,将所述预置问句训练文本分别和所述预置问句训练文本对应的多个不同相似等级的预置问句训练文本进行组合,得到多个输入训练文本。Step 101C: Combine the preset question training text with a plurality of preset question training texts of different similar levels corresponding to the preset question training text to obtain multiple input training texts.
构建包括主问句、其他预置问句训练文本和相似等级对应的相似度标签的三元组。例如,将“积木机器人如何操作”作为主问句,其他预置问句训练文本:“积木机器人的便捷操作是什么”,“积木机器人怎么操作”,“积木机器人的操作流程”,“扫描不到积木机器人的蓝牙”,“积木机器人有什么用”,“官方模型动作怎么编辑”,“配件如何购买”,“机器人多少钱”,于是,可以构建多个三元组:[积木机器人如何操作,积木机器人的便捷操作是什么,4],[积木机器人如何操作,积木机器人怎么操作,4],[积木机器人如何操作,积木机器人的操作流程,4],[积木机器人如何操作,扫描不到积木机器人的蓝牙,3],[积木机器人如何操作,积木机器人有什么用,2],[积木机器人如何操作,官方模型动作怎么编辑,1],[积木机器人如何操作,配件如何购买,0],[积木机器人如何操作,机器人多少钱,0]。模型训练的时候,将三元组中的[主问句,其他预置问句训练文本]作为输入,相似度标签作为期望的输出。当然,具体设置为几个相似等级可以根据实际的需求确定,在此不做具体的限定。Construct a triple consisting of the main question, other preset question training texts, and similarity labels corresponding to similarity levels. For example, use "how to operate a building block robot" as the main question, and other preset question training texts: "what is the convenient operation of the building block robot", "how to operate the building block robot", "operation flow of the building block robot", "scan not To the Bluetooth of the building block robot", "What is the use of the building block robot", "How to edit the official model action", "How to buy the accessories", "How much does the robot cost", so multiple triples can be constructed: [How to operate the building block robot , What is the convenient operation of the building block robot, 4], [How to operate the building block robot, how to operate the building block robot, 4], [How to operate the building block robot, the operating process of the building block robot, 4], [How to operate the building block robot, can not be scanned Bluetooth of the building block robot, 3], [how to operate the building block robot, what is the use of the building block robot, 2], [how to operate the building block robot, how to edit the official model actions, 1], [how to operate the building block robot, how to purchase the accessories, 0] , [How to operate a building block robot, how much is the robot, 0]. When the model is trained, the [main question, other preset question training texts] in the triple is taken as input, and the similarity label is used as the desired output. Of course, the specific settings for several similar levels can be determined according to actual needs, and no specific limitation is made here.
步骤101D,将多个所述输入训练文本作为问句匹配模型的输入,将所述预置问句训练文本与对应的多个不同相似等级的预置问句训练文本的相似度标签作为期望的输出,对所述问句匹配模型进行训练,得到训练好的问句匹配模型。Step 101D, using a plurality of the input training texts as input to the question matching model, and using the similarity labels of the preset question training texts and corresponding multiple preset query training texts of different similarity levels as desired The output is to train the question matching model to obtain a trained question matching model.
由于机器不能识别句子,所以需要将问句文本进行分词,得到词语,然后再转换为词向量作为模型的输入,其中,词向量为用向量的方式来表达词语。例如,问句文本为“积木机器人如何操作”,将其分词得到:积木、机器人、如何、操作,然后再获取到这些词语的词向量,最后将输入整理为词向量的形 式之后输入模型训练,首先将得到的词向量矩阵进行叉乘处理,然后挑选出叉乘之后的前K个值(公式1),进一步的,对待匹配问句文本的词向量进行一次简单的映射处理(公式2),然后再根据映射结果对经过激活函数的输出结果(公式3)赋一个权重值,得到最终的匹配度(公式4),再将该匹配度进行权重运算,得到最终的标签输出值(公式5),将标签输出值输入softmax层之后(公式6)再与相似度标签进行比较形成问句来匹配模型的损失函数(公式7),最后根据损失函数的值进行梯度更新,即可完成模型的训练,具体如下。需要说明的是,为了加快模型训练的速度,还可以选用Adam算法来完成梯度的更新。Since the machine can not recognize the sentence, it is necessary to segment the question text to get the word, and then convert it into a word vector as the input of the model, where the word vector is to express the word in a vector way. For example, the text of the question is "how does the building block robot operate", and the word segmentation is obtained: building blocks, robots, how to operate, and then get the word vectors of these words, and finally organize the input into the form of word vectors and then input model training, First, the obtained word vector matrix is cross-multiplied, and then the first K values after the cross-multiplication are selected (Equation 1). Further, a simple mapping process is performed on the word vector that matches the text of the question (Equation 2). Then, according to the mapping result, a weight value is assigned to the output result (Equation 3) after the activation function to obtain the final matching degree (Equation 4), and then the matching degree is weighted to obtain the final label output value (Equation 5) After inputting the output value of the label into the softmax layer (Equation 6), it is compared with the similarity label to form a question to match the loss function of the model (Equation 7). Finally, the gradient is updated according to the value of the loss function to complete the model training. ,details as follows. It should be noted that, in order to accelerate the speed of model training, the Adam algorithm can also be used to complete the gradient update.
假设q 1=(x 1,x 2,x 3,...,x m)为待匹配问句文本的词向量,q 2=(y 1,y 2,y 3,...,y n)为预置问句训练文本的词向量,于是有: Suppose q 1 =(x 1 ,x 2 ,x 3 ,...,x m ) is the word vector of the question text to be matched, q 2 =(y 1 ,y 2 ,y 3 ,...,y n ) Is the word vector of the preset question training text, so there are:
Figure PCTCN2018125360-appb-000001
Figure PCTCN2018125360-appb-000001
其中,m指待匹配问句文本分词后的长度,n指预置问句训练文本分词后的长度,x i为待匹配问句文本分词后的第i个词对应的词向量,y i为预置问句文本分词后的第i个词对应的词向量,
Figure PCTCN2018125360-appb-000002
为向量的叉乘,f函数即挑选出叉乘之后的前K个值,w p指映射的权重参数,b p指映射的偏置参数,H=[h 1,h 2,…h m],其中,h i为待匹配问句文本的第i个词所对应的映射后的值,relu为relu激活函数,W (l)为第l层的权重矩阵,b (l)为第l层的偏置矩阵,L是神经网络的总 层数,O=[o 1,o 2,…o C],C为相似等级数(即分为了多少个相似等级,每个相似等级对应一个相似度标签),o i为第i个等级的标签输出值,e为常数,e≈2.71828,M是训练样本总数,t gj为训练g样本的第j个相似等级的真实的相似度标签。
Where, m refers to the length of the word segmentation to be matched with the question text, n refers to the length after the word segmentation of the preset question training text, x i is the word vector corresponding to the i-th word after the word segmentation to be matched, and y i is The word vector corresponding to the i-th word after word segmentation of the preset question sentence,
Figure PCTCN2018125360-appb-000002
For the cross product of vectors, the f function selects the first K values after the cross product, w p refers to the weight parameter of the map, b p refers to the offset parameter of the map, H=[h 1 , h 2 ,...h m ] , Where h i is the mapped value corresponding to the i-th word of the question text to be matched, relu is the relu activation function, W (l) is the weight matrix of layer l, and b (l) is layer l The bias matrix of L, L is the total number of layers of the neural network, O = [o 1 , o 2 , ... o C ], C is the number of similar levels (that is, how many similar levels are divided, each similar level corresponds to a similarity Label), o i is the output value of the i-th level label, e is a constant, e≈2.71828, M is the total number of training samples, and t gj is the true similarity label of the j-th similarity level of the training g sample.
如图3所示,提供了一种问句文本的匹配方法,具体包括:As shown in FIG. 3, a method for matching question text is provided, which specifically includes:
步骤302,获取产品类别标签。Step 302: Obtain the product category label.
所述产品类别标签,用于指示不同的产品,由数字和/或字符和/或字母组成。例如,对于机器人来说,可能有“悟空机器人”,“alpha机器人”,“jimu机器人”,对应的,“悟空机器人”的产品类别标签可以设置为:wukong,“alpha机器人”的产品类别标签可以设置为:alpha,“jimu机器人”的产品类别标签可以设置为:jimu。The product category label is used to indicate different products and is composed of numbers and/or characters and/or letters. For example, for robots, there may be "Goku Robot", "Alpha Robot", "jimu Robot", correspondingly, the product category label of "Goku Robot" may be set to: wukong, the product category label of "Alpha Robot" may be Set to: alpha, the product category label of "jimu robot" can be set to: jimu.
步骤304,获取待匹配问句文本。Step 304: Obtain the question text to be matched.
步骤306,根据所述产品类别标签,确定目标问句文本子库,获取所述目标问句文本子库中的多个预置问句文本。Step 306: Determine a target question text sub-library according to the product category label, and obtain a plurality of preset question texts in the target question text sub-library.
在本发明实施例中,根据产品类别标签,将问句文本库分为多个问句文本子库,每个问句文本子库存放相应的机器人产品的相关问句。例如,“悟空机器人”的问句文本子库存放有关“悟空机器人”的问句,“alpha机器人”的问句文本子库存放有关“alpha机器人”的问句。In the embodiment of the present invention, the question text library is divided into multiple question text sub-libraries according to the product category label, and each question text sub-stock stores the relevant question of the corresponding robot product. For example, the question text sub-stock of "Goku Robot" contains questions about "Goku Robot", and the question text sub-stock of "Alpha Robot" contains questions about "Alpha Robot".
步骤308,分别将所述待匹配问句文本和所述目标问句文本子库中的多个预置问句文本进行组合,得到多个输入问句文本。In step 308, the question text to be matched and the multiple preset question texts in the target question text sub-library are combined to obtain multiple input question texts.
步骤310,将多个所述输入问句文本输入问句匹配模型,得到所述待匹配问句文本和每个所述预置问句文本的相似度标签。Step 310: Input a plurality of the input question texts into a question matching model to obtain a similarity label between the question text to be matched and each of the preset question texts.
步骤312,根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本。Step 312: Acquire the target question text with the highest similarity to the question text to be matched according to the similarity label.
为了进一步的确保匹配的答案的准确性,如图4所示,所述问句文本的匹配方法,还包括:In order to further ensure the accuracy of the matching answer, as shown in FIG. 4, the matching method of the question text further includes:
步骤312,获取所述目标问句文本子库中每个所述预置问句文本对应的预置答案文本。Step 312: Acquire preset answer text corresponding to each of the preset question texts in the target question text sub-library.
问句文本子库中,设置有预置问句文本,相应的,还可以在问句文本子库中设置预置问句文本的预置答案文本,或者,单独设置一个问句答案子库,将问句文本子库与问句答案子库进行关联,这样,只要知道了预置文本问句,就能根据关联关系知道该预置文本问句的答案。In the question text sub-library, preset question texts are set. Correspondingly, the preset answer text of the preset question texts can also be set in the question text sub-library, or a question answer sub-library can be set separately. The question text sub-library is associated with the question answer sub-library, so that as long as the preset text question is known, the answer to the preset text question can be known according to the association relationship.
步骤314,分别将所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的预置答案文本进行组合,得到多个输入问答文本。Step 314: Combine the question text to be matched and the preset answer text of each preset question text in the target question text sub-library to obtain multiple input question and answer texts.
在这里,只将待匹配问句文本与目标问句文本子库中的各个预置问句文本的预置答案文本进行组合,而不用再将问句文本库中的各个预置问句文本的预置答案文本组合,大大的节约了程序开销。例如,待匹配问句文本为“积木机器人如何操作”,目标问句文本子库中各个预置问句文本的预置答案文本有“积木机器人操作方式如下”,“积木机器人的操作流程如下”,“通过如下方式扫描到积木机器人蓝牙”,“积木机器人可以用来扫地”,“官方模型动作的编辑方式如下”,“配件可以在商城购买”,“2000块”,于是,将待匹配问句文本和目标问句文本子库中各个预置问句文本的预置答案文本进行组合,得到多个输入问答文本:[积木机器人如何操作,积木机器人操作方式如下],[积木机器人如何操作,积木机器人的操作流程如下],[积木机器人如何操作,通过如下方式扫描到积木机器人蓝牙],[积木机器人如何操作,积木机器人可以用来扫地],[积木机器人如何操作,官方模型动作的编辑方式如下],[积木机器人如何操作,2000块]。Here, only the question text to be matched and the preset answer text of each preset question text in the target question text sub-library are combined, and the preset question texts in the question text library are no longer needed The preset answer text combination greatly saves the program overhead. For example, the question text to be matched is "how does the building block robot operate", and the preset answer text of each preset question text in the target question text sub-library is "the building block robot operates as follows", "the building block robot operates as follows" , "Scan the Bluetooth of the building block robot through the following methods", "The building block robot can be used to sweep the floor", "The official model action is edited as follows", "Accessories can be purchased in the mall", "2000 blocks", so the question will be asked The sentence text and the preset answer text of each preset question text in the target question text sub-library are combined to obtain multiple input question and answer texts: [How to operate the building block robot, the operation mode of the building block robot is as follows], [How to operate the building block robot, The operation process of the building block robot is as follows], [How to operate the building block robot, scan to the building block Bluetooth via the following method], [How to operate the building block robot, the building block robot can be used to sweep the floor], [How to operate the building block robot, the official model action editing method As follows], [how to operate the building block robot, 2000 pieces].
步骤316,将多个所述输入问答文本输入问答匹配模型,得到所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的预置答案文本的匹配值。Step 316: Enter a plurality of the input question and answer texts into a question and answer matching model to obtain a match between the question text to be matched and the preset answer text of each preset question text in the target question text sub-library value.
所述匹配值,用于指示待匹配问句文本和预置答案文本的问答匹配度,答案与问句越匹配,匹配值越高。在这里,需要预先对问答匹配模型进行训练, 训练中,将预置问句训练文本作为问句,将各个预置答案训练文本作为答案,构建包括问句和答案的二元组,该二元组即为问答匹配模型的输入,同时,设置限制条件作为问句的输出,在满足该条件的时候,模型训练完成。其中,限制条件根据主问句和预置问句训练文本的相似度标签的值进行设置,具体的,相似度标签最大二元组的匹配值必须大于其他二元组的匹配值。例如,现有主问句、其他预置问句训练文本、相似度标签和预置答案训练文本:[积木机器人如何操作,积木机器人的便捷操作是什么,4,积木机器人的便捷操作方式如下],[积木机器人如何操作,扫描不到积木机器人的蓝牙,3,通过这样的方式扫描到机器人蓝牙],[积木机器人如何操作,积木机器人有什么用,2,积木机器人用来扫地],[积木机器人如何操作,官方模型动作怎么编辑,1,官方模型动作通过这样的方式编辑],[积木机器人如何操作,配件如何购买,0,配件可以在商城购买],这样,根据相似度标签,可以得到限制条件:[积木机器人如何操作,积木机器人的便捷操作方式如下]的匹配值>[积木机器人如何操作,通过这样的方式扫描到机器人蓝牙]的匹配值>[积木机器人如何操作,积木机器人用来扫地]的匹配值>[积木机器人如何操作,官方模型动作通过这样的方式编辑]的匹配值>[积木机器人如何操作,配件可以在商城购买]的匹配值。在本发明实施例中,问答匹配模型的训练具体如下,通过对L函数进行梯度更新,既可完成问答匹配模型的训练,为了加快模型训练的速度,可以选用Adam算法来完成梯度的更新。The matching value is used to indicate the degree of matching between the question text to be matched and the preset answer text. The closer the answer matches the question, the higher the matching value. Here, the question answering matching model needs to be trained in advance. In the training, the preset question training text is used as a question, and each preset answer training text is used as an answer, and a binary group including questions and answers is constructed. The group is the input of the question and answer matching model, and at the same time, a restriction condition is set as the output of the question. When the condition is met, the model training is completed. Among them, the restriction condition is set according to the value of the similarity label of the main question sentence and the preset question sentence training text. Specifically, the matching value of the maximum binary group of the similarity label must be greater than the matching value of other binary groups. For example, the existing main question, other preset question training text, similarity label and preset answer training text: [how to operate the building block robot, what is the convenient operation of the building block robot, 4, the convenient operation method of the building block robot is as follows] ,[How to operate the building block robot, the Bluetooth of the building block robot cannot be scanned, 3. Scan the Bluetooth of the robot in this way], [How to operate the building block robot, what is the use of the building block robot, 2, The building block robot is used to sweep the floor], [Building block How to operate the robot, how to edit the official model actions, 1, how to edit the official model actions in this way], [how to operate the building block robot, how to buy accessories, 0, accessories can be purchased in the mall], so that according to the similarity label, you can get Restrictions: [How to operate the building block robot, the convenient way to operate the building robot is as follows] Matching value> [How to operate the building block robot, scan the robot Bluetooth in this way] Matching value> [How to operate the building block robot, the building block robot is used to Matching value of [Sweeping floor]>[How to operate the building block robot, the official model action is edited in this way] Matching value>[How to operate the building block robot, accessories can be purchased in the mall] In the embodiment of the present invention, the training of the question answering matching model is as follows. By gradient updating the L function, the training of the question answering matching model can be completed. In order to speed up the model training speed, the Adam algorithm can be used to complete the gradient update.
q 1=(x 1,x 2,x 3,...,x m)为预置问句训练文本的词向量,q 2=(y 1,y 2,y 3,...,y n)为某一预置答案训练文本的词向量,于是有: q 1 =(x 1 ,x 2 ,x 3 ,...,x m ) is the word vector of the preset question training text, q 2 =(y 1 ,y 2 ,y 3 ,...,y n ) The word vector of the training text for a preset answer, then there are:
Figure PCTCN2018125360-appb-000003
Figure PCTCN2018125360-appb-000003
m指预置问句训练文本分词后的长度,n指预置答案训练文本分词后的长 度,x i为预置问句训练文本分词后的第i个词对应的词向量,y i为预置答案训练文本分词后的第i个词对应的词向量,
Figure PCTCN2018125360-appb-000004
为向量的叉乘,f函数即挑选出叉乘之后的前K个值,relu为relu激活函数,W (l)为第l层的权重矩阵,b (l)为第l层的偏置矩阵,L是神经网络的总层数,W p是预置问句训练文本的权重矩阵,b p是预置问句训练文本的权重矩阵,h是预置问句训练文本经过映射后的输出值,margin设置为1,s(q 1,q 2)和s(q 1,q 3)为预置问句训练文本与某一预置答案训练文本输出的预测的匹配值,Θ为预先给定的参数。
m refers to the length after the word segmentation of the preset question training text, n refers to the length after the word segmentation of the preset answer training text, x i is the word vector corresponding to the ith word after the word segmentation of the preset question training text, and y i is the pre-word Set the word vector corresponding to the i-th word after word segmentation in the answer training text,
Figure PCTCN2018125360-appb-000004
For the cross product of vectors, the f function selects the first K values after the cross product, relu is the relu activation function, W (l) is the weight matrix of layer l, and b (l) is the offset matrix of layer l , L is the total number of layers of the neural network, W p is the weight matrix of the preset question training text, b p is the weight matrix of the preset question training text, and h is the output value of the preset question training text after mapping , Margin is set to 1, s(q 1 , q 2 ) and s(q 1 , q 3 ) are the predicted matching values between the preset question training text and a preset answer training text output, and Θ is given in advance Parameters.
相应的,步骤312所述根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本,包括:Correspondingly, obtaining the target question text with the highest similarity to the question text to be matched according to the similarity label in step 312 includes:
步骤318,根据所述相似度标签和所述匹配值,获取与所述待匹配问句文本匹配的目标预置答案文本。Step 318: Acquire target preset answer text that matches the question text to be matched according to the similarity label and the matching value.
在本发明实施例中,所述预置问句文本和所述预置问句文本对应的预置答案文本具有相同的文本标识。例如,首先获取相似度标签最大的和匹配值最大的,然后看它们的文本标识是否相同,若相同,则将匹配值最大的对应的预置答案文本作为目标预置答案文本,若不相同,则将相似度标签最大的预置问句文本对应的预置答案文本作为目标预置答案文本,或者将匹配值最大的预置答案文本作为目标预置答案文本。作为本发明的一种实施例,步骤318所述根据所述相似度标签和所述匹配值,获取与所述待匹配问句文本匹配的目标预置答案文本,包括:步骤318A,根据所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的相似度标签,从多个所述预置问句文本中挑选出与所述待匹配问句文本相似度最高的预设个数的优选预置问句文本。其中,优选预置问句文本,为多个预置问句文本中经过模型预测得到的相似度最高的预置问句文本。例如,假设预置问句文本有10个,预置个数设置为3个,则通过对相似度标签排序得到相似度标签为4、4、3、3、2、2、1、0、0、0,从中可以挑选出相似度标签4、4、3的优选预置问句文本。步骤318B,根据所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的预置 答案文本的匹配值,从多个所述预置问句文本中的预置答案文本中挑选出与所述待匹配问句文本匹配的所述预设个数的优选预置答案文本。步骤318B挑选优选预置问句文本的方式与步骤318A与相同,在此不再详述。步骤318C,根据每个所述优选预置问句文本的文本标识和每个所述优选预置答案文本的文本标识,获取与所述待匹配问句文本匹配的目标预置答案文本。假设挑选出3个优选预置问句文本和3个优选预置答案文本,那么,再根据文本标识从这3个优选预置答案文本中挑选出目标预置答案文本。作为本发明的一种实施例,步骤318C所述根据每个所述优选预置问句文本的文本标识和每个所述优选预置答案文本的文本标识,获取与所述待匹配问句文本匹配的目标预置答案文本,包括:步骤318C1,根据每个所述优选预置问句文本的文本标识和每个所述优选预置答案文本的文本标识,获取与所述待匹配问句匹配的至少一个优选预置问句文本。在这里,主要是通过对文本标识取交集的方式获取到与所述待匹配问句匹配的至少一个优选预置问句文本。例如,3个优选预置问句文本的文本标识分别为jimu10、jimu11和jimu15,3个优选预置答案文本的文本标识分别为jimu10、jimu11和jimu17,于是,将文本标识jimu10和jimu1对应的预置问句文本确定为优选预置问句文本。步骤318C2,将所述待匹配问句文本进行分字,得到包含多个字的待匹配分词结果。例如,待匹配问句文本为“悟空有金色的吗”,待匹配分词结果为:[悟,空,有,金,色,的,吗]。步骤318C3,将与所述待匹配问句匹配的至少一个优选预置问句文本进行分词,得到包含多个字的多个优选分词结果。例如,最终得到2个优选预置问句文本:“悟空的尺寸是多少”和“悟空有哪些颜色”,对应的两个优选分词结果为:[悟,空,的,尺,寸,是,多,少]和[悟,空,有,哪,些,颜,色]。步骤318C4,根据所述待匹配分词结果和所述优选分词结果,计算所述待匹配问句文本和与所述待匹配问句匹配的每个优选预置问句文本的文本匹配值。在这里,首先统计待匹配分词结果和优选分词结果的不重复字的总个数,然后确认待匹配分词结果和优选分词结果的相同字的相同个数,最后用相同个数/总个数既可得到待 匹配问句文本和与待匹配问句匹配的每个优选预置问句文本的文本匹配值。继续如上例子,“悟空有金色的吗”和“悟空的尺寸是多少”的不重复字的总个数为12,相同字的相同个数为3,文本匹配值为3/12,“悟空有金色的吗”和“悟空有哪些颜色”的不重复字的总个数为11,相同字的相同个数为4,文本匹配值为4/11。当然,为了提高计算的有效性和准确性,可以选择去除待匹配问句和预置文本问句中的一些无关紧要的词,然后再进行计算。在去除一些无意义的词之后,“悟空有金色的吗”得到“金色”,“悟空的尺寸是多少”得到“尺寸是多少”,“悟空有哪些颜色”得到“哪些颜色”,统计“悟空有金色的吗”和“悟空的尺寸是多少”的总个数:7,相同个数为0,文本匹配值为0,统计“悟空有金色的吗”和“悟空有哪些颜色”的总个数:5,相同个数为1,文本匹配值为0.2。步骤318C5,根据所述待匹配问句文本和与所述待匹配问句匹配的每个优选预置问句文本的文本匹配值,获取与所述待匹配问句文本匹配的目标预置答案文本。继续如上例子,由于“悟空的大小”和“悟空的尺寸”的文本匹配值大于“悟空的大小”和“悟空有哪些颜色”的文本匹配值,所以,目标问句文本为“悟空的尺寸”,根据目标问句文本文本标识,获取与所述待匹配问句文本匹配的目标预置答案文本:1米。In this embodiment of the present invention, the preset question text and the preset answer text corresponding to the preset question text have the same text identifier. For example, first obtain the one with the largest similarity label and the largest matching value, and then see if their text identifiers are the same. If they are the same, the corresponding preset answer text with the largest matching value is used as the target preset answer text. If they are not the same, Then, the preset answer text corresponding to the preset question text with the largest similarity label is used as the target preset answer text, or the preset answer text with the largest matching value is used as the target preset answer text. As an embodiment of the present invention, obtaining target preset answer text matching the question text to be matched according to the similarity label and the matching value in step 318 includes: step 318A, according to the A similarity label of the question text to be matched and each of the preset question texts in the target question text sub-library, and selecting the question text to be matched from the plurality of preset question texts The preferred preset question text with the highest number of similarities. Among them, the preset question text is preferred, which is the preset question text with the highest similarity predicted by the model among the multiple preset question texts. For example, assuming that there are 10 preset question texts and the preset number is set to 3, the similarity labels are obtained by sorting the similarity labels as 4, 4, 3, 3, 2, 2, 1, 0, 0 , 0, from which the preferred preset question texts of the similarity tags 4, 4, and 3 can be selected. Step 318B, according to the matching value of the question text to be matched and the preset answer text of each preset question text in the target question text sub-library, from multiple preset question texts The preset number of preferred preset answer texts that match the question text to be matched are selected from the preset answer texts of. Step 318B selects the preferred preset question text in the same way as step 318A, and will not be described in detail here. Step 318C: Acquire target preset answer text that matches the question text to be matched according to the text identifier of each preferred preset question text and the text identifier of each preferred preset answer text. Assuming that three preferred preset question texts and three preferred preset answer texts are selected, then the target preset answer text is selected from the three preferred preset answer texts according to the text identifier. As an embodiment of the present invention, in step 318C, according to the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts, the question text to be matched is obtained The matched target preset answer text includes: Step 318C1, according to the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts, to obtain a match with the question to be matched At least one preferred preset question text. Here, at least one preferred preset question text that matches the question to be matched is mainly obtained by taking an intersection of the text identifiers. For example, the text identifiers of the three preferred preset question texts are jimu10, jimu11, and jimu15, and the text identifiers of the three preferred preset answer texts are jimu10, jimu11, and jimu17, so the text identifiers corresponding to the text identifiers jimu10 and jimu1 are The question text is determined to be the preferred preset question text. Step 318C2: Divide the question text to be matched to obtain a word segmentation result that includes multiple words. For example, the text of the question to be matched is "Does Wukong have a golden one", and the result of the word segmentation to be matched is: [Wu, Kong, Yes, Jin, Se, ye]. Step 318C3: Segment at least one preferred preset question text that matches the question to be matched to obtain a plurality of preferred word segmentation results including multiple words. For example, we finally get two preferred preset question texts: "What is the size of Wukong" and "What colors does Wukong have?" The corresponding two preferred word segmentation results are: [Wu, Kong, De, Chi, Chi, Yes, Yes, More, less] and [Enlightenment, empty, have, which, some, face, color]. In step 318C4, according to the word segmentation result to be matched and the preferred word segmentation result, a text matching value of the question text to be matched and each preferred preset question text matching the question to be matched is calculated. Here, first count the total number of non-repeating words of the word segmentation result to be matched and the preferred word segmentation result, then confirm the same number of the same words of the word segmentation result to be matched and the preferred word segmentation result, and finally use the same number/total number The text matching value of the question text to be matched and each preferred preset question text matching the question to be matched can be obtained. Continuing the example above, the total number of non-repeated words in "Does Wukong have a golden color" and "What is the size of Wukong" is 12, the same number of the same words is 3, and the text matching value is 3/12, "Wukong has The total number of non-repetitive words of "Golden" and "Which colors does Goku have?" is 11, the same number of the same word is 4, and the text matching value is 4/11. Of course, in order to improve the validity and accuracy of the calculation, you can choose to remove some irrelevant words in the question to be matched and the preset text question, and then perform the calculation. After removing some meaningless words, "Does Wukong have gold" gets "golden", "what is the size of Wukong" gets "how big is the size", "what colors does Wukong have" gets "which colors", and statistics "Woku The total number of "golden" and "what is the size of Wukong": 7, the same number is 0, the text matching value is 0, and the total number of "Wukong has gold" and "Which colors does Goku have?" Number: 5, the same number is 1, and the text matching value is 0.2. Step 318C5: Obtain the target preset answer text that matches the question text to be matched according to the question text to be matched and the text matching value of each preferred preset question text that matches the question to be matched . Continuing the above example, since the text matching value of "Wukong's size" and "Wukong's size" is greater than the text matching value of "Wukong's size" and "Wukong's colors", the target question text is "Wukong's size" , According to the target question text text identifier, obtain the target preset answer text that matches the question text to be matched: 1 meter.
如图5所示,提供了一种问句文本的匹配装置500,具体包括:As shown in FIG. 5, an apparatus 500 for matching question text is provided, which specifically includes:
获取模块502,用于获取待匹配问句文本;组合模块504,用于分别将所述待匹配问句文本和问句文本库中的各个预置问句文本进行组合,得到多个输入问句文本;标签模块506,用于将多个所述输入问句文本输入问句匹配模型,得到所述待匹配问句文本和每个所述预置问句文本的相似度标签;匹配模块508,用于根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本。The obtaining module 502 is used to obtain the question text to be matched; the combination module 504 is used to combine the question text to be matched and each preset question text in the question text library to obtain multiple input question sentences Text; a label module 506, used to input a plurality of the input question text into a question matching model to obtain a similarity label of the question text to be matched and each of the preset question texts; the matching module 508, It is used to obtain the target question text with the highest similarity to the question text to be matched according to the similarity label.
在其中一个实施例中,所述装置500,还包括:产品标签获取模块,用于获取产品类别标签;相应的,所述组合模块504,包括:第一组合模块,用于根据所述产品类别标签,确定目标问句文本子库,获取所述目标问句文本子库 中的多个预置问句文本;第二组合模块,用于分别将所述待匹配问句文本和所述目标问句文本子库中的多个预置问句文本进行组合,得到多个输入问句文本。In one of the embodiments, the apparatus 500 further includes: a product label acquisition module for acquiring a product category label; correspondingly, the combination module 504 includes: a first combination module for according to the product category Tags to determine the target question text sub-library and obtain a plurality of preset question texts in the target question text sub-library; a second combination module is used to separate the question text to be matched and the target question Multiple preset question texts in the sentence text sub-library are combined to obtain multiple input question texts.
在其中一个实施例中,所述装置500,还包括:答案文本获取模块,用于获取所述目标问句文本子库中每个所述预置问句文本对应的预置答案文本;问答组合模块,用于分别将所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的预置答案文本进行组合,得到多个输入问答文本;匹配值获取模块,用于将多个所述输入问答文本输入问答匹配模型,得到所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的预置答案文本的匹配值;相应的,所述匹配模块508,包括:目标答案匹配模块,用于根据所述相似度标签和所述匹配值,获取与所述待匹配问句文本匹配的目标预置答案文本。In one of the embodiments, the device 500 further includes: an answer text acquisition module, configured to acquire a preset answer text corresponding to each of the preset question texts in the target question text sub-library; A module for respectively combining the question text to be matched and the preset answer text of each preset question text in the target question text sub-library to obtain multiple input question and answer texts; matching value acquisition A module for inputting a plurality of the input question-answer texts into a question-answer matching model to obtain the preset answer texts of the question text to be matched and each of the preset question texts in the target question text sub-library Matching value; correspondingly, the matching module 508 includes: a target answer matching module, configured to obtain target preset answer text that matches the question text to be matched according to the similarity label and the matching value.
在其中一个实施例中,所述预置问句文本和所述预置问句文本对应的预置答案文本具有相同的文本标识;所述目标答案匹配模块,包括:优选问句模块,用于根据所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的相似度标签,从多个所述预置问句文本中挑选出与所述待匹配问句文本相似度最高的预设个数的优选预置问句文本;优选答案模块,用于根据所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的预置答案文本的匹配值,从多个所述预置问句文本中的预置答案文本中挑选出与所述待匹配问句文本匹配的所述预设个数的优选预置答案文本;目标预置答案文本模块,用于根据每个所述优选预置问句文本的文本标识和每个所述优选预置答案文本的文本标识,获取与所述待匹配问句文本匹配的目标预置答案文本。In one of the embodiments, the preset question text and the preset answer text corresponding to the preset question text have the same text identifier; the target answer matching module includes: a preferred question sentence module for According to the similarity label of the question text to be matched and each of the preset question texts in the target question text sub-library, select from a plurality of the preset question texts to match the to-be-matched text A preset number of preferred preset question texts with the highest similarity of question texts; a preferred answer module for each preset question in the sub-library of the question text to be matched and the target question text The matching value of the preset answer text of the sentence text is selected from the preset answer texts in the plurality of preset question texts to select the preferred preset of the preset number that matches the question text to be matched Answer text; target preset answer text module, used to obtain a match with the question text to be matched according to the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts The goal preset answer text.
在其中一个实施例中,所述目标预置答案文本模块,包括:第一答案文本模块,用于根据每个所述优选预置问句文本的文本标识和每个所述优选预置答案文本的文本标识,获取与所述待匹配问句匹配的至少一个优选预置问句文本;第二答案文本模块,用于将所述待匹配问句文本进行分字,得到包含多个字的待匹配分词结果;第三答案文本模块,用于将与所述待匹配问句匹配的至少一 个优选预置问句文本进行分词,得到包含多个字的多个优选分词结果;第四答案文本模块,用于根据所述待匹配分词结果和所述优选分词结果,计算所述待匹配问句文本和与所述待匹配问句匹配的每个优选预置问句文本的文本匹配值;第五答案文本模块,用于根据所述待匹配问句文本和与所述待匹配问句匹配的每个优选预置问句文本的文本匹配值,获取与所述待匹配问句文本匹配的目标预置答案文本。In one of the embodiments, the target preset answer text module includes: a first answer text module for each text according to each preferred preset question text identifier and each preferred preset answer text Text identification, to obtain at least one preferred preset question text that matches the question to be matched; a second answer text module for word segmentation of the question text to be matched, to obtain a query containing multiple words Matching word segmentation results; a third answer text module for segmenting at least one preferred preset question text that matches the question to be matched to obtain multiple preferred word segmentation results containing multiple words; a fourth answer text module For calculating the text matching value of the question text to be matched and each preferred preset question text matching the question to be matched according to the word segmentation result to be matched and the preferred word segmentation result; fifth The answer text module is used to obtain the target pre-matched text of the question text to be matched according to the text matching value of the question text to be matched and each preferred preset question text that matches the question to be matched Set the answer text.
在其中一个实施例中,所述装置500,还包括:训练模块,用于对所述问句匹配模型进行训练;所述训练模块,包括:第一训练模块,用于获取包括多个预置问句训练文本的预置问句训练文本集;第二训练模块,用于获取每个所述预置问句训练文本对应的多个不同相似等级的预置问句训练文本;第三训练模块,用于将所述预置问句训练文本分别和所述预置问句训练文本对应的多个不同相似等级的预置问句训练文本进行组合,得到多个输入训练文本;第四训练模块,用于将多个所述输入训练文本作为问句匹配模型的输入,将所述预置问句训练文本与对应的多个不同相似等级的预置问句训练文本的相似度标签作为期望的输出,对所述问句匹配模型进行训练,得到训练好的问句匹配模型。In one of the embodiments, the device 500 further includes: a training module for training the question-matching model; the training module includes: a first training module for acquiring multiple presets The preset question training text set of the question training text; the second training module is used to obtain a plurality of preset question training texts of different similar levels corresponding to each of the preset question training text; the third training module For combining the preset question training text with a plurality of preset question training texts of different similar levels corresponding to the preset question training text to obtain multiple input training texts; a fourth training module , Used to input a plurality of the input training texts as the input of the question matching model, and using the similarity labels of the preset question training texts and the corresponding multiple preset query training texts of different similarity levels as the desired The output is to train the question matching model to obtain a trained question matching model.
图6示出了一个实施例中计算机设备的内部结构图。该计算机设备可以是服务器,也可以是机器人。如图6所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现问句文本的匹配方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行问句文本的匹配方法。本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。FIG. 6 shows an internal structure diagram of a computer device in an embodiment. The computer device may be a server or a robot. As shown in FIG. 6, the computer device includes a processor, a memory, and a network interface connected by a system bus. Among them, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program. When the computer program is executed by the processor, the processor may enable the processor to implement a question text matching method. A computer program may also be stored in the internal memory. When the computer program is executed by the processor, the processor may be caused to execute a method for matching question text. Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. The specific computer equipment may It includes more or fewer components than shown in the figure, or some components are combined, or have a different component arrangement.
在一个实施例中,本申请提供的问句文本的匹配方法可以实现为一种计算 机程序的形式,计算机程序可在如图6所示的计算机设备上运行。计算机设备的存储器中可存储组成问句文本的匹配装置500的各个程序模板。比如,获取模块502、组合模块504、标签模块506和匹配模块508。In one embodiment, the question text matching method provided in this application may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 6. The program templates of the matching device 500 constituting the question text can be stored in the memory of the computer device. For example, the acquisition module 502, the combination module 504, the tag module 506, and the matching module 508.
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:获取待匹配问句文本;分别将所述待匹配问句文本和问句文本库中的各个预置问句文本进行组合,得到多个输入问句文本;将多个所述输入问句文本输入问句匹配模型,得到所述待匹配问句文本和每个所述预置问句文本的相似度标签;根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本。A computer device includes a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, the processor is caused to perform the following steps: obtain question text to be matched; Combining the question text to be matched and each preset question text in the question text library to obtain a plurality of input question texts; inputting the plurality of input question texts into a question matching model to obtain the to-be-matched The question text and the similarity label of each of the preset question texts; according to the similarity label, the target question text with the highest similarity to the question text to be matched is obtained.
在一个实施例中,提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:获取待匹配问句文本;分别将所述待匹配问句文本和问句文本库中的各个预置问句文本进行组合,得到多个输入问句文本;将多个所述输入问句文本输入问句匹配模型,得到所述待匹配问句文本和每个所述预置问句文本的相似度标签;根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本。In one embodiment, a computer-readable storage medium is proposed, which stores a computer program, and when the computer program is executed by a processor, the processor is caused to perform the following steps: obtain question text to be matched; The question text to be matched and each preset question text in the question text library are combined to obtain multiple input question texts; multiple input question texts are input into the question matching model to obtain the to-be-matched The question text and the similarity label of each of the preset question texts; according to the similarity label, the target question text with the highest similarity to the question text to be matched is obtained.
需要说明的是,上述问句文本的匹配方法、问句文本的匹配装置、计算机设备及计算机可读存储介质属于一个总的发明构思,问句文本的匹配方法、问句文本的匹配装置、计算机设备及计算机可读存储介质实施例中的内容可相互适用。It should be noted that the above question text matching method, question text matching device, computer equipment, and computer readable storage medium belong to a general inventive concept. The question text matching method, question text matching device, and computer The content in the embodiments of the device and the computer-readable storage medium may be mutually applicable.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包 括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art may understand that all or part of the processes in the method of the foregoing embodiments may be completed by instructing relevant hardware through a computer program, and the program may be stored in a non-volatile computer-readable storage medium In this case, when the program is executed, it may include the flow of the above-mentioned method embodiments. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be arbitrarily combined. In order to simplify the description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the scope described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiment only expresses several implementation manners of the present application, and its description is more specific and detailed, but it cannot be understood as a limitation of the patent scope of the present application. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, a number of modifications and improvements can also be made, which all fall within the protection scope of the present application. Therefore, the protection scope of the patent of this application shall be subject to the appended claims.

Claims (10)

  1. 一种问句文本的匹配方法,其特征在于,包括:A method for matching question text, including:
    获取待匹配问句文本;Get the question text to be matched;
    分别将所述待匹配问句文本和问句文本库中的各个预置问句文本进行组合,得到多个输入问句文本;Combining the question text to be matched and each preset question text in the question text library to obtain multiple input question texts;
    将多个所述输入问句文本输入问句匹配模型,得到所述待匹配问句文本和每个所述预置问句文本的相似度标签;Input a plurality of the input question texts into a question matching model to obtain a similarity label between the question text to be matched and each of the preset question texts;
    根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本。According to the similarity label, the target question text with the highest similarity to the question text to be matched is obtained.
  2. 如权利要求1所述的方法,其特征在于,所述问句文本库包括多个问句文本子库;在所述获取待匹配问句文本之前,还包括:The method of claim 1, wherein the question text library includes a plurality of question text sub-libraries; before the acquiring the question text to be matched, the method further comprises:
    获取产品类别标签;Obtain the product category label;
    所述分别将所述待匹配问句文本和问句文本库中的各个预置问句文本进行组合,得到多个输入问句文本,包括:The combination of the question text to be matched and each preset question text in the question text library to obtain multiple input question texts includes:
    根据所述产品类别标签,确定目标问句文本子库,获取所述目标问句文本子库中的多个预置问句文本;Determine a target question text sub-library according to the product category label, and obtain a plurality of preset question texts in the target question text sub-library;
    分别将所述待匹配问句文本和所述目标问句文本子库中的多个预置问句文本进行组合,得到多个输入问句文本。Combining the question text to be matched and the plurality of preset question texts in the target question text sub-library, respectively, to obtain multiple input question texts.
  3. 如权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, wherein the method further comprises:
    获取所述目标问句文本子库中每个所述预置问句文本对应的预置答案文本;Acquiring preset answer text corresponding to each of the preset question texts in the target question text sub-library;
    分别将所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的预置答案文本进行组合,得到多个输入问答文本;Combining the question text to be matched and the preset answer text of each of the preset question texts in the target question text sub-library to obtain multiple input question and answer texts;
    将多个所述输入问答文本输入问答匹配模型,得到所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的预置答案文本的匹配值;Inputting a plurality of the input question and answer texts into a question and answer matching model to obtain a matching value of the preset answer texts of the question texts to be matched and each of the preset question texts in the target question text sub-library;
    所述根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标 问句文本,包括:The obtaining the target question text with the highest similarity to the question text to be matched according to the similarity label includes:
    根据所述相似度标签和所述匹配值,获取与所述待匹配问句文本匹配的目标预置答案文本。According to the similarity label and the matching value, target preset answer text that matches the question text to be matched is obtained.
  4. 如权利要求3所述的方法,其特征在于,所述预置问句文本和所述预置问句文本对应的预置答案文本具有相同的文本标识;The method according to claim 3, wherein the preset question text and the preset answer text corresponding to the preset question text have the same text identifier;
    所述根据所述相似度标签和所述匹配值,获取与所述待匹配问句文本匹配的目标预置答案文本,包括:The obtaining the target preset answer text matching the question text to be matched according to the similarity label and the matching value includes:
    根据所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的相似度标签,从多个所述预置问句文本中挑选出与所述待匹配问句文本相似度最高的预设个数的优选预置问句文本;According to the similarity label of the question text to be matched and each of the preset question texts in the target question text sub-library, select from a plurality of the preset question texts to match the to-be-matched text The preset preset question texts with the highest number of question text similarities are preferred;
    根据所述待匹配问句文本和所述目标问句文本子库中每个所述预置问句文本的预置答案文本的匹配值,从多个所述预置问句文本中的预置答案文本中挑选出与所述待匹配问句文本匹配的所述预设个数的优选预置答案文本;According to the matching value of the question text to be matched and the preset answer text of each of the preset question texts in the target question text sub-library, preset from a plurality of the preset question texts Selecting the preset preset number of preferred answer texts that match the question text to be matched from the answer texts;
    根据每个所述优选预置问句文本的文本标识和每个所述优选预置答案文本的文本标识,获取与所述待匹配问句文本匹配的目标预置答案文本。According to the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts, a target preset answer text that matches the question text to be matched is obtained.
  5. 如权利要求4所述的方法,其特征在于,所述根据每个所述优选预置问句文本的文本标识和每个所述优选预置答案文本的文本标识,获取与所述待匹配问句文本匹配的目标预置答案文本,包括:The method according to claim 4, characterized in that, based on the text identifier of each of the preferred preset question texts and the text identifier of each of the preferred preset answer texts, the Target preset answer text for sentence text matching, including:
    根据每个所述优选预置问句文本的文本标识和每个所述优选预置答案文本的文本标识,获取与所述待匹配问句匹配的至少一个优选预置问句文本;Acquiring at least one preferred preset question text that matches the question to be matched according to the text identifier of each preferred preset question text and the text identifier of each preferred preset answer text;
    将所述待匹配问句文本进行分字,得到包含多个字的待匹配分词结果;Word segment the question text to be matched to obtain a word segmentation result to be matched that contains multiple words;
    将与所述待匹配问句匹配的至少一个优选预置问句文本进行分词,得到包含多个字的多个优选分词结果;Segmenting at least one preferred preset question text that matches the question to be matched to obtain multiple preferred word segmentation results that include multiple words;
    根据所述待匹配分词结果和所述优选分词结果,计算所述待匹配问句文本和与所述待匹配问句匹配的每个优选预置问句文本的文本匹配值;Calculating the text matching value of the question text to be matched and each preferred preset question text matching the question to be matched according to the word segmentation result to be matched and the preferred word segmentation result;
    根据所述待匹配问句文本和与所述待匹配问句匹配的每个优选预置问句 文本的文本匹配值,获取与所述待匹配问句文本匹配的目标预置答案文本。According to the text matching value of the question text to be matched and each preferred preset question text matching the question to be matched, a target preset answer text matching the text of the question to be matched is obtained.
  6. 根据权利要求1至5任一项所述的方法,其特征在于,在所述获取待匹配问句文本之前,还包括:对所述问句匹配模型进行训练,所述训练包括以下步骤:The method according to any one of claims 1 to 5, characterized in that, before acquiring the question text to be matched, the method further comprises: training the question matching model, and the training includes the following steps:
    获取包括多个预置问句训练文本的预置问句训练文本集;Obtain a preset question training text set that includes multiple preset question training texts;
    获取每个所述预置问句训练文本对应的多个不同相似等级的预置问句训练文本;Acquiring a plurality of preset question training texts of different similar levels corresponding to each of the preset question training texts;
    将所述预置问句训练文本分别和所述预置问句训练文本对应的多个不同相似等级的预置问句训练文本进行组合,得到多个输入训练文本;Combining the preset question training text with a plurality of preset question training texts of different similar levels corresponding to the preset question training text to obtain multiple input training texts;
    将多个所述输入训练文本作为问句匹配模型的输入,将所述预置问句训练文本与对应的多个不同相似等级的预置问句训练文本的相似度标签作为期望的输出,对所述问句匹配模型进行训练,得到训练好的问句匹配模型。Use a plurality of the input training texts as input to the question matching model, and use the similarity labels of the preset question training texts and corresponding multiple preset question training texts of different similarity levels as the desired output. The question matching model is trained to obtain a trained question matching model.
  7. 一种问句的匹配装置,其特征在于,包括:A question sentence matching device, characterized in that it includes:
    获取模块,用于获取待匹配问句文本;The acquisition module is used to obtain the question text to be matched;
    组合模块,用于分别将所述待匹配问句文本和问句文本库中的各个预置问句文本进行组合,得到多个输入问句文本;A combination module, configured to combine the question text to be matched and each preset question text in the question text library to obtain multiple input question texts;
    标签模块,用于将多个所述输入问句文本输入问句匹配模型,得到所述待匹配问句文本和每个所述预置问句文本的相似度标签;A label module, configured to input a plurality of the input question texts into a question matching model to obtain a similarity label of the question text to be matched and each of the preset question texts;
    匹配模块,用于根据所述相似度标签,获取与所述待匹配问句文本相似度最高的目标问句文本。The matching module is configured to obtain the target question text with the highest similarity to the question text to be matched according to the similarity label.
  8. 如权利要求7所述的装置,其特征在于,还包括:The apparatus of claim 7, further comprising:
    产品标签获取模块,用于获取产品类别标签;Product label acquisition module for acquiring product category labels;
    相应的,所述组合模块,包括:Correspondingly, the combination module includes:
    第一组合模块,用于根据所述产品类别标签,确定目标问句文本子库,获取所述目标问句文本子库中的多个预置问句文本;A first combination module, configured to determine a target question text sub-library according to the product category label, and obtain multiple preset question texts in the target question text sub-library;
    第二组合模块,用于分别将所述待匹配问句文本和所述目标问句文本子库 中的多个预置问句文本进行组合,得到多个输入问句文本。A second combination module is used to combine the question text to be matched and the multiple preset question texts in the target question text sub-library to obtain multiple input question texts.
  9. 一种计算机设备,其特征在于,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至6任一项所述问句文本的匹配方法的步骤。A computer device, characterized in that it includes a memory, a processor, and a computer program stored in the memory and runable on the processor, and characterized in that, when the processor executes the computer program, The steps of the method for matching question text according to any one of claims 1 to 6.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述问句文本的匹配方法的步骤。A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, matching of question text according to any one of claims 1 to 6 is realized Method steps.
PCT/CN2018/125360 2018-12-29 2018-12-29 Question text matching method and apparatus, computer device and storage medium WO2020133360A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/125360 WO2020133360A1 (en) 2018-12-29 2018-12-29 Question text matching method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/125360 WO2020133360A1 (en) 2018-12-29 2018-12-29 Question text matching method and apparatus, computer device and storage medium

Publications (1)

Publication Number Publication Date
WO2020133360A1 true WO2020133360A1 (en) 2020-07-02

Family

ID=71126274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/125360 WO2020133360A1 (en) 2018-12-29 2018-12-29 Question text matching method and apparatus, computer device and storage medium

Country Status (1)

Country Link
WO (1) WO2020133360A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398968A (en) * 2022-01-06 2022-04-26 北京博瑞彤芸科技股份有限公司 Method and device for labeling similar customer-obtaining files based on file similarity

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150074112A1 (en) * 2012-05-14 2015-03-12 Huawei Technologies Co., Ltd. Multimedia Question Answering System and Method
CN106649868A (en) * 2016-12-30 2017-05-10 首都师范大学 Method and device for matching between questions and answers
CN106815311A (en) * 2016-12-21 2017-06-09 杭州朗和科技有限公司 A kind of problem matching process and device
CN107305578A (en) * 2016-04-25 2017-10-31 北京京东尚科信息技术有限公司 Human-machine intelligence's answering method and device
CN107688608A (en) * 2017-07-28 2018-02-13 合肥美的智能科技有限公司 Intelligent sound answering method, device, computer equipment and readable storage medium storing program for executing
CN108536708A (en) * 2017-03-03 2018-09-14 腾讯科技(深圳)有限公司 A kind of automatic question answering processing method and automatically request-answering system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150074112A1 (en) * 2012-05-14 2015-03-12 Huawei Technologies Co., Ltd. Multimedia Question Answering System and Method
CN107305578A (en) * 2016-04-25 2017-10-31 北京京东尚科信息技术有限公司 Human-machine intelligence's answering method and device
CN106815311A (en) * 2016-12-21 2017-06-09 杭州朗和科技有限公司 A kind of problem matching process and device
CN106649868A (en) * 2016-12-30 2017-05-10 首都师范大学 Method and device for matching between questions and answers
CN108536708A (en) * 2017-03-03 2018-09-14 腾讯科技(深圳)有限公司 A kind of automatic question answering processing method and automatically request-answering system
CN107688608A (en) * 2017-07-28 2018-02-13 合肥美的智能科技有限公司 Intelligent sound answering method, device, computer equipment and readable storage medium storing program for executing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398968A (en) * 2022-01-06 2022-04-26 北京博瑞彤芸科技股份有限公司 Method and device for labeling similar customer-obtaining files based on file similarity

Similar Documents

Publication Publication Date Title
CN112328742B (en) Training method and device based on artificial intelligence, computer equipment and storage medium
CN111581229B (en) SQL statement generation method and device, computer equipment and storage medium
CN110717034A (en) Ontology construction method and device
WO2022134421A1 (en) Multi-knowledge graph based intelligent reply method and apparatus, computer device and storage medium
CN103593412B (en) A kind of answer method and system based on tree structure problem
CN106095842B (en) Online course searching method and device
CN112163424A (en) Data labeling method, device, equipment and medium
WO2019232893A1 (en) Method and device for text emotion analysis, computer apparatus and storage medium
US10496751B2 (en) Avoiding sentiment model overfitting in a machine language model
CN110263326B (en) User behavior prediction method, prediction device, storage medium and terminal equipment
CN111182162A (en) Telephone quality inspection method, device, equipment and storage medium based on artificial intelligence
Noguti et al. Legal document classification: An application to law area prediction of petitions to public prosecution service
CN111382250A (en) Question text matching method and device, computer equipment and storage medium
CN111860669A (en) Training method and device of OCR recognition model and computer equipment
CN114691891A (en) Knowledge graph-oriented question-answer reasoning method
CN113656547A (en) Text matching method, device, equipment and storage medium
CN110569507B (en) Semantic recognition method, device, equipment and storage medium
CN115659226A (en) Data processing system for acquiring APP label
WO2020133360A1 (en) Question text matching method and apparatus, computer device and storage medium
CN111401038B (en) Text processing method, device, electronic equipment and storage medium
CN112966076A (en) Intelligent question and answer generating method and device, computer equipment and storage medium
CN111145053A (en) Enterprise law consultant management system and method based on artificial intelligence
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN112989007B (en) Knowledge base expansion method and device based on countermeasure network and computer equipment
CN108959327B (en) Service processing method, device and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18945052

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18945052

Country of ref document: EP

Kind code of ref document: A1