CN112101423A

CN112101423A - Multi-model fused FAQ matching method and device

Info

Publication number: CN112101423A
Application number: CN202010852824.7A
Authority: CN
Inventors: 田东坡; 巩乐; 翟永佳; 张铭
Original assignee: Shanghai Changtou Network Technology Co ltd
Current assignee: Shanghai Changtou Network Technology Co ltd
Priority date: 2020-08-22
Filing date: 2020-08-22
Publication date: 2020-12-18

Abstract

The invention is suitable for the technical field of natural language processing, and provides a method and a device for matching multi-model fused FAQ (resource oriented language), which are used for constructing the FAQ of finance education by sequentially obtaining a training text set of problems to be processed, extracting and summarizing knowledge points of finance education by combining Bert-encoder + DBSCAn clustering assistance, manually marking a small number of similar problems, generating a large number of similar problems according to the marked small number of similar problems, manually checking, constructing a sentence pair matching data set, training a sentence pair matching model by using an unsupervised model and a supervised deep learning model, receiving the problems input by a user after the sentence pair matching model training is finished, identifying the most matched problems with the input problems and outputting corresponding answers, adopting a plurality of model fusion, training or pre-training models, extracting texts, matching standard problems and replying corresponding answers, the problems that FAQ users are fussy to check and manual customer service efficiency is low are solved.

Description

Multi-model fused FAQ matching method and device

Technical Field

The invention belongs to the technical field of natural language processing, and particularly relates to a multi-model fusion FAQ matching method and device.

Background

The land conditions in the artificial intelligence field of the financial education industry are not many, and especially, the FAQ uplink industry has private linguistic data, which is inconvenient to open sources and limits the development.

In recent years, the NLP field develops rapidly, but the NLP field can not land on the finance education and has a good effect, and the forefront FAQ sentence is not good for the matching algorithm to land on the finance education field.

The intelligent FAQ has difficulty in obtaining a good effect under the conditions of more knowledge points and extremely similar problem semantics.

Disclosure of Invention

The invention provides a multi-model fused FAQ matching method and device, and aims to solve the problems in the prior art.

The invention is realized in this way, a method and a device for matching multi-model fused FAQ, comprising the following steps:

s1, obtaining a training text set of the problems to be processed, extracting and concluding the finance education knowledge points in combination with Bert-encoder + DBSCAN clustering assistance, thereby constructing a finance education FAQ and manually labeling a small number of similar problems;

s2, generating a large number of similar problems by using a similar problem generation module according to a small number of labeled similar problems, manually checking and constructing a sentence pair matching data set;

s3, constructing a pre-training model data set;

s4, training sentence pair matching models by using an unsupervised model and a supervised deep learning model;

and S5, after the sentence pair matching model is trained, receiving the questions input by the user, inputting the texts of the questions into the sentence pair matching model, identifying the questions most matched with the input questions and corresponding answers, outputting the questions and the corresponding answers, and replying the answers to the user.

Preferably, the training text set comprises text with a text length limit of 3-50, deleted emoticons, numbers and mail text.

Preferably, the unsupervised model comprises a WMD model and a SIF model;

the supervised model comprises a bert model, an albert model and a roberta model.

The invention also provides a multi-model fused FAQ matching device, which comprises:

the financial education corpus database is used for storing pre-input FAQ corpus data and generating a training text set;

the manual labeling module is used for manually labeling a small number of similar problems in the training text set by an operator;

the similar problem generation module is used for generating a large number of similar problems according to a small number of labeled similar problems, carrying out manual review and constructing a sentence pair matching data set;

and the NLU module is used for training the sentence pair matching model, matching the questions input by the user by using the trained model, finding out the best matching question, outputting the best matching question and the corresponding answer thereof, and replying the best matching question and the corresponding answer to the user.

Preferably, the NLU module includes an unsupervised model and a supervised model;

the unsupervised model comprises a WMD model and an SIF model;

Compared with the prior art, the invention has the beneficial effects that: the FAQ matching method and device with multi-model fusion sequentially obtain the training text set of the problems to be processed, extract and conclude the financing education knowledge points by combining the Bert-encoder + DBSCAn clustering assistance, thereby constructing the financing education FAQ, manually labeling a small amount of similar problems, then, according to a small amount of labeled similar problems, a large amount of similar problems are generated and are checked manually, a sentence pair matching data set is constructed, a sentence pair matching model is trained by using an unsupervised model and a supervised deep learning model, and finally, after the sentence pair matching model is trained, receiving a question input by a user, identifying a question most matched with the input question and corresponding answer output, adopting a plurality of model fusion, training or pre-training models, extracting a text, matching a standard question, and replying corresponding answers to solve the problems of complicated check of FAQ users and low manual customer service efficiency.

Drawings

Fig. 1 is an overall system schematic diagram of a multi-model fused FAQ matching apparatus according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, the present invention provides a technical solution: a multi-model fused FAQ matching method and a device thereof are provided, the multi-model fused FAQ matching method comprises the following steps:

s1, obtaining a training text set of the problems to be processed, extracting and summarizing the finance education knowledge points by combining Bert-encoder + DBSCAn clustering assistance, thereby constructing a finance education FAQ and manually labeling a small number of similar problems. Wherein the training text set comprises text with the text length limit of 3-50, deleted expressions, numbers and mails.

In the present embodiment, the problem is: what is the a strand and B strand difference?

Similar problems are artificially increased: how to distinguish between the A strand and the B strand? Is a good strand a good or a good strand B?

And S2, generating a large number of similar problems according to the marked small number of similar problems by using a similar problem generation module, manually checking and constructing a sentence pair matching data set.

In the present embodiment, the sentence pair matching data set is shown in table 1 below:

TABLE 1

Standard problem of	Problem of similarity
		What is the difference between strand a and strand B?	How to distinguish between the A strand and the B strand?
Do i want to know that strand A is the same as strand B?	Is a strand a or B?
		What are you good, ask what are the concept shares?	What is?
What is a concept strand?	What is the concept strand?
		What is the meaning of the concept strand?	What is the concept strand asked?

And S3, constructing a pre-training model data set according to the sentence pair matching data set.

And S4, training a sentence pair matching model by using an unsupervised model and a supervised deep learning model. The unsupervised model comprises a WMD model and an SIF model. The supervised model comprises a bert model, an albert model and a roberta model. And further according to the model prediction probability, a scheme of linear regression and XGBOST is fused on the line, wherein the linear regression can be trained in real time and update the model by 40%, the XGBOST uses the model trained in advance by 60%, so that 60% of the XGBOST ensures the stability of the model, and 40% of the linear regression ensures the flexibility of the model.

The invention discloses a multi-model fused FAQ matching device which comprises a manual labeling module, a similar problem production module and an NLU module.

And the financial education corpus database is used for storing pre-input FAQ corpus data and generating a training text set. The training text set includes text with text length limits between 3-50, deleted emoticons, numbers, and mail text.

The manual labeling module is used for allowing an operator to manually label a small number of similar problems in the training text set;

and the similar problem generation module comprises a similar problem generation model, and the similar problem generation model is used for generating a large number of similar problems according to a small number of labeled similar problems, carrying out manual review and constructing a sentence pair matching data set.

The NLU module is used for training sentence pair matching models, matching questions input by a user by using the trained models, finding out the best matching questions, outputting the best matching questions and corresponding answers thereof, and replying the best matching questions and corresponding answers to the user. The NLU module comprises an unsupervised model and a supervised model. The unsupervised model comprises a WMD model and an SIF model. Supervised models include bert, albert, roberta.

The modules are deployed in an online environment and are deployed on two GPU servers of the RTX 60024G. And the online environment optimizes service performance to solve the high concurrency problem, so that the response speed is controlled within 300 ms. The optimization process comprises the following steps: parallel computation in the bert preprocessing process, hot loading of trained models, parallel computation of multiple models and the like.

The FAQ matching method and device with multi-model fusion sequentially obtain the training text set of the problems to be processed, extract and conclude the financing education knowledge points by combining the Bert-encoder + DBSCAn clustering assistance, thereby constructing the financing education FAQ, manually labeling a small amount of similar problems, then, according to a small amount of labeled similar problems, a large amount of similar problems are generated and are checked manually, a sentence pair matching data set is constructed, a sentence pair matching model is trained by using an unsupervised model and a supervised deep learning model, and finally, after the sentence pair matching model is trained, receiving a question input by a user, identifying a question most matched with the input question and corresponding answer output, adopting a plurality of model fusion, training or pre-training models, extracting a text, matching a standard question, and replying corresponding answers to solve the problems of complicated check of FAQ users and low manual customer service efficiency.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A multi-model fused FAQ matching method is characterized in that: the method comprises the following steps:

s3, constructing a pre-training model data set;

2. The method of multi-model fused FAQ matching as claimed in claim 1, wherein: the training text set includes text length limits between 3-50, deleted expressions, numbers, and text of the email.

3. The method of multi-model fused FAQ matching as claimed in claim 1, wherein: the unsupervised model comprises a WMD model and an SIF model;

4. An apparatus for multi-model fused FAQ matching, comprising: the method comprises the following steps:

5. The apparatus for multiple model fused FAQ matching according to claim 4, wherein: the training text set includes text length limits between 3-50, deleted expressions, numbers, and text of the email.

6. The apparatus for multiple model fused FAQ matching according to claim 4, wherein: the NLU module comprises an unsupervised model and a supervised model;

the unsupervised model comprises a WMD model and an SIF model;