CN112101423A - Multi-model fused FAQ matching method and device - Google Patents
Multi-model fused FAQ matching method and device Download PDFInfo
- Publication number
- CN112101423A CN112101423A CN202010852824.7A CN202010852824A CN112101423A CN 112101423 A CN112101423 A CN 112101423A CN 202010852824 A CN202010852824 A CN 202010852824A CN 112101423 A CN112101423 A CN 112101423A
- Authority
- CN
- China
- Prior art keywords
- model
- matching
- faq
- training
- sentence pair
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000013136 deep learning model Methods 0.000 claims abstract description 6
- 238000002372 labelling Methods 0.000 claims description 11
- 230000014509 gene expression Effects 0.000 claims description 3
- 238000012552 review Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 abstract description 6
- 238000003058 natural language processing Methods 0.000 abstract description 4
- 238000012417 linear regression Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention is suitable for the technical field of natural language processing, and provides a method and a device for matching multi-model fused FAQ (resource oriented language), which are used for constructing the FAQ of finance education by sequentially obtaining a training text set of problems to be processed, extracting and summarizing knowledge points of finance education by combining Bert-encoder + DBSCAn clustering assistance, manually marking a small number of similar problems, generating a large number of similar problems according to the marked small number of similar problems, manually checking, constructing a sentence pair matching data set, training a sentence pair matching model by using an unsupervised model and a supervised deep learning model, receiving the problems input by a user after the sentence pair matching model training is finished, identifying the most matched problems with the input problems and outputting corresponding answers, adopting a plurality of model fusion, training or pre-training models, extracting texts, matching standard problems and replying corresponding answers, the problems that FAQ users are fussy to check and manual customer service efficiency is low are solved.
Description
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a multi-model fusion FAQ matching method and device.
Background
The land conditions in the artificial intelligence field of the financial education industry are not many, and especially, the FAQ uplink industry has private linguistic data, which is inconvenient to open sources and limits the development.
In recent years, the NLP field develops rapidly, but the NLP field can not land on the finance education and has a good effect, and the forefront FAQ sentence is not good for the matching algorithm to land on the finance education field.
The intelligent FAQ has difficulty in obtaining a good effect under the conditions of more knowledge points and extremely similar problem semantics.
Disclosure of Invention
The invention provides a multi-model fused FAQ matching method and device, and aims to solve the problems in the prior art.
The invention is realized in this way, a method and a device for matching multi-model fused FAQ, comprising the following steps:
s1, obtaining a training text set of the problems to be processed, extracting and concluding the finance education knowledge points in combination with Bert-encoder + DBSCAN clustering assistance, thereby constructing a finance education FAQ and manually labeling a small number of similar problems;
s2, generating a large number of similar problems by using a similar problem generation module according to a small number of labeled similar problems, manually checking and constructing a sentence pair matching data set;
s3, constructing a pre-training model data set;
s4, training sentence pair matching models by using an unsupervised model and a supervised deep learning model;
and S5, after the sentence pair matching model is trained, receiving the questions input by the user, inputting the texts of the questions into the sentence pair matching model, identifying the questions most matched with the input questions and corresponding answers, outputting the questions and the corresponding answers, and replying the answers to the user.
Preferably, the training text set comprises text with a text length limit of 3-50, deleted emoticons, numbers and mail text.
Preferably, the unsupervised model comprises a WMD model and a SIF model;
the supervised model comprises a bert model, an albert model and a roberta model.
The invention also provides a multi-model fused FAQ matching device, which comprises:
the financial education corpus database is used for storing pre-input FAQ corpus data and generating a training text set;
the manual labeling module is used for manually labeling a small number of similar problems in the training text set by an operator;
the similar problem generation module is used for generating a large number of similar problems according to a small number of labeled similar problems, carrying out manual review and constructing a sentence pair matching data set;
and the NLU module is used for training the sentence pair matching model, matching the questions input by the user by using the trained model, finding out the best matching question, outputting the best matching question and the corresponding answer thereof, and replying the best matching question and the corresponding answer to the user.
Preferably, the training text set comprises text with a text length limit of 3-50, deleted emoticons, numbers and mail text.
Preferably, the NLU module includes an unsupervised model and a supervised model;
the unsupervised model comprises a WMD model and an SIF model;
the supervised model comprises a bert model, an albert model and a roberta model.
Compared with the prior art, the invention has the beneficial effects that: the FAQ matching method and device with multi-model fusion sequentially obtain the training text set of the problems to be processed, extract and conclude the financing education knowledge points by combining the Bert-encoder + DBSCAn clustering assistance, thereby constructing the financing education FAQ, manually labeling a small amount of similar problems, then, according to a small amount of labeled similar problems, a large amount of similar problems are generated and are checked manually, a sentence pair matching data set is constructed, a sentence pair matching model is trained by using an unsupervised model and a supervised deep learning model, and finally, after the sentence pair matching model is trained, receiving a question input by a user, identifying a question most matched with the input question and corresponding answer output, adopting a plurality of model fusion, training or pre-training models, extracting a text, matching a standard question, and replying corresponding answers to solve the problems of complicated check of FAQ users and low manual customer service efficiency.
Drawings
Fig. 1 is an overall system schematic diagram of a multi-model fused FAQ matching apparatus according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the present invention provides a technical solution: a multi-model fused FAQ matching method and a device thereof are provided, the multi-model fused FAQ matching method comprises the following steps:
s1, obtaining a training text set of the problems to be processed, extracting and summarizing the finance education knowledge points by combining Bert-encoder + DBSCAn clustering assistance, thereby constructing a finance education FAQ and manually labeling a small number of similar problems. Wherein the training text set comprises text with the text length limit of 3-50, deleted expressions, numbers and mails.
In the present embodiment, the problem is: what is the a strand and B strand difference?
Similar problems are artificially increased: how to distinguish between the A strand and the B strand? Is a good strand a good or a good strand B?
And S2, generating a large number of similar problems according to the marked small number of similar problems by using a similar problem generation module, manually checking and constructing a sentence pair matching data set.
In the present embodiment, the sentence pair matching data set is shown in table 1 below:
TABLE 1
Standard problem of | Problem of similarity |
What is the difference between strand a and strand B? | How to distinguish between the A strand and the B strand? |
Do i want to know that strand A is the same as strand B? | Is a strand a or B? |
What are you good, ask what are the concept shares? | What is? |
What is a concept strand? | What is the concept strand? |
What is the meaning of the concept strand? | What is the concept strand asked? |
And S3, constructing a pre-training model data set according to the sentence pair matching data set.
And S4, training a sentence pair matching model by using an unsupervised model and a supervised deep learning model. The unsupervised model comprises a WMD model and an SIF model. The supervised model comprises a bert model, an albert model and a roberta model. And further according to the model prediction probability, a scheme of linear regression and XGBOST is fused on the line, wherein the linear regression can be trained in real time and update the model by 40%, the XGBOST uses the model trained in advance by 60%, so that 60% of the XGBOST ensures the stability of the model, and 40% of the linear regression ensures the flexibility of the model.
And S5, after the sentence pair matching model is trained, receiving the questions input by the user, inputting the texts of the questions into the sentence pair matching model, identifying the questions most matched with the input questions and corresponding answers, outputting the questions and the corresponding answers, and replying the answers to the user.
The invention discloses a multi-model fused FAQ matching device which comprises a manual labeling module, a similar problem production module and an NLU module.
And the financial education corpus database is used for storing pre-input FAQ corpus data and generating a training text set. The training text set includes text with text length limits between 3-50, deleted emoticons, numbers, and mail text.
The manual labeling module is used for allowing an operator to manually label a small number of similar problems in the training text set;
and the similar problem generation module comprises a similar problem generation model, and the similar problem generation model is used for generating a large number of similar problems according to a small number of labeled similar problems, carrying out manual review and constructing a sentence pair matching data set.
The NLU module is used for training sentence pair matching models, matching questions input by a user by using the trained models, finding out the best matching questions, outputting the best matching questions and corresponding answers thereof, and replying the best matching questions and corresponding answers to the user. The NLU module comprises an unsupervised model and a supervised model. The unsupervised model comprises a WMD model and an SIF model. Supervised models include bert, albert, roberta.
The modules are deployed in an online environment and are deployed on two GPU servers of the RTX 60024G. And the online environment optimizes service performance to solve the high concurrency problem, so that the response speed is controlled within 300 ms. The optimization process comprises the following steps: parallel computation in the bert preprocessing process, hot loading of trained models, parallel computation of multiple models and the like.
The FAQ matching method and device with multi-model fusion sequentially obtain the training text set of the problems to be processed, extract and conclude the financing education knowledge points by combining the Bert-encoder + DBSCAn clustering assistance, thereby constructing the financing education FAQ, manually labeling a small amount of similar problems, then, according to a small amount of labeled similar problems, a large amount of similar problems are generated and are checked manually, a sentence pair matching data set is constructed, a sentence pair matching model is trained by using an unsupervised model and a supervised deep learning model, and finally, after the sentence pair matching model is trained, receiving a question input by a user, identifying a question most matched with the input question and corresponding answer output, adopting a plurality of model fusion, training or pre-training models, extracting a text, matching a standard question, and replying corresponding answers to solve the problems of complicated check of FAQ users and low manual customer service efficiency.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (6)
1. A multi-model fused FAQ matching method is characterized in that: the method comprises the following steps:
s1, obtaining a training text set of the problems to be processed, extracting and concluding the finance education knowledge points in combination with Bert-encoder + DBSCAN clustering assistance, thereby constructing a finance education FAQ and manually labeling a small number of similar problems;
s2, generating a large number of similar problems by using a similar problem generation module according to a small number of labeled similar problems, manually checking and constructing a sentence pair matching data set;
s3, constructing a pre-training model data set;
s4, training sentence pair matching models by using an unsupervised model and a supervised deep learning model;
and S5, after the sentence pair matching model is trained, receiving the questions input by the user, inputting the texts of the questions into the sentence pair matching model, identifying the questions most matched with the input questions and corresponding answers, outputting the questions and the corresponding answers, and replying the answers to the user.
2. The method of multi-model fused FAQ matching as claimed in claim 1, wherein: the training text set includes text length limits between 3-50, deleted expressions, numbers, and text of the email.
3. The method of multi-model fused FAQ matching as claimed in claim 1, wherein: the unsupervised model comprises a WMD model and an SIF model;
the supervised model comprises a bert model, an albert model and a roberta model.
4. An apparatus for multi-model fused FAQ matching, comprising: the method comprises the following steps:
the financial education corpus database is used for storing pre-input FAQ corpus data and generating a training text set;
the manual labeling module is used for manually labeling a small number of similar problems in the training text set by an operator;
the similar problem generation module is used for generating a large number of similar problems according to a small number of labeled similar problems, carrying out manual review and constructing a sentence pair matching data set;
and the NLU module is used for training the sentence pair matching model, matching the questions input by the user by using the trained model, finding out the best matching question, outputting the best matching question and the corresponding answer thereof, and replying the best matching question and the corresponding answer to the user.
5. The apparatus for multiple model fused FAQ matching according to claim 4, wherein: the training text set includes text length limits between 3-50, deleted expressions, numbers, and text of the email.
6. The apparatus for multiple model fused FAQ matching according to claim 4, wherein: the NLU module comprises an unsupervised model and a supervised model;
the unsupervised model comprises a WMD model and an SIF model;
the supervised model comprises a bert model, an albert model and a roberta model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010852824.7A CN112101423A (en) | 2020-08-22 | 2020-08-22 | Multi-model fused FAQ matching method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010852824.7A CN112101423A (en) | 2020-08-22 | 2020-08-22 | Multi-model fused FAQ matching method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112101423A true CN112101423A (en) | 2020-12-18 |
Family
ID=73754202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010852824.7A Pending CN112101423A (en) | 2020-08-22 | 2020-08-22 | Multi-model fused FAQ matching method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112101423A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113505207A (en) * | 2021-07-02 | 2021-10-15 | 中科苏州智能计算技术研究院 | Machine reading understanding method and system for financial public opinion research and report |
CN114117022A (en) * | 2022-01-26 | 2022-03-01 | 杭州远传新业科技有限公司 | FAQ similarity problem generation method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347835A (en) * | 2019-07-11 | 2019-10-18 | 招商局金融科技有限公司 | Text Clustering Method, electronic device and storage medium |
CN110727779A (en) * | 2019-10-16 | 2020-01-24 | 信雅达系统工程股份有限公司 | Question-answering method and system based on multi-model fusion |
CN111191442A (en) * | 2019-12-30 | 2020-05-22 | 杭州远传新业科技有限公司 | Similar problem generation method, device, equipment and medium |
-
2020
- 2020-08-22 CN CN202010852824.7A patent/CN112101423A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347835A (en) * | 2019-07-11 | 2019-10-18 | 招商局金融科技有限公司 | Text Clustering Method, electronic device and storage medium |
CN110727779A (en) * | 2019-10-16 | 2020-01-24 | 信雅达系统工程股份有限公司 | Question-answering method and system based on multi-model fusion |
CN111191442A (en) * | 2019-12-30 | 2020-05-22 | 杭州远传新业科技有限公司 | Similar problem generation method, device, equipment and medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113505207A (en) * | 2021-07-02 | 2021-10-15 | 中科苏州智能计算技术研究院 | Machine reading understanding method and system for financial public opinion research and report |
CN113505207B (en) * | 2021-07-02 | 2024-02-20 | 中科苏州智能计算技术研究院 | Machine reading understanding method and system for financial public opinion research report |
CN114117022A (en) * | 2022-01-26 | 2022-03-01 | 杭州远传新业科技有限公司 | FAQ similarity problem generation method and system |
CN114117022B (en) * | 2022-01-26 | 2022-05-06 | 杭州远传新业科技有限公司 | FAQ similarity problem generation method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12010073B2 (en) | Systems and processes for operating and training a text-based chatbot | |
CN110555095B (en) | Man-machine conversation method and device | |
CN111708869B (en) | Processing method and device for man-machine conversation | |
CN117009490A (en) | Training method and device for generating large language model based on knowledge base feedback | |
US11487952B2 (en) | Method and terminal for generating a text based on self-encoding neural network, and medium | |
CN110175229B (en) | Method and system for on-line training based on natural language | |
CN110781681B (en) | Automatic first-class mathematic application problem solving method and system based on translation model | |
CN112101423A (en) | Multi-model fused FAQ matching method and device | |
CN113434688B (en) | Data processing method and device for public opinion classification model training | |
CN114676255A (en) | Text processing method, device, equipment, storage medium and computer program product | |
CN109508367A (en) | Automatically extract the method, on-line intelligence customer service system and electronic equipment of question and answer corpus | |
CN113326367A (en) | Task type dialogue method and system based on end-to-end text generation | |
CN112287085A (en) | Semantic matching method, system, device and storage medium | |
CN111523328A (en) | Intelligent customer service semantic processing method | |
CN114330318A (en) | Method and device for recognizing Chinese fine-grained entities in financial field | |
CN113051388A (en) | Intelligent question and answer method and device, electronic equipment and storage medium | |
Kaviya et al. | Artificial intelligence based farmer assistant chatbot | |
Sawant et al. | Analytical and Sentiment based text generative chatbot | |
CN111488448A (en) | Method and device for generating machine reading marking data | |
CN114579706B (en) | Automatic subjective question review method based on BERT neural network and multi-task learning | |
CN116362331A (en) | Knowledge point filling method based on man-machine cooperation construction knowledge graph | |
CN114610743A (en) | Structured query language statement processing method, system, device, and medium | |
CN113886521A (en) | Text relation automatic labeling method based on similar vocabulary | |
Liang et al. | Intelligent chat robot in digital campus based on deep learning | |
Buyrukoğlu et al. | A Novel Semi-Automated Chatbot Model: Providing Consistent Response of Students’ Email in Higher Education based on Case-Based Reasoning and Latent Semantic Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |