CN110377713A - A method of being shifted based on probability improves question answering system context - Google Patents
A method of being shifted based on probability improves question answering system context Download PDFInfo
- Publication number
- CN110377713A CN110377713A CN201910641706.9A CN201910641706A CN110377713A CN 110377713 A CN110377713 A CN 110377713A CN 201910641706 A CN201910641706 A CN 201910641706A CN 110377713 A CN110377713 A CN 110377713A
- Authority
- CN
- China
- Prior art keywords
- data
- probability
- question answering
- transfer matrix
- answering system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0281—Customer communication at a business location, e.g. providing product or service information, consulting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Finance (AREA)
- Mathematical Physics (AREA)
- Strategic Management (AREA)
- Artificial Intelligence (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Probability & Statistics with Applications (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Human Computer Interaction (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Machine Translation (AREA)
Abstract
A method of being shifted based on probability improves question answering system context, belongs to data processing method technical field, customer problem data is handled using sorting algorithm and probability transfer matrix A, (1) is to systemic presupposition labeled data;(2) customer problem is received, is pre-processed, processing data are obtained;(3) labeled data is trained by sorting algorithm, obtains intent classifier model;Then, labeled data is delivered to probability transfer matrix A to be trained, obtains the probability transfer matrix A of initialization;(4) processing data are predicted, obtains the distribution P of prediction label;The present invention provides a kind of intension recognizing methods of combination context, by this method, may not need and prepare the labeled data with context, save human cost;, can be with the variation for using the time by the ability of self-teaching of probability transfer matrix A, the precision of whole system is higher and higher.
Description
Technical field
The invention belongs to data processing method technical fields, in particular to a kind of to be improved in question answering system based on probability transfer
Method hereafter.
Background technique
In electric business field, when user (i.e. buyer) does shopping on line, consulting behavior can be generated to customer service.Much asking automatically
It answers in system, it is necessary first to which the thing done is exactly to carry out identification classification to the intention of buyer.Common practice is that intention assessment is made
Classify for short text, but this way has isolated influence of the context (dialog history) of user to intent classifier.For example, buyer
Say " 160 ".What so buyer was both possible to expression is confirmation price, it is also possible to be to provide oneself height or weight.
This just needs to confirm specific intention according to above.It goes that context is combined to do intention assessment there is also some schemes at present, than
Such as, continuously 5 problems of input buyer do classification with the attention of layering as input.Or label above is made
Current sentence is brought into for a kind of feature to calculate together.But since this kind of way needs to use the continuous chat record of user, and
And it needs to mark personnel and requires to pay close attention to its context when marking each data, therefore can be brought additionally in mark
Workload, also easily cause marking error.In addition, the another kind of defect of the program is that sample is easily unbalanced, people is needed
Work supplementary data.Another kind of scheme is that is, artificial rule removing to provide out context and occurring, then aobvious by the way of rule
And be clear to, this scheme, which just compares, to be taken time and effort, while being also difficult to ensure and being enumerated whole possibilities.
Summary of the invention
It is an object of the invention to overcome defect and deficiency mentioned above, and a kind of improved based on probability transfer is provided and is asked
The method for answering system context.
In order to solve the above-mentioned technical problem, it adopts the following technical scheme that
A method of being shifted based on probability improves question answering system context, passes through sorting algorithm and probability transfer matrix A
The improvement for being implemented in combination with question answering system, specific step content is as follows:
(1) to system prediction volume of data, and manually it is demarcated, to obtain nominal data;
(2) customer problem is received, and customer problem is pre-processed, processing data is obtained, is handled convenient for follow-up link;
(3) labeled data is handled, process content is as follows:
The labeled data of acquisition is trained by (3-1) by TEXT CNN model, obtains intent classifier model;
(3-2) will be trained in labeled data input probability transfer matrix, obtain the probability transfer matrix A of initialization;
(4) the processing data in step (2) are predicted by sorting algorithm, obtains the distribution P of prediction label;
(5) series of computation is carried out to the distribution P of prediction label, filters out the information Q of missingi, i=1,2,3...n;
(6) it for the processing data in step (2) for a complete conversation procedure, is obtained in conjunction with step (3-2) general
Rate shift-matrix A is handled, and accurately missing sentence and corresponding context are obtained;
Further, the sorting algorithm is TEXT CNN, LSTM, BERT and SVM, specially with the probability of prediction result
The applicable this system of the sorting algorithm of distribution.
Further, the series of algorithms in the step (5) specifically:
(5-1) calculates the average M of distribution P by average formula,
Wherein, P1、P2...PiIndicate that specific numerical value, i state the number of this group of data;
(5-2) again by formula of variance calculate, screening provide it is standby,
Wherein, the number of i presentation data, M are average, s2Variance is indicated, as variance s2Numerical value is smaller, for handling number
Intention expressed by certain some data is just more difficult to judge in.
Further, the particular content that join probability shift-matrix A calculates in the step (6) are as follows:
(6-1) combination processing data, we have the dialogue of n item, for being judged as the sentence Q of missing information in step (4)i,
There is a distribution Pi, in conjunction with the random probability transfer matrix A that system itself initializes, construct objective function: f=
||Qi-1-AQi| |, wherein the n item processing data are the data set with full dialog scene, make probability transfer matrix A
Learn the prediction result of currently processed data, so that it is determined that missing sentence and corresponding context.
By using above scheme, have the following beneficial effects:
(1) the present invention provides a kind of intension recognizing methods of combination context may not need preparation band by this method
The labeled data of context, saves human cost.
(2) by the ability of self-teaching of probability transfer matrix A, can with the variation for using the time, whole system
Precision is higher and higher.
Detailed description of the invention
Fig. 1 is the flow diagram of whole system.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description.
A method of being shifted based on probability improves question answering system context, passes through sorting algorithm and probability transfer matrix A
The improvement for being implemented in combination with question answering system, specific step content is as follows:
(4) to system prediction volume of data, and manually it is demarcated, to obtain nominal data;
(5) customer problem is received, and customer problem is pre-processed, processing data is obtained, is handled convenient for follow-up link;
(6) labeled data is handled, process content is as follows:
The labeled data of acquisition is trained by (3-1) by TEXT CNN model, obtains intent classifier model;
(3-2) will be trained in labeled data input probability transfer matrix, obtain the probability transfer matrix A of initialization;
(4) the processing data in step (2) are predicted by sorting algorithm, obtains the distribution P of prediction label;
(5) series of computation is carried out to the distribution P of prediction label, filters out the information Q of missingi, i=1,2,3...n;Its
Series of algorithms specifically:
(5-1) calculates the average M of distribution P by average formula,
Wherein, P1、P2...PiIndicate that specific numerical value, i state the number of this group of data;
(5-2) again by formula of variance calculate, screening provide it is standby,
Wherein, the number of i presentation data, M are average, s2Variance is indicated, as variance s2Numerical value is smaller, for handling number
Intention expressed by certain some data is just more difficult to judge in.
(6) it for the processing data in step (2) for a complete conversation procedure, is obtained in conjunction with step (3-2) general
Rate shift-matrix A is handled, and accurately missing sentence and corresponding context, join probability transfer matrix are obtained
The particular content that A is calculated is as follows:
(6-1) combination processing data, we have the dialogue of n item, for being judged as the sentence Q of missing information in step (4)i,
There is a distribution Pi, in conjunction with the random probability transfer matrix A that system itself initializes, construct objective function: f=
||Qi-1-AQi| |, so that probability transfer matrix A is learnt the prediction result of currently processed data, so that it is determined that missing sentence and phase
Corresponding context.
Preferably, the sorting algorithm is TEXT CNN, LSTM, BERT and SVM, specially there is the general of prediction result
Applicable this system of sorting algorithm of rate distribution.
Preferably, the processing data of n item described in the step (6-1) are the data set with full dialog scene.
The working principle of this system: as shown in Figure 1, firstly, TEXT CNN is trained labeled data, acquisition is intended to divide
Class model, then, the label that TEXT CNN goes prediction to be intended to for receiving customer problem by trained TEXT CNN, if
The probability distribution of the label of prediction is smooth, then obtains the label of the prediction, if the probability distribution of the label of prediction is unsmooth,
Probability transfer matrix A training by the data not marked (the customer problem data in i.e. complete conversation procedure) by initialization is learned
It practises, is directed to the label probability distribution P that each customer problem obtains in conjunction with TEXT CNN, i.e., by the label probability distribution of acquisition
P is multiplied to obtain new probability distribution with probability transfer matrix A, and intention label corresponding to new ground probability distribution top1 is defeated
Out, the process of whole system is completed.
The present invention is illustrated according to embodiment, under the premise of not departing from present principles, if the present apparatus can also make
Dry modification and improvement.It should be pointed out that it is all using modes technical solutions obtained such as equivalent substitution or equivalent transformations, all fall within this
In the protection scope of invention.
Claims (7)
1. a kind of shift the method for improving question answering system context based on probability, using sorting algorithm and A pairs of probability transfer matrix
It is handled in customer problem data, it is characterised in that: specific processing step content is as follows:
(1) to systemic presupposition labeled data;
(2) customer problem is received, is pre-processed, processing data are obtained;
(3) labeled data is trained by sorting algorithm, obtains intent classifier model;Then, labeled data is delivered to
Probability transfer matrix A is trained, and obtains the probability transfer matrix A of initialization;
(4) processing data are predicted, obtains the distribution P of prediction label;
(5) a series of calculating is carried out by the distribution P of prediction label, filters out the information Q of missingi, i=1,2,3...n;
(6) for being the data acquisition system in a complete conversation procedure in processing data, we combine the probability transfer of initialization
Matrix A is calculated, and accurately missing sentence and corresponding context are obtained.
2. a kind of method for shifting improvement question answering system context based on probability according to claim 1, it is characterised in that:
The labeled data is a series of problems artificially demarcated.
3. a kind of method for shifting improvement question answering system context based on probability according to claim 1, it is characterised in that:
The processing data are to have the process of a complete session.
4. a kind of method for shifting improvement question answering system context based on probability according to claim 1, it is characterised in that:
The sorting algorithm is to have TEXT CNN, LSTM, BERT and SVM, and the specially classification of the probability distribution with prediction result is calculated
The applicable this system of method.
5. a kind of method for shifting improvement question answering system context based on probability according to claim 1, it is characterised in that:
Series of algorithms in the step (5) specifically:
(5-1) calculates the average M of distribution P by average formula,
Wherein, P1、P2...PiIndicate that specific numerical value, i state the number of this group of data;
(5-2) again by formula of variance calculate, screening provide it is standby,
Wherein, the number of i presentation data, M are average, s2Variance is indicated, as variance s2Numerical value is smaller, in processing data
Intention expressed by certain some data is just more difficult to judge.
6. a kind of method for shifting improvement question answering system context based on probability according to claim 1, it is characterised in that:
The particular content that join probability shift-matrix A calculates in the step (6) are as follows:
(6-1) combination processing data, we have the dialogue of n item, for being judged as the sentence Q of missing information in step (4)i, there is one
It is distributed Pi, in conjunction with the random probability transfer matrix A that system itself initializes, construct an objective function:
F=| | Qi-1-AQi| |, make probability transfer matrix A learn currently processed data prediction result so that it is determined that missing sentence with
And corresponding context.
7. a kind of method for shifting improvement question answering system context based on probability according to claim 6, it is characterised in that:
The n item processing data combined in the step (6-1) are the data set with full dialog scene.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910641706.9A CN110377713B (en) | 2019-07-16 | 2019-07-16 | Method for improving context of question-answering system based on probability transition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910641706.9A CN110377713B (en) | 2019-07-16 | 2019-07-16 | Method for improving context of question-answering system based on probability transition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110377713A true CN110377713A (en) | 2019-10-25 |
CN110377713B CN110377713B (en) | 2023-09-15 |
Family
ID=68253502
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910641706.9A Active CN110377713B (en) | 2019-07-16 | 2019-07-16 | Method for improving context of question-answering system based on probability transition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110377713B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111694952A (en) * | 2020-04-16 | 2020-09-22 | 国家计算机网络与信息安全管理中心 | Big data analysis model system based on microblog and implementation method thereof |
CN115018656A (en) * | 2022-08-08 | 2022-09-06 | 太平金融科技服务(上海)有限公司深圳分公司 | Risk identification method, and training method, device and equipment of risk identification model |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649694A (en) * | 2016-12-19 | 2017-05-10 | 北京云知声信息技术有限公司 | Method and device for identifying user's intention in voice interaction |
CN107679234A (en) * | 2017-10-24 | 2018-02-09 | 上海携程国际旅行社有限公司 | Customer service information providing method, device, electronic equipment, storage medium |
US20180067981A1 (en) * | 2016-09-06 | 2018-03-08 | International Business Machines Corporation | Automatic Detection and Cleansing of Erroneous Concepts in an Aggregated Knowledge Base |
CN108897896A (en) * | 2018-07-13 | 2018-11-27 | 深圳追科技有限公司 | Keyword abstraction method based on intensified learning |
US20200334553A1 (en) * | 2019-04-22 | 2020-10-22 | Electronics And Telecommunications Research Institute | Apparatus and method for predicting error of annotation |
WO2022095573A1 (en) * | 2020-11-09 | 2022-05-12 | 西安交通大学 | Community question answering website answer sorting method and system combined with active learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108829662A (en) * | 2018-05-10 | 2018-11-16 | 浙江大学 | A kind of conversation activity recognition methods and system based on condition random field structuring attention network |
-
2019
- 2019-07-16 CN CN201910641706.9A patent/CN110377713B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180067981A1 (en) * | 2016-09-06 | 2018-03-08 | International Business Machines Corporation | Automatic Detection and Cleansing of Erroneous Concepts in an Aggregated Knowledge Base |
CN106649694A (en) * | 2016-12-19 | 2017-05-10 | 北京云知声信息技术有限公司 | Method and device for identifying user's intention in voice interaction |
CN107679234A (en) * | 2017-10-24 | 2018-02-09 | 上海携程国际旅行社有限公司 | Customer service information providing method, device, electronic equipment, storage medium |
CN108897896A (en) * | 2018-07-13 | 2018-11-27 | 深圳追科技有限公司 | Keyword abstraction method based on intensified learning |
US20200334553A1 (en) * | 2019-04-22 | 2020-10-22 | Electronics And Telecommunications Research Institute | Apparatus and method for predicting error of annotation |
WO2022095573A1 (en) * | 2020-11-09 | 2022-05-12 | 西安交通大学 | Community question answering website answer sorting method and system combined with active learning |
Non-Patent Citations (2)
Title |
---|
周小强等: "交互式问答的关系结构体系及标注", 《中文信息学报》 * |
周小强等: "交互式问答的关系结构体系及标注", 《中文信息学报》, no. 05, 15 May 2018 (2018-05-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111694952A (en) * | 2020-04-16 | 2020-09-22 | 国家计算机网络与信息安全管理中心 | Big data analysis model system based on microblog and implementation method thereof |
CN115018656A (en) * | 2022-08-08 | 2022-09-06 | 太平金融科技服务(上海)有限公司深圳分公司 | Risk identification method, and training method, device and equipment of risk identification model |
Also Published As
Publication number | Publication date |
---|---|
CN110377713B (en) | 2023-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105677831B (en) | Method and device for determining recommended merchants | |
CN107742107A (en) | Facial image sorting technique, device and server | |
CN109740155A (en) | A kind of customer service system artificial intelligence quality inspection rule self concludes the method and system of model | |
CN110377713A (en) | A method of being shifted based on probability improves question answering system context | |
CN107306306A (en) | Communicating number processing method and processing device | |
CN106846082A (en) | Tourism cold start-up consumer products commending system and method based on hardware information | |
CN109064023A (en) | A kind of method and apparatus of manpower potency management system | |
CN113780342A (en) | Intelligent detection method and device based on self-supervision pre-training and robot | |
CN114882228B (en) | Fitness place layout optimization method based on knowledge distillation | |
CN109272312A (en) | Method and apparatus for transaction risk detecting real-time | |
CN108876053A (en) | Travelling route Intelligent planning method and system | |
CN115994767A (en) | Product supply chain management system based on electronic commerce | |
CN106897282A (en) | The sorting technique and equipment of a kind of customer group | |
CN107145758A (en) | A kind of outgoing method of servicing and device of accompanying and attending to based on domestic robot | |
CN110704803A (en) | Target object evaluation value calculation method and device, storage medium and electronic device | |
CN111368131B (en) | User relationship identification method and device, electronic equipment and storage medium | |
CN116151840B (en) | User service data intelligent management system and method based on big data | |
CN115953080B (en) | Engineer service level determination method, apparatus and storage medium | |
CN109214400A (en) | Classifier training method, apparatus, equipment and computer readable storage medium | |
CN109858900A (en) | A kind of payment information method for pushing, device and terminal device | |
CN116757855A (en) | Intelligent insurance service method, device, equipment and storage medium | |
CN116859842A (en) | Chemical production line safety evaluation system | |
CN110532394A (en) | The processing method and system of Order Remarks text | |
CN107169854A (en) | A kind of method and device of data processing | |
CN116089578A (en) | Automatic labeling method, system and storage medium for intelligent question-answering data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230331 Address after: 104058, No. 2-10, No. 311 Huangpu Avenue Middle, Tianhe District, Guangzhou City, Guangdong Province, 510000 Applicant after: Guangzhou Tanyu Technology Co.,Ltd. Address before: 601-5, 1382 Wenyi West Road, Cangqian street, Yuhang District, Hangzhou City, Zhejiang Province, 310012 Applicant before: Hangzhou Weier Network Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |