CN110377713A - A method of being shifted based on probability improves question answering system context - Google Patents

A method of being shifted based on probability improves question answering system context Download PDF

Info

Publication number
CN110377713A
CN110377713A CN201910641706.9A CN201910641706A CN110377713A CN 110377713 A CN110377713 A CN 110377713A CN 201910641706 A CN201910641706 A CN 201910641706A CN 110377713 A CN110377713 A CN 110377713A
Authority
CN
China
Prior art keywords
data
probability
question answering
transfer matrix
answering system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910641706.9A
Other languages
Chinese (zh)
Other versions
CN110377713B (en
Inventor
谢铁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Tanyu Technology Co ltd
Original Assignee
Hangzhou Weier Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Weier Network Technology Co Ltd filed Critical Hangzhou Weier Network Technology Co Ltd
Priority to CN201910641706.9A priority Critical patent/CN110377713B/en
Publication of CN110377713A publication Critical patent/CN110377713A/en
Application granted granted Critical
Publication of CN110377713B publication Critical patent/CN110377713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0281Customer communication at a business location, e.g. providing product or service information, consulting

Abstract

A method of being shifted based on probability improves question answering system context, belongs to data processing method technical field, customer problem data is handled using sorting algorithm and probability transfer matrix A, (1) is to systemic presupposition labeled data;(2) customer problem is received, is pre-processed, processing data are obtained;(3) labeled data is trained by sorting algorithm, obtains intent classifier model;Then, labeled data is delivered to probability transfer matrix A to be trained, obtains the probability transfer matrix A of initialization;(4) processing data are predicted, obtains the distribution P of prediction label;The present invention provides a kind of intension recognizing methods of combination context, by this method, may not need and prepare the labeled data with context, save human cost;, can be with the variation for using the time by the ability of self-teaching of probability transfer matrix A, the precision of whole system is higher and higher.

Description

A method of being shifted based on probability improves question answering system context
Technical field
The invention belongs to data processing method technical fields, in particular to a kind of to be improved in question answering system based on probability transfer Method hereafter.
Background technique
In electric business field, when user (i.e. buyer) does shopping on line, consulting behavior can be generated to customer service.Much asking automatically It answers in system, it is necessary first to which the thing done is exactly to carry out identification classification to the intention of buyer.Common practice is that intention assessment is made Classify for short text, but this way has isolated influence of the context (dialog history) of user to intent classifier.For example, buyer Say " 160 ".What so buyer was both possible to expression is confirmation price, it is also possible to be to provide oneself height or weight. This just needs to confirm specific intention according to above.It goes that context is combined to do intention assessment there is also some schemes at present, than Such as, continuously 5 problems of input buyer do classification with the attention of layering as input.Or label above is made Current sentence is brought into for a kind of feature to calculate together.But since this kind of way needs to use the continuous chat record of user, and And it needs to mark personnel and requires to pay close attention to its context when marking each data, therefore can be brought additionally in mark Workload, also easily cause marking error.In addition, the another kind of defect of the program is that sample is easily unbalanced, people is needed Work supplementary data.Another kind of scheme is that is, artificial rule removing to provide out context and occurring, then aobvious by the way of rule And be clear to, this scheme, which just compares, to be taken time and effort, while being also difficult to ensure and being enumerated whole possibilities.
Summary of the invention
It is an object of the invention to overcome defect and deficiency mentioned above, and a kind of improved based on probability transfer is provided and is asked The method for answering system context.
In order to solve the above-mentioned technical problem, it adopts the following technical scheme that
A method of being shifted based on probability improves question answering system context, passes through sorting algorithm and probability transfer matrix A The improvement for being implemented in combination with question answering system, specific step content is as follows:
(1) to system prediction volume of data, and manually it is demarcated, to obtain nominal data;
(2) customer problem is received, and customer problem is pre-processed, processing data is obtained, is handled convenient for follow-up link;
(3) labeled data is handled, process content is as follows:
The labeled data of acquisition is trained by (3-1) by TEXT CNN model, obtains intent classifier model;
(3-2) will be trained in labeled data input probability transfer matrix, obtain the probability transfer matrix A of initialization;
(4) the processing data in step (2) are predicted by sorting algorithm, obtains the distribution P of prediction label;
(5) series of computation is carried out to the distribution P of prediction label, filters out the information Q of missingi, i=1,2,3...n;
(6) it for the processing data in step (2) for a complete conversation procedure, is obtained in conjunction with step (3-2) general Rate shift-matrix A is handled, and accurately missing sentence and corresponding context are obtained;
Further, the sorting algorithm is TEXT CNN, LSTM, BERT and SVM, specially with the probability of prediction result The applicable this system of the sorting algorithm of distribution.
Further, the series of algorithms in the step (5) specifically:
(5-1) calculates the average M of distribution P by average formula,
Wherein, P1、P2...PiIndicate that specific numerical value, i state the number of this group of data;
(5-2) again by formula of variance calculate, screening provide it is standby,
Wherein, the number of i presentation data, M are average, s2Variance is indicated, as variance s2Numerical value is smaller, for handling number Intention expressed by certain some data is just more difficult to judge in.
Further, the particular content that join probability shift-matrix A calculates in the step (6) are as follows:
(6-1) combination processing data, we have the dialogue of n item, for being judged as the sentence Q of missing information in step (4)i, There is a distribution Pi, in conjunction with the random probability transfer matrix A that system itself initializes, construct objective function: f= ||Qi-1-AQi| |, wherein the n item processing data are the data set with full dialog scene, make probability transfer matrix A Learn the prediction result of currently processed data, so that it is determined that missing sentence and corresponding context.
By using above scheme, have the following beneficial effects:
(1) the present invention provides a kind of intension recognizing methods of combination context may not need preparation band by this method The labeled data of context, saves human cost.
(2) by the ability of self-teaching of probability transfer matrix A, can with the variation for using the time, whole system Precision is higher and higher.
Detailed description of the invention
Fig. 1 is the flow diagram of whole system.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description.
A method of being shifted based on probability improves question answering system context, passes through sorting algorithm and probability transfer matrix A The improvement for being implemented in combination with question answering system, specific step content is as follows:
(4) to system prediction volume of data, and manually it is demarcated, to obtain nominal data;
(5) customer problem is received, and customer problem is pre-processed, processing data is obtained, is handled convenient for follow-up link;
(6) labeled data is handled, process content is as follows:
The labeled data of acquisition is trained by (3-1) by TEXT CNN model, obtains intent classifier model;
(3-2) will be trained in labeled data input probability transfer matrix, obtain the probability transfer matrix A of initialization;
(4) the processing data in step (2) are predicted by sorting algorithm, obtains the distribution P of prediction label;
(5) series of computation is carried out to the distribution P of prediction label, filters out the information Q of missingi, i=1,2,3...n;Its Series of algorithms specifically:
(5-1) calculates the average M of distribution P by average formula,
Wherein, P1、P2...PiIndicate that specific numerical value, i state the number of this group of data;
(5-2) again by formula of variance calculate, screening provide it is standby,
Wherein, the number of i presentation data, M are average, s2Variance is indicated, as variance s2Numerical value is smaller, for handling number Intention expressed by certain some data is just more difficult to judge in.
(6) it for the processing data in step (2) for a complete conversation procedure, is obtained in conjunction with step (3-2) general Rate shift-matrix A is handled, and accurately missing sentence and corresponding context, join probability transfer matrix are obtained The particular content that A is calculated is as follows:
(6-1) combination processing data, we have the dialogue of n item, for being judged as the sentence Q of missing information in step (4)i, There is a distribution Pi, in conjunction with the random probability transfer matrix A that system itself initializes, construct objective function: f= ||Qi-1-AQi| |, so that probability transfer matrix A is learnt the prediction result of currently processed data, so that it is determined that missing sentence and phase Corresponding context.
Preferably, the sorting algorithm is TEXT CNN, LSTM, BERT and SVM, specially there is the general of prediction result Applicable this system of sorting algorithm of rate distribution.
Preferably, the processing data of n item described in the step (6-1) are the data set with full dialog scene.
The working principle of this system: as shown in Figure 1, firstly, TEXT CNN is trained labeled data, acquisition is intended to divide Class model, then, the label that TEXT CNN goes prediction to be intended to for receiving customer problem by trained TEXT CNN, if The probability distribution of the label of prediction is smooth, then obtains the label of the prediction, if the probability distribution of the label of prediction is unsmooth, Probability transfer matrix A training by the data not marked (the customer problem data in i.e. complete conversation procedure) by initialization is learned It practises, is directed to the label probability distribution P that each customer problem obtains in conjunction with TEXT CNN, i.e., by the label probability distribution of acquisition P is multiplied to obtain new probability distribution with probability transfer matrix A, and intention label corresponding to new ground probability distribution top1 is defeated Out, the process of whole system is completed.
The present invention is illustrated according to embodiment, under the premise of not departing from present principles, if the present apparatus can also make Dry modification and improvement.It should be pointed out that it is all using modes technical solutions obtained such as equivalent substitution or equivalent transformations, all fall within this In the protection scope of invention.

Claims (7)

1. a kind of shift the method for improving question answering system context based on probability, using sorting algorithm and A pairs of probability transfer matrix It is handled in customer problem data, it is characterised in that: specific processing step content is as follows:
(1) to systemic presupposition labeled data;
(2) customer problem is received, is pre-processed, processing data are obtained;
(3) labeled data is trained by sorting algorithm, obtains intent classifier model;Then, labeled data is delivered to Probability transfer matrix A is trained, and obtains the probability transfer matrix A of initialization;
(4) processing data are predicted, obtains the distribution P of prediction label;
(5) a series of calculating is carried out by the distribution P of prediction label, filters out the information Q of missingi, i=1,2,3...n;
(6) for being the data acquisition system in a complete conversation procedure in processing data, we combine the probability transfer of initialization Matrix A is calculated, and accurately missing sentence and corresponding context are obtained.
2. a kind of method for shifting improvement question answering system context based on probability according to claim 1, it is characterised in that: The labeled data is a series of problems artificially demarcated.
3. a kind of method for shifting improvement question answering system context based on probability according to claim 1, it is characterised in that: The processing data are to have the process of a complete session.
4. a kind of method for shifting improvement question answering system context based on probability according to claim 1, it is characterised in that: The sorting algorithm is to have TEXT CNN, LSTM, BERT and SVM, and the specially classification of the probability distribution with prediction result is calculated The applicable this system of method.
5. a kind of method for shifting improvement question answering system context based on probability according to claim 1, it is characterised in that: Series of algorithms in the step (5) specifically:
(5-1) calculates the average M of distribution P by average formula,
Wherein, P1、P2...PiIndicate that specific numerical value, i state the number of this group of data;
(5-2) again by formula of variance calculate, screening provide it is standby,
Wherein, the number of i presentation data, M are average, s2Variance is indicated, as variance s2Numerical value is smaller, in processing data Intention expressed by certain some data is just more difficult to judge.
6. a kind of method for shifting improvement question answering system context based on probability according to claim 1, it is characterised in that: The particular content that join probability shift-matrix A calculates in the step (6) are as follows:
(6-1) combination processing data, we have the dialogue of n item, for being judged as the sentence Q of missing information in step (4)i, there is one It is distributed Pi, in conjunction with the random probability transfer matrix A that system itself initializes, construct an objective function:
F=| | Qi-1-AQi| |, make probability transfer matrix A learn currently processed data prediction result so that it is determined that missing sentence with And corresponding context.
7. a kind of method for shifting improvement question answering system context based on probability according to claim 6, it is characterised in that: The n item processing data combined in the step (6-1) are the data set with full dialog scene.
CN201910641706.9A 2019-07-16 2019-07-16 Method for improving context of question-answering system based on probability transition Active CN110377713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910641706.9A CN110377713B (en) 2019-07-16 2019-07-16 Method for improving context of question-answering system based on probability transition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910641706.9A CN110377713B (en) 2019-07-16 2019-07-16 Method for improving context of question-answering system based on probability transition

Publications (2)

Publication Number Publication Date
CN110377713A true CN110377713A (en) 2019-10-25
CN110377713B CN110377713B (en) 2023-09-15

Family

ID=68253502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910641706.9A Active CN110377713B (en) 2019-07-16 2019-07-16 Method for improving context of question-answering system based on probability transition

Country Status (1)

Country Link
CN (1) CN110377713B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694952A (en) * 2020-04-16 2020-09-22 国家计算机网络与信息安全管理中心 Big data analysis model system based on microblog and implementation method thereof
CN115018656A (en) * 2022-08-08 2022-09-06 太平金融科技服务(上海)有限公司深圳分公司 Risk identification method, and training method, device and equipment of risk identification model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649694A (en) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 Method and device for identifying user's intention in voice interaction
CN107679234A (en) * 2017-10-24 2018-02-09 上海携程国际旅行社有限公司 Customer service information providing method, device, electronic equipment, storage medium
US20180067981A1 (en) * 2016-09-06 2018-03-08 International Business Machines Corporation Automatic Detection and Cleansing of Erroneous Concepts in an Aggregated Knowledge Base
CN108897896A (en) * 2018-07-13 2018-11-27 深圳追科技有限公司 Keyword abstraction method based on intensified learning
US20200334553A1 (en) * 2019-04-22 2020-10-22 Electronics And Telecommunications Research Institute Apparatus and method for predicting error of annotation
WO2022095573A1 (en) * 2020-11-09 2022-05-12 西安交通大学 Community question answering website answer sorting method and system combined with active learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829662A (en) * 2018-05-10 2018-11-16 浙江大学 A kind of conversation activity recognition methods and system based on condition random field structuring attention network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180067981A1 (en) * 2016-09-06 2018-03-08 International Business Machines Corporation Automatic Detection and Cleansing of Erroneous Concepts in an Aggregated Knowledge Base
CN106649694A (en) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 Method and device for identifying user's intention in voice interaction
CN107679234A (en) * 2017-10-24 2018-02-09 上海携程国际旅行社有限公司 Customer service information providing method, device, electronic equipment, storage medium
CN108897896A (en) * 2018-07-13 2018-11-27 深圳追科技有限公司 Keyword abstraction method based on intensified learning
US20200334553A1 (en) * 2019-04-22 2020-10-22 Electronics And Telecommunications Research Institute Apparatus and method for predicting error of annotation
WO2022095573A1 (en) * 2020-11-09 2022-05-12 西安交通大学 Community question answering website answer sorting method and system combined with active learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周小强等: "交互式问答的关系结构体系及标注", 《中文信息学报》 *
周小强等: "交互式问答的关系结构体系及标注", 《中文信息学报》, no. 05, 15 May 2018 (2018-05-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694952A (en) * 2020-04-16 2020-09-22 国家计算机网络与信息安全管理中心 Big data analysis model system based on microblog and implementation method thereof
CN115018656A (en) * 2022-08-08 2022-09-06 太平金融科技服务(上海)有限公司深圳分公司 Risk identification method, and training method, device and equipment of risk identification model

Also Published As

Publication number Publication date
CN110377713B (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN105677831B (en) Method and device for determining recommended merchants
CN108228706A (en) For identifying the method and apparatus of abnormal transaction corporations
CN107103005A (en) The collection method and device of question and answer language material
CN109740155A (en) A kind of customer service system artificial intelligence quality inspection rule self concludes the method and system of model
CN110377713A (en) A method of being shifted based on probability improves question answering system context
CN107306306A (en) Communicating number processing method and processing device
CN107301229A (en) Feedback assigning method and system based on semantic analysis
CN110415086A (en) Intelligence financing recommended method based on user's Continuous behavior sequence signature
CN106875076A (en) Set up the method and system that outgoing call quality model, outgoing call model and outgoing call are evaluated
CN106846082A (en) Tourism cold start-up consumer products commending system and method based on hardware information
CN114882228B (en) Fitness place layout optimization method based on knowledge distillation
CN110827432A (en) Class attendance checking method and system based on face recognition
CN111080087A (en) Calling center scheduling method based on customer emotion analysis
CN113780342A (en) Intelligent detection method and device based on self-supervision pre-training and robot
CN108876053A (en) Travelling route Intelligent planning method and system
CN116402399A (en) Business data processing method and system based on artificial intelligence and electronic mall
CN107665423A (en) The tutoring system and method that a kind of rapid field is called the roll
CN109949162A (en) A kind of management of investment assessment system based on Cloud Server
CN115953080B (en) Engineer service level determination method, apparatus and storage medium
CN116757855A (en) Intelligent insurance service method, device, equipment and storage medium
CN110532394A (en) The processing method and system of Order Remarks text
CN115994767A (en) Product supply chain management system based on electronic commerce
CN115439265A (en) Intelligent insurance industry compensation abnormal transaction risk control system
Liang et al. Clef newsreel 2017: Contextual bandit news recommendation
CN112365302B (en) Product recommendation network training method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230331

Address after: 104058, No. 2-10, No. 311 Huangpu Avenue Middle, Tianhe District, Guangzhou City, Guangdong Province, 510000

Applicant after: Guangzhou Tanyu Technology Co.,Ltd.

Address before: 601-5, 1382 Wenyi West Road, Cangqian street, Yuhang District, Hangzhou City, Zhejiang Province, 310012

Applicant before: Hangzhou Weier Network Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant