CN105760965A - Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices - Google Patents

Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices Download PDF

Info

Publication number
CN105760965A
CN105760965A CN201610147605.2A CN201610147605A CN105760965A CN 105760965 A CN105760965 A CN 105760965A CN 201610147605 A CN201610147605 A CN 201610147605A CN 105760965 A CN105760965 A CN 105760965A
Authority
CN
China
Prior art keywords
term vector
service
service quality
binary
isp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610147605.2A
Other languages
Chinese (zh)
Inventor
张军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610147605.2A priority Critical patent/CN105760965A/en
Publication of CN105760965A publication Critical patent/CN105760965A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a pre-estimated model parameter training method, a service quality pre-estimation method and corresponding devices. The pre-estimated model parameter training method comprises: constructing a training data set corresponding to a preset type of service, and utilizing the training data set to train learning to obtain pre-estimated model parameters corresponding to the preset type of service. The service quality pre-estimation method comprises: obtaining the communication records between a user and a service provider of a certain type of service; querying pre-estimated model parameters corresponding to the service, and obtaining a word vector matrix corresponding to the communication records; calculating the service quality scores of the service provider according to the word vector matrix corresponding to the communication records and the pre-estimated model parameters; and presenting the quality service of the service provider to the user according to the service quality scores of the service provider. The method can rapidly pre-estimate the service quality of the service provider according to user customized demands, and facilitate user selection of an appropriate service provider.

Description

The training method of prediction model parameter, service quality predictor method and corresponding intrument
[technical field]
The present invention relates to Computer Applied Technology field, particularly relate to the training side of a kind of prediction model parameter Method, service quality predictor method and corresponding intrument.
[background technology]
In the middle of the offer that O2O (Online To Offline) services, some O2O services are had to be belonging to Ultra-low frequency industry, such as Decoration Industry, wedding celebration industry, second-hand house intermediary industry etc., because user The frequency using this type of O2O to service is the lowest, usually only one this type of can be used for twice to take in the middle of all one's life Business, and the service quality of this type of service often relies on individual quality and the specialty journey of concrete practitioner Degree.Meanwhile, for the O2O of other high frequencies services, the service content provided is according to specific use The demand at family has the highest personalization, and therefore, the evaluation of service case before can not be good The reference frame serviced as this.User whether concrete decision uses certain ISP when, It is required to the service quality quickly estimating this attendant for this service of this user, it is thus possible to This user is enough allowed to enjoy more satisfied service experience.
Estimating of the service quality of certain service of current ultralow frequency O2O industry mainly uses following side Method: method one, user understands certain concrete working people by the way of to friend or acquaintance's consulting The public praise of member, judges this quality serviced from intuition;Method two, user passes through browsing histories user The indexs such as the evaluation content of the service case being provided certain attendant and service star are to judge this The service quality of service.The method that both is traditional, although the reference that user can be given certain understands this The service quality of attendant, but due to the particularity of ultralow frequency O2O service industry, these type of data one As fewer, reference significance is limited;It addition, certain concrete service item has the strongest personalization, Between service case and this service content in past not consistent, thus can mislead user to this service Expection.
[summary of the invention]
The invention provides the training method of a kind of prediction model parameter, service quality predictor method and correspondence Device, can be according to the service quality of the individual demand rapid Estimation ISP of user, it is simple to user Select suitable ISP, improve the experience of user.
Concrete technical scheme is as follows:
A kind of training method of prediction model parameter, including:
Building the training dataset that preset kind service is corresponding, described training dataset includes presetting from described The sample data filtered out in the relevant user of type of service and the communication record of ISP;
The training of described training dataset is utilized to obtain the prediction model parameter corresponding with the service of described preset kind, Described prediction model parameter includes the term vector square being made up of the term vector of the comprised word of described training dataset Battle array, the distribution of weights vector corresponding to term vector matrix.
According to one preferred embodiment of the present invention, described sample data includes customer problem, and ISP answers, Service quality mark.
According to one preferred embodiment of the present invention, the training of described training dataset is utilized to obtain described preset kind clothes The prediction model parameter that business is relevant includes:
Set up parameterized term vector matrix, parameterized described distribution of weights vector;
Initialize the parameter in described term vector matrix, the parameter in described distribution of weights vector;
Use the iterative algorithm preset, to the parameter in described term vector matrix, in described distribution of weights vector Parameter be iterated, until it reaches preset stopping criterion for iteration.
According to one preferred embodiment of the present invention, the training of described training dataset is utilized to obtain described preset kind clothes The prediction model parameter that business is relevant also includes:
Offset parameter is estimated in initialization;
Use the iterative algorithm preset, described offset parameter of estimating is iterated, until it reaches that presets changes For end condition.
According to one preferred embodiment of the present invention, described set up parameterized term vector matrix and include:
Described sample data is carried out participle;
The term vector of each word that parametrization participle obtains;
Term vector matrix is constituted by the term vector of each word obtained after parameterizing.
According to one preferred embodiment of the present invention, described stopping criterion for iteration includes:
Reach default iterations;Or
The value of the loss function obtained after current iteration is less than predetermined target value;Or
The value of the loss function that the loss function that current iteration obtains after terminating and last iteration obtain after terminating it Difference is less than the threshold value preset;Wherein, the service quality that described loss function is estimated according to training dataset is divided Distance between the service quality mark that number and training data are concentrated determines.
A kind of service quality predictor method, the method includes:
Obtain the communication record of user and the ISP of certain type of service;
Inquire about the prediction model parameter relevant to described service, obtain the term vector square that described communication record is corresponding Battle array;
The term vector matrix corresponding according to described communication record and described prediction model parameter, calculate described clothes The service quality mark of business supplier;
Wherein said prediction model parameter is that the training method training using above-mentioned prediction model parameter obtains.
According to one preferred embodiment of the present invention, the term vector matrix that the described communication record of described acquisition is corresponding, bag Include:
Extract at least one according to described communication record and answered, by customer problem and ISP, the binary constituted Right;
Customer problem QU and ISP to each binary centering answer AN and carry out participle;
Inquire about the term vector matrix relevant to described service, obtain each binary to the word of central all words to Amount.
According to one preferred embodiment of the present invention, the service quality mark of the described ISP of described calculating, bag Include:
Utilize default vector calculation that to corresponding term vector matrix computations, each binary is obtained each two Unit is to corresponding term vector;
According to each binary corresponding term vector and described prediction model parameter calculated each binary to right The service quality mark answered;
Divide according to the service quality that each binary calculates described ISP to corresponding service quality mark Number.
According to one preferred embodiment of the present invention, described according to each binary to corresponding term vector and described pre- Estimate each binary of model parameter calculation as follows to the computing formula of corresponding service quality mark:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents The inner product of rep (QU-AN) and distribution of weights vector V, rep (QU-AN) represents that i-th binary is to corresponding Term vector, described b represents and estimates offset parameter.
A kind of trainer of prediction model parameter, including:
Construction unit, for building the training dataset that preset kind service is corresponding, described training dataset bag Include the sample filtered out from the communication record of the user relevant to described preset kind service and ISP Data;
Training unit is corresponding with the service of described preset kind for utilizing the training of described training dataset to obtain Prediction model parameter, described prediction model parameter includes by the term vector of the comprised word of described training dataset The term vector matrix constituted, the distribution of weights vector corresponding to term vector matrix.
According to one preferred embodiment of the present invention, described sample data includes customer problem, and ISP answers, Service quality mark.
According to one preferred embodiment of the present invention, it is characterised in that utilize the training of described training dataset to obtain institute The prediction model parameter stating preset kind service relevant includes:
Set up parameterized term vector matrix, parameterized described distribution of weights vector;
Initialize the parameter in described term vector matrix, the parameter in described distribution of weights vector;
Use the iterative algorithm preset, to the parameter in described term vector matrix, in described distribution of weights vector Parameter be iterated, until it reaches preset stopping criterion for iteration.
According to one preferred embodiment of the present invention, the training of described training dataset is utilized to obtain described preset kind clothes The prediction model parameter that business is relevant also includes:
Offset parameter is estimated in initialization;
Use the iterative algorithm preset, described offset parameter of estimating is iterated, until it reaches that presets changes For end condition.
According to one preferred embodiment of the present invention, described set up parameterized term vector matrix and include:
Described sample data is carried out participle;
The term vector of each word that parametrization participle obtains;
Term vector matrix is constituted by the term vector of each word obtained after parameterizing.
According to one preferred embodiment of the present invention, described stopping criterion for iteration includes:
Reach default iterations;Or
The value of the loss function obtained after current iteration is less than predetermined target value;Or
The value of the loss function that the loss function that current iteration obtains after terminating and last iteration obtain after terminating it Difference is less than the threshold value preset;Wherein, the service quality that described loss function is estimated according to training dataset is divided Distance between the service quality mark that number and training data are concentrated determines.
A kind of service quality estimating device, this device includes:
First acquiring unit, for obtaining the communication record of user and the ISP of certain type of service;
Second acquisition unit, for the prediction model parameter that inquiry is relevant to described service, obtains described communication The term vector matrix that record is corresponding;
Computing unit, for the term vector matrix corresponding according to described communication record and described prediction model ginseng Number, calculates the service quality mark of described ISP;
Wherein said prediction model parameter is that the trainer training using above-mentioned prediction model parameter obtains.
According to one preferred embodiment of the present invention, the term vector matrix that the described communication record of described acquisition is corresponding, bag Include:
Extract at least one according to described communication record and answered, by customer problem and ISP, the binary constituted Right;
Customer problem QU and ISP to each binary centering answer AN and carry out participle;
Inquire about the term vector matrix relevant to described service, obtain each binary to the word of central all words to Amount.
According to one preferred embodiment of the present invention, the service quality mark of the described ISP of described calculating, bag Include:
Utilize default vector calculation that to corresponding term vector matrix computations, each binary is obtained each two Unit is to corresponding term vector;
According to each binary corresponding term vector and described prediction model parameter calculated each binary to right The service quality mark answered;
Divide according to the service quality that each binary calculates described ISP to corresponding service quality mark Number.
According to one preferred embodiment of the present invention, described according to each binary to corresponding term vector and described pre- Estimate each binary of model parameter calculation as follows to the computing formula of corresponding service quality mark:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents The inner product of rep (QU-AN) and distribution of weights vector V, rep (QU-AN) represents that i-th binary is to corresponding Term vector, described b represents and estimates offset parameter.
As can be seen from the above technical solutions, the present invention can be fast with the communication record of ISP according to user Speed estimates the service quality of ISP quantitatively, it is simple to user selects suitable ISP, improves The experience of user.
[accompanying drawing explanation]
Fig. 1 is the general principle block diagram of the embodiment of the present invention.
Fig. 2 is the flow chart of the training method of the prediction model parameter of the embodiment of the present invention one.
Fig. 3 is the schematic diagram of non-linear transform function.
Fig. 4 is the flow chart of the service quality predictor method of the embodiment of the present invention two.
Fig. 5 is the structural representation of the trainer of the prediction model parameter of the embodiment of the present invention three.
Fig. 6 is the structural representation of the service quality estimating device of the embodiment of the present invention four.
[detailed description of the invention]
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the accompanying drawings and specifically Embodiment describes the present invention.
Fig. 1 is the general principle block diagram of the embodiment of the present invention.As it is shown in figure 1, first excavate from internet The training data relevant to preset kind service gone out, and build training dataset.Train according to this data set Study obtains optimum prediction model parameter.Finally, obtain the user ISP's with certain type of service Communication record, inquires about the prediction model parameter relevant to described service, obtains the word that described communication record is corresponding Vector matrix, the term vector matrix corresponding according to described communication record and described prediction model parameter, calculate The service quality mark of described ISP, according to the service quality mark of described ISP by described The service quality of ISP presents to user.
As in figure 2 it is shown, be the flow chart of the training method of the prediction model parameter of the embodiment of the present invention one. The training method of this prediction model parameter includes:
S10, builds the training dataset that preset kind service is corresponding.
In the present embodiment, the corresponding training dataset of a type of service.Existing data are utilized to dig Pick technology is excavated with pre-from the web page library of search engine (such as evaluation type webpage, or forum Web pages etc.) If the sample data that type of service is relevant.Multiple triple, the plurality of ternary is generated according to sample data Group composing training data set.Each triple table is shown as < customer problem, ISP's answer, service Mass fraction>, it is abbreviated as<Q, A, S>.Such as, in house ornamentation service industry, triple be exemplified as < The bricklayer level of your family how?Bricklayer level is the highest, does several family, woulds you please relieved.9>.Sample The amount of notebook data is the biggest, and the scale of constructed training dataset is the biggest, follow-up instructs prediction model The model parameter of this prediction model obtained by white silk is the most accurate.As a rule, described sample data is at least Ten million magnitude.
When generating multiple triple according to sample data, a corresponding triple of sample data.Pin To a sample data, data abstraction techniques is utilized to extract customer problem and service from described sample data Supplier answers, and answers in one triple of generation according to the customer problem extracted and ISP Described customer problem Q, ISP answers A.
The service quality mark S of described triple can be determined by the following manner: (1) by artificial according to institute The customer problem and the supplier that state triple answer and manually mark service quality mark;(2) sample is worked as When data include assessing ISP, divide according to the described service quality assessing generation triple Number.Such as, when assessing in sample data be with star index to evaluate time, can be by different stars Carrying out score normalization, such as, during a star, be 1 point, two stars are 3 to grade.When in sample data Assess be with mark to evaluate time, as assessed with the mark between 0-100, this mark is marked Standardization, such as 90 points standardization 9 grade.
S11, utilizes the training study of described training dataset to obtain corresponding pre-with the service of described preset kind Estimate model parameter.
In the present embodiment, described prediction model parameter includes by the comprised word of described training dataset The term vector matrix E that term vector is constituted, distribution of weights vector V corresponding to term vector matrix and estimate partially Shifting parameter b.In other embodiments, described prediction model parameter can not also include estimating offset parameter b.
Described term vector matrix E parameter represents.Described term vector matrix by the word of multiple words to Amount composition, described term vector is for be mapped to word in a vector space, by word in this vector space Distribution situation is with a vector representation.In the present embodiment, word represents in the distribution of described vector space The fine or not degree of the service quality of ISP.
Term vector matrix significance level in described vector space described in described distribution of weights vector representation. In the present embodiment it can be understood as, training data concentrate the term vector corresponding to all triple exist Significance level in vector space.Described offset parameter of estimating represents the skew of the evaluation to ISP Amount.In the present embodiment it can be understood as the scope that fluctuates of the service quality mark of ISP.
Preferably as a kind of embodiment of S11, this S11 includes:
S110, sets up parameterized term vector matrix, and parameterized described distribution of weights is vectorial and estimates partially Shifting parameter.
In setting up parameterized term vector matrix, specifically include:
(1) training data is concentrated each triple carry out participle.
Utilize participle technique that training data is concentrated customer problem Q and the ISP of each triple Answer A and carry out participle.For example, it is assumed that customer problem Q to be divided into the text string of a length of M, will clothes Business supplier answer the text string that A is divided into a length of N.Q in the most each triple is expressed as (q1,q2...qM), A is expressed as (a1,a2...aN)。
(2) term vector of each word that parametrization participle obtains.
The term vector emb (Q) of all words of the customer problem Q of each triple is expressed as (emb_q1,emb_q2...emb_qM), the ISP of each triple answers the word of all words of A Vector emb (A) is expressed as (emb_a1,emb_a2...emb_aN).Emb (Q) be a line number be m, columns is The matrix of emb_size.Emb (A) be a line number be n, columns is the matrix of emb_size.
(3) term vector matrix is constituted by the term vector of each word obtained after parameterizing.
Each word of all triple after participle is included in a term vector matrix, this term vector A height of | V | of row of matrix, a width of emb_size of matrix column.Here | V | is all words being likely to occur Number, i.e. dictionary size.The size of emb_size is preset value, and this preset value is an empirical value, It is normally set up between position 50 to 1000.Each line parameter in this matrix is an a length of emb_size Vector, be referred to as the term vector of word corresponding to this line.Follow-up can be by the instruction to prediction model parameter Get the optimal solution of described term vector matrix, i.e. can get the term vector of word corresponding in each row Optimal solution.
Parameterized described distribution of weights vector is an a length of 2*emb_size being made up of parameter Vector.
S111, initializes the parameter in described term vector matrix, parameter in described distribution of weights vector and Estimate offset parameter.
In the present embodiment, randomly generate set of number to initialize the parameter in described term vector matrix, Parameter in described distribution of weights vector and estimate offset parameter.Default initial value can certainly be used Initialize each parameter, initialize each parameter for example with the preset value between 0 to 0.01.
S112, uses the iterative algorithm preset, and to the parameter in described term vector matrix, described distribution is weighed Parameter in weight vector and estimate offset parameter and be iterated, until it reaches the stopping criterion for iteration preset.
In the present embodiment, utilize the iterative algorithm preset, described training dataset uses loss letter Number carrys out iteration and gets the parameter value in described term vector matrix, the parameter value in described distribution of weights vector And estimate offset parameter values, the service quality mark that described loss function is estimated according to training dataset with Distance between the service quality mark (i.e. actual service quality mark) that training data is concentrated determines.
The expression formula of loss function is as follows:
Wherein (rep (QA) V+b, Score represent in training study Score=, concentrate ternary according to training data The customer problem of group and ISP answer the service quality mark estimated.S is to represent in triple Service quality mark, i.e. actual service quality mark.It is described that (rep (QA) V represents rep (QA) and V's Inner product.Described rep (QA) represents vector rep (Q) and rep (A) splicing obtained, and one a length of The vector of 2*emb_size.Rep (Q) represents the term vector in each triple corresponding to customer problem, rep (A) Represent that in each triple, ISP answers corresponding term vector.
In other embodiments, those skilled in the art can use other representations as required Loss function, such as logarithm loss function, average loss function, absolute loss function etc..
Described default iterative algorithm be stochastic gradient descent method (Stochastic Gradient Descent, SGD) with backpropagation (Back Propagation, BP) algorithm.Due to the data set scale set up Cross hundred million, therefore described prediction model parameter is trained, it is possible to obtain a optimized prediction model Parameter.SGD Yu BP algorithm belongs to the knowledge of those skilled in the art, only does summary description at this. BP algorithm is the method for the gradient of a kind of effective calculating parameter.
In the present embodiment, the iteration thought of SGD is utilized, respectively to term vector matrix E, distribution of weights Vector V and estimate offset parameter b and initialize, is trained by (referred to as mini-batch size) Data set calculates term vector matrix E, distribution of weights vector V respectively and estimates offset parameter b gradient, root According to the gradient updating initialized term vector matrix E of term vector matrix E, the method for renewal is to allow word every time Vector matrix E deducts a set learning rate (learning rate) and is multiplied by calculated term vector square The gradient of battle array E, to distribution of weights vector V and to estimate offset parameter b be also above-mentioned same update method, After successive ignition, when iterating to the stopping criterion for iteration preset, thus obtain the term vector matrix of optimum E, distribution of weights vector V and estimate offset parameter b.
After described default stopping criterion for iteration can be default iterations, or current iteration terminates The difference of the value of the loss function that the loss function obtained and last iteration obtain after terminating is less than the threshold value preset Or the value of loss function is less than predetermined target value.Described default iterations, default threshold value and pre- If desired value is a preset value, it it is all empirical value.
Preferentially, default vector calculation owning the customer problem in each triple is wherein utilized The term vector of word, the term vector of all words that server answers is respectively processed and obtains each ternary Term vector rep (Q) corresponding to customer problem in group and the server in each triple answer corresponding Term vector rep (A).
Obtain especially by the following manner:
(1), the term vector of words all in the customer problem in each triple is added and obtains rep (Q), In server's answer in each triple, the term vector of all words carries out addition of vectors and obtains rep (A).
I.e. rep (Q)=emb_q1+emb_q2...+emb_qM,
Rep (A)=emb_a1+emb_a2...+emb_aN
(2), to the rep (Q) in each triple, rep (A) be normalized respectively update described often Rep (Q) in individual triple, rep (A).In the present embodiment, nonlinear transformation sigmoid function is used Rep (Q), each element in rep (A) vector carries out nonlinear transformation respectively, it is therefore an objective to by rep (Q), Each element in rep (A) vector normalizes in an interval.
Sigmoid function be one for the function carrying out nonlinear transformation, its definition and function curve diagram As it is shown on figure 3, certainly, non-linear transform function can also use other the nonlinear function such as tanh.
S i g m o i d ( z ) = 1 1 + e - z .
Again will in each triple after conversion rep (Q), rep (A) carry out splicing a length and be The vectorial rep (QA) of 2*emb_size.
After obtaining the prediction model parameter of service of preset kind, can be stored in storage device, it is simple to When the follow-up user of having searches for this service, according to the prediction model parameter of this service, present this service to user The service quality of supplier, it is simple to user weighs whether use this ISP.
As shown in Figure 4, it is the flow chart of service quality predictor method of the embodiment of the present invention two.This service Quality predictor method includes:
S21, obtains the communication record of user and the ISP of certain type of service.
User is when selecting some type of service, it will usually in internet by voice or the form of text Link up with the ISP of this service, in order to select suitable ISP.Therefore, at this In embodiment, voice or the literary composition of the ISP of user and described service in preset time period can be obtained The communication record of this form, described communication record includes the communication summary (such as chat record) of textual form And the communication record of speech form.When for the communication record of speech form, first pass through speech recognition technology The communication record of this speech form is converted to the communication record of textual form.
S22, inquires about the prediction model parameter relevant to described service, obtains described communication record corresponding Term vector matrix.
In the present embodiment, the described prediction model parameter relevant to the type service is by embodiment one Described in method training obtain, can according to the type of this service from storage device obtain and this service phase The prediction model parameter trained closed, is the term vector matrix E relevant to described service, point Cloth weight vectors V and estimate offset parameter b.A height of | V | of row of this matrix, matrix column is a width of emb_size.Here the number (i.e. dictionary size) that | V | is all words being likely to occur, in this matrix The vector that each row number is an a length of emb_size, be referred to as the word of word corresponding to this line to Amount.Described term vector matrix is made up of the term vector of multiple words, described term vector for word is mapped to one to In quantity space, by word in the distribution situation of this vector space with a vector representation.
Described term vector matrix E parameter represents.Described term vector matrix by the word of multiple words to Amount composition, described term vector is for be mapped to word in a vector space, by word in this vector space Distribution situation is with a vector representation.In the present embodiment, word represents in the distribution of described vector space The fine or not degree of the service quality of ISP.
Term vector matrix significance level in described vector space described in described distribution of weights vector representation. In the present embodiment it can be understood as, training data concentrate the term vector corresponding to all triple exist Significance level in vector space.Described offset parameter of estimating represents the skew of the evaluation to ISP Amount.In the present embodiment it can be understood as the scope that fluctuates of the service quality mark of ISP. In other are implemented, it is also possible to described prediction model parameter can not also comprise estimates offset parameter.
Preferably as a kind of embodiment of S22, this S22 includes:
(1), extract at least one according to described communication record to be returned by customer problem QU and ISP Answer the binary pair that AN is constituted, be abbreviated as<QU, AN>.
In the present embodiment, data abstraction techniques is utilized to extract K binary from described communication record Right.K is positive integer.
(2) customer problem QU and ISP to each binary centering answer AN and carry out participle.
Specifically, utilize the participle technique each binary to extracting to carrying out participle.By each binary pair In QU participle become be by word qu1,qu2,…,qumThe text string of a length of m of composition.By each The AN participle of binary centering becomes by word an1,an2,…,annThe text string of a length of n of composition.
(3) inquire about the term vector matrix relevant to described service, obtain each binary to central all words The term vector of language.
The term vector being obtained each binary centering customer problem QU by query word vector matrix is designated as:
Emb (QU)=emb_qu1,emb_qu2,…,emb_qum,
Being similar to, each binary centering ISP answers the term vector of AN, is designated as:
Emb (AN)=emb_an1,emb_an2,…,emb_ann
Term vector matrix corresponding to described communication record i.e. by the word of each binary centering customer problem QU to Amount and ISP answer the term vector of AN and constitute.
S23, the term vector matrix corresponding according to described communication record and prediction model parameter, calculate described The service quality mark of ISP.
Preferably, the service quality mark calculating described ISP includes:
(1) utilize default vector calculation that corresponding term vector matrix computations is obtained by each binary To each binary to corresponding term vector.
Specifically, by the term vector matrix corresponding to each binary centering customer problem, ISP returns Answer corresponding term vector matrix to carry out addition respectively and obtain rep (QU), rep (AN), then to rep (QU), Rep (AN) normalized updates described rep (QU), rep (AN).It is described below:
To each binary to for, emb (QU), emb (AN) carries out phase add operation respectively, i.e.
Rep (QU)=emb_qu1+emb_qu2...+emb_qum
Rep (AN)=emb_an1+ emb_an2...+emb_ann
Recycling non-linear transform function all normalizes to an interval model rep (QU), rep (AN) vector In enclosing, as normalized in the range of 0 to 1, with normalized rep (QU), rep (AN) distinguishes more New described rep (QU), rep (AN), it is preferable that described non-linear transform function can be sigmoid letter Number.
Rep (QU), rep (AN) after each binary centering being updated are stitched together and are each binary centering Corresponding term vector rep (QU-AN).Rep (QU-AN) is the expression of an a length of 2*emb_size Vector.
In other embodiments, it would however also be possible to employ other the nonlinear function such as tanh.
(2) according to each binary, corresponding term vector and described prediction model parameter are calculated each two Unit is to corresponding service quality mark.
Specifically, to a binary to for, according to binary to corresponding rep (QU-AN), described pre- Estimate the distribution of weights vector V in model parameter and estimate offset parameter b and calculate this binary to corresponding clothes Business mass fraction, specific formula for calculation is as follows:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents The inner product of rep (QU-AN) and V.In other embodiments, above-mentioned computing formula can also be Scorei=(rep (QU-AN) V.
(3) according to each binary, corresponding service quality mark is calculated the clothes of described ISP Business mass fraction.
Assume to extract K binary pair altogether according to described communication record.So can be obtained by K service Mass fraction: Score1,…ScoreK.In the present embodiment, described K service quality mark is calculated Mean value is as the service quality mark to described ISP.This service quality mark is the highest, represents The quality of this service of this ISP is the highest.
S24, according to the service quality mark of described ISP by the Service Quality of described ISP Amount presents to user.
Preferably, directly the mark of this ISP can be presented to user, in other embodiments, According to the mark of ISP, the service of ISP can also be divided into grade, by this grade in Now give user.Such as, if service quality mark is 9 points, corresponding grade is fine.
When user by webpage after having exchanged Preset Time with certain ISP, pre-by the present invention Estimate the service quality of this ISP in this preset time period, by the service quality of this ISP Display is on the webpage that user is checked.Or when user is being exchanged with certain ISP by webpage After, according to the request of user, present to the service quality being estimated this ISP by the present invention use Family.Or after the multiple ISP of user exchanges, each ISP will be estimated by the present invention Service quality, and ISP higher for service quality is presented to user.After being easy to user's measurement Continue and provide service the need of this ISP.
As it is shown in figure 5, the structural representation of the trainer of the prediction model parameter of the embodiment of the present invention three. This device includes: construction unit 100 and training unit 101.
Construction unit 100 is for building the training dataset that preset kind service is corresponding.
In the present embodiment, the corresponding training dataset of a type of service.Existing data are utilized to dig Pick technology is excavated with pre-from the web page library of search engine (such as evaluation type webpage, or forum Web pages etc.) If the sample data that type of service is relevant.Multiple triple, the plurality of ternary is generated according to sample data Group composing training data set.Each triple table is shown as < customer problem, ISP's answer, service Mass fraction>, it is abbreviated as<Q, A, S>.Such as, in house ornamentation service industry, triple be exemplified as < The bricklayer level of your family how?Bricklayer level is the highest, does several family, woulds you please relieved.9>.Sample The amount of notebook data is the biggest, and the scale of constructed training dataset is the biggest, follow-up instructs prediction model The model parameter of this prediction model obtained by white silk is the most accurate.As a rule, described sample data is at least Ten million magnitude.
When generating multiple triple according to sample data, a corresponding triple of sample data.Pin To a sample data, data abstraction techniques is utilized to extract customer problem and service from described sample data Supplier answers, and answers in one triple of generation according to the customer problem extracted and ISP Described customer problem Q, ISP answers A.
The service quality mark S of described triple can be determined by the following manner: (1) by artificial according to institute The customer problem and the supplier that state triple answer and manually mark service quality mark;(2) sample is worked as When data include assessing ISP, divide according to the described service quality assessing generation triple Number.Such as, when assessing in sample data be with star index to evaluate time, can be by different stars Carrying out score normalization, such as, during a star, be 1 point, two stars are 3 to grade.When in sample data Assess be with mark to evaluate time, as assessed with the mark between 0-100, this mark is marked Standardization, such as 90 points standardization 9 grade.
Training unit 101 is used for utilizing the training study of described training dataset to obtain taking with described preset kind The prediction model parameter that business is corresponding.
In the present embodiment, described prediction model parameter includes by the comprised word of described training dataset The term vector matrix E that term vector is constituted, distribution of weights vector V corresponding to term vector matrix and estimate partially Shifting parameter b.In other embodiments, described prediction model parameter can not also include estimating offset parameter b.
Described term vector matrix E parameter represents.Described term vector matrix by the word of multiple words to Amount composition, described term vector is for be mapped to word in a vector space, by word in this vector space Distribution situation is with a vector representation.In the present embodiment, word represents in the distribution of described vector space The fine or not degree of the service quality of ISP.
Term vector matrix significance level in described vector space described in described distribution of weights vector representation. In the present embodiment it can be understood as, training data concentrate the term vector corresponding to all triple exist Significance level in vector space.Described offset parameter of estimating represents the skew of the evaluation to ISP Amount.In the present embodiment it can be understood as the scope that fluctuates of the service quality mark of ISP.
Preferably, the training study of described training dataset is being utilized to obtain servicing corresponding with described preset kind Prediction model parameter in, including:
Training unit 101 is additionally operable to set up parameterized term vector matrix, parameterized described distribution of weights Vector and estimate offset parameter.
In setting up parameterized term vector matrix, specifically include:
(1) training data is concentrated each triple carry out participle.
Utilize participle technique that training data is concentrated customer problem Q and the ISP of each triple Answer A and carry out participle.For example, it is assumed that customer problem Q to be divided into the text string of a length of M, will clothes Business supplier answer the text string that A is divided into a length of N.Q in the most each triple is expressed as (q1,q2...qM), A is expressed as (a1,a2...aN)。
(2) term vector of each word that parametrization participle obtains.
The term vector emb (Q) of all words of the customer problem Q of each triple is expressed as (emb_q1,emb_q2...emb_qM), the ISP of each triple answers the word of all words of A Vector emb (A) is expressed as (emb_a1,emb_a2...emb_aN).Emb (Q) be a line number be m, columns Matrix for emb_size.Emb (A) be a line number be n, columns is the matrix of emb_size.
(3) term vector matrix is constituted by the term vector of each word obtained after parameterizing.
Each word of all triple after participle is included in a term vector matrix, this term vector A height of | V | of row of matrix, a width of emb_size of matrix column.Here | V | is all words being likely to occur Number, i.e. dictionary size.The size of emb_size is preset value, and this preset value is an empirical value, It is normally set up between position 50 to 1000.Each line parameter in this matrix is an a length of emb_size Vector, be referred to as the term vector of word corresponding to this line.Follow-up can be by the instruction to prediction model parameter Get the optimal solution of described term vector matrix, i.e. can get the term vector of word corresponding in each row Optimal solution.
Parameterized described distribution of weights vector is an a length of 2*emb_size being made up of parameter Vector.
Training unit 101 is additionally operable to initialize the parameter in described term vector matrix, described distribution of weights to Parameter in amount and estimate offset parameter.
In the present embodiment, randomly generate set of number to initialize the parameter in described term vector matrix, Parameter in described distribution of weights vector and estimate offset parameter.Default initial value can certainly be used Initialize each parameter, initialize each parameter for example with the preset value between 0 to 0.01.
Training unit 101 is additionally operable to use the iterative algorithm preset, to the parameter in described term vector matrix, Parameter in described distribution of weights vector and estimate offset parameter and be iterated, until it reaches the iteration preset End condition.
In the present embodiment, utilize the iterative algorithm preset, described training dataset uses loss letter Number carrys out iteration and gets the parameter value in described term vector matrix, the parameter value in described distribution of weights vector And estimate offset parameter values, the service quality mark that described loss function is estimated according to training dataset with Distance between the service quality mark (i.e. actual service quality mark) that training data is concentrated determines.
The expression formula of loss function is as follows:
Wherein (rep (QA) V+b, Score represent in training study Score=, concentrate ternary according to training data The customer problem of group and ISP answer the service quality mark estimated.S is to represent in triple Service quality mark, i.e. actual service quality mark.It is described that (rep (QA) V represents rep (QA) and V's Inner product.Described rep (QA) represents vector rep (Q) and rep (A) splicing obtained, and one a length of The vector of 2*emb_size.Rep (Q) represents the term vector in each triple corresponding to customer problem, rep (A) Represent that in each triple, ISP answers corresponding term vector.
In other embodiments, those skilled in the art can use other representations as required Loss function, such as logarithm loss function, average loss function, absolute loss function etc..
Described default iterative algorithm be stochastic gradient descent method (Stochastic Gradient Descent, SGD) with backpropagation (Back Propagation, BP) algorithm.Due to the data set scale set up Cross hundred million, therefore described prediction model parameter is trained, it is possible to obtain a optimized prediction model Parameter.SGD Yu BP algorithm belongs to the knowledge of those skilled in the art, only does summary description at this. BP algorithm is the method for the gradient of a kind of effective calculating parameter.
In the present embodiment, the iteration thought of SGD is utilized, respectively to term vector matrix E, distribution of weights Vector V and estimate offset parameter b and initialize, is trained by (referred to as mini-batch size) Data set calculates term vector matrix E, distribution of weights vector V respectively and estimates offset parameter b gradient, root According to the gradient updating initialized term vector matrix E of term vector matrix E, the method for renewal is to allow word every time Vector matrix E deducts a set learning rate (learning rate) and is multiplied by calculated term vector square The gradient of battle array E, to distribution of weights vector V and to estimate offset parameter b be also above-mentioned same update method, After successive ignition, when iterating to the stopping criterion for iteration preset, thus obtain the term vector matrix of optimum E, distribution of weights vector V and estimate offset parameter b.
After described default stopping criterion for iteration can be default iterations, or current iteration terminates The difference of the value of the loss function that the loss function obtained and last iteration obtain after terminating is less than the threshold value preset Or the value of loss function is less than predetermined target value.Described default iterations, default threshold value and pre- If value is a preset value, it it is all empirical value.
Preferentially, default vector calculation owning the customer problem in each triple is wherein utilized The term vector of word, the term vector of all words that server answers is respectively processed and obtains each ternary Term vector rep (Q) corresponding to customer problem in group and the server in each triple answer corresponding Term vector rep (A).
Obtain especially by the following manner:
(1), the term vector of words all in the customer problem in each triple is added and obtains rep (Q), In server's answer in each triple, the term vector of all words carries out addition of vectors and obtains rep (A).
I.e. rep (Q)=emb_q1+ emb_q2...+emb_qM,
Rep (A)=emb_a1+ emb_a2...+emb_aN
(2), to the rep (Q) in each triple, rep (A) be normalized respectively update described often Rep (Q) in individual triple, rep (A).In the present embodiment, nonlinear transformation sigmoid letter is used Several rep (Q), each element in rep (A) vector carries out nonlinear transformation respectively, it is therefore an objective to will Rep (Q), each element in rep (A) vector normalizes in an interval.
Sigmoid function be one for the function carrying out nonlinear transformation, its definition and function curve diagram As it is shown on figure 3, certainly, non-linear transform function can also use other the nonlinear function such as tanh.
S i g m o i d ( z ) = 1 1 + e - z .
Again will in each triple after conversion rep (Q), rep (A) carry out splicing a length and be The vectorial rep (QA) of 2*emb_size.
After obtaining the prediction model parameter of service of preset kind, can be stored in storage device, it is simple to When the follow-up user of having searches for this service, according to the prediction model parameter of this service, present this service to user The service quality of supplier, it is simple to user weighs whether use this ISP.
As shown in Figure 6, it is the structural representation of service quality estimating device of the embodiment of the present invention four.Should Device includes: the first acquiring unit 201, second acquisition unit 202, computing unit 203 and display unit 204。
First acquiring unit 201 is for obtaining the communication note of user and the ISP of certain type of service Record.
User is when selecting some type of service, it will usually in internet by voice or the form of text Link up with the ISP of this service, in order to select suitable ISP.Therefore, at this In embodiment, voice or the literary composition of the ISP of user and described service in preset time period can be obtained This formal communication record, described communication record includes the communication summary (such as chat record) of textual form And the communication record of speech form.When for the communication record of speech form, first pass through speech recognition technology The communication record of this speech form is converted to the communication record of textual form.
Second acquisition unit 202, for inquiring about the prediction model parameter relevant to described service, obtains described The term vector matrix that communication record is corresponding.
In the present embodiment, the described prediction model parameter relevant to the type service is by embodiment one Described in method training obtain, can according to the type of this service from storage device obtain and this service phase The prediction model parameter trained closed, is the term vector matrix E relevant to described service, point Cloth weight vectors V and estimate offset parameter b.A height of | V | of row of this matrix, matrix column is a width of emb_size.Here the number (i.e. dictionary size) that | V | is all words being likely to occur, in this matrix The vector that each row number is an a length of emb_size, be referred to as the word of word corresponding to this line to Amount.Described term vector matrix is made up of the term vector of multiple words, described term vector for word is mapped to one to In quantity space, by word in the distribution situation of this vector space with a vector representation.
Described term vector matrix E parameter represents.Described term vector matrix by the word of multiple words to Amount composition, described term vector is for be mapped to word in a vector space, by word in this vector space Distribution situation is with a vector representation.In the present embodiment, word represents in the distribution of described vector space The fine or not degree of the service quality of ISP.
Term vector matrix significance level in described vector space described in described distribution of weights vector representation. In the present embodiment it can be understood as, training data concentrate the term vector corresponding to all triple exist Significance level in vector space.Described offset parameter of estimating represents the skew of the evaluation to ISP Amount.In the present embodiment it can be understood as the scope that fluctuates of the service quality mark of ISP. In other are implemented, it is also possible to described prediction model parameter can not also comprise estimates offset parameter.
Preferably, in obtaining the term vector matrix that described communication record is corresponding, including:
Second acquisition unit 202 is additionally operable to extract at least one by customer problem according to described communication record QU and ISP answer the binary pair that AN is constituted, and are abbreviated as<QU, AN>.
In the present embodiment, data abstraction techniques is utilized to extract K binary from described communication record Right.K is positive integer.
Second acquisition unit 202 is additionally operable to the customer problem QU to each binary centering and ISP Answer AN and carry out participle.
Specifically, utilize the participle technique each binary to extracting to carrying out participle.By each binary pair In QU participle become be by word qu1,qu2,…,qumThe text string of a length of m of composition.By each The AN participle of binary centering becomes by word an1,an2,…,annThe text string of a length of n of composition.
Second acquisition unit 202 is additionally operable to inquire about the term vector matrix relevant to described service, obtains each The binary term vector to central all words.
The term vector being obtained each binary centering customer problem QU by query word vector matrix is designated as:
Emb (QU)=emb_qu1,emb_qu2,…,emb_qum,
Being similar to, each binary centering ISP answers the term vector of AN, is designated as:
Emb (AN)=emb_an1,emb_an2,…,emb_ann
Term vector matrix corresponding to described communication record i.e. by the word of each binary centering customer problem QU to Amount and ISP answer the term vector of AN and constitute.
Computing unit 203 is for the term vector matrix corresponding according to described communication record and prediction model ginseng Number, calculates the service quality mark of described ISP.
Preferably, the service quality mark calculating described ISP includes:
(1) utilize default vector calculation that corresponding term vector matrix computations is obtained by each binary To each binary to corresponding term vector.
Specifically, by the term vector matrix corresponding to each binary centering customer problem, ISP returns Answer corresponding term vector matrix to carry out addition respectively and obtain rep (QU), rep (AN), then to rep (QU), Rep (AN) normalized updates described rep (QU), rep (AN).It is described below:
To each binary to for, emb (QU), emb (AN) carries out phase add operation respectively, i.e.
Rep (QU)=emb_qu1+ emb_qu2...+emb_qum
Rep (AN)=emb_an1+ emb_an2...+emb_ann
Recycling non-linear transform function all normalizes to an interval model rep (QU), rep (AN) vector In enclosing, as normalized in the range of 0 to 1, with normalized rep (QU), rep (AN) distinguishes more New described rep (QU), rep (AN), it is preferable that described non-linear transform function can be sigmoid letter Number.
Rep (QU), rep (AN) after each binary centering being updated are stitched together and are each binary centering Corresponding term vector rep (QU-AN).Rep (QU-AN) is the expression of an a length of 2*emb_size Vector.
In other embodiments, it would however also be possible to employ other the nonlinear function such as tanh.
(2) according to each binary, corresponding term vector and described prediction model parameter are calculated each two Unit is to corresponding service quality mark.
Specifically, to a binary to for, according to binary to corresponding rep (QU-AN), described pre- Estimate the distribution of weights vector V in model parameter and estimate offset parameter b and calculate this binary to corresponding clothes Business mass fraction, specific formula for calculation is as follows:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents The inner product of rep (QU-AN) and V.In other embodiments, above-mentioned computing formula can also be Scorei=(rep (QU-AN) V.
(3) according to each binary, corresponding service quality mark is calculated the clothes of described ISP Business mass fraction.
Assume to extract K binary pair altogether according to described communication record.So can be obtained by K service Mass fraction: Score1,…ScoreK.In the present embodiment, described K service quality mark is calculated Mean value is as the service quality mark to described ISP.This service quality mark is the highest, represents The quality of this service of this ISP is the highest.
Described service is provided by display unit 204 for the service quality mark according to described ISP The service quality of person presents to user.
Preferably, directly the mark of this ISP can be presented to user, in other embodiments, According to the mark of ISP, the service of ISP can also be divided into grade, by this grade in Now give user.Such as, if service quality mark is 9 points, corresponding grade is fine.
When user by webpage after having exchanged Preset Time with certain ISP, pre-by the present invention Estimate the service quality of this ISP in this preset time period, by the service quality of this ISP Display is on the webpage that user is checked.Or when user is being exchanged with certain ISP by webpage After, according to the request of user, present to the service quality being estimated this ISP by the present invention use Family.Or after the multiple ISP of user exchanges, each ISP will be estimated by the present invention Service quality, and ISP higher for service quality is presented to user.After being easy to user's measurement Continue and provide service the need of this ISP.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and Method, can realize by another way.Such as, device embodiment described above is only shown Meaning property, such as, the division of described unit, be only a kind of logic function and divide, actual can when realizing There to be other dividing mode.
The described unit illustrated as separating component can be or may not be physically separate, makees The parts shown for unit can be or may not be physical location, i.e. may be located at a place, Or can also be distributed on multiple NE.Can select according to the actual needs part therein or The whole unit of person realizes the purpose of the present embodiment scheme.
It addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit In, it is also possible to it is that unit is individually physically present, it is also possible to two or more unit are integrated in one In individual unit.Above-mentioned integrated unit both can realize to use the form of hardware, it would however also be possible to employ hardware adds The form of SFU software functional unit realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer In read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, including some fingers Make with so that a computer equipment (can be personal computer, server, or the network equipment etc.) Or processor (processor) performs the part steps of method described in each embodiment of the present invention.And it is aforementioned Storage medium include: USB flash disk, portable hard drive, read-only storage (Read-Only Memory, ROM), Random access memory (Random Access Memory, RAM), magnetic disc or CD etc. are various can To store the medium of program code.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all at this Within the spirit of invention and principle, any modification, equivalent substitution and improvement etc. done, should be included in Within the scope of protection of the invention.

Claims (20)

1. the training method of a prediction model parameter, it is characterised in that including:
Building the training dataset that preset kind service is corresponding, described training dataset includes presetting from described The sample data filtered out in the relevant user of type of service and the communication record of ISP;
The training of described training dataset is utilized to obtain the prediction model parameter corresponding with the service of described preset kind, Described prediction model parameter includes the term vector square being made up of the term vector of the comprised word of described training dataset Battle array, the distribution of weights vector corresponding to term vector matrix.
Method the most according to claim 1, it is characterised in that described sample data includes customer problem, ISP answers, service quality mark.
Method the most according to claim 1, it is characterised in that utilize described training dataset to train The prediction model parameter relevant to described preset kind service includes:
Set up parameterized term vector matrix, parameterized described distribution of weights vector;
Initialize the parameter in described term vector matrix, the parameter in described distribution of weights vector;
Use the iterative algorithm preset, to the parameter in described term vector matrix, in described distribution of weights vector Parameter be iterated, until it reaches preset stopping criterion for iteration.
Method the most according to claim 1, it is characterised in that utilize described training dataset to train The prediction model parameter relevant to described preset kind service also includes:
Offset parameter is estimated in initialization;
Use the iterative algorithm preset, described offset parameter of estimating is iterated, until it reaches that presets changes For end condition.
Method the most according to claim 3, it is characterised in that described set up parameterized term vector square Battle array includes:
Described sample data is carried out participle;
The term vector of each word that parametrization participle obtains;
Term vector matrix is constituted by the term vector of each word obtained after parameterizing.
Method the most according to claim 3, it is characterised in that described stopping criterion for iteration includes:
Reach default iterations;Or
The value of the loss function obtained after current iteration is less than predetermined target value;Or
The value of the loss function that the loss function that current iteration obtains after terminating and last iteration obtain after terminating it Difference is less than the threshold value preset;Wherein, the service quality that described loss function is estimated according to training dataset is divided Distance between the service quality mark that number and training data are concentrated determines.
7. a service quality predictor method, it is characterised in that the method includes:
Obtain the communication record of user and the ISP of certain type of service;
Inquire about the prediction model parameter relevant to described service, obtain the term vector square that described communication record is corresponding Battle array;
The term vector matrix corresponding according to described communication record and described prediction model parameter, calculate described clothes The service quality mark of business supplier;
Wherein said prediction model parameter is to use the method training described in the arbitrary claim of claim 1~6 to obtain 's.
Method the most according to claim 7, it is characterised in that the described communication record of described acquisition is corresponding Term vector matrix, including:
Extract at least one according to described communication record and answered, by customer problem and ISP, the binary constituted Right;
Customer problem QU and ISP to each binary centering answer AN and carry out participle;
Inquire about the term vector matrix relevant to described service, obtain each binary to the word of central all words to Amount.
Method the most according to claim 7, it is characterised in that the described ISP's of described calculating Service quality mark, including:
Utilize default vector calculation that to corresponding term vector matrix computations, each binary is obtained each two Unit is to corresponding term vector;
According to each binary corresponding term vector and described prediction model parameter calculated each binary to right The service quality mark answered;
Divide according to the service quality that each binary calculates described ISP to corresponding service quality mark Number.
Method the most according to claim 8, it is characterised in that described according to each binary to right The term vector answered and described prediction model parameter calculate the calculating to corresponding service quality mark of each binary Formula is as follows:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents The inner product of rep (QU-AN) and distribution of weights vector V, rep (QU-AN) represents that i-th binary is to corresponding Term vector, described b represents and estimates offset parameter.
The trainer of 11. 1 kinds of prediction model parameters, it is characterised in that including:
Construction unit, for building the training dataset that preset kind service is corresponding, described training dataset bag Include the sample filtered out from the communication record of the user relevant to described preset kind service and ISP Data;
Training unit is corresponding with the service of described preset kind for utilizing the training of described training dataset to obtain Prediction model parameter, described prediction model parameter includes by the term vector of the comprised word of described training dataset The term vector matrix constituted, the distribution of weights vector corresponding to term vector matrix.
12. devices according to claim 11, it is characterised in that described sample data includes that user asks Topic, ISP answers, service quality mark.
13. devices according to claim 11, it is characterised in that utilize described training dataset to train The prediction model parameter obtaining described preset kind service relevant includes:
Set up parameterized term vector matrix, parameterized described distribution of weights vector;
Initialize the parameter in described term vector matrix, the parameter in described distribution of weights vector;
Use the iterative algorithm preset, to the parameter in described term vector matrix, in described distribution of weights vector Parameter be iterated, until it reaches preset stopping criterion for iteration.
14. devices according to claim 11, it is characterised in that utilize described training dataset to train The prediction model parameter obtaining described preset kind service relevant also includes:
Offset parameter is estimated in initialization;
Use the iterative algorithm preset, described offset parameter of estimating is iterated, until it reaches that presets changes For end condition.
15. devices according to claim 13, it is characterised in that described set up parameterized term vector Matrix includes:
Described sample data is carried out participle;
The term vector of each word that parametrization participle obtains;
Term vector matrix is constituted by the term vector of each word obtained after parameterizing.
16. devices according to claim 13, it is characterised in that described stopping criterion for iteration includes:
Reach default iterations;Or
The value of the loss function obtained after current iteration is less than predetermined target value;Or
The value of the loss function that the loss function that current iteration obtains after terminating and last iteration obtain after terminating it Difference is less than the threshold value preset;Wherein, the service quality that described loss function is estimated according to training dataset is divided Distance between the service quality mark that number and training data are concentrated determines.
17. 1 kinds of service quality estimating devices, it is characterised in that this device includes:
First acquiring unit, for obtaining the communication record of user and the ISP of certain type of service;
Second acquisition unit, for the prediction model parameter that inquiry is relevant to described service, obtains described communication The term vector matrix that record is corresponding;
Computing unit, for the term vector matrix corresponding according to described communication record and described prediction model ginseng Number, calculates the service quality mark of described ISP;
Wherein said prediction model parameter is to use the device described in the arbitrary claim of claim 11~16 to train Arrive.
18. devices according to claim 17, it is characterised in that the described communication record pair of described acquisition The term vector matrix answered, including:
Extract at least one according to described communication record and answered, by customer problem and ISP, the binary constituted Right;
Customer problem QU and ISP to each binary centering answer AN and carry out participle;
Inquire about the term vector matrix relevant to described service, obtain each binary to the word of central all words to Amount.
19. devices according to claim 18, it is characterised in that the described ISP of described calculating Service quality mark, including:
Utilize default vector calculation that to corresponding term vector matrix computations, each binary is obtained each two Unit is to corresponding term vector;
According to each binary corresponding term vector and described prediction model parameter calculated each binary to right The service quality mark answered;
Divide according to the service quality that each binary calculates described ISP to corresponding service quality mark Number.
20. devices according to claim 19, it is characterised in that described according to each binary to right The term vector answered and described prediction model parameter calculate the calculating to corresponding service quality mark of each binary Formula is as follows:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents The inner product of rep (QU-AN) and distribution of weights vector V, rep (QU-AN) represents that i-th binary is to corresponding Term vector, described b represents and estimates offset parameter.
CN201610147605.2A 2016-03-15 2016-03-15 Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices Pending CN105760965A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610147605.2A CN105760965A (en) 2016-03-15 2016-03-15 Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610147605.2A CN105760965A (en) 2016-03-15 2016-03-15 Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices

Publications (1)

Publication Number Publication Date
CN105760965A true CN105760965A (en) 2016-07-13

Family

ID=56333188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610147605.2A Pending CN105760965A (en) 2016-03-15 2016-03-15 Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices

Country Status (1)

Country Link
CN (1) CN105760965A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171524A (en) * 2018-01-09 2018-06-15 安徽润谷网络科技有限公司 One kind is based on small-loan company's customer experience evaluation system
CN110782221A (en) * 2019-09-19 2020-02-11 丁玥 Intelligent interview evaluation system and method
CN111461340A (en) * 2020-03-10 2020-07-28 北京百度网讯科技有限公司 Weight matrix updating method and device and electronic equipment
CN116614431A (en) * 2023-07-19 2023-08-18 中国电信股份有限公司 Data processing method, device, electronic equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227435A (en) * 2008-01-28 2008-07-23 浙江大学 Method for filtering Chinese junk mail based on Logistic regression
CN102508907A (en) * 2011-11-11 2012-06-20 北京航空航天大学 Dynamic recommendation method based on training set optimization for recommendation system
CN104750674A (en) * 2015-02-17 2015-07-01 北京京东尚科信息技术有限公司 Man-machine conversation satisfaction degree prediction method and system
CN104915446A (en) * 2015-06-29 2015-09-16 华南理工大学 Automatic extracting method and system of event evolving relationship based on news

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227435A (en) * 2008-01-28 2008-07-23 浙江大学 Method for filtering Chinese junk mail based on Logistic regression
CN102508907A (en) * 2011-11-11 2012-06-20 北京航空航天大学 Dynamic recommendation method based on training set optimization for recommendation system
CN104750674A (en) * 2015-02-17 2015-07-01 北京京东尚科信息技术有限公司 Man-machine conversation satisfaction degree prediction method and system
CN104915446A (en) * 2015-06-29 2015-09-16 华南理工大学 Automatic extracting method and system of event evolving relationship based on news

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171524A (en) * 2018-01-09 2018-06-15 安徽润谷网络科技有限公司 One kind is based on small-loan company's customer experience evaluation system
CN110782221A (en) * 2019-09-19 2020-02-11 丁玥 Intelligent interview evaluation system and method
CN111461340A (en) * 2020-03-10 2020-07-28 北京百度网讯科技有限公司 Weight matrix updating method and device and electronic equipment
CN111461340B (en) * 2020-03-10 2023-03-31 北京百度网讯科技有限公司 Weight matrix updating method and device and electronic equipment
CN116614431A (en) * 2023-07-19 2023-08-18 中国电信股份有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN116614431B (en) * 2023-07-19 2023-10-03 中国电信股份有限公司 Data processing method, device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
EP3862893A1 (en) Recommendation model training method, recommendation method, device, and computer-readable medium
CN110427560B (en) Model training method applied to recommendation system and related device
EP4016432A1 (en) Method and apparatus for training fusion ordering model, search ordering method and apparatus, electronic device, storage medium, and program product
CN106251174A (en) Information recommendation method and device
WO2020140073A1 (en) Neural architecture search through a graph search space
US9110923B2 (en) Ranking over hashes
CN109829775A (en) A kind of item recommendation method, device, equipment and readable storage medium storing program for executing
WO2020082561A1 (en) Text input prediction method and apparatus, computer device, and storage medium
US11403700B2 (en) Link prediction using Hebbian graph embeddings
US20150356658A1 (en) Systems And Methods For Serving Product Recommendations
CN111291165B (en) Method and device for embedding training word vector into model
US20210366006A1 (en) Ranking of business object
CN105760965A (en) Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices
CN111400615B (en) Resource recommendation method, device, equipment and storage medium
CN112084307B (en) Data processing method, device, server and computer readable storage medium
NL2024312B1 (en) System and method for job profile matching
KR102412158B1 (en) Keyword extraction and analysis method to expand market share in the open market
CN109670161A (en) Commodity similarity calculating method and device, storage medium, electronic equipment
CN111125348A (en) Text abstract extraction method and device
CN112396492A (en) Conversation recommendation method based on graph attention network and bidirectional long-short term memory network
WO2020065611A1 (en) Recommendation method and system and method and system for improving a machine learning system
WO2016122575A1 (en) Product, operating system and topic based recommendations
CN110019563B (en) Portrait modeling method and device based on multi-dimensional data
CN113409157B (en) Cross-social network user alignment method and device
JP2021064132A (en) Question sentence output method, computer program and information processing device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160713

RJ01 Rejection of invention patent application after publication