CN105760965A - Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices - Google Patents
Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices Download PDFInfo
- Publication number
- CN105760965A CN105760965A CN201610147605.2A CN201610147605A CN105760965A CN 105760965 A CN105760965 A CN 105760965A CN 201610147605 A CN201610147605 A CN 201610147605A CN 105760965 A CN105760965 A CN 105760965A
- Authority
- CN
- China
- Prior art keywords
- term vector
- service
- service quality
- binary
- isp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Tourism & Hospitality (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a pre-estimated model parameter training method, a service quality pre-estimation method and corresponding devices. The pre-estimated model parameter training method comprises: constructing a training data set corresponding to a preset type of service, and utilizing the training data set to train learning to obtain pre-estimated model parameters corresponding to the preset type of service. The service quality pre-estimation method comprises: obtaining the communication records between a user and a service provider of a certain type of service; querying pre-estimated model parameters corresponding to the service, and obtaining a word vector matrix corresponding to the communication records; calculating the service quality scores of the service provider according to the word vector matrix corresponding to the communication records and the pre-estimated model parameters; and presenting the quality service of the service provider to the user according to the service quality scores of the service provider. The method can rapidly pre-estimate the service quality of the service provider according to user customized demands, and facilitate user selection of an appropriate service provider.
Description
[technical field]
The present invention relates to Computer Applied Technology field, particularly relate to the training side of a kind of prediction model parameter
Method, service quality predictor method and corresponding intrument.
[background technology]
In the middle of the offer that O2O (Online To Offline) services, some O2O services are had to be belonging to
Ultra-low frequency industry, such as Decoration Industry, wedding celebration industry, second-hand house intermediary industry etc., because user
The frequency using this type of O2O to service is the lowest, usually only one this type of can be used for twice to take in the middle of all one's life
Business, and the service quality of this type of service often relies on individual quality and the specialty journey of concrete practitioner
Degree.Meanwhile, for the O2O of other high frequencies services, the service content provided is according to specific use
The demand at family has the highest personalization, and therefore, the evaluation of service case before can not be good
The reference frame serviced as this.User whether concrete decision uses certain ISP when,
It is required to the service quality quickly estimating this attendant for this service of this user, it is thus possible to
This user is enough allowed to enjoy more satisfied service experience.
Estimating of the service quality of certain service of current ultralow frequency O2O industry mainly uses following side
Method: method one, user understands certain concrete working people by the way of to friend or acquaintance's consulting
The public praise of member, judges this quality serviced from intuition;Method two, user passes through browsing histories user
The indexs such as the evaluation content of the service case being provided certain attendant and service star are to judge this
The service quality of service.The method that both is traditional, although the reference that user can be given certain understands this
The service quality of attendant, but due to the particularity of ultralow frequency O2O service industry, these type of data one
As fewer, reference significance is limited;It addition, certain concrete service item has the strongest personalization,
Between service case and this service content in past not consistent, thus can mislead user to this service
Expection.
[summary of the invention]
The invention provides the training method of a kind of prediction model parameter, service quality predictor method and correspondence
Device, can be according to the service quality of the individual demand rapid Estimation ISP of user, it is simple to user
Select suitable ISP, improve the experience of user.
Concrete technical scheme is as follows:
A kind of training method of prediction model parameter, including:
Building the training dataset that preset kind service is corresponding, described training dataset includes presetting from described
The sample data filtered out in the relevant user of type of service and the communication record of ISP;
The training of described training dataset is utilized to obtain the prediction model parameter corresponding with the service of described preset kind,
Described prediction model parameter includes the term vector square being made up of the term vector of the comprised word of described training dataset
Battle array, the distribution of weights vector corresponding to term vector matrix.
According to one preferred embodiment of the present invention, described sample data includes customer problem, and ISP answers,
Service quality mark.
According to one preferred embodiment of the present invention, the training of described training dataset is utilized to obtain described preset kind clothes
The prediction model parameter that business is relevant includes:
Set up parameterized term vector matrix, parameterized described distribution of weights vector;
Initialize the parameter in described term vector matrix, the parameter in described distribution of weights vector;
Use the iterative algorithm preset, to the parameter in described term vector matrix, in described distribution of weights vector
Parameter be iterated, until it reaches preset stopping criterion for iteration.
According to one preferred embodiment of the present invention, the training of described training dataset is utilized to obtain described preset kind clothes
The prediction model parameter that business is relevant also includes:
Offset parameter is estimated in initialization;
Use the iterative algorithm preset, described offset parameter of estimating is iterated, until it reaches that presets changes
For end condition.
According to one preferred embodiment of the present invention, described set up parameterized term vector matrix and include:
Described sample data is carried out participle;
The term vector of each word that parametrization participle obtains;
Term vector matrix is constituted by the term vector of each word obtained after parameterizing.
According to one preferred embodiment of the present invention, described stopping criterion for iteration includes:
Reach default iterations;Or
The value of the loss function obtained after current iteration is less than predetermined target value;Or
The value of the loss function that the loss function that current iteration obtains after terminating and last iteration obtain after terminating it
Difference is less than the threshold value preset;Wherein, the service quality that described loss function is estimated according to training dataset is divided
Distance between the service quality mark that number and training data are concentrated determines.
A kind of service quality predictor method, the method includes:
Obtain the communication record of user and the ISP of certain type of service;
Inquire about the prediction model parameter relevant to described service, obtain the term vector square that described communication record is corresponding
Battle array;
The term vector matrix corresponding according to described communication record and described prediction model parameter, calculate described clothes
The service quality mark of business supplier;
Wherein said prediction model parameter is that the training method training using above-mentioned prediction model parameter obtains.
According to one preferred embodiment of the present invention, the term vector matrix that the described communication record of described acquisition is corresponding, bag
Include:
Extract at least one according to described communication record and answered, by customer problem and ISP, the binary constituted
Right;
Customer problem QU and ISP to each binary centering answer AN and carry out participle;
Inquire about the term vector matrix relevant to described service, obtain each binary to the word of central all words to
Amount.
According to one preferred embodiment of the present invention, the service quality mark of the described ISP of described calculating, bag
Include:
Utilize default vector calculation that to corresponding term vector matrix computations, each binary is obtained each two
Unit is to corresponding term vector;
According to each binary corresponding term vector and described prediction model parameter calculated each binary to right
The service quality mark answered;
Divide according to the service quality that each binary calculates described ISP to corresponding service quality mark
Number.
According to one preferred embodiment of the present invention, described according to each binary to corresponding term vector and described pre-
Estimate each binary of model parameter calculation as follows to the computing formula of corresponding service quality mark:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents
The inner product of rep (QU-AN) and distribution of weights vector V, rep (QU-AN) represents that i-th binary is to corresponding
Term vector, described b represents and estimates offset parameter.
A kind of trainer of prediction model parameter, including:
Construction unit, for building the training dataset that preset kind service is corresponding, described training dataset bag
Include the sample filtered out from the communication record of the user relevant to described preset kind service and ISP
Data;
Training unit is corresponding with the service of described preset kind for utilizing the training of described training dataset to obtain
Prediction model parameter, described prediction model parameter includes by the term vector of the comprised word of described training dataset
The term vector matrix constituted, the distribution of weights vector corresponding to term vector matrix.
According to one preferred embodiment of the present invention, described sample data includes customer problem, and ISP answers,
Service quality mark.
According to one preferred embodiment of the present invention, it is characterised in that utilize the training of described training dataset to obtain institute
The prediction model parameter stating preset kind service relevant includes:
Set up parameterized term vector matrix, parameterized described distribution of weights vector;
Initialize the parameter in described term vector matrix, the parameter in described distribution of weights vector;
Use the iterative algorithm preset, to the parameter in described term vector matrix, in described distribution of weights vector
Parameter be iterated, until it reaches preset stopping criterion for iteration.
According to one preferred embodiment of the present invention, the training of described training dataset is utilized to obtain described preset kind clothes
The prediction model parameter that business is relevant also includes:
Offset parameter is estimated in initialization;
Use the iterative algorithm preset, described offset parameter of estimating is iterated, until it reaches that presets changes
For end condition.
According to one preferred embodiment of the present invention, described set up parameterized term vector matrix and include:
Described sample data is carried out participle;
The term vector of each word that parametrization participle obtains;
Term vector matrix is constituted by the term vector of each word obtained after parameterizing.
According to one preferred embodiment of the present invention, described stopping criterion for iteration includes:
Reach default iterations;Or
The value of the loss function obtained after current iteration is less than predetermined target value;Or
The value of the loss function that the loss function that current iteration obtains after terminating and last iteration obtain after terminating it
Difference is less than the threshold value preset;Wherein, the service quality that described loss function is estimated according to training dataset is divided
Distance between the service quality mark that number and training data are concentrated determines.
A kind of service quality estimating device, this device includes:
First acquiring unit, for obtaining the communication record of user and the ISP of certain type of service;
Second acquisition unit, for the prediction model parameter that inquiry is relevant to described service, obtains described communication
The term vector matrix that record is corresponding;
Computing unit, for the term vector matrix corresponding according to described communication record and described prediction model ginseng
Number, calculates the service quality mark of described ISP;
Wherein said prediction model parameter is that the trainer training using above-mentioned prediction model parameter obtains.
According to one preferred embodiment of the present invention, the term vector matrix that the described communication record of described acquisition is corresponding, bag
Include:
Extract at least one according to described communication record and answered, by customer problem and ISP, the binary constituted
Right;
Customer problem QU and ISP to each binary centering answer AN and carry out participle;
Inquire about the term vector matrix relevant to described service, obtain each binary to the word of central all words to
Amount.
According to one preferred embodiment of the present invention, the service quality mark of the described ISP of described calculating, bag
Include:
Utilize default vector calculation that to corresponding term vector matrix computations, each binary is obtained each two
Unit is to corresponding term vector;
According to each binary corresponding term vector and described prediction model parameter calculated each binary to right
The service quality mark answered;
Divide according to the service quality that each binary calculates described ISP to corresponding service quality mark
Number.
According to one preferred embodiment of the present invention, described according to each binary to corresponding term vector and described pre-
Estimate each binary of model parameter calculation as follows to the computing formula of corresponding service quality mark:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents
The inner product of rep (QU-AN) and distribution of weights vector V, rep (QU-AN) represents that i-th binary is to corresponding
Term vector, described b represents and estimates offset parameter.
As can be seen from the above technical solutions, the present invention can be fast with the communication record of ISP according to user
Speed estimates the service quality of ISP quantitatively, it is simple to user selects suitable ISP, improves
The experience of user.
[accompanying drawing explanation]
Fig. 1 is the general principle block diagram of the embodiment of the present invention.
Fig. 2 is the flow chart of the training method of the prediction model parameter of the embodiment of the present invention one.
Fig. 3 is the schematic diagram of non-linear transform function.
Fig. 4 is the flow chart of the service quality predictor method of the embodiment of the present invention two.
Fig. 5 is the structural representation of the trainer of the prediction model parameter of the embodiment of the present invention three.
Fig. 6 is the structural representation of the service quality estimating device of the embodiment of the present invention four.
[detailed description of the invention]
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the accompanying drawings and specifically
Embodiment describes the present invention.
Fig. 1 is the general principle block diagram of the embodiment of the present invention.As it is shown in figure 1, first excavate from internet
The training data relevant to preset kind service gone out, and build training dataset.Train according to this data set
Study obtains optimum prediction model parameter.Finally, obtain the user ISP's with certain type of service
Communication record, inquires about the prediction model parameter relevant to described service, obtains the word that described communication record is corresponding
Vector matrix, the term vector matrix corresponding according to described communication record and described prediction model parameter, calculate
The service quality mark of described ISP, according to the service quality mark of described ISP by described
The service quality of ISP presents to user.
As in figure 2 it is shown, be the flow chart of the training method of the prediction model parameter of the embodiment of the present invention one.
The training method of this prediction model parameter includes:
S10, builds the training dataset that preset kind service is corresponding.
In the present embodiment, the corresponding training dataset of a type of service.Existing data are utilized to dig
Pick technology is excavated with pre-from the web page library of search engine (such as evaluation type webpage, or forum Web pages etc.)
If the sample data that type of service is relevant.Multiple triple, the plurality of ternary is generated according to sample data
Group composing training data set.Each triple table is shown as < customer problem, ISP's answer, service
Mass fraction>, it is abbreviated as<Q, A, S>.Such as, in house ornamentation service industry, triple be exemplified as <
The bricklayer level of your family how?Bricklayer level is the highest, does several family, woulds you please relieved.9>.Sample
The amount of notebook data is the biggest, and the scale of constructed training dataset is the biggest, follow-up instructs prediction model
The model parameter of this prediction model obtained by white silk is the most accurate.As a rule, described sample data is at least
Ten million magnitude.
When generating multiple triple according to sample data, a corresponding triple of sample data.Pin
To a sample data, data abstraction techniques is utilized to extract customer problem and service from described sample data
Supplier answers, and answers in one triple of generation according to the customer problem extracted and ISP
Described customer problem Q, ISP answers A.
The service quality mark S of described triple can be determined by the following manner: (1) by artificial according to institute
The customer problem and the supplier that state triple answer and manually mark service quality mark;(2) sample is worked as
When data include assessing ISP, divide according to the described service quality assessing generation triple
Number.Such as, when assessing in sample data be with star index to evaluate time, can be by different stars
Carrying out score normalization, such as, during a star, be 1 point, two stars are 3 to grade.When in sample data
Assess be with mark to evaluate time, as assessed with the mark between 0-100, this mark is marked
Standardization, such as 90 points standardization 9 grade.
S11, utilizes the training study of described training dataset to obtain corresponding pre-with the service of described preset kind
Estimate model parameter.
In the present embodiment, described prediction model parameter includes by the comprised word of described training dataset
The term vector matrix E that term vector is constituted, distribution of weights vector V corresponding to term vector matrix and estimate partially
Shifting parameter b.In other embodiments, described prediction model parameter can not also include estimating offset parameter b.
Described term vector matrix E parameter represents.Described term vector matrix by the word of multiple words to
Amount composition, described term vector is for be mapped to word in a vector space, by word in this vector space
Distribution situation is with a vector representation.In the present embodiment, word represents in the distribution of described vector space
The fine or not degree of the service quality of ISP.
Term vector matrix significance level in described vector space described in described distribution of weights vector representation.
In the present embodiment it can be understood as, training data concentrate the term vector corresponding to all triple exist
Significance level in vector space.Described offset parameter of estimating represents the skew of the evaluation to ISP
Amount.In the present embodiment it can be understood as the scope that fluctuates of the service quality mark of ISP.
Preferably as a kind of embodiment of S11, this S11 includes:
S110, sets up parameterized term vector matrix, and parameterized described distribution of weights is vectorial and estimates partially
Shifting parameter.
In setting up parameterized term vector matrix, specifically include:
(1) training data is concentrated each triple carry out participle.
Utilize participle technique that training data is concentrated customer problem Q and the ISP of each triple
Answer A and carry out participle.For example, it is assumed that customer problem Q to be divided into the text string of a length of M, will clothes
Business supplier answer the text string that A is divided into a length of N.Q in the most each triple is expressed as
(q1,q2...qM), A is expressed as (a1,a2...aN)。
(2) term vector of each word that parametrization participle obtains.
The term vector emb (Q) of all words of the customer problem Q of each triple is expressed as
(emb_q1,emb_q2...emb_qM), the ISP of each triple answers the word of all words of A
Vector emb (A) is expressed as (emb_a1,emb_a2...emb_aN).Emb (Q) be a line number be m, columns is
The matrix of emb_size.Emb (A) be a line number be n, columns is the matrix of emb_size.
(3) term vector matrix is constituted by the term vector of each word obtained after parameterizing.
Each word of all triple after participle is included in a term vector matrix, this term vector
A height of | V | of row of matrix, a width of emb_size of matrix column.Here | V | is all words being likely to occur
Number, i.e. dictionary size.The size of emb_size is preset value, and this preset value is an empirical value,
It is normally set up between position 50 to 1000.Each line parameter in this matrix is an a length of emb_size
Vector, be referred to as the term vector of word corresponding to this line.Follow-up can be by the instruction to prediction model parameter
Get the optimal solution of described term vector matrix, i.e. can get the term vector of word corresponding in each row
Optimal solution.
Parameterized described distribution of weights vector is an a length of 2*emb_size being made up of parameter
Vector.
S111, initializes the parameter in described term vector matrix, parameter in described distribution of weights vector and
Estimate offset parameter.
In the present embodiment, randomly generate set of number to initialize the parameter in described term vector matrix,
Parameter in described distribution of weights vector and estimate offset parameter.Default initial value can certainly be used
Initialize each parameter, initialize each parameter for example with the preset value between 0 to 0.01.
S112, uses the iterative algorithm preset, and to the parameter in described term vector matrix, described distribution is weighed
Parameter in weight vector and estimate offset parameter and be iterated, until it reaches the stopping criterion for iteration preset.
In the present embodiment, utilize the iterative algorithm preset, described training dataset uses loss letter
Number carrys out iteration and gets the parameter value in described term vector matrix, the parameter value in described distribution of weights vector
And estimate offset parameter values, the service quality mark that described loss function is estimated according to training dataset with
Distance between the service quality mark (i.e. actual service quality mark) that training data is concentrated determines.
The expression formula of loss function is as follows:
Wherein (rep (QA) V+b, Score represent in training study Score=, concentrate ternary according to training data
The customer problem of group and ISP answer the service quality mark estimated.S is to represent in triple
Service quality mark, i.e. actual service quality mark.It is described that (rep (QA) V represents rep (QA) and V's
Inner product.Described rep (QA) represents vector rep (Q) and rep (A) splicing obtained, and one a length of
The vector of 2*emb_size.Rep (Q) represents the term vector in each triple corresponding to customer problem, rep (A)
Represent that in each triple, ISP answers corresponding term vector.
In other embodiments, those skilled in the art can use other representations as required
Loss function, such as logarithm loss function, average loss function, absolute loss function etc..
Described default iterative algorithm be stochastic gradient descent method (Stochastic Gradient Descent,
SGD) with backpropagation (Back Propagation, BP) algorithm.Due to the data set scale set up
Cross hundred million, therefore described prediction model parameter is trained, it is possible to obtain a optimized prediction model
Parameter.SGD Yu BP algorithm belongs to the knowledge of those skilled in the art, only does summary description at this.
BP algorithm is the method for the gradient of a kind of effective calculating parameter.
In the present embodiment, the iteration thought of SGD is utilized, respectively to term vector matrix E, distribution of weights
Vector V and estimate offset parameter b and initialize, is trained by (referred to as mini-batch size)
Data set calculates term vector matrix E, distribution of weights vector V respectively and estimates offset parameter b gradient, root
According to the gradient updating initialized term vector matrix E of term vector matrix E, the method for renewal is to allow word every time
Vector matrix E deducts a set learning rate (learning rate) and is multiplied by calculated term vector square
The gradient of battle array E, to distribution of weights vector V and to estimate offset parameter b be also above-mentioned same update method,
After successive ignition, when iterating to the stopping criterion for iteration preset, thus obtain the term vector matrix of optimum
E, distribution of weights vector V and estimate offset parameter b.
After described default stopping criterion for iteration can be default iterations, or current iteration terminates
The difference of the value of the loss function that the loss function obtained and last iteration obtain after terminating is less than the threshold value preset
Or the value of loss function is less than predetermined target value.Described default iterations, default threshold value and pre-
If desired value is a preset value, it it is all empirical value.
Preferentially, default vector calculation owning the customer problem in each triple is wherein utilized
The term vector of word, the term vector of all words that server answers is respectively processed and obtains each ternary
Term vector rep (Q) corresponding to customer problem in group and the server in each triple answer corresponding
Term vector rep (A).
Obtain especially by the following manner:
(1), the term vector of words all in the customer problem in each triple is added and obtains rep (Q),
In server's answer in each triple, the term vector of all words carries out addition of vectors and obtains rep (A).
I.e. rep (Q)=emb_q1+emb_q2...+emb_qM,
Rep (A)=emb_a1+emb_a2...+emb_aN。
(2), to the rep (Q) in each triple, rep (A) be normalized respectively update described often
Rep (Q) in individual triple, rep (A).In the present embodiment, nonlinear transformation sigmoid function is used
Rep (Q), each element in rep (A) vector carries out nonlinear transformation respectively, it is therefore an objective to by rep (Q),
Each element in rep (A) vector normalizes in an interval.
Sigmoid function be one for the function carrying out nonlinear transformation, its definition and function curve diagram
As it is shown on figure 3, certainly, non-linear transform function can also use other the nonlinear function such as tanh.
Again will in each triple after conversion rep (Q), rep (A) carry out splicing a length and be
The vectorial rep (QA) of 2*emb_size.
After obtaining the prediction model parameter of service of preset kind, can be stored in storage device, it is simple to
When the follow-up user of having searches for this service, according to the prediction model parameter of this service, present this service to user
The service quality of supplier, it is simple to user weighs whether use this ISP.
As shown in Figure 4, it is the flow chart of service quality predictor method of the embodiment of the present invention two.This service
Quality predictor method includes:
S21, obtains the communication record of user and the ISP of certain type of service.
User is when selecting some type of service, it will usually in internet by voice or the form of text
Link up with the ISP of this service, in order to select suitable ISP.Therefore, at this
In embodiment, voice or the literary composition of the ISP of user and described service in preset time period can be obtained
The communication record of this form, described communication record includes the communication summary (such as chat record) of textual form
And the communication record of speech form.When for the communication record of speech form, first pass through speech recognition technology
The communication record of this speech form is converted to the communication record of textual form.
S22, inquires about the prediction model parameter relevant to described service, obtains described communication record corresponding
Term vector matrix.
In the present embodiment, the described prediction model parameter relevant to the type service is by embodiment one
Described in method training obtain, can according to the type of this service from storage device obtain and this service phase
The prediction model parameter trained closed, is the term vector matrix E relevant to described service, point
Cloth weight vectors V and estimate offset parameter b.A height of | V | of row of this matrix, matrix column is a width of
emb_size.Here the number (i.e. dictionary size) that | V | is all words being likely to occur, in this matrix
The vector that each row number is an a length of emb_size, be referred to as the word of word corresponding to this line to
Amount.Described term vector matrix is made up of the term vector of multiple words, described term vector for word is mapped to one to
In quantity space, by word in the distribution situation of this vector space with a vector representation.
Described term vector matrix E parameter represents.Described term vector matrix by the word of multiple words to
Amount composition, described term vector is for be mapped to word in a vector space, by word in this vector space
Distribution situation is with a vector representation.In the present embodiment, word represents in the distribution of described vector space
The fine or not degree of the service quality of ISP.
Term vector matrix significance level in described vector space described in described distribution of weights vector representation.
In the present embodiment it can be understood as, training data concentrate the term vector corresponding to all triple exist
Significance level in vector space.Described offset parameter of estimating represents the skew of the evaluation to ISP
Amount.In the present embodiment it can be understood as the scope that fluctuates of the service quality mark of ISP.
In other are implemented, it is also possible to described prediction model parameter can not also comprise estimates offset parameter.
Preferably as a kind of embodiment of S22, this S22 includes:
(1), extract at least one according to described communication record to be returned by customer problem QU and ISP
Answer the binary pair that AN is constituted, be abbreviated as<QU, AN>.
In the present embodiment, data abstraction techniques is utilized to extract K binary from described communication record
Right.K is positive integer.
(2) customer problem QU and ISP to each binary centering answer AN and carry out participle.
Specifically, utilize the participle technique each binary to extracting to carrying out participle.By each binary pair
In QU participle become be by word qu1,qu2,…,qumThe text string of a length of m of composition.By each
The AN participle of binary centering becomes by word an1,an2,…,annThe text string of a length of n of composition.
(3) inquire about the term vector matrix relevant to described service, obtain each binary to central all words
The term vector of language.
The term vector being obtained each binary centering customer problem QU by query word vector matrix is designated as:
Emb (QU)=emb_qu1,emb_qu2,…,emb_qum,
Being similar to, each binary centering ISP answers the term vector of AN, is designated as:
Emb (AN)=emb_an1,emb_an2,…,emb_ann。
Term vector matrix corresponding to described communication record i.e. by the word of each binary centering customer problem QU to
Amount and ISP answer the term vector of AN and constitute.
S23, the term vector matrix corresponding according to described communication record and prediction model parameter, calculate described
The service quality mark of ISP.
Preferably, the service quality mark calculating described ISP includes:
(1) utilize default vector calculation that corresponding term vector matrix computations is obtained by each binary
To each binary to corresponding term vector.
Specifically, by the term vector matrix corresponding to each binary centering customer problem, ISP returns
Answer corresponding term vector matrix to carry out addition respectively and obtain rep (QU), rep (AN), then to rep (QU),
Rep (AN) normalized updates described rep (QU), rep (AN).It is described below:
To each binary to for, emb (QU), emb (AN) carries out phase add operation respectively, i.e.
Rep (QU)=emb_qu1+emb_qu2...+emb_qum;
Rep (AN)=emb_an1+ emb_an2...+emb_ann;
Recycling non-linear transform function all normalizes to an interval model rep (QU), rep (AN) vector
In enclosing, as normalized in the range of 0 to 1, with normalized rep (QU), rep (AN) distinguishes more
New described rep (QU), rep (AN), it is preferable that described non-linear transform function can be sigmoid letter
Number.
Rep (QU), rep (AN) after each binary centering being updated are stitched together and are each binary centering
Corresponding term vector rep (QU-AN).Rep (QU-AN) is the expression of an a length of 2*emb_size
Vector.
In other embodiments, it would however also be possible to employ other the nonlinear function such as tanh.
(2) according to each binary, corresponding term vector and described prediction model parameter are calculated each two
Unit is to corresponding service quality mark.
Specifically, to a binary to for, according to binary to corresponding rep (QU-AN), described pre-
Estimate the distribution of weights vector V in model parameter and estimate offset parameter b and calculate this binary to corresponding clothes
Business mass fraction, specific formula for calculation is as follows:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents
The inner product of rep (QU-AN) and V.In other embodiments, above-mentioned computing formula can also be
Scorei=(rep (QU-AN) V.
(3) according to each binary, corresponding service quality mark is calculated the clothes of described ISP
Business mass fraction.
Assume to extract K binary pair altogether according to described communication record.So can be obtained by K service
Mass fraction: Score1,…ScoreK.In the present embodiment, described K service quality mark is calculated
Mean value is as the service quality mark to described ISP.This service quality mark is the highest, represents
The quality of this service of this ISP is the highest.
S24, according to the service quality mark of described ISP by the Service Quality of described ISP
Amount presents to user.
Preferably, directly the mark of this ISP can be presented to user, in other embodiments,
According to the mark of ISP, the service of ISP can also be divided into grade, by this grade in
Now give user.Such as, if service quality mark is 9 points, corresponding grade is fine.
When user by webpage after having exchanged Preset Time with certain ISP, pre-by the present invention
Estimate the service quality of this ISP in this preset time period, by the service quality of this ISP
Display is on the webpage that user is checked.Or when user is being exchanged with certain ISP by webpage
After, according to the request of user, present to the service quality being estimated this ISP by the present invention use
Family.Or after the multiple ISP of user exchanges, each ISP will be estimated by the present invention
Service quality, and ISP higher for service quality is presented to user.After being easy to user's measurement
Continue and provide service the need of this ISP.
As it is shown in figure 5, the structural representation of the trainer of the prediction model parameter of the embodiment of the present invention three.
This device includes: construction unit 100 and training unit 101.
Construction unit 100 is for building the training dataset that preset kind service is corresponding.
In the present embodiment, the corresponding training dataset of a type of service.Existing data are utilized to dig
Pick technology is excavated with pre-from the web page library of search engine (such as evaluation type webpage, or forum Web pages etc.)
If the sample data that type of service is relevant.Multiple triple, the plurality of ternary is generated according to sample data
Group composing training data set.Each triple table is shown as < customer problem, ISP's answer, service
Mass fraction>, it is abbreviated as<Q, A, S>.Such as, in house ornamentation service industry, triple be exemplified as <
The bricklayer level of your family how?Bricklayer level is the highest, does several family, woulds you please relieved.9>.Sample
The amount of notebook data is the biggest, and the scale of constructed training dataset is the biggest, follow-up instructs prediction model
The model parameter of this prediction model obtained by white silk is the most accurate.As a rule, described sample data is at least
Ten million magnitude.
When generating multiple triple according to sample data, a corresponding triple of sample data.Pin
To a sample data, data abstraction techniques is utilized to extract customer problem and service from described sample data
Supplier answers, and answers in one triple of generation according to the customer problem extracted and ISP
Described customer problem Q, ISP answers A.
The service quality mark S of described triple can be determined by the following manner: (1) by artificial according to institute
The customer problem and the supplier that state triple answer and manually mark service quality mark;(2) sample is worked as
When data include assessing ISP, divide according to the described service quality assessing generation triple
Number.Such as, when assessing in sample data be with star index to evaluate time, can be by different stars
Carrying out score normalization, such as, during a star, be 1 point, two stars are 3 to grade.When in sample data
Assess be with mark to evaluate time, as assessed with the mark between 0-100, this mark is marked
Standardization, such as 90 points standardization 9 grade.
Training unit 101 is used for utilizing the training study of described training dataset to obtain taking with described preset kind
The prediction model parameter that business is corresponding.
In the present embodiment, described prediction model parameter includes by the comprised word of described training dataset
The term vector matrix E that term vector is constituted, distribution of weights vector V corresponding to term vector matrix and estimate partially
Shifting parameter b.In other embodiments, described prediction model parameter can not also include estimating offset parameter b.
Described term vector matrix E parameter represents.Described term vector matrix by the word of multiple words to
Amount composition, described term vector is for be mapped to word in a vector space, by word in this vector space
Distribution situation is with a vector representation.In the present embodiment, word represents in the distribution of described vector space
The fine or not degree of the service quality of ISP.
Term vector matrix significance level in described vector space described in described distribution of weights vector representation.
In the present embodiment it can be understood as, training data concentrate the term vector corresponding to all triple exist
Significance level in vector space.Described offset parameter of estimating represents the skew of the evaluation to ISP
Amount.In the present embodiment it can be understood as the scope that fluctuates of the service quality mark of ISP.
Preferably, the training study of described training dataset is being utilized to obtain servicing corresponding with described preset kind
Prediction model parameter in, including:
Training unit 101 is additionally operable to set up parameterized term vector matrix, parameterized described distribution of weights
Vector and estimate offset parameter.
In setting up parameterized term vector matrix, specifically include:
(1) training data is concentrated each triple carry out participle.
Utilize participle technique that training data is concentrated customer problem Q and the ISP of each triple
Answer A and carry out participle.For example, it is assumed that customer problem Q to be divided into the text string of a length of M, will clothes
Business supplier answer the text string that A is divided into a length of N.Q in the most each triple is expressed as
(q1,q2...qM), A is expressed as (a1,a2...aN)。
(2) term vector of each word that parametrization participle obtains.
The term vector emb (Q) of all words of the customer problem Q of each triple is expressed as
(emb_q1,emb_q2...emb_qM), the ISP of each triple answers the word of all words of A
Vector emb (A) is expressed as (emb_a1,emb_a2...emb_aN).Emb (Q) be a line number be m, columns
Matrix for emb_size.Emb (A) be a line number be n, columns is the matrix of emb_size.
(3) term vector matrix is constituted by the term vector of each word obtained after parameterizing.
Each word of all triple after participle is included in a term vector matrix, this term vector
A height of | V | of row of matrix, a width of emb_size of matrix column.Here | V | is all words being likely to occur
Number, i.e. dictionary size.The size of emb_size is preset value, and this preset value is an empirical value,
It is normally set up between position 50 to 1000.Each line parameter in this matrix is an a length of emb_size
Vector, be referred to as the term vector of word corresponding to this line.Follow-up can be by the instruction to prediction model parameter
Get the optimal solution of described term vector matrix, i.e. can get the term vector of word corresponding in each row
Optimal solution.
Parameterized described distribution of weights vector is an a length of 2*emb_size being made up of parameter
Vector.
Training unit 101 is additionally operable to initialize the parameter in described term vector matrix, described distribution of weights to
Parameter in amount and estimate offset parameter.
In the present embodiment, randomly generate set of number to initialize the parameter in described term vector matrix,
Parameter in described distribution of weights vector and estimate offset parameter.Default initial value can certainly be used
Initialize each parameter, initialize each parameter for example with the preset value between 0 to 0.01.
Training unit 101 is additionally operable to use the iterative algorithm preset, to the parameter in described term vector matrix,
Parameter in described distribution of weights vector and estimate offset parameter and be iterated, until it reaches the iteration preset
End condition.
In the present embodiment, utilize the iterative algorithm preset, described training dataset uses loss letter
Number carrys out iteration and gets the parameter value in described term vector matrix, the parameter value in described distribution of weights vector
And estimate offset parameter values, the service quality mark that described loss function is estimated according to training dataset with
Distance between the service quality mark (i.e. actual service quality mark) that training data is concentrated determines.
The expression formula of loss function is as follows:
Wherein (rep (QA) V+b, Score represent in training study Score=, concentrate ternary according to training data
The customer problem of group and ISP answer the service quality mark estimated.S is to represent in triple
Service quality mark, i.e. actual service quality mark.It is described that (rep (QA) V represents rep (QA) and V's
Inner product.Described rep (QA) represents vector rep (Q) and rep (A) splicing obtained, and one a length of
The vector of 2*emb_size.Rep (Q) represents the term vector in each triple corresponding to customer problem, rep (A)
Represent that in each triple, ISP answers corresponding term vector.
In other embodiments, those skilled in the art can use other representations as required
Loss function, such as logarithm loss function, average loss function, absolute loss function etc..
Described default iterative algorithm be stochastic gradient descent method (Stochastic Gradient Descent,
SGD) with backpropagation (Back Propagation, BP) algorithm.Due to the data set scale set up
Cross hundred million, therefore described prediction model parameter is trained, it is possible to obtain a optimized prediction model
Parameter.SGD Yu BP algorithm belongs to the knowledge of those skilled in the art, only does summary description at this.
BP algorithm is the method for the gradient of a kind of effective calculating parameter.
In the present embodiment, the iteration thought of SGD is utilized, respectively to term vector matrix E, distribution of weights
Vector V and estimate offset parameter b and initialize, is trained by (referred to as mini-batch size)
Data set calculates term vector matrix E, distribution of weights vector V respectively and estimates offset parameter b gradient, root
According to the gradient updating initialized term vector matrix E of term vector matrix E, the method for renewal is to allow word every time
Vector matrix E deducts a set learning rate (learning rate) and is multiplied by calculated term vector square
The gradient of battle array E, to distribution of weights vector V and to estimate offset parameter b be also above-mentioned same update method,
After successive ignition, when iterating to the stopping criterion for iteration preset, thus obtain the term vector matrix of optimum
E, distribution of weights vector V and estimate offset parameter b.
After described default stopping criterion for iteration can be default iterations, or current iteration terminates
The difference of the value of the loss function that the loss function obtained and last iteration obtain after terminating is less than the threshold value preset
Or the value of loss function is less than predetermined target value.Described default iterations, default threshold value and pre-
If value is a preset value, it it is all empirical value.
Preferentially, default vector calculation owning the customer problem in each triple is wherein utilized
The term vector of word, the term vector of all words that server answers is respectively processed and obtains each ternary
Term vector rep (Q) corresponding to customer problem in group and the server in each triple answer corresponding
Term vector rep (A).
Obtain especially by the following manner:
(1), the term vector of words all in the customer problem in each triple is added and obtains rep (Q),
In server's answer in each triple, the term vector of all words carries out addition of vectors and obtains rep (A).
I.e. rep (Q)=emb_q1+ emb_q2...+emb_qM,
Rep (A)=emb_a1+ emb_a2...+emb_aN。
(2), to the rep (Q) in each triple, rep (A) be normalized respectively update described often
Rep (Q) in individual triple, rep (A).In the present embodiment, nonlinear transformation sigmoid letter is used
Several rep (Q), each element in rep (A) vector carries out nonlinear transformation respectively, it is therefore an objective to will
Rep (Q), each element in rep (A) vector normalizes in an interval.
Sigmoid function be one for the function carrying out nonlinear transformation, its definition and function curve diagram
As it is shown on figure 3, certainly, non-linear transform function can also use other the nonlinear function such as tanh.
Again will in each triple after conversion rep (Q), rep (A) carry out splicing a length and be
The vectorial rep (QA) of 2*emb_size.
After obtaining the prediction model parameter of service of preset kind, can be stored in storage device, it is simple to
When the follow-up user of having searches for this service, according to the prediction model parameter of this service, present this service to user
The service quality of supplier, it is simple to user weighs whether use this ISP.
As shown in Figure 6, it is the structural representation of service quality estimating device of the embodiment of the present invention four.Should
Device includes: the first acquiring unit 201, second acquisition unit 202, computing unit 203 and display unit
204。
First acquiring unit 201 is for obtaining the communication note of user and the ISP of certain type of service
Record.
User is when selecting some type of service, it will usually in internet by voice or the form of text
Link up with the ISP of this service, in order to select suitable ISP.Therefore, at this
In embodiment, voice or the literary composition of the ISP of user and described service in preset time period can be obtained
This formal communication record, described communication record includes the communication summary (such as chat record) of textual form
And the communication record of speech form.When for the communication record of speech form, first pass through speech recognition technology
The communication record of this speech form is converted to the communication record of textual form.
Second acquisition unit 202, for inquiring about the prediction model parameter relevant to described service, obtains described
The term vector matrix that communication record is corresponding.
In the present embodiment, the described prediction model parameter relevant to the type service is by embodiment one
Described in method training obtain, can according to the type of this service from storage device obtain and this service phase
The prediction model parameter trained closed, is the term vector matrix E relevant to described service, point
Cloth weight vectors V and estimate offset parameter b.A height of | V | of row of this matrix, matrix column is a width of
emb_size.Here the number (i.e. dictionary size) that | V | is all words being likely to occur, in this matrix
The vector that each row number is an a length of emb_size, be referred to as the word of word corresponding to this line to
Amount.Described term vector matrix is made up of the term vector of multiple words, described term vector for word is mapped to one to
In quantity space, by word in the distribution situation of this vector space with a vector representation.
Described term vector matrix E parameter represents.Described term vector matrix by the word of multiple words to
Amount composition, described term vector is for be mapped to word in a vector space, by word in this vector space
Distribution situation is with a vector representation.In the present embodiment, word represents in the distribution of described vector space
The fine or not degree of the service quality of ISP.
Term vector matrix significance level in described vector space described in described distribution of weights vector representation.
In the present embodiment it can be understood as, training data concentrate the term vector corresponding to all triple exist
Significance level in vector space.Described offset parameter of estimating represents the skew of the evaluation to ISP
Amount.In the present embodiment it can be understood as the scope that fluctuates of the service quality mark of ISP.
In other are implemented, it is also possible to described prediction model parameter can not also comprise estimates offset parameter.
Preferably, in obtaining the term vector matrix that described communication record is corresponding, including:
Second acquisition unit 202 is additionally operable to extract at least one by customer problem according to described communication record
QU and ISP answer the binary pair that AN is constituted, and are abbreviated as<QU, AN>.
In the present embodiment, data abstraction techniques is utilized to extract K binary from described communication record
Right.K is positive integer.
Second acquisition unit 202 is additionally operable to the customer problem QU to each binary centering and ISP
Answer AN and carry out participle.
Specifically, utilize the participle technique each binary to extracting to carrying out participle.By each binary pair
In QU participle become be by word qu1,qu2,…,qumThe text string of a length of m of composition.By each
The AN participle of binary centering becomes by word an1,an2,…,annThe text string of a length of n of composition.
Second acquisition unit 202 is additionally operable to inquire about the term vector matrix relevant to described service, obtains each
The binary term vector to central all words.
The term vector being obtained each binary centering customer problem QU by query word vector matrix is designated as:
Emb (QU)=emb_qu1,emb_qu2,…,emb_qum,
Being similar to, each binary centering ISP answers the term vector of AN, is designated as:
Emb (AN)=emb_an1,emb_an2,…,emb_ann。
Term vector matrix corresponding to described communication record i.e. by the word of each binary centering customer problem QU to
Amount and ISP answer the term vector of AN and constitute.
Computing unit 203 is for the term vector matrix corresponding according to described communication record and prediction model ginseng
Number, calculates the service quality mark of described ISP.
Preferably, the service quality mark calculating described ISP includes:
(1) utilize default vector calculation that corresponding term vector matrix computations is obtained by each binary
To each binary to corresponding term vector.
Specifically, by the term vector matrix corresponding to each binary centering customer problem, ISP returns
Answer corresponding term vector matrix to carry out addition respectively and obtain rep (QU), rep (AN), then to rep (QU),
Rep (AN) normalized updates described rep (QU), rep (AN).It is described below:
To each binary to for, emb (QU), emb (AN) carries out phase add operation respectively, i.e.
Rep (QU)=emb_qu1+ emb_qu2...+emb_qum;
Rep (AN)=emb_an1+ emb_an2...+emb_ann;
Recycling non-linear transform function all normalizes to an interval model rep (QU), rep (AN) vector
In enclosing, as normalized in the range of 0 to 1, with normalized rep (QU), rep (AN) distinguishes more
New described rep (QU), rep (AN), it is preferable that described non-linear transform function can be sigmoid letter
Number.
Rep (QU), rep (AN) after each binary centering being updated are stitched together and are each binary centering
Corresponding term vector rep (QU-AN).Rep (QU-AN) is the expression of an a length of 2*emb_size
Vector.
In other embodiments, it would however also be possible to employ other the nonlinear function such as tanh.
(2) according to each binary, corresponding term vector and described prediction model parameter are calculated each two
Unit is to corresponding service quality mark.
Specifically, to a binary to for, according to binary to corresponding rep (QU-AN), described pre-
Estimate the distribution of weights vector V in model parameter and estimate offset parameter b and calculate this binary to corresponding clothes
Business mass fraction, specific formula for calculation is as follows:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents
The inner product of rep (QU-AN) and V.In other embodiments, above-mentioned computing formula can also be
Scorei=(rep (QU-AN) V.
(3) according to each binary, corresponding service quality mark is calculated the clothes of described ISP
Business mass fraction.
Assume to extract K binary pair altogether according to described communication record.So can be obtained by K service
Mass fraction: Score1,…ScoreK.In the present embodiment, described K service quality mark is calculated
Mean value is as the service quality mark to described ISP.This service quality mark is the highest, represents
The quality of this service of this ISP is the highest.
Described service is provided by display unit 204 for the service quality mark according to described ISP
The service quality of person presents to user.
Preferably, directly the mark of this ISP can be presented to user, in other embodiments,
According to the mark of ISP, the service of ISP can also be divided into grade, by this grade in
Now give user.Such as, if service quality mark is 9 points, corresponding grade is fine.
When user by webpage after having exchanged Preset Time with certain ISP, pre-by the present invention
Estimate the service quality of this ISP in this preset time period, by the service quality of this ISP
Display is on the webpage that user is checked.Or when user is being exchanged with certain ISP by webpage
After, according to the request of user, present to the service quality being estimated this ISP by the present invention use
Family.Or after the multiple ISP of user exchanges, each ISP will be estimated by the present invention
Service quality, and ISP higher for service quality is presented to user.After being easy to user's measurement
Continue and provide service the need of this ISP.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and
Method, can realize by another way.Such as, device embodiment described above is only shown
Meaning property, such as, the division of described unit, be only a kind of logic function and divide, actual can when realizing
There to be other dividing mode.
The described unit illustrated as separating component can be or may not be physically separate, makees
The parts shown for unit can be or may not be physical location, i.e. may be located at a place,
Or can also be distributed on multiple NE.Can select according to the actual needs part therein or
The whole unit of person realizes the purpose of the present embodiment scheme.
It addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit
In, it is also possible to it is that unit is individually physically present, it is also possible to two or more unit are integrated in one
In individual unit.Above-mentioned integrated unit both can realize to use the form of hardware, it would however also be possible to employ hardware adds
The form of SFU software functional unit realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer
In read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, including some fingers
Make with so that a computer equipment (can be personal computer, server, or the network equipment etc.)
Or processor (processor) performs the part steps of method described in each embodiment of the present invention.And it is aforementioned
Storage medium include: USB flash disk, portable hard drive, read-only storage (Read-Only Memory, ROM),
Random access memory (Random Access Memory, RAM), magnetic disc or CD etc. are various can
To store the medium of program code.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all at this
Within the spirit of invention and principle, any modification, equivalent substitution and improvement etc. done, should be included in
Within the scope of protection of the invention.
Claims (20)
1. the training method of a prediction model parameter, it is characterised in that including:
Building the training dataset that preset kind service is corresponding, described training dataset includes presetting from described
The sample data filtered out in the relevant user of type of service and the communication record of ISP;
The training of described training dataset is utilized to obtain the prediction model parameter corresponding with the service of described preset kind,
Described prediction model parameter includes the term vector square being made up of the term vector of the comprised word of described training dataset
Battle array, the distribution of weights vector corresponding to term vector matrix.
Method the most according to claim 1, it is characterised in that described sample data includes customer problem,
ISP answers, service quality mark.
Method the most according to claim 1, it is characterised in that utilize described training dataset to train
The prediction model parameter relevant to described preset kind service includes:
Set up parameterized term vector matrix, parameterized described distribution of weights vector;
Initialize the parameter in described term vector matrix, the parameter in described distribution of weights vector;
Use the iterative algorithm preset, to the parameter in described term vector matrix, in described distribution of weights vector
Parameter be iterated, until it reaches preset stopping criterion for iteration.
Method the most according to claim 1, it is characterised in that utilize described training dataset to train
The prediction model parameter relevant to described preset kind service also includes:
Offset parameter is estimated in initialization;
Use the iterative algorithm preset, described offset parameter of estimating is iterated, until it reaches that presets changes
For end condition.
Method the most according to claim 3, it is characterised in that described set up parameterized term vector square
Battle array includes:
Described sample data is carried out participle;
The term vector of each word that parametrization participle obtains;
Term vector matrix is constituted by the term vector of each word obtained after parameterizing.
Method the most according to claim 3, it is characterised in that described stopping criterion for iteration includes:
Reach default iterations;Or
The value of the loss function obtained after current iteration is less than predetermined target value;Or
The value of the loss function that the loss function that current iteration obtains after terminating and last iteration obtain after terminating it
Difference is less than the threshold value preset;Wherein, the service quality that described loss function is estimated according to training dataset is divided
Distance between the service quality mark that number and training data are concentrated determines.
7. a service quality predictor method, it is characterised in that the method includes:
Obtain the communication record of user and the ISP of certain type of service;
Inquire about the prediction model parameter relevant to described service, obtain the term vector square that described communication record is corresponding
Battle array;
The term vector matrix corresponding according to described communication record and described prediction model parameter, calculate described clothes
The service quality mark of business supplier;
Wherein said prediction model parameter is to use the method training described in the arbitrary claim of claim 1~6 to obtain
's.
Method the most according to claim 7, it is characterised in that the described communication record of described acquisition is corresponding
Term vector matrix, including:
Extract at least one according to described communication record and answered, by customer problem and ISP, the binary constituted
Right;
Customer problem QU and ISP to each binary centering answer AN and carry out participle;
Inquire about the term vector matrix relevant to described service, obtain each binary to the word of central all words to
Amount.
Method the most according to claim 7, it is characterised in that the described ISP's of described calculating
Service quality mark, including:
Utilize default vector calculation that to corresponding term vector matrix computations, each binary is obtained each two
Unit is to corresponding term vector;
According to each binary corresponding term vector and described prediction model parameter calculated each binary to right
The service quality mark answered;
Divide according to the service quality that each binary calculates described ISP to corresponding service quality mark
Number.
Method the most according to claim 8, it is characterised in that described according to each binary to right
The term vector answered and described prediction model parameter calculate the calculating to corresponding service quality mark of each binary
Formula is as follows:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents
The inner product of rep (QU-AN) and distribution of weights vector V, rep (QU-AN) represents that i-th binary is to corresponding
Term vector, described b represents and estimates offset parameter.
The trainer of 11. 1 kinds of prediction model parameters, it is characterised in that including:
Construction unit, for building the training dataset that preset kind service is corresponding, described training dataset bag
Include the sample filtered out from the communication record of the user relevant to described preset kind service and ISP
Data;
Training unit is corresponding with the service of described preset kind for utilizing the training of described training dataset to obtain
Prediction model parameter, described prediction model parameter includes by the term vector of the comprised word of described training dataset
The term vector matrix constituted, the distribution of weights vector corresponding to term vector matrix.
12. devices according to claim 11, it is characterised in that described sample data includes that user asks
Topic, ISP answers, service quality mark.
13. devices according to claim 11, it is characterised in that utilize described training dataset to train
The prediction model parameter obtaining described preset kind service relevant includes:
Set up parameterized term vector matrix, parameterized described distribution of weights vector;
Initialize the parameter in described term vector matrix, the parameter in described distribution of weights vector;
Use the iterative algorithm preset, to the parameter in described term vector matrix, in described distribution of weights vector
Parameter be iterated, until it reaches preset stopping criterion for iteration.
14. devices according to claim 11, it is characterised in that utilize described training dataset to train
The prediction model parameter obtaining described preset kind service relevant also includes:
Offset parameter is estimated in initialization;
Use the iterative algorithm preset, described offset parameter of estimating is iterated, until it reaches that presets changes
For end condition.
15. devices according to claim 13, it is characterised in that described set up parameterized term vector
Matrix includes:
Described sample data is carried out participle;
The term vector of each word that parametrization participle obtains;
Term vector matrix is constituted by the term vector of each word obtained after parameterizing.
16. devices according to claim 13, it is characterised in that described stopping criterion for iteration includes:
Reach default iterations;Or
The value of the loss function obtained after current iteration is less than predetermined target value;Or
The value of the loss function that the loss function that current iteration obtains after terminating and last iteration obtain after terminating it
Difference is less than the threshold value preset;Wherein, the service quality that described loss function is estimated according to training dataset is divided
Distance between the service quality mark that number and training data are concentrated determines.
17. 1 kinds of service quality estimating devices, it is characterised in that this device includes:
First acquiring unit, for obtaining the communication record of user and the ISP of certain type of service;
Second acquisition unit, for the prediction model parameter that inquiry is relevant to described service, obtains described communication
The term vector matrix that record is corresponding;
Computing unit, for the term vector matrix corresponding according to described communication record and described prediction model ginseng
Number, calculates the service quality mark of described ISP;
Wherein said prediction model parameter is to use the device described in the arbitrary claim of claim 11~16 to train
Arrive.
18. devices according to claim 17, it is characterised in that the described communication record pair of described acquisition
The term vector matrix answered, including:
Extract at least one according to described communication record and answered, by customer problem and ISP, the binary constituted
Right;
Customer problem QU and ISP to each binary centering answer AN and carry out participle;
Inquire about the term vector matrix relevant to described service, obtain each binary to the word of central all words to
Amount.
19. devices according to claim 18, it is characterised in that the described ISP of described calculating
Service quality mark, including:
Utilize default vector calculation that to corresponding term vector matrix computations, each binary is obtained each two
Unit is to corresponding term vector;
According to each binary corresponding term vector and described prediction model parameter calculated each binary to right
The service quality mark answered;
Divide according to the service quality that each binary calculates described ISP to corresponding service quality mark
Number.
20. devices according to claim 19, it is characterised in that described according to each binary to right
The term vector answered and described prediction model parameter calculate the calculating to corresponding service quality mark of each binary
Formula is as follows:
Scorei=(rep (QU-AN) V+b,
Wherein ScoreiRepresent that i-th binary is to corresponding service quality mark.Rep (QU-AN) V represents
The inner product of rep (QU-AN) and distribution of weights vector V, rep (QU-AN) represents that i-th binary is to corresponding
Term vector, described b represents and estimates offset parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610147605.2A CN105760965A (en) | 2016-03-15 | 2016-03-15 | Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610147605.2A CN105760965A (en) | 2016-03-15 | 2016-03-15 | Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105760965A true CN105760965A (en) | 2016-07-13 |
Family
ID=56333188
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610147605.2A Pending CN105760965A (en) | 2016-03-15 | 2016-03-15 | Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105760965A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108171524A (en) * | 2018-01-09 | 2018-06-15 | 安徽润谷网络科技有限公司 | One kind is based on small-loan company's customer experience evaluation system |
CN110782221A (en) * | 2019-09-19 | 2020-02-11 | 丁玥 | Intelligent interview evaluation system and method |
CN111461340A (en) * | 2020-03-10 | 2020-07-28 | 北京百度网讯科技有限公司 | Weight matrix updating method and device and electronic equipment |
CN116614431A (en) * | 2023-07-19 | 2023-08-18 | 中国电信股份有限公司 | Data processing method, device, electronic equipment and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101227435A (en) * | 2008-01-28 | 2008-07-23 | 浙江大学 | Method for filtering Chinese junk mail based on Logistic regression |
CN102508907A (en) * | 2011-11-11 | 2012-06-20 | 北京航空航天大学 | Dynamic recommendation method based on training set optimization for recommendation system |
CN104750674A (en) * | 2015-02-17 | 2015-07-01 | 北京京东尚科信息技术有限公司 | Man-machine conversation satisfaction degree prediction method and system |
CN104915446A (en) * | 2015-06-29 | 2015-09-16 | 华南理工大学 | Automatic extracting method and system of event evolving relationship based on news |
-
2016
- 2016-03-15 CN CN201610147605.2A patent/CN105760965A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101227435A (en) * | 2008-01-28 | 2008-07-23 | 浙江大学 | Method for filtering Chinese junk mail based on Logistic regression |
CN102508907A (en) * | 2011-11-11 | 2012-06-20 | 北京航空航天大学 | Dynamic recommendation method based on training set optimization for recommendation system |
CN104750674A (en) * | 2015-02-17 | 2015-07-01 | 北京京东尚科信息技术有限公司 | Man-machine conversation satisfaction degree prediction method and system |
CN104915446A (en) * | 2015-06-29 | 2015-09-16 | 华南理工大学 | Automatic extracting method and system of event evolving relationship based on news |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108171524A (en) * | 2018-01-09 | 2018-06-15 | 安徽润谷网络科技有限公司 | One kind is based on small-loan company's customer experience evaluation system |
CN110782221A (en) * | 2019-09-19 | 2020-02-11 | 丁玥 | Intelligent interview evaluation system and method |
CN111461340A (en) * | 2020-03-10 | 2020-07-28 | 北京百度网讯科技有限公司 | Weight matrix updating method and device and electronic equipment |
CN111461340B (en) * | 2020-03-10 | 2023-03-31 | 北京百度网讯科技有限公司 | Weight matrix updating method and device and electronic equipment |
CN116614431A (en) * | 2023-07-19 | 2023-08-18 | 中国电信股份有限公司 | Data processing method, device, electronic equipment and computer readable storage medium |
CN116614431B (en) * | 2023-07-19 | 2023-10-03 | 中国电信股份有限公司 | Data processing method, device, electronic equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3862893A1 (en) | Recommendation model training method, recommendation method, device, and computer-readable medium | |
CN110427560B (en) | Model training method applied to recommendation system and related device | |
EP4016432A1 (en) | Method and apparatus for training fusion ordering model, search ordering method and apparatus, electronic device, storage medium, and program product | |
CN106251174A (en) | Information recommendation method and device | |
WO2020140073A1 (en) | Neural architecture search through a graph search space | |
US9110923B2 (en) | Ranking over hashes | |
CN109829775A (en) | A kind of item recommendation method, device, equipment and readable storage medium storing program for executing | |
WO2020082561A1 (en) | Text input prediction method and apparatus, computer device, and storage medium | |
US11403700B2 (en) | Link prediction using Hebbian graph embeddings | |
US20150356658A1 (en) | Systems And Methods For Serving Product Recommendations | |
CN111291165B (en) | Method and device for embedding training word vector into model | |
US20210366006A1 (en) | Ranking of business object | |
CN105760965A (en) | Pre-estimated model parameter training method, service quality pre-estimation method and corresponding devices | |
CN111400615B (en) | Resource recommendation method, device, equipment and storage medium | |
CN112084307B (en) | Data processing method, device, server and computer readable storage medium | |
NL2024312B1 (en) | System and method for job profile matching | |
KR102412158B1 (en) | Keyword extraction and analysis method to expand market share in the open market | |
CN109670161A (en) | Commodity similarity calculating method and device, storage medium, electronic equipment | |
CN111125348A (en) | Text abstract extraction method and device | |
CN112396492A (en) | Conversation recommendation method based on graph attention network and bidirectional long-short term memory network | |
WO2020065611A1 (en) | Recommendation method and system and method and system for improving a machine learning system | |
WO2016122575A1 (en) | Product, operating system and topic based recommendations | |
CN110019563B (en) | Portrait modeling method and device based on multi-dimensional data | |
CN113409157B (en) | Cross-social network user alignment method and device | |
JP2021064132A (en) | Question sentence output method, computer program and information processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160713 |
|
RJ01 | Rejection of invention patent application after publication |