CN106384197A

CN106384197A - Service quality evaluation method and device based on big data

Info

Publication number: CN106384197A
Application number: CN201610818704.9A
Authority: CN
Inventors: 李明; 崔岩; 沈雷
Original assignee: Beijing To Build A Financial Information Service Ltd By Share Ltd
Current assignee: Beijing To Build A Financial Information Service Ltd By Share Ltd
Priority date: 2016-09-13
Filing date: 2016-09-13
Publication date: 2017-02-08

Abstract

The invention provides a service quality evaluation method and device based on big data. The method comprises the steps: feature data related to the service is acquired, and the storage mode of the feature data is determined according to the service features; preprocessing is performed on the feature data; a basic marking model of the service quality is determined, and a feature weight vector in the basic marking model is to-be-determined; an industry weighting coefficient is determined for each feature of the feature data; the feature data is sampled according to industry, and training samples are obtained; a machine learning stochastic gradient descent algorithm is adopted to the training samples to obtain an optimal feature weight vector which can achieve local minimum in the error function of the machine learning stochastic gradient descent algorithm; the industry weighting coefficient and the optimal feature weight vector obtained by the machine learning are combined to obtain a final service quality model; and according to a quality ranking of different services organized manually, quantitative evaluation is performed on the final service quality model.

Description

A kind of evaluation the quality method and apparatus based on big data

Technical field

The present invention relates to machine learning field is and in particular to a kind of method of the evaluation the quality based on big data and dress Put.

Background technology

With the aggravation of the level of informatization, big data industry is arisen at the historic moment and development speed is surprising and has 4V characteristic, I.e. Volume (a large amount of), Velocity (at a high speed), Variety (various), Value (value).According to authoritative institution's prediction, the whole world is big Data Market scale is expected to reach 53,000,000,000 dollars in 2017.With deepening continuously of big data application, it is also applied to commonly Internet firm in, such as：The aspects such as business intelligence, data operation.Due to the characteristic of Internet firm, the operation shape of business Condition can by the quality on line (for example：Day any active ues etc.) synthetically reflect, but in prior art, lack how base Method and apparatus in mass data accurate evaluation quality of service.

Content of the invention

In view of this, embodiments provide a kind of evaluation the quality method and apparatus based on big data, with More accurately assess quality of service.

A kind of first aspect, there is provided evaluation the quality method based on big data, including：

Step one, obtains the characteristic related to business, and determines described characteristic according to the feature of business itself Storage mode；

Step 2, carries out pretreatment to described characteristic；

Step 3, determines the basic scoring model of quality of service, and the feature weight vector in the scoring model of described basis is To be determined；Industry weight coefficient is determined to each feature in described characteristic；Described characteristic is taken out by industry After sample, obtain training sample；Machine learning stochastic gradient descent algorithm is used to obtain optimal characteristics weight described training sample Vector, described optimal characteristics weight vectors meet the error function Local Minimum of described machine learning stochastic gradient descent algorithm； The described optimal characteristics weight vectors obtaining in conjunction with described industry weight coefficient and machine learning obtain final service quality model：

Step 4, the quality ranking of the different business being gone out according to manual sorting, to the described final industry obtaining in step 3 Business quality model carries out quantitative evaluation.

In conjunction with a first aspect, in the first possible implementation method of first aspect, described step 2 includes：To described Characteristic carries out data scrubbing, data integration, data normalization data stipulations.

In conjunction with the first possible implementation method of first aspect or first aspect, possible in the second of first aspect In implementation method, the basic scoring model of described quality of service is h (X)=θ X=θ₁x₁+θ₂x₂+…+θ_nx_n, wherein, h (X) generation Table quality of service, X=(x₁..., x_n) vectorial for input feature vector value, n represents n-th feature, θ_nRepresent the weight of n-th feature Value, x_nRepresent the input feature vector value of n-th feature, θ=(θ₁..., θ_n) it is feature weight vector to be determined.

In conjunction with the possible implementation method of any of the above, in the third possible implementation method of first aspect, institute State and industry weight coefficient is determined to each feature in characteristic, including：For different industries, manual sorting goes out to meet difference The industry weight coefficient of industrial characteristic, the industry weight coefficient being gone out according to described manual sorting, determine in described characteristic The industry weight coefficient of each feature.

In conjunction with the possible implementation method of any of the above, in the 4th kind of possible implementation method of first aspect, institute State and characteristic is sampled by industry, including：Different industries is sampled according to predetermined ratio, according to concrete business Emphasis adjust industry shared by training sample ratio.

In conjunction with the possible implementation method of the second of first aspect, in the 5th kind of possible implementation method of first aspect In, the error function of described machine learning stochastic gradient descent algorithm isWherein θ is to treat really Fixed feature weight vector, the sum of p representative sample, h (Xⁱ) represent the discreet value of i-th sample, yⁱRepresent i-th sample Actual value；

Described to training sample use machine learning stochastic gradient descent algorithm obtain optimal characteristics weight vectors, including： (1) random initializtion θ vector；(2) learning rate that setting declines(3) value of θ is updated toJ (θ) is become Little, whereinIt is the partial derivative to θ for the J；(4) until J (θ) convergence, after convergence, the value of θ is weighed as optimal characteristics for circulation execution (3) Weight vector θ *, it meets described error function Local Minimum.

In conjunction with the 5th kind of possible implementation method of first aspect, in the 6th kind of possible implementation method of first aspect In, the optimal characteristics weight vectors that described combination industry weight coefficient is obtained with machine learning obtain final service quality model, Including：According to described, each feature in described characteristic is determined under industry m obtaining in industry weight coefficient step Parameter θ '_mWith described optimal characteristics weight vectors θ *, and input feature vector value vector X=(x₁..., x_n), draw final service Quality model is：h^*(X)=θ^*·θ′_m·X^T.

In conjunction with the possible implementation method of any of the above, in the 7th kind of possible implementation method of first aspect, its It is characterised by, described step 4 is specially：Manual sorting goes out the quality ranking of different business, true according to step 3 to each business Fixed final service quality model calculates quality of service, then carries out the mass value of the mass value of each business and other business Relatively, if comparative result is consistent with the quality of service ranking that manual sorting goes out, for positive example, otherwise for negative example, positive example total Quantity is P, and the total quantity of negative example is N, then evaluation quality L is defined as：

A kind of second aspect, there is provided evaluation the quality device based on big data, including：

First module, for obtaining the characteristic related to business, and determines described spy according to the feature of business itself Levy the storage mode of data；

Second module, for carrying out pretreatment to described characteristic；

Three module, for determining the basic scoring model of quality of service, the feature weight in the scoring model of described basis Vector is to be determined；Industry weight coefficient is determined to each feature in described characteristic；Industry is pressed to described characteristic After being sampled, obtain training sample；Described training sample is used machine learning stochastic gradient descent algorithm obtain optimum special Levy weight vectors, described optimal characteristics weight vectors meet the error function local of described machine learning stochastic gradient descent algorithm Minimum；The described optimal characteristics weight vectors obtaining in conjunction with described industry weight coefficient and machine learning obtain final service quality Model；

4th module, the quality ranking of the different business for being gone out according to manual sorting, to described in three module acquisition Final service quality model carries out quantitative evaluation.

A kind of third aspect, there is provided evaluation the quality device based on big data, including：

Memorizer, have program stored therein in described memorizer instruction；

At least one processor, for executing described program instruction；

Described program instruction by during described computing device so that the method for described computing device first aspect.

Beneficial effects of the present invention are as follows：

The present invention according to the feature of different industries it is intended that the sampling proportion of training sample so that machine learning algorithm More accurately more stable evaluation the quality model can be trained.It has been simultaneously introduced the feature weight correcting mode by industry, Ensure that the quality of service of model evaluation can be contrasted between industry.

Brief description

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing Have technology description in required use accompanying drawing be briefly described it should be apparent that, drawings in the following description be only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, acceptable Other accompanying drawings are obtained according to these accompanying drawings.

Fig. 1 is a kind of flow chart of evaluation the quality method based on big data provided in an embodiment of the present invention；

Fig. 2 is the example flow diagram of characteristic pretreatment provided in an embodiment of the present invention；

Fig. 3 is the flow chart determining quality of service model provided in an embodiment of the present invention.

Specific embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation description is it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art obtained on the premise of not making creative work all its His embodiment, broadly falls into the scope of protection of the invention.

Fig. 1 is a kind of flow chart of evaluation the quality method based on big data provided in an embodiment of the present invention, including Following steps：

S101, the acquisition characteristic related to business, and described characteristic is determined according to the feature of business itself Storage mode.

Wherein, the mode obtaining the characteristic related to business includes：1) associated nets are obtained by data crawlers Stand data；2) own service data accumulation；3) third party's data company provides data.To protect during obtaining characteristic The accuracy of card overall data is with real-time it is ensured that timing data being capable of continuous updating.

Because different business has different characteristics, such as business can be divided into computation-intensive, I/O intensive type etc., so not The business of same type has different storage organization Technology Selections.It is thus desirable to according to the feature of business itself, selecting different Characteristic storage mode, carries out efficient storage, renewal, modification and the purpose applied to reach to the characteristic getting. Wherein, it is not very big for data volume and updates business infrequently, can be using the file storage characteristic number related to this business According to；For high access frequency and the business slightly higher to requirement of real-time, optional memory database (for example, redis etc.) storage The characteristic related to this business；But data volume slightly larger, data structure complicated business general for access frequency, may be selected Relevant database (for example, mysql etc.) the storage characteristic related to this business；Very big (for example, every time for data volume Data updates magnitude in more than GB) and the business not high to requirement of real-time, optional distributed data processing warehouse is (for example, Hadoop etc.) the storage characteristic related to this business.

S102, characteristic pretreatment.This step is used for the characteristic obtaining described in S101 step and store is entered The pretreatment such as row data scrubbing, data integration, data normalization, hough transformation, to obtain the structure more complete, accuracy is higher Change characteristic, be conducive to subsequently carrying out more accurately quality evaluation.

S103, determine the basic scoring model of quality of service, the feature weight vector in the scoring model of described basis is for treating Determine；Industry weight coefficient is determined to each feature in described characteristic；Described characteristic is sampled by industry After obtain training sample；To described training sample use machine learning stochastic gradient descent algorithm obtain optimal characteristics weight to Amount, it meets the error function Local Minimum of described machine learning stochastic gradient descent algorithm；In conjunction with described industry weight coefficient Obtain final service quality model with the described optimal characteristics weight vectors that machine learning obtains.

S104, the quality ranking of the different business being gone out according to manual sorting, to the final service matter obtaining in S103 step Amount model carries out quantitative evaluation.In this step, manual sorting go out different business quality ranking (only focus on order, for example：Before 100 etc.).According to the final service quality model that S103 determines, quality of service is calculated to each business, then by each business Mass value is compared with the mass value of other business, if comparative result is consistent with the quality of service ranking of manual sorting, For positive example, otherwise for bearing example, the total quantity of positive example is P, and the total quantity of negative example is N, then evaluation quality L is defined as：

L = \frac{P}{P + N} .

Can there is the assessment of a quantization according to evaluation quality L to the effect of final service quality model, thus being follow-up Optimize the standard that basis is provided.

Fig. 2 is the example flow diagram of characteristic pretreatment provided in an embodiment of the present invention.As shown in Fig. 2 characteristic Pretreatment specifically may include following steps：

S201, data scrubbing.Data scrubbing includes removing noise data and extraneous data, processes missing data, identification simultaneously Delete isolated point etc. to process.For example, for noise data, can be sentenced according to the extent of deviation with expected value (or meansigma methodss) Disconnected.Isolated point is that this distance to be measured by given threshold value apart from far point with the meansigma methodss of corresponding stochastic variable, leads to It is often the integral multiple of standard deviation.For missing data, using default value (difference be might have according to the business implication of data), all Value, mode, median etc. carry out completion process.Carry out filtering removal for repeating the noise datas such as record.

S202, data integration.Data entity expresses possibility incomplete same it is also possible to lead in the data of different data sources Cause the problem of data redundancy, need to carry out redundancy deletion data integration as the case may be.

S203, data normalization.Namely dimensionless process is carried out to characteristic, the value making characteristic is specified In span.

S204, hough transformation.Stipulations are carried out to the mass data in data base, for example, Data Dimensionality Reduction, restriction data value Scope etc..Data after stipulations remains close to keep the integrity of former data, but relatively small many of data volume, so dug The performance of pick and efficiency can be greatly improved.

Fig. 3 is the flow chart determining quality of service model provided in an embodiment of the present invention, and concrete steps are described in detail as follows：

S301, determine the basic scoring model of quality of service.

In one example, using scoring model based on linear model：

H (X)=θ X=θ₁x₁+θ₂x₂+…+θ_nx_n

Wherein, h (X) represents quality of service, X=(x₁..., x_n) vectorial for input feature vector value, n represents n-th feature, θ_n Represent the weighted value of n-th feature, x_nRepresent the input feature vector value of n-th feature, θ=(θ₁..., θ_n) it is feature to be determined Weight vectors.

S302, determine the industry weight coefficient of each feature.For different industries, manual sorting goes out a set of to meet the sector The weight coefficient of feature.Assume that industry is expressed as m, then the weight vector of industry m is expressed as θ '_m.

S303, the characteristic to acquisition are sampled by industry, obtain training sample.To different industries according to prior Designated ratio is sampled, can according to shared by the emphasis of concrete business adjusts industry training sample ratio.Its benefit exists In：For different business, the impact effect of different industries can be controlled.

S304, using machine learning stochastic gradient descent algorithm obtain optimal characteristics weight vectors.In order to draw in S301 The feature weight vector to be determined of basic scoring model, needs using the training sample in S303, using machine learning boarding steps Degree descent algorithm is learnt.Described stochastic gradient descent algorithm needs to build error function Wherein θ is characterized weight vectors, the sum of p representative sample, h (Xⁱ) represent the discreet value of i-th sample, yⁱRepresent i-th The actual value of bar sample.Accordingly, it would be desirable to ask meet the θ that J (θ) is minimum for optimal characteristics weight vectors.S304 specifically may include Following steps：

1) random initializtion θ vector.

2) learning rate that setting declines

3) value of θ is updated toJ (θ) is diminished.WhereinFor setting learning rate,It is J to θ Partial derivative, i.e. the gradient direction of J (θ).

4) circulate execution 3) until J (θ) convergence, after convergence, as optimal characteristics weight vectors θ *, it meets error to the value of θ Function Local Minimum.

S305, the optimal characteristics weight vectors obtaining with reference to industry weight coefficient and machine learning obtain final service quality Model.According to the parameter θ ' under industry m that S302 obtains_mWith the optimal characteristics weight vectors θ * drawing in S304, and defeated Enter feature value vector X=(x₁..., x_n) it can be deduced that final service quality model is：

h^*(X)=θ^*·θ′_m·X^T

The embodiment of the present invention additionally provides a kind of evaluation the quality device based on big data, can be used for executing aforementioned Method in embodiment and described in accompanying drawing 1-3.The described evaluation the quality device based on big data, including：

Second module, for carrying out pretreatment to described characteristic；

The embodiment of the present invention additionally provides a kind of evaluation the quality device based on big data, can be used for executing aforementioned Method in embodiment and described in accompanying drawing 1-3.Described memorizer and at least is included based on the evaluation the quality device of big data One processor, have program stored therein in described memorizer instruction, and at least one processor described refers to for executing described program Order, described program instruction by during described computing device so that in described computing device previous embodiment and accompanying drawing 1-3 described in Method.

Those skilled in the art can be understood that, for convenience and simplicity of description, the system of foregoing description, Device and the specific work process of unit, may be referred to the corresponding process in preceding method embodiment, will not be described here.

Those of ordinary skill in the art are it is to be appreciated that combine the mould of each example of the embodiments described herein description Block and method and step, being capable of being implemented in combination in electronic hardware or computer software and electronic hardware.These functions are actually To be executed with hardware or software mode, the application-specific depending on technical scheme and design constraint.Professional and technical personnel Each specific application can be used different methods to realize described function, but this realization is it is not considered that exceed The scope of the present invention.

One of ordinary skill in the art will appreciate that it is permissible for realizing all or part of step in above-described embodiment method Completed by program class instruction processing unit.If described function is realized and as independent product using in the form of SFU software functional unit When selling or using, can be stored in a computer read/write memory medium.Based on such understanding, the technology of the present invention Part that scheme substantially contributes to prior art in other words or this technical scheme partly can be with software product Form embodies, and this computer software product is stored in a storage medium, including some instructions with so that one is counted Calculate machine equipment (can be personal computer, server, or network equipment etc.) execution each embodiment methods described of the present invention All or part of step.And aforesaid storage medium includes：USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.

More than, the specific embodiment of the only present invention, but protection scope of the present invention is not limited thereto, any it is familiar with Those skilled in the art the invention discloses technical scope in, change or replacement can be readily occurred in, all should cover Within protection scope of the present invention.Therefore, protection scope of the present invention should be defined by scope of the claims.

Claims

1. a kind of evaluation the quality method based on big data is it is characterised in that include：

Step one, obtains the characteristic related to business, and determines depositing of described characteristic according to the feature of business itself Storage mode；

Step 2, carries out pretreatment to described characteristic；

Step 3, determines the basic scoring model of quality of service, the feature weight vector in the scoring model of described basis is for treating really Fixed；Industry weight coefficient is determined to each feature in described characteristic；After described characteristic is sampled by industry, Obtain training sample；Machine learning stochastic gradient descent algorithm is used to obtain optimal characteristics weight vectors described training sample, Described optimal characteristics weight vectors meet the error function Local Minimum of described machine learning stochastic gradient descent algorithm；In conjunction with institute State industry weight coefficient and obtain final service quality model with the described optimal characteristics weight vectors that machine learning obtains；

Step 4, the quality ranking of the different business being gone out according to manual sorting, to the described final service matter obtaining in step 3 Amount model carries out quantitative evaluation.

2. the evaluation the quality method based on big data as claimed in claim 1 is it is characterised in that described step 2 bag Include：Described characteristic is carried out with data scrubbing, data integration, data normalization data stipulations.

3. the evaluation the quality method based on big data as claimed in claim 1 or 2 is it is characterised in that described business matter The basic scoring model of amount is h (X)=θ X=θ₁x₁+θ₂x₂+…+θ_nx_n, wherein, h (X) represents quality of service, X=(x₁..., x_n) vectorial for input feature vector value, n represents n-th feature, θ_nRepresent the weighted value of n-th feature, x_nRepresent the defeated of n-th feature Enter eigenvalue, θ=(θ₁..., θ_n) it is feature weight vector to be determined.

4. as claim 1-3 any one described evaluation the quality method based on big data it is characterised in that described right Each feature in characteristic determines industry weight coefficient, including：For different industries, manual sorting goes out to meet different industries The industry weight coefficient of feature, the industry weight coefficient being gone out according to described manual sorting, determine each in described characteristic The industry weight coefficient of feature.

5. as claim 1-4 any one described evaluation the quality method based on big data it is characterised in that described right Characteristic is sampled by industry, including：Different industries is sampled according to predetermined ratio, according to the side of concrete business Emphasis adjusts the ratio of training sample shared by industry.

6. the evaluation the quality method based on big data as claimed in claim 3 it is characterised in that described machine learning with The error function of machine gradient descent algorithm isWherein θ is feature weight vector to be determined, The sum of p representative sample, h (Xⁱ) represent the discreet value of i-th sample, yⁱRepresent the actual value of i-th sample；

Described to training sample use machine learning stochastic gradient descent algorithm obtain optimal characteristics weight vectors, including：(1) with Machine initialization θ vector；(2) learning rate that setting declines(3) value of θ is updated toJ (θ) is diminished, whereinIt is the partial derivative to θ for the J；(4) until J (θ) convergence, after convergence, the value of θ is as optimal characteristics weight vectors for circulation execution (3) θ *, it meets described error function Local Minimum.

7. the evaluation the quality method based on big data as claimed in claim 6 is it is characterised in that described combination industry adds The optimal characteristics weight vectors that weight coefficient is obtained with machine learning obtain final service quality model, including：According to described to institute State each feature in characteristic and determine the parameter θ ' under industry m obtaining in industry weight coefficient step_mWith described optimum Feature weight vector θ *, and input feature vector value vector X=(x₁..., x_n), show that final service quality model is：h^*(X)= θ^*·θ′_m·X^T.

8. as claim 1-7 any one described evaluation the quality method based on big data it is characterised in that described step Rapid four are specially：Manual sorting goes out the quality ranking of different business, the final service matter that each business is determined according to step 3 Amount model calculates quality of service, is then compared the mass value of the mass value of each business and other business, if compared Result is consistent with the quality of service ranking that manual sorting goes out, then for positive example, otherwise for bearing example, the total quantity of positive example is P, negative example Total quantity is N, then evaluation quality L is defined as：

9. a kind of evaluation the quality device based on big data is it is characterised in that described commented based on the quality of service of big data Estimate device to include：

First module, for obtaining the characteristic related to business, and determines described characteristic number according to the feature of business itself According to storage mode；

Second module, for carrying out pretreatment to described characteristic；

Three module, for determining the basic scoring model of quality of service, the feature weight vector in the scoring model of described basis For to be determined；Industry weight coefficient is determined to each feature in described characteristic；Described characteristic is carried out by industry After sampling, obtain training sample；Described training sample is used machine learning stochastic gradient descent algorithm obtain optimal characteristics power Weight vector, described optimal characteristics weight vectors meet the error function local of described machine learning stochastic gradient descent algorithm Little；The described optimal characteristics weight vectors obtaining in conjunction with described industry weight coefficient and machine learning obtain final service quality mould Type；

4th module, the quality ranking of the different business for being gone out according to manual sorting, what three module was obtained is described final Quality of service model carries out quantitative evaluation.

10. a kind of evaluation the quality device based on big data is it is characterised in that described commented based on the quality of service of big data Estimate device to include：

Memorizer, have program stored therein in described memorizer instruction；

At least one processor, for executing described program instruction；

Described program instruction by during described computing device so that method as described in claim 1-8 for the described computing device.