CN108876098A - Determine the method and device of user quality - Google Patents

Determine the method and device of user quality Download PDF

Info

Publication number
CN108876098A
CN108876098A CN201810402323.1A CN201810402323A CN108876098A CN 108876098 A CN108876098 A CN 108876098A CN 201810402323 A CN201810402323 A CN 201810402323A CN 108876098 A CN108876098 A CN 108876098A
Authority
CN
China
Prior art keywords
user
data
quality
users
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810402323.1A
Other languages
Chinese (zh)
Inventor
胡军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201810402323.1A priority Critical patent/CN108876098A/en
Publication of CN108876098A publication Critical patent/CN108876098A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of method and devices of determining user quality, are related to Internet technical field.The method includes:Obtain the quality evaluation feature of multiple users, the quality evaluation feature includes user characteristics and at least one of the data characteristics of program data for belonging to each user, according to the quality evaluation feature, clustering processing is carried out to the multiple user, the user quality data of the multiple user are determined according to cluster result.The present invention can be improved the accuracy and efficiency of determining user quality data.

Description

Determine the method and device of user quality
Technical field
The present invention relates to Internet technical fields, more particularly to a kind of method and device of determining user quality.
Background technique
With the development of internet technology, network has been able to provide more and more facilitated for user.Usual user Data can be uploaded to server, thus to other user's sharing datas, if the user quality of the user is lower, the user The quality of uploaded data may also can be lower, to cause adverse effect to other users and network environment.
In the prior art, a large amount of user can be obtained in advance as sample of users, by mark personnel according to labeled standards Sample of users is labeled, then machine learning model is trained according to the sample of users after mark, using trained Model determines the user quality of other users.But due to also being had different needs in different application scene to user quality, and it is right The labeled standards that sample of users is labeled often are difficult to suit the actual demand of determining user quality, so as to cause being difficult to use Trained model accurately determines user quality data, while a large amount of labeling operation also results in the efficiency of determining user quality Lowly.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind State the method and device of the determination user quality of problem.
According to one aspect of the present invention, a kind of method of determining user quality is provided, including:
The quality evaluation feature of multiple users is obtained, the quality evaluation feature includes user characteristics and belongs to each user At least one of the data characteristics of program data;
According to the quality evaluation feature, clustering processing is carried out to the multiple user;
The user quality data of the multiple user are determined according to cluster result.
Optionally, the program data includes multi-medium data, and the data characteristics includes playing integrity degree, average bean vermicelli Number averagely thumbs up number, average review number, average broadcasting time, average definition and average review sentiment analysis data At least one of.
Optionally, described according to the quality evaluation feature, carrying out clustering processing to the multiple user includes:
PCA (principal components analysis, principal component analysis) is carried out to the quality evaluation feature to become It changes;
According to the transformed quality evaluation feature of PCA, clustering processing is carried out to the multiple user.
Optionally, the user quality data that the multiple user is determined according to cluster result include:
At least one user is extracted from cluster as sample of users;
Receive the user quality data for the sample of users submitted;
Using the user quality data of the sample of users as the user quality data of each user in the cluster.
It optionally, include sample of users in the multiple user, it is described to determine the multiple user's according to cluster result User quality data include:
Determine to include the sample of users in cluster;
Using the user quality data of the sample of users as the user quality data of each user in the cluster.
Optionally, after the user quality data for determining the multiple user according to cluster result, the method Further include:
According to the user quality data of the multiple user, determines and the order of the program data is provided;
The program data is provided according to the order.
According to another aspect of the present invention, a kind of device of determining user quality is provided, including:
Module is obtained, for obtaining the quality evaluation feature of multiple users, the quality evaluation feature includes user characteristics With at least one of the data characteristics of program data for belonging to each user;
Cluster module, for carrying out clustering processing to the multiple user according to the quality evaluation feature;
First determining module, for determining the user quality data of the multiple user according to cluster result.
Optionally, the program data includes multi-medium data, and the data characteristics includes playing integrity degree, average bean vermicelli Number averagely thumbs up number, average review number, average broadcasting time, average definition and average review sentiment analysis data At least one of.
Optionally, the cluster module includes:
Transformation submodule, for carrying out PCA transformation to the quality evaluation feature;
Submodule is clustered, for being carried out at cluster to the multiple user according to the transformed quality evaluation feature of PCA Reason.
Optionally, first determining module includes:
Submodule is extracted, for extracting at least one user from cluster as sample of users;
Receiving submodule, for receiving the user quality data for the sample of users submitted;
First determines submodule, for using the user quality data of the sample of users as each user in the cluster User quality data.
It optionally, include sample of users in the multiple user, first determining module includes:
Second determines submodule, includes the sample of users in cluster for determining;
Third determines submodule, for using the user quality data of the sample of users as each user in the cluster User quality data.
Optionally, described device further includes:
Second determining module determines for the user quality data according to the multiple user and provides the program data Order;
Module is provided, for providing the program data according to the order.
In embodiments of the present invention, user characteristics can be obtained and/or belongs to the data spy of the program data of each user Sign is used as quality evaluation feature, carries out clustering processing to multiple users according to quality evaluation feature, and then true based on cluster result The user quality data of fixed multiple user.Due to not needing to mark in advance according to labeled standards to a large amount of sample of users Note, thus reduce because largely mark caused by inefficiency the problem of, decrease since labeled standards are difficult to meet reality Accuracy caused by the demand of border passs lower problem, that is to say, improves the accuracy and efficiency of determining user quality data.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is a kind of step flow chart of the method for according to embodiments of the present invention one determining user quality;
Fig. 2 is a kind of step flow chart of the method for according to embodiments of the present invention two determining user quality;
Fig. 3 is the step flow chart that according to embodiments of the present invention two another kind determines the method for user quality;
Fig. 4 is a kind of structural block diagram of the device of according to embodiments of the present invention three determining user quality.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Embodiment one
Referring to Fig.1, the step flow chart that one of embodiment of the present invention determines the method for user quality is shown.Specifically Step includes:
Step 101, the quality evaluation feature of multiple users is obtained, the quality evaluation feature includes user characteristics and ownership In at least one of the data characteristics of program data of each user.
Since many factors such as different user institute education level, local environment and social experiences may be different, So the quality of different user also can difference, and user quality may directly influence this and be uploaded with user orientation server Data quality, and then influence network environment where the data and obtain other users of the data, therefore, in order to Convenient for the user quality of subsequent determining user, in order to which the data uploaded to user or user are managed, optimize network rings Border, the user characteristics of available user and/or belong to the user program data data characteristics, as the user's Quality evaluation feature.
Wherein, server may include providing the offer program data of C2C (Customer to Customer) type Server.User can register in the server, and to the server upload program data, which can be other User acquires.Certainly, in other application scenarios, which may also only obtain the program data uploaded, without The program data is provided to other users.
Program data may include multi-medium data, such as audio or video;It is of course also possible to include other types of number According to, such as the text datas such as novel, computer program code or webpage etc..
Wherein, the program data of user is belonged to, the program data that can be uploaded for the user.
Quality evaluation feature is feature for determining user quality, the quality evaluation feature may include user characteristics and Belong at least one of the data characteristics of program data of the user.When the feature that quality evaluation feature includes is more, more Comprehensively user quality can be evaluated, more can be improved the accuracy of subsequent determining user quality data.
User characteristics are the feature that user institute body has, such as frequency, the upload section of registration time length, upload program data The number or bean vermicelli number of mesh number evidence.Certainly, in practical applications, user characteristics can also include other feature, such as the age, Educational background or nationality etc..
Registration time length is the duration that user registers moment to current time in the server.Registration time length is longer, then illustrates A possibility that user is loyal user is higher, correspondingly, to obtain user quality also higher by the user.
In addition, higher, upload program data the number of the frequency of user's upload program data is more, number of fans is more, Then the user quality of the user is higher.
Can registered events to user, upload program data event, concern or cancel the customer incidents such as concern event into Row detection, so that it is determined that the user characteristics of user.It certainly, in practical applications, can be direct for age, educational background or nationality etc. The user characteristics submitted by user can also provide a user user characteristics and submit entrance, be submitted by the user characteristics Entrance receives the user characteristics submitted.
Customer incident is the event recorded to user's operation.For example, customer registration affair is to record user The event registered, wherein the information such as at the time of may include user's registration;Upload program data event is to record user The event of data is uploaded to server, wherein may include program data at the time of uploading data and uploaded.
Data characteristics is feature possessed by program data, for example size of data, playing duration, broadcasting time or thumbs up number Mesh.Certainly, in practical applications, data characteristics can also include other feature.
When size of data is bigger, playing duration is longer, broadcasting time is more, like time is more, illustrate to belong to the use The program data quality at family is higher, correspondingly, the user quality of the user is also higher.
It wherein, can be the data characteristics for belonging to multiple program datas of the user corresponding to the data characteristics of user The sum of, alternatively, to belong to the mean value of the data characteristics of multiple program datas of the user.
For example, the program data for belonging to user 1 includes program data 1, program data 2 and program data 3.Wherein, it saves Mesh number according to 1 playing duration is 1 minute, broadcasting time 1000, to thumb up number be 1000;The playing duration of program data 2 is 60 minutes, broadcasting time 12000, to thumb up number be 10000;The playing duration of program data 3 is 139 minutes, broadcasting time For 2000, to thumb up number be 1000.Therefore, if the data characteristics for corresponding to user is the multiple program datas for belonging to the user The sum of data characteristics, then be the data of program data 1, program data 2 and program data 3 corresponding to the data characteristics of user 1 The sum of feature, including playing duration is 200 minutes, broadcasting time 15000, to thumb up number be 12000;If corresponding to user's Data characteristics is to belong to the mean value of the data characteristics of multiple program datas of the user, then corresponds to the data characteristics of user 1 It is 67 minutes for the mean value of the data characteristics of program data 1, program data 2 and program data 3, including playing duration, broadcasting time Number for 5000, to thumb up number be 4000.
Customer incident for the program data can be counted, so that the data characteristics of the program data is obtained, For example, the broadcasting time of the program data can be determined as by for the number of the broadcast event of the program data;To be directed to should The number for thumbing up event of program data, be determined as the program data thumbs up number.Certainly, big for data in practical applications The data characteristicses such as small or playing duration directly can also be detected to program data or be received the use for uploading the program data It submits to obtain in family.
Step 102, according to the quality evaluation feature, clustering processing is carried out to the multiple user.
Since clustering processing can will include that the set of multiple objects be divided into the cluster being made of similar object, no matter In which kind of application scenarios, same cluster, user in same cluster can will be divided to the similar user of quality evaluation feature Quality evaluation data can be identical, so for the ease of the subsequent quality evaluation for determining each user according to cluster result Data are reduced when determining user quality data using machine learning model, meet actual demand due to being difficult to accurately formulate The lower problem of accuracy and due to caused by a large amount of labeling operations the problem of inefficiency caused by labeled standards, The accuracy and efficiency for determining user quality is improved, it can be according to quality evaluation feature to multiple user's clustering processings.
Clustering processing may include K-means (K mean value) cluster, hierarchical clustering, GMM (Gaussian Mixture Model, mixed Gauss model) cluster or spectral clustering may include that at least one is poly- in the obtained cluster result of clustering processing Class.Wherein, K-means cluster can be randomly chosen preset number (such as k) object, and each object initially represents one The average value of a class or center, i.e. k initial mass centers of selection;To remaining each object, according to it at a distance from all kinds of centers, It is assigned to nearest class;Then the average value of each class is recalculated.This process constantly repeats, until criterion function is received It holds back, mass center does not occur significantly to change.The principle of K-means cluster is simple, it is easy to accomplish, time complexity is low, can be improved The efficiency of cluster.Hierarchical clustering algorithm can the distance between first computing object, every time will be apart from nearest object merging to one Then a class calculates the distance between class and class, will merge into a major class apart from nearest class, until merging condition is not present Class.Hierarchical clustering algorithm does not need the number that cluster is previously set, and clustering rule is easy definition.GMM cluster is probabilistic type Clustering method, it is assumed that all objects are all to have the multivariate Gaussian distribution of some given parameters to be generated, based on given cluster Number K, is solved using EM (Expectation Maximization, expectation maximization) algorithm, finally obtains cluster result.Spectrum is poly- Similarity measure between vertex can be turned to connect the power of side E between vertex by class using each object as the vertex V in figure Value, to convert cluster to the division of figure, is then based on graph theory to obtain the undirected weighted graph G (V, E) based on similarity Optimal dividing criterion, make inside the subgraph being divided into that similarity is maximum, phase velocity is minimum between subgraph, to complete to cluster. Certainly, in practical applications, clustering processing can also be carried out to multiple users by other cluster modes.
User quality data are to illustrate the data of user quality, which can pass through numerical value, letter or symbol It number indicates.
Step 103, the user quality data of the multiple user are determined according to cluster result.
The user quality data of the user as included by same cluster can be identical, for each in cluster result Cluster, can user quality data by the user quality data of any user in the cluster, as all users in the cluster.
It may include at least one cluster in cluster result, may include at least one user in each cluster.
Cluster result can be supplied to related technical personnel, for respectively clustering in cluster result, receive the relevant technologies people The user quality data that member submits for any user, and using the user quality data as the user of all users in the cluster Qualitative data.
In embodiments of the present invention, user characteristics can be obtained and/or belongs to the data spy of the program data of each user Sign is used as quality evaluation feature, carries out clustering processing to multiple users according to quality evaluation feature, and then true based on cluster result The user quality data of fixed multiple user.Due to not needing to mark in advance according to labeled standards to a large amount of sample of users Note, thus reduce because largely mark caused by inefficiency the problem of, decrease since labeled standards are difficult to meet reality Accuracy caused by the demand of border passs lower problem, that is to say, improves the accuracy and efficiency of determining user quality data.
Embodiment two
Referring to Fig. 2, the step flow chart that one of embodiment of the present invention determines the method for user quality is shown.Specifically Step includes:
Step 201, the quality evaluation feature of multiple users is obtained, the quality evaluation feature includes user characteristics and ownership In at least one of the data characteristics of program data of each user.
Wherein, the mode for obtaining the quality evaluation feature of multiple users, may refer to the associated description in aforementioned, herein not It repeats one by one again.
In embodiments of the present invention, optionally, it in order to get the quality evaluation features of more various dimensions, improves subsequent The accuracy of clustering processing is carried out to user, and then improves the accuracy for determining user quality data, if the program data packets Multi-medium data is included, then the data characteristics may include playing integrity degree, average bean vermicelli number, averagely thumbing up number, is average At least one of number of reviews, average broadcasting time, average definition and average review sentiment analysis data.
Playing integrity degree is the integrated degree that multi-medium data plays.When the broadcasting integrity degree of program data is higher, then say Bright other users are higher to the satisfaction of the program data, and then the user quality of user that is belonged to of the program data is also It is higher.
The playing duration of each multi-medium data, will acquire in the available multiple multi-medium datas for belonging to same user To playing duration be compared respectively with playing duration threshold value, by the playing duration of multiple multi-medium data be greater than play when The number of long threshold value, the ratio with the sum of the broadcasting time of multiple multimedia programming, the broadcasting as the corresponding user are complete Degree.
Wherein, playing duration threshold value can by being determined in advance to obtain, for example, the playing duration threshold value can for 30 seconds or 60 seconds.
Average bean vermicelli number is the bean vermicelli number of the user, the ratio between the number for the program data for belonging to the user Value, to illustrate that the bean vermicelli of the user is directed to the average number for belonging to each program data of the user.
Averagely thumb up the mean value for thumbing up number that number is each program data in the multiple program datas for belong to the user.
Average review number is the mean value of the number of reviews of each program data in the multiple program datas for belong to the user.
Average broadcasting time is the mean value of the broadcasting time of each program data in the multiple program datas for belong to the user.
Average definition is the mean value of the clarity of each program data in the multiple program datas for belong to the user.Work as section The clarity of mesh number evidence is higher, then the user quality for the user that the program data is belonged to is also higher.
Average review sentiment analysis data are the comment feelings of each program data in the multiple program datas for belong to the user The mean value of sense analysis data.
It can be based on NLP (Neuro Linguistic Programming, neural LISP program LISP), to for number of programs According to comment carry out sentiment analysis, so that it is determined that be directed to the program data comment sentiment analysis data, the comment sentiment analysis The comment that data are used to illustrate to be directed to the program data is main negative or positive, and when positive comment is more, the comment feelings Sense analysis data are higher.
Wherein, NLP is the program to interact between the language and physical and mental statuse for studying people.
In addition, if program data includes multi-medium data, data are special in the exemplary another alternative embodiment of the present invention Sign can also include at least one of thumbing up total number, playing total degree and comment on total number.
Wherein, thumb up the multi-medium data that total number is home subscriber thumbs up the sum of number, plays total degree and is Played the sum of the number of the multi-medium data of home subscriber, comment total number are the multi-medium data for being directed to home subscriber The sum of number of reviews.
Step 202, PCA transformation is carried out to the quality evaluation feature.
Since quality evaluation feature may include more than one feature, may exist between different features certain , that is, there is synteny in the linear correlation of degree, there are the features of synteny may interact during clustering processing, The influence of one of feature opposite may weaken, and the influence of another feature opposite may enhance, so as to cause cluster As a result inaccurate.Therefore, in order to eliminate synteny that may be present between feature, the accuracy of cluster result is further increased, PCA transformation can be carried out to quality evaluating characteristic.
PCA transformation, also known as principal component analysis is a kind of technology of simplified data set, can will be counted by linear transformation According to being converted into new coordinate system, make the first big variance of any data projection in first coordinate (referred to as first principal component) On, the second largest variance on the second coordinate (referred to as Second principal component), and so on, later, calculate the variance contribution of principal component Rate calculates variance and accumulates contribution rate according to the variance contribution ratio of principal component, selects variance accumulation contribution rate to be greater than default variance tired Principal component corresponding to included variance contribution ratio when product contribution rate, due to principal component be according to the big minispread of variance, because This, selected principal component is biggish principal component, and the corresponding data of selected principal component are the number after PCA transformation According to.PCA transformation can be used in reducing the dimension of data set, while keep in data set to the maximum feature of variance contribution.
Variance contribution ratio refers to that variation caused by single common factor accounts for the ratio always to make a variation, illustrates this common factor to dependent variable Influence power size.In embodiments of the present invention, variance contribution ratio can be used as the important journey of corresponding user quality evaluating characteristic Degree.
Variance accumulates the sum of the variance contribution ratio that contribution rate is multiple common factors, and it is total to refer to that variation caused by multiple common factors accounts for Variation ratio illustrates all common factors to total influence power of dependent variable.Default variance accumulation contribution rate refers to that multiple common factors draw The variation risen accounts for the preset ratio always to make a variation.In embodiments of the present invention, when variance accumulation contribution rate is greater than default variance accumulation When contribution rate, it is main use that variance, which accumulates user quality evaluating characteristic corresponding to variance contribution ratio included by contribution rate, Family quality evaluation feature, and wherein the smallest variance contribution ratio can be used as significance level threshold value.
Wherein, presetting variance accumulation contribution rate can determine by way of receiving the numerical value of submission in advance etc..
For example, preset cumulative variance contribution ratio can be 80%, 85% or 90%.
In embodiments of the present invention, can will be aforementioned in the quality evaluation feature of multiple users that gets, become as PCA The input changed, so that output obtains main quality evaluation feature.Wherein, main quality evaluation feature can be significance level Greater than the quality evaluation feature of significance level threshold value, significance level threshold value can be by being determined in advance to obtain.
For example, the quality evaluation feature of 30 users is acquired, and each user has 20 quality evaluation features, because This, carries out PCA transformation for 30*20 user quality feature, obtains 30*5 user quality feature, i.e., after progress PCA transformation, Each user has 5 quality evaluation features, which can be big for significance level in 20 quality evaluation features In the quality evaluation feature of significance level threshold value, i.e., main quality evaluation feature.
Step 203, according to the transformed quality evaluation feature of PCA, clustering processing is carried out to the multiple user.
Quality evaluation feature after being converted according to PCA carries out clustering processing to multiple users, succeeds and gathered Class result.
Step 204, the user quality data of the multiple user are determined according to cluster result.
Wherein, the mode that the user quality data of multiple users are determined according to cluster result, may refer to the phase in aforementioned Description is closed, is no longer repeated one by one herein.
In embodiments of the present invention, optionally, in order to ensure needing not rely in advance according to labeled standards to great amount of samples Being labeled also can determine user quality data, and then improve the accuracy and efficiency for determining user quality data, Ke Yicong At least one user is extracted in cluster as sample of users, receives the user quality data for the sample of users of submission, Using the user quality data of the sample of users as the user quality data of each user in the cluster.
The sample of users of extraction can be supplied to related technical personnel, to receive related technical personnel for the sample The user quality data that user submits, so that it is determined that obtaining the user quality data of sample of users.If having extracted one from cluster A user, then can be using the user quality data of the user as the user quality number of each user in the cluster as sample of users According to;If having extracted more than one user from cluster as sample of users, this can be used sample of at least more than one The average value of the user quality data at family, the user quality data as each user in the cluster.
For example, including 15 users in cluster, user 1 is extracted from the cluster at random, related technical personnel are to the user 1 It is assessed, determines that the user quality data of the user 1 are 80, then can determine the user quality number of 15 users in the cluster According to being 80.
Certainly, in practical applications, the user quality data of sample of users can also be determined otherwise.
In embodiments of the present invention, optionally, in order to ensure need not rely in advance according to labeled standards to a large amount of samples into Rower note also can determine user quality data, and then improve the accuracy and efficiency for determining user quality data, the multiple It include sample of users in user, correspondingly, can determine in cluster includes the sample of users, by the user of the sample of users User quality data of the qualitative data as each user in the cluster.Certainly, however, it is determined that do not include the sample in the cluster User can then extract at least one user as sample of users from cluster, receive submission for the sample of users User quality data, using the user quality data of the sample of users as the user quality data of each user in the cluster.
Can obtain multiple users in advance, and determine the user quality data of multiple user, using multiple user as Sample of users and other users for not determining user quality data carry out clustering processing, so that obtaining includes the sample of users Cluster.
From the foregoing it will be appreciated that the embodiment of the present invention can be directed to the user quality data that sample of users is submitted according to user, come Determine the user quality data of user in cluster result.On the one hand, need to only there be a sample of users in each cluster, it just can be true The user quality data of each user, that is to say only it needs to be determined that the very few number (number clustered in cluster result in the fixed cluster Mesh) sample of users user quality data, this with the needs in user quality data procedures are determined using machine learning model It is entirely different for being labeled in advance to great amount of samples user;On the other hand, the sample of users in the embodiment of the present invention can be with Before clustering processing determine, can also extract and obtain after clustering processing, thus the sample with use machine learning mould Before type determines user quality data, the effect of the sample of users for being trained to machine learning model is also entirely different 's.
Step 204, it according to the user quality data of the multiple user, determines and the order of the program data is provided, press The program data is provided according to the order.
Since the higher user of user quality is capable of providing the higher program data of quality, in order to preferentially provide quality Higher program data improves and provides the effect of program data, can determine according to user quality data and provide program data Order, and according to the order provide program data.
The order of multiple user can be determined, according to provided program according to the user quality data of multiple users The order of user and multiple user that data are belonged to, determine the order of the program data.
For example, a kind of method and step flow chart of the determination user quality applied to video sharing platform can be such as Fig. 3 institute Show.
Step 301, feature extraction;
Wherein, the feature being drawn into may include video playing integrity degree, average bean vermicelli number, bean vermicelli number, on video Frequency, average broadcasting time are passed, total degree is played, averagely thumbs up number, thumb up total number, is average sentiment analysis data, clear Degree etc., can also include other feature in practical applications certainly.
Step 302, PCA is converted, i.e., carries out PAC transformation to the feature being drawn into aforementioned;
Step 303, clustering processing carries out clustering processing to user that is, according to the transformed feature of PCA;
Step 304, classification quality demarcation.
For each cluster, the sample of users of preset number (such as i.e. tens) is randomly selected, receives and is directed to sample of users The user quality data of submission, the user quality data as all users in the cluster.
Wherein, preset number can be by being determined in advance to obtain.
In embodiments of the present invention, firstly, user characteristics can be obtained and/or belong to the number of the program data of each user According to feature as quality evaluation feature, clustering processing is carried out to multiple users according to quality evaluation feature, and then based on cluster knot Fruit determines the user quality data of multiple user.Due to not needing to carry out in advance according to labeled standards to a large amount of sample of users Mark, thus reduce because largely mark caused by inefficiency the problem of, decrease since labeled standards are difficult to meet Accuracy caused by actual demand passs lower problem, that is to say, improves the accuracy and effect of determining user quality data Rate.
Secondly, PCA conversion can be carried out to the quality evaluation feature got, wherein there is linear correlation to reduce Quality evaluation feature may further improve the accuracy of cluster result to the influence of cluster result.
Furthermore it is possible to extract at least one user from cluster as sample of users, the user quality of sample of users is determined Data, and then identified user quality data are determined as to the user quality data of each user in the cluster, alternatively, determining poly- The user quality data of the sample of users are determined as the user quality number of each user in the cluster by existing sample of users in class According to ensuring that need not rely in advance to be labeled a large amount of samples according to labeled standards and also can determine user quality number According to, the accuracy and efficiency of determining user quality data is improved,
It should be noted that for the aforementioned method embodiment, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described, because according to According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know that, The embodiments described in the specification are all preferred embodiments, and related movement is not necessarily essential to the invention.
Embodiment five
Referring to Fig. 4, the structural block diagram that one of embodiment of the present invention determines the device of user quality is shown.The dress It sets and may include:
Module 401 is obtained, for obtaining the quality evaluation feature of multiple users, the quality evaluation feature includes user spy It seeks peace and belongs at least one of the data characteristics of program data of each user;
Cluster module 402, for carrying out clustering processing to the multiple user according to the quality evaluation feature;
First determining module 403, for determining the user quality data of the multiple user according to cluster result.
Optionally, the program data includes multi-medium data, and the data characteristics includes playing integrity degree, average bean vermicelli Number averagely thumbs up number, average review number, average broadcasting time, average definition and average review sentiment analysis data At least one of.
Optionally, the cluster module includes:
Transformation submodule, for carrying out PCA transformation to the quality evaluation feature;
Submodule is clustered, for being carried out at cluster to the multiple user according to the transformed quality evaluation feature of PCA Reason.
Optionally, first determining module includes:
Submodule is extracted, for extracting at least one user from cluster as sample of users;
Receiving submodule, for receiving the user quality data for the sample of users submitted;
First determines submodule, for using the user quality data of the sample of users as each user in the cluster User quality data.
It optionally, include sample of users in the multiple user, first determining module includes:
Second determines submodule, includes the sample of users in cluster for determining;
Third determines submodule, for using the user quality data of the sample of users as each user in the cluster User quality data.
Optionally, described device further includes:
Second determining module determines for the user quality data according to the multiple user and provides the program data Order;
Module is provided, for providing the program data according to the order.
In embodiments of the present invention, user characteristics can be obtained and/or belongs to the data spy of the program data of each user Sign is used as quality evaluation feature, carries out clustering processing to multiple users according to quality evaluation feature, and then true based on cluster result The user quality data of fixed multiple user.Due to not needing to mark in advance according to labeled standards to a large amount of sample of users Note, thus reduce because largely mark caused by inefficiency the problem of, decrease since labeled standards are difficult to meet reality Accuracy caused by the demand of border passs lower problem, that is to say, improves the accuracy and efficiency of determining user quality data.
For the Installation practice of above-mentioned determining user quality, since it is basically similar to the method embodiment, so It is described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It would have readily occurred to a person skilled in the art that be:Any combination application of above-mentioned each embodiment is all feasible, therefore Any combination between above-mentioned each embodiment is all embodiment of the present invention, but this specification exists as space is limited, This is not just detailed one by one.
There is provided herein determine the method and device of user quality not with any certain computer, virtual system or other set It is standby intrinsic related.Various general-purpose systems can also be used together with teachings based herein.As described above, construction has Structure required by the system of the present invention program is obvious.In addition, the present invention is also not directed to any particular programming language. It should be understood that can use various programming languages realizes summary of the invention described herein, and above to language-specific institute The description done is in order to disclose the best mode of carrying out the invention.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention:It is i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, such as right As claim reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool Thus claims of body embodiment are expressly incorporated in the specific embodiment, wherein each claim conduct itself Separate embodiments of the invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any Can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize the method and dress of determining user quality according to an embodiment of the present invention Set some or all functions of some or all components in scheme.The present invention is also implemented as executing institute here Some or all device or device programs of the method for description are (for example, computer program and computer program produce Product).It is such to realize that program of the invention can store on a computer-readable medium, or can have one or more The form of signal.Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or to appoint What other forms provides.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims (12)

1. a kind of method of determining user quality, which is characterized in that including:
The quality evaluation feature of multiple users is obtained, the quality evaluation feature includes user characteristics and the section for belonging to each user At least one of the data characteristics of mesh number evidence;
According to the quality evaluation feature, clustering processing is carried out to the multiple user;
The user quality data of the multiple user are determined according to cluster result.
2. the method according to claim 1, wherein the program data includes multi-medium data, the data Feature includes playing integrity degree, average bean vermicelli number, averagely thumbing up number, is average review number, average broadcasting time, average clear At least one of clear degree and average review sentiment analysis data.
3. the method according to claim 1, wherein described according to the quality evaluation feature, to the multiple User carries out clustering processing:
PCA transformation is carried out to the quality evaluation feature;
According to the transformed quality evaluation feature of PCA, clustering processing is carried out to the multiple user.
4. the method according to claim 1, wherein the use for determining the multiple user according to cluster result Family qualitative data includes:
At least one user is extracted from cluster as sample of users;
Receive the user quality data for the sample of users submitted;
Using the user quality data of the sample of users as the user quality data of each user in the cluster.
5. the method according to claim 1, wherein including sample of users, the basis in the multiple user Cluster result determines that the user quality data of the multiple user include:
Determine to include the sample of users in cluster;
Using the user quality data of the sample of users as the user quality data of each user in the cluster.
6. the method according to claim 1, wherein determining the multiple user's according to cluster result described After user quality data, the method also includes:
According to the user quality data of the multiple user, determines and the order of the program data is provided;
The program data is provided according to the order.
7. a kind of device of determining user quality, which is characterized in that including:
Module is obtained, for obtaining the quality evaluation feature of multiple users, the quality evaluation feature includes user characteristics and returns Belong at least one of the data characteristics of program data of each user;
Cluster module, for carrying out clustering processing to the multiple user according to the quality evaluation feature;
First determining module, for determining the user quality data of the multiple user according to cluster result.
8. device according to claim 7, which is characterized in that the program data includes multi-medium data, the data Feature includes playing integrity degree, average bean vermicelli number, averagely thumbing up number, is average review number, average broadcasting time, average clear At least one of clear degree and average review sentiment analysis data.
9. device according to claim 7, which is characterized in that the cluster module includes:
Transformation submodule, for carrying out PCA transformation to the quality evaluation feature;
Submodule is clustered, for carrying out clustering processing to the multiple user according to the transformed quality evaluation feature of PCA.
10. device according to claim 7, which is characterized in that first determining module includes:
Submodule is extracted, for extracting at least one user from cluster as sample of users;
Receiving submodule, for receiving the user quality data for the sample of users submitted;
First determines submodule, for using the user quality data of the sample of users as the user of each user in the cluster Qualitative data.
11. device according to claim 7, which is characterized in that in the multiple user include sample of users, described first Determining module includes:
Second determines submodule, includes the sample of users in cluster for determining;
Third determines submodule, for using the user quality data of the sample of users as the user of each user in the cluster Qualitative data.
12. device according to claim 7, which is characterized in that described device further includes:
Second determining module determines for the user quality data according to the multiple user and provides time of the program data Sequence;
Module is provided, for providing the program data according to the order.
CN201810402323.1A 2018-04-28 2018-04-28 Determine the method and device of user quality Pending CN108876098A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810402323.1A CN108876098A (en) 2018-04-28 2018-04-28 Determine the method and device of user quality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810402323.1A CN108876098A (en) 2018-04-28 2018-04-28 Determine the method and device of user quality

Publications (1)

Publication Number Publication Date
CN108876098A true CN108876098A (en) 2018-11-23

Family

ID=64326991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810402323.1A Pending CN108876098A (en) 2018-04-28 2018-04-28 Determine the method and device of user quality

Country Status (1)

Country Link
CN (1) CN108876098A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984191A (en) * 2011-09-07 2013-03-20 百度在线网络技术(北京)有限公司 Method and device and equipment used for determining behavior related quality information
CN103856789A (en) * 2014-03-13 2014-06-11 赛特斯信息科技股份有限公司 System and method for achieving OTT service quality guarantee based on user behavior analysis
US9454729B2 (en) * 2011-03-29 2016-09-27 Manyworlds, Inc. Serendipity generating method, system, and device
CN106446078A (en) * 2016-09-08 2017-02-22 乐视控股(北京)有限公司 Information recommendation method and recommendation apparatus
CN107426177A (en) * 2017-06-13 2017-12-01 努比亚技术有限公司 A kind of user behavior clustering method and terminal, computer-readable recording medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9454729B2 (en) * 2011-03-29 2016-09-27 Manyworlds, Inc. Serendipity generating method, system, and device
CN102984191A (en) * 2011-09-07 2013-03-20 百度在线网络技术(北京)有限公司 Method and device and equipment used for determining behavior related quality information
CN103856789A (en) * 2014-03-13 2014-06-11 赛特斯信息科技股份有限公司 System and method for achieving OTT service quality guarantee based on user behavior analysis
CN106446078A (en) * 2016-09-08 2017-02-22 乐视控股(北京)有限公司 Information recommendation method and recommendation apparatus
CN107426177A (en) * 2017-06-13 2017-12-01 努比亚技术有限公司 A kind of user behavior clustering method and terminal, computer-readable recording medium

Similar Documents

Publication Publication Date Title
CN108491529B (en) Information recommendation method and device
CN108540826B (en) Bullet screen pushing method and device, electronic equipment and storage medium
Rubinstein et al. A comparative study of image retargeting
Kennard et al. Evaluating word embeddings using a representative suite of practical tasks
CN111125574B (en) Method and device for generating information
CN106709318B (en) A kind of recognition methods of user equipment uniqueness, device and calculate equipment
CN112380859A (en) Public opinion information recommendation method and device, electronic equipment and computer storage medium
CN109729395A (en) Video quality evaluation method, device, storage medium and computer equipment
US10210214B2 (en) Scalable trend detection in a personalized search context
CN108959329B (en) Text classification method, device, medium and equipment
KR102078627B1 (en) Method and system for providing real-time feedback information associated with user-input contents
CN110334356A (en) Article matter method for determination of amount, article screening technique and corresponding device
KR101804967B1 (en) Method and system to recommend music contents by database composed of user's context, recommended music and use pattern
CN113688310B (en) Content recommendation method, device, equipment and storage medium
CN110019837B (en) User portrait generation method and device, computer equipment and readable medium
CN111061979A (en) User label pushing method and device, electronic equipment and medium
CN112131322A (en) Time series classification method and device
CN114037545A (en) Client recommendation method, device, equipment and storage medium
JP6169511B2 (en) Apparatus, program, and method for analyzing poster's psychological transition based on comment text
CN109740156B (en) Feedback information processing method and device, electronic equipment and storage medium
CN115222443A (en) Client group division method, device, equipment and storage medium
Torres-Tramón et al. Topic detection in Twitter using topology data analysis
CN113886697A (en) Clustering algorithm based activity recommendation method, device, equipment and storage medium
Xinchang et al. Movie recommendation algorithm using social network analysis to alleviate cold-start problem
CN108647227A (en) A kind of recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181123