CN104991899B - The recognition methods of user property and device - Google Patents

The recognition methods of user property and device Download PDF

Info

Publication number
CN104991899B
CN104991899B CN201510296833.1A CN201510296833A CN104991899B CN 104991899 B CN104991899 B CN 104991899B CN 201510296833 A CN201510296833 A CN 201510296833A CN 104991899 B CN104991899 B CN 104991899B
Authority
CN
China
Prior art keywords
user
sample
broadcasting
feature vector
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510296833.1A
Other languages
Chinese (zh)
Other versions
CN104991899A (en
Inventor
林锡雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu kugou business incubator management Co.,Ltd.
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201510296833.1A priority Critical patent/CN104991899B/en
Publication of CN104991899A publication Critical patent/CN104991899A/en
Application granted granted Critical
Publication of CN104991899B publication Critical patent/CN104991899B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Recognition methods and device the invention discloses a kind of user property, belong to network technique field.Method includes:Obtain first sample user set;Obtain user in first sample user set first plays set of records ends;First sample user set and the first broadcasting set of records ends are screened, the second sample of users set and second is obtained and plays set of records ends;Set of records ends is played based on the second sample of users set and second, generates eigenmatrix;The attribute information of feature vector and feature vector in feature based matrix builds disaggregated model;It is recorded according to the broadcasting of user to be identified, generates the feature vector of user to be identified;The feature vector of user to be identified is inputted into disaggregated model, exports the user property of user to be identified.The present invention plays record by the history of user to be identified, predicts the attribute informations such as gender, the age of the user to be identified, to obtain the basis for carrying out user service, can improve the accuracy of such as Multimedia Recommendation user service.

Description

The recognition methods of user property and device
Technical field
The present invention relates to network technique field, the recognition methods of more particularly to a kind of user property and device.
Background technology
With the development of network technology, more and more users use a network for the relevant amusement of various and multimedia and live It is dynamic, such as surf the Internet and listen song or watch movie.And due to the explosive growth of information content so that user is difficult fast in magnanimity information Speed finds oneself interested multimedia file.
In order to solve this problem, many network services provide recommendation function, such as attribute, preference letter according to user It ceases and is targetedly recommended for user.In actual life, such as gender, the difference of age attribute can cause user to more matchmakers The preference of body file type generates very big difference, and therefore, user property may be considered to recommending being affected for accuracy rate One factor.
Usually, user property can be embodied in the personal information of user, however, in actual use, it is most With the personal information of oneself will not be improved per family so that it is relatively low to the recommendation accuracy rate of this kind of user, user has been influenced indirectly To the use perception of application, user's viscosity of network service is influenced, therefore, there is an urgent need for a kind of recognition methods of user property, with solution The certainly problem in the prior art.
Invention content
In order to solve problem of the prior art, recognition methods and dress an embodiment of the present invention provides a kind of user property It puts.The technical solution is as follows:
On the one hand, an embodiment of the present invention provides a kind of recognition methods of user property, the method includes:
First sample user set is obtained, is included on platform in the first sample user set and registers and preserve category The user of property information;
Obtain user in the first sample user set first plays set of records ends, and described first plays set of records ends The multimedia document information played including user;
First sample user set and the first broadcasting set of records ends are screened, obtain the second sample of users collection It closes and second plays set of records ends;
Set of records ends is played based on the second sample of users set and second, generates eigenmatrix, the eigenmatrix Include the feature vector of user each in the second sample of users set, the feature vector of each user is according to described every The multimedia document information generation that a user is played;
Based on the feature vector and the attribute information of described eigenvector in the eigenmatrix, disaggregated model is built;
It is recorded according to the broadcasting of user to be identified, generates the feature vector of the user to be identified;
The feature vector of the user to be identified is inputted into the disaggregated model, the user for exporting the user to be identified belongs to Property.
Optionally, the first broadcasting set of records ends for obtaining user in the first sample user set includes:
Obtain the multimedia file letter that each user is played in preset time period in the first sample user set Breath.
Optionally, it is described that first sample user set and the first broadcasting set of records ends are screened, obtain the Two sample of users set and second play set of records ends, including:
It is screened out in gathering from the first sample user in preset time period and plays multimedia file number less than first in advance If the user of threshold value, the second sample of users set is obtained;
It is screened out from the described first broadcasting set of records ends and number is played in the preset time period less than the second default threshold The multimedia file of value obtains the second broadcasting set of records ends.
Optionally, set of records ends is played based on the second sample of users set and second, generation eigenmatrix includes:
For any one user in the second sample of users set, each multimedia that the user played is counted Word frequency and inverse document frequency of the file in the described second broadcasting set of records ends;
According to the user through counting the obtained word frequency and inverse document frequency of each multimedia file, each more matchmakers are generated The vector element of body file;
The vector element of each multimedia file is combined, obtains the broadcasting score value vector of the user;
The broadcasting score value vector of each user in the second sample of users set is combined, obtains playing score value square Battle array;
The broadcasting score matrix is subjected to dimensionality reduction, is arranged from big to small according to the characteristic value after dimensionality reduction, and the before choosing One preset number vector composition eigenmatrix.
Optionally, based on the feature vector and the attribute information of described eigenvector in the eigenmatrix, structure classification Model includes:
Attribute information based on the first eigenvector in the eigenmatrix and the first eigenvector is trained, Preliminary classification model is generated, the first eigenvector is preceding second preset number feature vector;
Attribute information based on second feature vector described in the second feature vector sum in the eigenmatrix is to described first Beginning disaggregated model is verified and is adjusted, and obtains the disaggregated model, and the second feature vector is to be removed in the eigenmatrix Feature vector other than the first eigenvector.
On the other hand, an embodiment of the present invention provides a kind of identification device of user property, described device includes:
User gathers acquisition module, and for obtaining first sample user set, the first sample user set includes The user of attribute information is registered and preserved on platform;
Set acquisition module is played, first for obtaining user in the first sample user set plays record set It closes, the first broadcasting set of records ends includes the multimedia document information that user is played;
Screening module for being screened to first sample user set and the first broadcasting set of records ends, obtains Second sample of users set and second plays set of records ends;
Matrix generation module plays set of records ends for being based on the second sample of users set and second, generates feature Matrix, the eigenmatrix include the feature vector of each user in the second sample of users set, each user's Feature vector is generated according to the multimedia document information that each user is played;
Modeling module, for based on the feature vector and the attribute information of described eigenvector in the eigenmatrix, structure Build disaggregated model;
Vector generation module, for being recorded according to the broadcasting of user to be identified, generate the feature of the user to be identified to Amount;
Identification module for the feature vector of the user to be identified to be inputted the disaggregated model, is waited to know described in output The user property of other user.
Optionally, it is described to play set acquisition module for obtaining in first sample user set each user pre- If the multimedia document information played in the period.
Optionally, the screening module is used in from the first sample user gathering screen out in preset time period to play Multimedia file number is less than the user of the first predetermined threshold value, obtains the second sample of users set;Record set is played from described first The multimedia file that broadcasting number in the preset time period is less than the second predetermined threshold value is screened out in conjunction, obtains the second broadcasting note Record set.
Optionally, the matrix generation module is used for for any one user in the second sample of users set, system It counts each multimedia file that the user played and plays word frequency and inverse document frequency in set of records ends described second;Root According to the user through counting the obtained word frequency and inverse document frequency of each multimedia file, generate each multimedia file to Secondary element;The vector element of each multimedia file is combined, obtains the broadcasting score value vector of the user;By described The broadcasting score value vector combination of each user in two sample of users set, obtains playing score matrix;By the broadcasting score value Matrix carries out dimensionality reduction, is arranged from big to small according to the characteristic value after dimensionality reduction, and the first preset number vector composition is special before selection Levy matrix.
Optionally, the modeling module is used for based on the first eigenvector in the eigenmatrix and the fisrt feature The attribute information of vector is trained, and generates preliminary classification model, and the first eigenvector is special for preceding second preset number Sign vector;Attribute information based on second feature vector described in the second feature vector sum in the eigenmatrix is to described initial Disaggregated model is verified and is adjusted, and obtains the disaggregated model, the second feature vector is removes institute in the eigenmatrix State the feature vector other than first eigenvector.
The advantageous effect that technical solution provided in an embodiment of the present invention is brought is:
Users of attribute information are left by using some to model the broadcasting record of multimedia file, it can be with The disaggregated model for carrying out Attribute Recognition is obtained, is recorded so as to be played based on the history of user to be identified, predicts that this is treated It identifies the attribute informations such as gender, the age of user, to obtain the basis for carrying out user service, such as Multimedia Recommendation can be improved The accuracy of user service.
Description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present invention, for For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings Attached drawing.
Fig. 1 is a kind of flow chart of the recognition methods of user property provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of the recognition methods of user property provided in an embodiment of the present invention;
Fig. 3 is a kind of identification device structure diagram of user property provided in an embodiment of the present invention;
Fig. 4 is the block diagram according to a kind of device 400 of identification for user property shown in an exemplary embodiment.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
Fig. 1 is a kind of flow chart of the recognition methods of user property provided in an embodiment of the present invention.Referring to Fig. 1, the side Method includes:
101st, first sample user set is obtained, is included on platform in the first sample user set and is registered and preserve There is the user of attribute information.
102nd, obtain user in the first sample user set first plays set of records ends, and described first plays record Set includes the multimedia document information that user is played.
103rd, first sample user set and the first broadcasting set of records ends are screened, obtains the second sample use Family is gathered and second plays set of records ends.
104th, set of records ends is played based on the second sample of users set and second, generates eigenmatrix, the feature Matrix includes the feature vector of each user in the second sample of users set, and the feature vector of each user is according to institute State the multimedia document information generation that each user is played.
105th, based on the feature vector and the attribute information of described eigenvector in the eigenmatrix, structure classification mould Type.
106th, it is recorded according to the broadcasting of user to be identified, generates the feature vector of the user to be identified.
107th, the feature vector of the user to be identified is inputted into the disaggregated model, exports the use of the user to be identified Family attribute.
Optionally, the first broadcasting set of records ends for obtaining user in the first sample user set includes:
Obtain the multimedia file letter that each user is played in preset time period in the first sample user set Breath.
Optionally, it is described that first sample user set and the first broadcasting set of records ends are screened, obtain the Two sample of users set and second play set of records ends, including:
It is screened out in gathering from the first sample user in preset time period and plays multimedia file number less than first in advance If the user of threshold value, the second sample of users set is obtained;
It is screened out from the described first broadcasting set of records ends and number is played in the preset time period less than the second default threshold The multimedia file of value obtains the second broadcasting set of records ends.
Optionally, set of records ends is played based on the second sample of users set and second, generation eigenmatrix includes:
For any one user in the second sample of users set, each multimedia that the user played is counted Word frequency and inverse document frequency of the file in the described second broadcasting set of records ends;
According to the user through counting the obtained word frequency and inverse document frequency of each multimedia file, each more matchmakers are generated The vector element of body file;
The vector element of each multimedia file is combined, obtains the broadcasting score value vector of the user;
The broadcasting score value vector of each user in the second sample of users set is combined, obtains playing score value square Battle array;
The broadcasting score matrix is subjected to dimensionality reduction, is arranged from big to small according to the characteristic value after dimensionality reduction, and the before choosing One preset number vector composition eigenmatrix.
Optionally, based on the feature vector and the attribute information of described eigenvector in the eigenmatrix, structure classification Model includes:
Attribute information based on the first eigenvector in the eigenmatrix and the first eigenvector is trained, Preliminary classification model is generated, the first eigenvector is preceding second preset number feature vector;
Attribute information based on second feature vector described in the second feature vector sum in the eigenmatrix is to described first Beginning disaggregated model is verified and is adjusted, and obtains the disaggregated model, and the second feature vector is to be removed in the eigenmatrix Feature vector other than the first eigenvector.
The alternative embodiment that any combination forms the disclosure may be used, herein no longer in above-mentioned all optional technical solutions It repeats one by one.
Fig. 2 is a kind of flow chart of the recognition methods of user property provided in an embodiment of the present invention.Referring to Fig. 2, the implementation Example specifically includes:
201st, first sample user set is obtained, is included on platform in first sample user set and is registered and preserve The user of attribute information.
Before user property identification is carried out, it can be obtained from the corresponding user profile database of platform with attribute letter The user of breath, the attribute information can refer to user's gender and age of user etc., can determine to study carefully according to current identification target Unexpectedly which kind of user is obtained, if currently identification target is user's gender, typing can be obtained from user profile database The user of user's gender if currently identification target is age of user, can obtain from user profile database and record Enter the user of age of user, it certainly, can be from user when identifying target as two dimension attribute of user's gender and age of user Document data base, which occupies, obtains the user of age of user and gender of typing, can also for other kinds of user property Using this kind of acquisition modes, the embodiment of the present invention does not repeat this.
It should be noted that the platform can refer to immediate communication platform, social networking application platform, Multi-media Service Platform or Other letters provide the platform of information service, and the embodiment of the present invention is not construed as limiting this.
Certainly, in order to ensure that sample is comprehensive, a certain number of users, example can be included in first sample user set Such as, 30 general-purpose families can be obtained as first sample user to gather, which, which is only for example, is used, to the reality of the embodiment of the present invention Border is not limited using number.
202nd, obtain user in first sample user set first plays set of records ends, the first broadcasting set of records ends The multimedia document information played including user.
Optionally, which can include:It is played in database of record from user's history, obtains first sample use The multimedia document information that each user is played in preset time period in the set of family.It should be noted that when obtaining, need Preset time period is set, to avoid influence of the excessively outmoded broadcasting record to current modeling process, enabling use is newer Broadcasting record carry out user property identification.The preset time period can be nearest 6 months in or it is 3 months nearest in etc. to user The actual play behavior representational period.
203rd, it is screened out in gathering from first sample user in preset time period and plays multimedia file number less than first The user of predetermined threshold value obtains the second sample of users set.
For a user, if the multimedia file number that the user plays in preset time period is pre- less than first If threshold value, the broadcasting preference of the user can not accurately be weighed by playing record, therefore, it is necessary to be screened to this kind of user, The users that number of songs is no more than 15 head are played within half a year for example, concentrating and screening out from first sample user.
Above-mentioned the second sample of users set by screening is denoted as { uj| j=1,2 ..., n, n+1, n+2 ... n+m }. Wherein, the preceding n user in set can be as training set user, and the rear m user in set can collect user as verification.
204th, it is screened out from the first broadcasting set of records ends and number is played in the preset time period less than the second default threshold The multimedia file of value obtains the second broadcasting set of records ends.
And for a multimedia file, if broadcasting number of the multimedia file in preset time period is less than Second predetermined threshold value, then its value as measurement user's broadcasting preference is also relatively small, and therefore, it is necessary to this kind of multimedia text Part is screened, for example, in above-mentioned half a year user plays record, removes multimedia text of the broadcasting number less than 6 in half a year Part.Still remaining multimedia file can be used as multimedia file dictionary after processing, be denoted as { Sj| j=1,2 ..., k }, k is Multimedia file number.
Above-mentioned steps 203,204 are that first sample user set and the first broadcasting set of records ends are screened, and are obtained The process of set of records ends is played to the second sample of users set and second.The first predetermined threshold value used in screening process and Second predetermined threshold value can be adjusted according to actual scene, and the embodiment of the present invention is not especially limited this.
205th, set of records ends is played based on the second sample of users set and second, generates eigenmatrix, the feature Matrix includes the feature vector of each user in the second sample of users set, and the feature vector of each user is according to institute State the multimedia document information generation that each user is played.
Specifically, which has procedure below:
205A, for each user in the second sample of users set, count each more matchmakers that the user played Word frequency and inverse document frequency of the body file in the second broadcasting set of records ends.
Specifically, each user u is countediTo multimedia file SjBroadcasting time fij, then broadcasting time is carried out certainly Right Logarithm conversion tfij=ln (fij), obtain the word frequency of each multimedia file.Then, the inverse text of each multimedia file is calculated Shelves frequencyWherein, njRepresent multimedia file SjBroadcasting number.
205B, the word frequency and inverse document frequency of each multimedia file obtained according to the user through statistics, generation are each The vector element of multimedia file.
Word frequency is multiplied w with inverse document frequencyij=tfij*idfj, you can obtain the vector of the multimedia file of the user Element di=(wi1, wi2... wik), wherein, i=1,2 ..., n+m, j=1,2 ..., k.
205C, the vector element of each multimedia file is combined, obtains the broadcasting score value vector of the user.
By the way that the vector element of multiple multimedia files of the user is combined, obtain the broadcasting score value of user to Amount.
Above-mentioned steps 205A-205C is actually based on tf-idf (Term Frequency-Inverse Document Frequency, term frequency-inverse document frequency) model, obtain each user broadcasting score value vector process, the broadcasting score value to Amount can be if-idf vectors.In this process, regard each user as a document, each multimedia that user is played File regards the word in document as, so as to generate the k dimensional vectors of the user.
205D, the broadcasting score value vector of each user in the second sample of users set is combined, obtains playing score value Matrix.
The broadcasting score value vector of user can be used as column vector, so as to form broadcasting score matrix, it is, of course, also possible to conduct Row vector, so as to form broadcasting score matrix, the embodiment of the present invention is not construed as limiting this.
205E, the broadcasting score matrix is subjected to dimensionality reduction, is arranged from big to small according to the characteristic value after dimensionality reduction, and before selection First preset number vector composition eigenmatrix.
It should be noted that the dimension of the broadcasting score matrix plays the multimedia file number one in set of records ends with second It causes, is a very huge dimension therefore, e.g., the multimedia file number in the second broadcasting set of records ends can reach 110,000 It is a, it that is to say, the broadcasting score matrix of user-multimedia file being combined by the broadcasting score value vector of m+n user is up to Therefore 110000 dimensions, sparse rate, need to carry out dimension-reduction treatment up to more than 99.5% to the matrix.Optionally, the embodiment of the present invention can be with Dimensionality reduction is carried out to matrix using svd algorithm.
Based on SVD (Singular Value Decomposition, singular value decomposition) algorithm, to the matrix M of a n*m Carry out matrix decomposition.Mn×m=Un×nSn×mVT m×m, wherein S is characteristic value, is arranged from big to small.For orthogonal basis Vm×m, there is VT mVm×m=1, therefore Mn×mVm×m=Un×nSn×m.R ties up M before extractionn×mVm×r≈Un×rSr×r, Un×rSr×rIt is exactly low after dimensionality reduction Dimension space vector.Therefore, the broadcasting score matrix M being combined into now to the broadcasting score value vector of above-mentioned m+n user(n+m)×kInto Row decomposes, and is arranged from big to small according to the characteristic value after dimensionality reduction, and the first preset number vector before extracting, by this first in advance If number vector one eigenmatrix of composition, that is to say, r dimensional features U before extraction(n+m)×rSr×r, optionally, r can take 200~ 500, such as r=300.
The reduction process can essentially be regarded as maps to a r dimension lower dimensional spaces (k by a matrix from k dimension spaces> >R), the embodiment of the present invention can also carry out the process using other dimension reduction methods, e.g., PLSA (Probabilistic Latent Semantic Analysis, probability latent semantic analysis algorithm), LDA (Latent Dirichlet Allocation, it is potential Di Li Crays allocation algorithm) etc. hidden semantic extraction technique, this is not repeated.
206th, it is carried out based on the attribute information of the first eigenvector in the eigenmatrix and the first eigenvector Training, generates preliminary classification model, and the first eigenvector is preceding second preset number feature vector.
The model training process can essentially be that the process of grader is built using regression algorithm, in this process, right In this two classification problem of gender, logistic regression algorithm is used to classify.Logistic regression is a kind of extremely intelligible model, Y=f (x) is equivalent to, shows the relationship of independent variable x and dependent variable y.Most common problem just like when attending prestige, news, Ask, cut, judge later patient it is whether sick or sick what, the four methods of diagnosis therein are just obtained from variable x, i.e. characteristic According to, judge whether it is sick be equivalent to obtain dependent variable y, i.e., prediction classification.In the step 206, square after dimensionality reduction can be used Preceding n vector and its attribute label (such as gender or age) in battle array, one preliminary classification model of training.
207th, the attribute information based on second feature vector described in the second feature vector sum in the eigenmatrix is to institute It states preliminary classification model to be verified and adjusted, obtains the disaggregated model, the second feature vector is the eigenmatrix In feature vector in addition to the first eigenvector.
In order to which the prediction accuracy to disaggregated model is verified, the rear m vector in matrix after dimensionality reduction can also be utilized And its attribute label (such as gender or age) is verified and is adjusted to it, is exported confusion matrix, is based ultimately upon confusion matrix, Obtain disaggregated model.
Above-mentioned steps 206-207 is believed based on the attribute of the feature vector in the eigenmatrix and described eigenvector Breath builds the process of disaggregated model.
208th, it is recorded according to the broadcasting of user to be identified, generates the feature vector of the user to be identified.
The process of the generation feature vector is every in the second sample of users set with being generated in above-mentioned steps 205A-205E Similarly, therefore not to repeat here for the process of the feature vector of one user.Specifically, in order to enable the feature of the user to be identified to Amount can be matched with disaggregated model, therefore, it is also desirable to dimensionality reduction be carried out to the broadcasting score value vector, based in above-mentioned steps 205 Citing, if the broadcasting score value vector of the user to be identified is Ak×m, A can be passed throughk×mVm×rCarry out dimensionality reduction.
209th, the feature vector of the user to be identified is inputted into the disaggregated model, exports the use of the user to be identified Family attribute.
Disaggregated model can classify to the feature vector after dimensionality reduction, to obtain the user property of the user to be identified.
Method provided in an embodiment of the present invention has left the user of attribute information to multimedia file by using some Broadcasting record modeled, be available for carry out Attribute Recognition disaggregated model, so as to be based on user to be identified History play record, predict the attribute informations such as gender, the age of the user to be identified, with obtain carry out user service base Plinth can improve the accuracy of such as Multimedia Recommendation user service.Further, by regarding multimedia file as text This, MultiMedia Field is introduced by the thought of text classification, each user corresponds to a document, and user plays each A multimedia file corresponds to the word in document, according to the history multimedia behavior of user to Sex, Age of user et al. Mouth attribute is predicted, is realized in MultiMedia Field and is utilized prediction of the broadcasting behavior of user to attributes such as user's genders.
Fig. 3 is a kind of structure diagram of the identification device of user property provided in an embodiment of the present invention.Referring to Fig. 3, institute Device is stated to include:
User gathers acquisition module 301, for obtaining first sample user set, is wrapped in the first sample user set Include the user that attribute information is registered and preserved on platform;
Set acquisition module 302 is played, first for obtaining user in the first sample user set plays record Set, the first broadcasting set of records ends include the multimedia document information that user is played;
Screening module 303 for being screened to first sample user set and the first broadcasting set of records ends, obtains Set of records ends is played to the second sample of users set and second;
Matrix generation module 304 plays set of records ends for being based on the second sample of users set and second, and generation is special Matrix is levied, the eigenmatrix includes the feature vector of each user in the second sample of users set, each user The multimedia document information that is played according to each user of feature vector generate;
Modeling module 305, for based on the feature vector and the attribute information of described eigenvector in the eigenmatrix, Build disaggregated model;
Vector generation module 306 for being recorded according to the broadcasting of user to be identified, generates the feature of the user to be identified Vector;
Identification module 307 for the feature vector of the user to be identified to be inputted the disaggregated model, is treated described in output Identify the user property of user.
Optionally, it is described to play set acquisition module 302 for obtaining each user in the first sample user set The multimedia document information played in preset time period.
Optionally, the screening module 303 is used in from the first sample user gathering screen out in preset time period The user that multimedia file number is less than the first predetermined threshold value is played, obtains the second sample of users set;Note is played from described first The multimedia file that broadcasting number in the preset time period is less than the second predetermined threshold value is screened out in record set, second is obtained and broadcasts Put set of records ends.
Optionally, the matrix generation module 304 is used for for any one user in the second sample of users set, It counts each multimedia file that the user played and plays word frequency and inverse document frequency in set of records ends described second; According to the user through counting the obtained word frequency and inverse document frequency of each multimedia file, each multimedia file is generated Vector element;The vector element of each multimedia file is combined, obtains the broadcasting score value vector of the user;By described in The broadcasting score value vector combination of each user in second sample of users set, obtains playing score matrix;Described play is divided Value matrix carries out dimensionality reduction, is arranged from big to small according to the characteristic value after dimensionality reduction, and the first preset number vector composition before selection Eigenmatrix.
Optionally, the modeling module 305 is used for based on the first eigenvector in the eigenmatrix and described first The attribute information of feature vector is trained, and generates preliminary classification model, and the first eigenvector is preceding second preset number A feature vector;Attribute information based on second feature vector described in the second feature vector sum in the eigenmatrix is to described Preliminary classification model is verified and is adjusted, and obtains the disaggregated model, and the second feature vector is in the eigenmatrix Feature vector in addition to the first eigenvector.
The alternative embodiment that any combination forms the disclosure may be used, herein no longer in above-mentioned all optional technical solutions It repeats one by one.
It should be noted that:The identification device for the user property that above-described embodiment provides is in the identification of user property, only With the division progress of above-mentioned each function module for example, in practical application, can as needed and by above-mentioned function distribution by Different function modules is completed, i.e., the internal structure of equipment is divided into different function modules, described above complete to complete Portion or partial function.In addition, the recognition methods of the identification device and user property for the user property that above-described embodiment provides is real It applies example and belongs to same design, specific implementation process refers to embodiment of the method, and which is not described herein again.
Fig. 4 is the block diagram according to a kind of device 400 of identification for user property shown in an exemplary embodiment.Example Such as, device 400 may be provided as a server.With reference to Fig. 4, device 400 includes processing component 422, further comprises one A or multiple processors and as the memory resource representated by memory 432, can holding by processing component 422 for storing Capable instruction, such as application program.The application program stored in memory 432 can include it is one or more each Corresponding to the module of one group of instruction.In addition, processing component 422 is configured as execute instruction, to perform the knowledge of above-mentioned user property Other method.
Device 400 can also include the power management that a power supply module 426 is configured as executive device 400, and one has Line or radio network interface 450 are configured as device 400 being connected to network and input and output (I/O) interface 458.Dress Putting 400 can operate based on the operating system for being stored in memory 432, such as Windows ServerTM, Mac OS XTM, UnixTM,LinuxTM, FreeBSDTMIt is or similar.
One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment It completes, relevant hardware can also be instructed to complete by program, the program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of recognition methods of user property, which is characterized in that the method includes:
First sample user set is obtained, is included on platform in the first sample user set and registers and preserve attribute letter The user of breath;
It obtains first of user in the first sample user set and plays set of records ends, the first broadcasting set of records ends includes The multimedia document information that user is played;
To the first sample user set and first broadcasting set of records ends screen, obtain the second sample of users set and Second plays set of records ends;
Set of records ends is played based on the second sample of users set and second, generates eigenmatrix, the eigenmatrix includes The feature vector of each user in the second sample of users set, the feature vector of each user is according to each use The multimedia document information generation that family is played;
Based on the feature vector and the attribute information of described eigenvector in the eigenmatrix, disaggregated model is built;
It is recorded according to the broadcasting of user to be identified, generates the feature vector of the user to be identified;
The feature vector of the user to be identified is inputted into the disaggregated model, exports the user property of the user to be identified;
It is described that set of records ends is played based on the second sample of users set and second, eigenmatrix is generated, including:
For any one user in the second sample of users set, each multimedia file that the user played is counted Word frequency and inverse document frequency in the described second broadcasting set of records ends;
According to the user through counting the obtained word frequency and inverse document frequency of each multimedia file, each multimedia text is generated The vector element of part;
The vector element of each multimedia file is combined, obtains the broadcasting score value vector of the user;
The broadcasting score value vector of each user in the second sample of users set is combined, obtains playing score matrix;
The broadcasting score matrix is subjected to dimensionality reduction, is arranged from big to small according to the characteristic value after dimensionality reduction, and is first pre- before choosing If number vector composition eigenmatrix.
2. according to the method described in claim 1, it is characterized in that, obtain first of user in the first sample user set Set of records ends is played to include:
Obtain the multimedia document information that each user is played in preset time period in the first sample user set.
3. according to the method described in claim 1, it is characterized in that, described broadcast first sample user set and first It puts set of records ends to be screened, obtains the second sample of users set and second and play set of records ends, including:
It is screened out in gathering from the first sample user in preset time period and plays multimedia file number less than the first default threshold The user of value obtains the second sample of users set;
It is screened out from the described first broadcasting set of records ends and number is played in the preset time period less than the second predetermined threshold value Multimedia file obtains the second broadcasting set of records ends.
4. according to the method described in claim 1, it is characterized in that, based on the feature vector in the eigenmatrix and the spy The attribute information of vector is levied, structure disaggregated model includes:
Attribute information based on the first eigenvector in the eigenmatrix and the first eigenvector is trained, generation Preliminary classification model, the first eigenvector are preceding second preset number feature vector;
Attribute information based on second feature vector described in the second feature vector sum in the eigenmatrix is to described initial point Class model is verified and is adjusted, and obtains the disaggregated model, and the second feature vector is except described in the eigenmatrix Feature vector other than first eigenvector.
5. a kind of identification device of user property, which is characterized in that described device includes:
User gathers acquisition module, for obtaining first sample user set, is included in the first sample user set flat The user of attribute information is registered and preserved on platform;
Set acquisition module is played, first for obtaining user in the first sample user set plays set of records ends, institute It states the first broadcasting set of records ends and includes the multimedia document information that user is played;
Screening module for being screened to first sample user set and the first broadcasting set of records ends, obtains second Sample of users set and second plays set of records ends;
Matrix generation module plays set of records ends for being based on the second sample of users set and second, generates eigenmatrix, The eigenmatrix includes the feature vector of each user in the second sample of users set, the feature of each user to Amount is generated according to the multimedia document information that each user is played;
Modeling module, for based on the feature vector and the attribute information of described eigenvector in the eigenmatrix, structure point Class model;
Vector generation module for being recorded according to the broadcasting of user to be identified, generates the feature vector of the user to be identified;
Identification module for the feature vector of the user to be identified to be inputted the disaggregated model, exports the use to be identified The user property at family;
The matrix generation module is used for for any one user in the second sample of users set, is counted the user and is broadcast Word frequency and inverse document frequency of each multimedia file let off in the described second broadcasting set of records ends;It is passed through according to the user The obtained word frequency and inverse document frequency of each multimedia file is counted, generates the vector element of each multimedia file;By institute The vector element combination of each multimedia file is stated, obtains the broadcasting score value vector of the user;By second sample of users The broadcasting score value vector combination of each user in set, obtains playing score matrix;The broadcasting score matrix is dropped Dimension, arranges from big to small according to the characteristic value after dimensionality reduction, and the first preset number vector composition eigenmatrix before selection.
6. device according to claim 5, which is characterized in that described to play set acquisition module for obtaining described first The multimedia document information that each user is played in preset time period in sample of users set.
7. device according to claim 5, which is characterized in that the screening module is used to collect from the first sample user It is screened out in conjunction and the user that multimedia file number is less than the first predetermined threshold value is played in preset time period, obtain the second sample of users Set;It is screened out from the described first broadcasting set of records ends and number is played in the preset time period less than the second predetermined threshold value Multimedia file obtains the second broadcasting set of records ends.
8. device according to claim 5, which is characterized in that the modeling module is used for based in the eigenmatrix The attribute information of first eigenvector and the first eigenvector is trained, and generates preliminary classification model, and described first is special Sign vector is preceding second preset number feature vector;It is special based on described in the second feature vector sum in the eigenmatrix second The attribute information of sign vector is verified and is adjusted to the preliminary classification model, obtains the disaggregated model, and described second is special Sign vector is the feature vector in the eigenmatrix in addition to the first eigenvector.
CN201510296833.1A 2015-06-02 2015-06-02 The recognition methods of user property and device Active CN104991899B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510296833.1A CN104991899B (en) 2015-06-02 2015-06-02 The recognition methods of user property and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510296833.1A CN104991899B (en) 2015-06-02 2015-06-02 The recognition methods of user property and device

Publications (2)

Publication Number Publication Date
CN104991899A CN104991899A (en) 2015-10-21
CN104991899B true CN104991899B (en) 2018-06-19

Family

ID=54303715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510296833.1A Active CN104991899B (en) 2015-06-02 2015-06-02 The recognition methods of user property and device

Country Status (1)

Country Link
CN (1) CN104991899B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106101831B (en) * 2016-07-15 2019-06-18 合一网络技术(北京)有限公司 Video vectorization method and device
CN106294783A (en) * 2016-08-12 2017-01-04 乐视控股(北京)有限公司 A kind of video recommendation method and device
CN107766360B (en) * 2016-08-17 2021-01-29 北京神州泰岳软件股份有限公司 Video heat prediction method and device
CN108268464B (en) * 2016-12-30 2021-01-12 广东精点数据科技股份有限公司 Personalized recommendation method and device based on collaborative filtering and logistic regression
CN107451854B (en) * 2017-07-12 2020-05-05 阿里巴巴集团控股有限公司 Method and device for determining user type and electronic equipment
CN107562793A (en) * 2017-08-01 2018-01-09 佛山市深研信息技术有限公司 A kind of big data method for digging
CN110019791B (en) * 2017-10-13 2022-09-20 腾讯科技(深圳)有限公司 Classification model training and pseudo base station short message identification method and device
CN107844478B (en) * 2017-11-20 2020-12-04 浪潮卓数大数据产业发展有限公司 Patent file processing method and device
CN107886949B (en) * 2017-11-24 2021-04-30 科大讯飞股份有限公司 Content recommendation method and device
CN108564220A (en) * 2018-04-19 2018-09-21 广州优视网络科技有限公司 User gender prediction method, apparatus, storage medium and computer equipment
CN110400027A (en) * 2018-04-20 2019-11-01 香港乐蜜有限公司 The statistical management method and device of main broadcaster in platform is broadcast live
CN110866114B (en) * 2019-10-16 2023-05-26 平安科技(深圳)有限公司 Object behavior identification method and device and terminal equipment
CN111222566B (en) * 2020-01-02 2020-09-01 平安科技(深圳)有限公司 User attribute identification method, device and storage medium
CN111556369A (en) * 2020-05-21 2020-08-18 四川省有线广播电视网络股份有限公司 Television-based family classification method
CN112528110A (en) 2020-07-24 2021-03-19 支付宝(杭州)信息技术有限公司 Method and device for determining entity service attribute
CN114387041B (en) * 2022-03-22 2022-06-17 北京鑫宇创世科技有限公司 Multimedia data acquisition method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708497A (en) * 2012-01-13 2012-10-03 合一网络技术(北京)有限公司 VideoBag feature-based accurate advertisement release system and method
CN103729785A (en) * 2014-01-26 2014-04-16 合一信息技术(北京)有限公司 Video user gender classification method and device for method
CN104462241A (en) * 2014-11-18 2015-03-25 北京锐安科技有限公司 Population property classification method and device based on anchor texts and peripheral texts in URLs
CN104636504A (en) * 2015-03-10 2015-05-20 飞狐信息技术(天津)有限公司 Method and system for identifying sexuality of user

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8756184B2 (en) * 2009-12-01 2014-06-17 Hulu, LLC Predicting users' attributes based on users' behaviors

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708497A (en) * 2012-01-13 2012-10-03 合一网络技术(北京)有限公司 VideoBag feature-based accurate advertisement release system and method
CN103729785A (en) * 2014-01-26 2014-04-16 合一信息技术(北京)有限公司 Video user gender classification method and device for method
CN104462241A (en) * 2014-11-18 2015-03-25 北京锐安科技有限公司 Population property classification method and device based on anchor texts and peripheral texts in URLs
CN104636504A (en) * 2015-03-10 2015-05-20 飞狐信息技术(天津)有限公司 Method and system for identifying sexuality of user

Also Published As

Publication number Publication date
CN104991899A (en) 2015-10-21

Similar Documents

Publication Publication Date Title
CN104991899B (en) The recognition methods of user property and device
Alp et al. Identifying topical influencers on twitter based on user behavior and network topology
Kosinski et al. Mining big data to extract patterns and predict real-life outcomes.
US10180967B2 (en) Performing application searches
CN107291780B (en) User comment information display method and device
CN106557513B (en) Event information pushing method and event information pushing device
Zhang et al. TempoRec: Temporal-topic based recommender for social network services
CN107862022B (en) Culture resource recommendation system
CN103885987B (en) A kind of music recommends method and system
US10529031B2 (en) Method and systems of implementing a ranked health-content article feed
US20130085745A1 (en) Semantic-based approach for identifying topics in a corpus of text-based items
CN106326391A (en) Method and device for recommending multimedia resources
US20120284340A1 (en) Social media analysis system
CA2865186A1 (en) Method and system relating to sentiment analysis of electronic content
US8965867B2 (en) Measuring and altering topic influence on edited and unedited media
Alsaedi et al. Automatic summarization of real world events using twitter
CN110119477A (en) A kind of information-pushing method, device and storage medium
CN106294473B (en) Entity word mining method, information recommendation method and device
CN105447205B (en) Method and device for sorting search results
US20130346385A1 (en) System and method for a purposeful sharing environment
CN111447575B (en) Short message pushing method, device, equipment and storage medium
EP2613275B1 (en) Search device, search method, search program, and computer-readable memory medium for recording search program
WO2014008848A1 (en) Method for providing recommend information for mobile terminal browser and system using the same
CN107169011A (en) The original recognition methods of webpage based on artificial intelligence, device and storage medium
CN106383857A (en) Information processing method and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 510660 Guangzhou City, Guangzhou, Guangdong, Whampoa Avenue, No. 315, self - made 1-17

Applicant after: Guangzhou KuGou Networks Co., Ltd.

Address before: 510000 B1, building, No. 16, rhyme Road, Guangzhou, Guangdong, China 13F

Applicant before: Guangzhou KuGou Networks Co., Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220407

Address after: 4119, 41st floor, building 1, No.500, middle section of Tianfu Avenue, Chengdu hi tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan 610000

Patentee after: Chengdu kugou business incubator management Co.,Ltd.

Address before: No. 315, Huangpu Avenue middle, Tianhe District, Guangzhou City, Guangdong Province

Patentee before: GUANGZHOU KUGOU COMPUTER TECHNOLOGY Co.,Ltd.