CN108256537A - A kind of user gender prediction method and system - Google Patents

A kind of user gender prediction method and system Download PDF

Info

Publication number
CN108256537A
CN108256537A CN201611236167.3A CN201611236167A CN108256537A CN 108256537 A CN108256537 A CN 108256537A CN 201611236167 A CN201611236167 A CN 201611236167A CN 108256537 A CN108256537 A CN 108256537A
Authority
CN
China
Prior art keywords
user
app
sample
gender prediction
carrying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611236167.3A
Other languages
Chinese (zh)
Inventor
高玉敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kuwo Technology Co Ltd
Original Assignee
Beijing Kuwo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kuwo Technology Co Ltd filed Critical Beijing Kuwo Technology Co Ltd
Priority to CN201611236167.3A priority Critical patent/CN108256537A/en
Publication of CN108256537A publication Critical patent/CN108256537A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of user gender prediction method and systems.The method includes:The sample after mark is divided into training set sample and test set sample according to pre-defined rule;Corresponding APP lists are extracted from the daily record of the user got, and the feature of the APP in user's APP lists is extracted, to obtain carrying out the characteristic attribute of user gender prediction accordingly;The characteristic attribute in training set is trained by train classification models, to obtain carrying out the train classification models of user gender prediction;User's gender in test set sample is predicted by the train classification models for carrying out user gender prediction.The train classification models of progress user gender prediction in the embodiment of the present invention, are that the characteristic attribute of the progress user gender prediction in training set is trained by train classification models.Therefore, technical solution provided by the present invention can accurately predict the gender attribute of user, greatly increase the accuracy rate analyzed potential user.

Description

A kind of user gender prediction method and system
Technical field
The present invention relates to data analysis technique field, specifically, the present invention relates to a kind of user gender prediction method and System.
Background technology
At present, under the background of big data, enterprise increasingly begins to focus on the potential value of mining data.Enterprise is in order to dig The potential value of data is dug, generally requires accurately and fast to analyze behavioural habits, the consumption habit of user.Specifically, analysis is used The behavioural habits at family, consumption habit, it is possible to draw a portrait to user, in order to the interest whereabouts user success of the user according to grasp Recommend similar commodity.
Existing method, a kind of method for recommending article to user be the article in the purchaser record according to user's history with And price, to the same kind of article of the identical price grade of user's recommendation.For example, the APP of mobile phone Taobao (APPlication), wherein, APP refers to the third party application of smart mobile phone.Mobile phone Taobao (Android editions) is Ali Ba Ba aims at the software for meeting its personal consumption and online shopping demand of Android phone user release.At present, applied to hand In the APP of machine Taobao, the method to user's Recommendations is the commodity bought in the History Order of the user got, for example, User has purchased " autumn coral fleece nightwear " before, then when user later starts mobile phone Taobao, then on the boundary of mobile phone Taobao APP The Related product of similar " autumn coral fleece nightwear " is shown on face.In this way, similar article has been had purchased based on user, it is past It is past to be greatly reduced to the probability of the similar article of user's successful referral.
Existing method, another method to user's recommendation article are the objects in the purchaser record according to user's history Product, the article bought to active user's recommendation user for buying the article similary with active user.For example, applied to sub- horse In inferior APP, the method to user's Recommendations is the books commodity bought in the History Order of the user got, for example, User has purchased before《Count the probability statistics of thinker person's mathematics》Books, then start Amazon in user later During APP, then shown on the interface of the APP of Amazon and buy the customer of this commodity also while have purchased corresponding books, example Such as,《Hadoop technologies are explained in detail》,《The development approach of machine learning practical test driving》,《Think deeply as computer scientist Python》,《HBase authority's guide》Etc. books, will not enumerate herein.In this way, different knowledge is had based on user Structure, even if having purchased same books, it is also possible to which its remaining books for needing to read is completely different.Therefore, make in this way The probability of books that must have been bought to other relevant users of user's successful referral greatly reduces.
It is existing to user recommend article method the shortcomings that be:Success recommends the probability of article low to user, to potential The accuracy rate of customer analysis, greatly reduces user experience.
Invention content
The embodiment of the present invention is to provide a kind of user gender prediction method and system, by train classification models to training The characteristic attribute of the progress user gender prediction of concentration is trained to obtain the train classification models for carrying out user gender prediction.Cause This, technical solution provided by the present invention can accurately predict the gender attribute of user, greatly increase to potential use The accurate judgement at family, improves user experience.
In a first aspect, an embodiment of the present invention provides a kind of user gender prediction method, the method includes:
The label of different APP is captured as the sample for carrying out user gender prediction, and therefrom choose user's registration information to have The sample of gender mark is marked accordingly, to get the sample after mark;
The sample after the mark is divided into training set sample and test set sample according to pre-defined rule;
Corresponding APP lists are extracted from the daily record of the user got, and to the feature of the APP in user's APP lists It extracts, to obtain carrying out the characteristic attribute of user gender prediction accordingly;
The characteristic attribute in training set is trained by train classification models, to obtain carrying out user gender prediction's Train classification models;
User's gender in test set sample is predicted by the train classification models for carrying out user gender prediction.
Preferably, it is described corresponding APP lists to be extracted from the daily record of the user got, and in user's APP lists The feature of APP extract, specifically included with the characteristic attribute for obtaining carrying out user gender prediction accordingly:
The daily record of user got by the analysis of Hadoop map-reduce software frames is arranged with extracting corresponding APP Table;
Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted;
According to preset rule, the APP in the APP lists of corresponding user is filtered, can be distinguished with filtering out Go out the APP set of sex character;
The startup time in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics Number, and the APP numbers of starts are recorded;
The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user's gender The characteristic attribute of prediction;
To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, with Obtain carrying out the characteristic attribute of user gender prediction accordingly.
Preferably, which is characterized in that the preset rule is specially:
When the APP in the APP lists of user document parameter probability valuing ranging from be more than 0.1 and less than 0.8 and should When APP is by more than at least ten user installation, selects and add in the APP set that the APP can extremely distinguish sex character.
Preferably, based on the GBDT train classification models in the scikit-learn of machine learning library to the feature in training set Attribute is trained, to obtain carrying out the train classification models of user gender prediction.
Preferably, after the train classification models for obtaining carrying out user gender prediction, the method further includes:
The train classification models of progress user gender prediction assessed by the characteristic attribute in test set sample Accuracy rate.
Preferably, the training set sample of selection and the ratio of test set sample are specially 7:3.
Second aspect, an embodiment of the present invention provides a kind of user gender prediction system, the system comprises:
Sample acquisition unit captures the label of different APP as the sample for carrying out user gender prediction, and therefrom chooses and use The sample that family log-on message has gender to identify is marked accordingly, to get the sample after mark;
Sample after the mark is divided into training set sample and test set sample by sample classification unit according to pre-defined rule This;
Characteristic attribute acquiring unit extracts corresponding APP lists, and to user APP from the daily record of the user got The feature of APP in list extracts, to obtain carrying out the characteristic attribute of user gender prediction accordingly;
Model obtains and predicting unit, and the characteristic attribute in training set is trained by train classification models, with To the train classification models for carrying out user gender prediction;And
User's gender in test set sample is predicted by the train classification models for carrying out user gender prediction.
Preferably, the characteristic attribute acquiring unit is specifically used for:
The daily record of user got by the analysis of Hadoop map-reduce software frames is arranged with extracting corresponding APP Table;
Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted;
According to preset rule, the APP in the APP lists of corresponding user is filtered, can be distinguished with filtering out Go out the APP set of sex character;
The startup time in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics Number, and the APP numbers of starts are recorded;
The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user's gender The characteristic attribute of prediction;
To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, with Obtain carrying out the characteristic attribute of user gender prediction accordingly.
Preferably, the preset rule of the characteristic attribute acquiring unit is specially:
When the APP in the APP lists of user document parameter probability valuing ranging from be more than 0.1 and less than 0.8 and should When APP is by more than at least ten user installation, selects and add in the APP set that the APP can extremely distinguish sex character.
Preferably, model obtains and predicting unit is specifically used for:
Based on the GBDT train classification models in the scikit-learn of machine learning library to the characteristic attribute in training set into Row training, to obtain carrying out the train classification models of user gender prediction.
An embodiment of the present invention provides a kind of user gender prediction method and system, wherein, the method includes:Crawl is not Label with APP therefrom chooses the sample that user's registration information has gender to identify as the sample for carrying out user gender prediction It is marked accordingly, to get the sample after mark;The sample after mark is divided into training set sample according to pre-defined rule With test set sample;Corresponding APP lists are extracted from the daily record of the user got, and to the APP's in user's APP lists Feature extracts, to obtain carrying out the characteristic attribute of user gender prediction accordingly;By train classification models to training set In characteristic attribute be trained, with obtain carry out user gender prediction train classification models;It is pre- by carrying out user's gender The train classification models of survey predict user's gender in test set sample.Carry out user's gender in the embodiment of the present invention The train classification models of prediction, be by train classification models in training set progress user gender prediction characteristic attribute into Row training obtains.Therefore, technical solution provided by the present invention can accurately predict the gender attribute of user, greatly The accuracy rate analyzed potential user is improved, improves user experience.
Description of the drawings
Fig. 1 is a kind of flow chart of user gender prediction method provided in an embodiment of the present invention;
Fig. 2 is a kind of structure diagram of user gender prediction system provided in an embodiment of the present invention.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art All other embodiments obtained without making creative work shall fall within the protection scope of the present invention.
For ease of the understanding to the embodiment of the present invention, it is further explained below in conjunction with attached drawing with specific embodiment It is bright.
In technical solution provided by the present invention, it is used as by the label for capturing different APP and carries out user gender prediction's Sample, and the sample that therefrom selection user's registration information has gender to identify is marked accordingly, to get the sample after mark This;The sample after mark is divided into training set sample and test set sample according to pre-defined rule;From the daily record of the user got The middle corresponding APP lists of extraction, and the feature of the APP in user's APP lists is extracted, to obtain carrying out user accordingly The characteristic attribute of gender prediction;The characteristic attribute in training set is trained by train classification models, to obtain being used The train classification models of family gender prediction;By carrying out the train classification models of user gender prediction to the use in test set sample Family gender is predicted.The train classification models of progress user gender prediction in the embodiment of the present invention, are classified by training Model is trained the characteristic attribute of the progress user gender prediction in training set.Therefore, it is provided by the present invention Technical solution can accurately predict the gender attribute of user, greatly increase the accuracy rate analyzed potential user, carry High user experience.
Existing method according to the data come out, obtains corresponding user's portrait, and according to the hobby of user to The method of corresponding commodity is recommended at family, the gender attribute of nearly 90% user, can not Direct Recognition come out.It is gender-based not Together, consumption habit is also different.If carrying out the recommendation of dependent merchandise to women, the probability that women buys similar commodity greatly improves. , whereas if carrying out the recommendation of dependent merchandise to male, because male is mainly rational thinking, male buys the general of similar commodity Rate substantially reduces.For the existing hobby according to user to the method for user's Recommendations, the probability of successful referral is high.Cause This, in order to improve the probability for successfully recommending similar commodity to user, improves the accuracy rate analyzed potential user, basis is needed to obtain The behavioural habits of the user got and consumption habit predict the gender of user.
The technical solution that the invention will now be described in detail with reference to the accompanying drawings.
Fig. 1 is a kind of flow chart of user gender prediction method provided in an embodiment of the present invention, as shown in Figure 1, a kind of use Family gender prediction's method includes the following steps:
S101:The label of different APP is captured as the sample for carrying out user gender prediction, and therefrom chooses user's registration letter The sample that breath has gender to identify is marked accordingly, to get the sample after mark.Wherein, the training set sample of selection with The ratio of test set sample is specially 7:3.
It should be noted that the label of different APP is captured, it can be complete by crawler capturing 360 mobile phone assistant APP labels Into.
S102:The sample after mark is divided into training set sample and test set sample according to pre-defined rule.
S103:Corresponding APP lists are extracted from the daily record of the user got, and to the APP's in user's APP lists Feature extracts, to obtain carrying out the characteristic attribute of user gender prediction accordingly.
Specifically, corresponding APP lists are extracted from the daily record of the user got, and in user's APP lists The feature of APP extracts, and following steps are specifically included with the characteristic attribute for obtaining carrying out user gender prediction accordingly:
The daily record of user got by the analysis of Hadoop map-reduce software frames is arranged with extracting corresponding APP Table.It should be noted that Hadoop map-reduce are a software frames, application can easily be write based on the frame Program, these application programs can be operated on the big cluster being made of thousands of a business machines, and reliable with one kind, had The mode of fault-tolerant ability concurrently handles the mass data collection of TB ranks.
Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted.It needs It is bright, each APP is interpreted as word, the APP lists of each user are interpreted as document, based on the document in natural language processing Frequency counts each APP documents probability.
According to preset rule, the APP in the APP lists of corresponding user is filtered, can be distinguished with filtering out Go out the APP set of sex character;It should be noted that preset rule is specially:When the text of the APP in the APP lists of user Shelves parameter probability valuing be ranging from more than 0.1 and less than the 0.8 and APP by more than at least ten user installation when, selection is simultaneously It adds in the APP set that the APP can extremely distinguish sex character.
The startup time in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics Number, and the APP numbers of starts are recorded.
The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user's gender The characteristic attribute of prediction.
To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, with Obtain carrying out the characteristic attribute of user gender prediction accordingly.
In practical applications, the process of feature extraction is described in detail below:
First, daily record is analyzed by Hadoop map-reduce, extracts the APP lists of each user.
Secondly, each APP is interpreted as word, the APP lists of each user are interpreted as document, based in natural language processing Document frequency, count each APP documents probability.
Then, the APP for being used as feature is selected based on document probability sieves.Wherein, preset screening rule is: Document probability is less than 0.8 and is more than 0.1, and APP is at least by 10 user installations.
Later, the daily number of starts of each user in APP set selected by statistics, the value as this feature.
Finally, the label based on APP and the number of starts add up the value of label characteristics.
Explanation is needed further exist for, in practical applications, by judging that the APP that user uses can tentatively be inferred to use The gender at family.Often it is women for example, playing the user for seeing APP game repeatedly.And the user of Need For Speed APP game is played, often For male.Often it is male in addition, using the user of Amazon APP purchase books.And it is washed using mobile phone Taobao APP purchases clean The user of essence, is often women.The keyword of relevant user buying habit detected by these, often can tentatively judge Go out the gender of user.
S104:The characteristic attribute in training set is trained by train classification models, to obtain carrying out user's gender The train classification models of prediction.
Specifically, based on the GBDT train classification models in the scikit-learn of machine learning library to the spy in training set Sign attribute is trained, to obtain carrying out the train classification models of user gender prediction.
It should be noted that scikit-learn machine learning library, scikit-learn is a machine learning mould of increasing income Block.The characteristics of it is maximum is exactly to provide various machine learning algorithm interfaces to the user, and user can be allowed simply and efficiently to carry out Data mining and data analysis.
Scikit-learn can be loaded into data set, and common machine learning library data are contained in scikit-learn Collection, for example iris the and digit data sets classified are done, for the classical data set Boston house prices of recurrence.
Scikit-learn can be learnt and be predicted.Scikit-learn provides connecing for various machine learning algorithms Mouthful, allow user that can easily use.The calling of each algorithm is just as a black box, for a user, it is only necessary to root According to the demand of oneself, corresponding parameter is set.
GBDT (Gradient Boosting Decision Tree) is a kind of decision Tree algorithms of iteration, herein no longer It repeats.
S105:User's gender in test set sample is carried out by the train classification models for carrying out user gender prediction pre- It surveys.
In addition, after to obtain carrying out the train classification models of user gender prediction, pass through the spy in test set sample The accuracy rate of the train classification models of progress user gender prediction that sign attribute evaluation obtains.
Existing technology, it is unpredictable go out 90% user gender.And technical solution provided by the present invention, from user It is extracted in cell phone application list and can be used in the startup frequency of the other APP of distinctiveness and the corresponding APP of statistics to predict use The gender at family.According to statistics, the gender of prediction user is gone by technical solution provided by the present invention, rate of accuracy reached arrives 89%.
In conclusion a kind of user gender prediction method provided in an embodiment of the present invention, by the label for capturing different APP As the sample for carrying out user gender prediction, and the sample that therefrom selection user's registration information has gender to identify is marked accordingly Note, to get the sample after mark;The sample after mark is divided into training set sample and test set sample according to pre-defined rule; Corresponding APP lists are extracted from the daily record of the user got, and the feature of the APP in user's APP lists is extracted, To obtain carrying out the characteristic attribute of user gender prediction accordingly;By train classification models to the characteristic attribute in training set into Row training, to obtain carrying out the train classification models of user gender prediction;By the training classification mould for carrying out user gender prediction Type predicts user's gender in test set sample.The training classification of progress user gender prediction in the embodiment of the present invention Model is that the characteristic attribute of the progress user gender prediction in training set is trained by train classification models. Therefore, technical solution provided by the present invention can accurately predict the gender attribute of user, greatly increase to potential The accuracy rate of customer analysis, improves user experience.
As shown in Fig. 2, a kind of user gender prediction system that the embodiment of the present invention is provided, including:Sample acquisition unit 201st, sample classification unit 202, characteristic attribute acquiring unit 203 and model acquisition and predicting unit 204.
Specifically, sample acquisition unit, captures the label of different APP as the sample for carrying out user gender prediction, and The sample that therefrom choosing user's registration information has gender to identify is marked accordingly, to get the sample after mark;
Sample after mark is divided into training set sample and test set sample by sample classification unit according to pre-defined rule;It needs It is noted that the training set sample of sample classification unit selection and the ratio of test set sample are specially 7:3.
Characteristic attribute acquiring unit extracts corresponding APP lists, and to user APP from the daily record of the user got The feature of APP in list extracts, to obtain carrying out the characteristic attribute of user gender prediction accordingly;
Model obtains and predicting unit, and the characteristic attribute in training set is trained by train classification models, with To the train classification models for carrying out user gender prediction;And by carrying out the train classification models of user gender prediction to test User's gender in collection sample is predicted.
Further, characteristic attribute acquiring unit is specifically used for:
The daily record of user got by the analysis of Hadoop map-reduce software frames is arranged with extracting corresponding APP Table;
Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted;
According to preset rule, the APP in the APP lists of corresponding user is filtered, can be distinguished with filtering out Go out the APP set of sex character;
The startup time in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics Number, and the APP numbers of starts are recorded;
The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user's gender The characteristic attribute of prediction;
To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, with Obtain carrying out the characteristic attribute of user gender prediction accordingly.
Further, the preset rule of characteristic attribute acquiring unit is specially:When the text of the APP in the APP lists of user Shelves parameter probability valuing be ranging from more than 0.1 and less than the 0.8 and APP by more than at least ten user installation when, selection is simultaneously It adds in the APP set that the APP can extremely distinguish sex character.
Further, model obtains and predicting unit is specifically used for:Based in the scikit-learn of machine learning library GBDT train classification models are trained the characteristic attribute in training set, are classified with the training for obtaining carrying out user gender prediction Model.
In addition, for the accuracy rate of the train classification models of progress user gender prediction preferably assessed, this hair A kind of user gender prediction system that bright specific embodiment provides further includes:Assessment unit (does not mark) in fig. 2, is obtained in model Take and predicting unit obtain carry out user gender prediction train classification models after, assessment unit, by test set sample Characteristic attribute assess progress user gender prediction train classification models accuracy rate.
In technical scheme of the present invention, the sample for carrying out user gender prediction is used as by the label for capturing different APP, and The sample that therefrom choosing user's registration information has gender to identify is marked accordingly, to get the sample after mark;According to Sample after mark is divided into training set sample and test set sample by pre-defined rule;Phase is extracted from the daily record of the user got The APP lists answered, and the feature of the APP in user's APP lists is extracted, to obtain carrying out user gender prediction accordingly Characteristic attribute;The characteristic attribute in training set is trained by train classification models, it is pre- to obtain progress user's gender The train classification models of survey;By carry out the train classification models of user gender prediction to user's gender in test set sample into Row prediction.The train classification models of progress user gender prediction in the embodiment of the present invention, are to instruction by train classification models What the characteristic attribute for the progress user gender prediction that white silk is concentrated was trained.Therefore, technical solution provided by the present invention The gender attribute of user can be accurately predicted, the accuracy rate analyzed potential user is greatly increased, improves user Experience Degree.
Above-described specific embodiment has carried out the purpose of the present invention, technical solution and advantageous effect further It is described in detail, it should be understood that the foregoing is merely the specific embodiment of the present invention, is not intended to limit the present invention Protection domain, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (10)

  1. A kind of 1. user gender prediction method, which is characterized in that including:
    The label of different APP is captured as the sample for carrying out user gender prediction, and therefrom choose user's registration information to have gender The sample of mark is marked accordingly, to get the sample after mark;
    The sample after the mark is divided into training set sample and test set sample according to pre-defined rule;
    Corresponding APP lists are extracted from the daily record of the user got, and the feature of the APP in user's APP lists is carried out Extraction, to obtain carrying out the characteristic attribute of user gender prediction accordingly;
    The characteristic attribute in training set is trained by train classification models, to obtain carrying out the training of user gender prediction Disaggregated model;
    User's gender in test set sample is predicted by the train classification models for carrying out user gender prediction.
  2. 2. according to the method described in claim 1, it is characterized in that, described extract accordingly from the daily record of the user got APP lists, and the feature of the APP in user's APP lists is extracted, to obtain carrying out the spy of user gender prediction accordingly Sign attribute specifically includes:
    The daily record of user got by the analysis of Hadoop map-reduce software frames, to extract corresponding APP lists;
    Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted;
    According to preset rule, the APP in the APP lists of corresponding user is filtered, with filter out being capable of distinguishing property The APP set of other feature;
    The number of starts in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics, and The APP numbers of starts are recorded;
    The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user gender prediction Characteristic attribute;
    To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, to obtain The corresponding characteristic attribute for carrying out user gender prediction.
  3. 3. according to the method described in claim 2, it is characterized in that, the preset rule is specially:
    When the document parameter probability valuing of the APP in the APP lists of user is ranging from more than 0.1 and less than 0.8 and the APP quilts During more than at least ten user installation, select and add in the APP set that the APP can extremely distinguish sex character.
  4. 4. according to the method described in claim 1, it is characterized in that, based on the GBDT instructions in the scikit-learn of machine learning library Practice disaggregated model to be trained the characteristic attribute in training set, to obtain carrying out the train classification models of user gender prediction.
  5. 5. according to the method described in claim 1, it is characterized in that, in the train classification models for obtaining carrying out user gender prediction Later, the method further includes:
    By the characteristic attribute in test set sample assess progress user gender prediction train classification models it is accurate Rate.
  6. 6. according to the method described in claim 1, it is characterized in that, the training set sample and the ratio of test set sample chosen have Body is 7:3.
  7. 7. a kind of user gender prediction system, which is characterized in that including:
    Sample acquisition unit captures the label of different APP as the sample for carrying out user gender prediction, and therefrom chooses user's note The sample that volume information has gender to identify is marked accordingly, to get the sample after mark;
    Sample after the mark is divided into training set sample and test set sample by sample classification unit according to pre-defined rule;
    Characteristic attribute acquiring unit extracts corresponding APP lists, and to user's APP lists from the daily record of the user got In the feature of APP extract, to obtain carrying out the characteristic attribute of user gender prediction accordingly;
    Model obtain and predicting unit, the characteristic attribute in training set is trained by train classification models, with obtain into The train classification models of row user gender prediction;And
    User's gender in test set sample is predicted by the train classification models for carrying out user gender prediction.
  8. 8. system according to claim 7, which is characterized in that the characteristic attribute acquiring unit is specifically used for:
    The daily record of user got by the analysis of Hadoop map-reduce software frames, to extract corresponding APP lists;
    Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted;
    According to preset rule, the APP in the APP lists of corresponding user is filtered, with filter out being capable of distinguishing property The APP set of other feature;
    The number of starts in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics, and The APP numbers of starts are recorded;
    The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user gender prediction Characteristic attribute;
    To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, to obtain The corresponding characteristic attribute for carrying out user gender prediction.
  9. 9. system according to claim 8, which is characterized in that the preset rule of the characteristic attribute acquiring unit is specific For:
    When the document parameter probability valuing of the APP in the APP lists of user is ranging from more than 0.1 and less than 0.8 and the APP quilts During more than at least ten user installation, select and add in the APP set that the APP can extremely distinguish sex character.
  10. 10. system according to claim 7, which is characterized in that model obtains and predicting unit is specifically used for:
    The characteristic attribute in training set is instructed based on the GBDT train classification models in the scikit-learn of machine learning library Practice, to obtain carrying out the train classification models of user gender prediction.
CN201611236167.3A 2016-12-28 2016-12-28 A kind of user gender prediction method and system Pending CN108256537A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611236167.3A CN108256537A (en) 2016-12-28 2016-12-28 A kind of user gender prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611236167.3A CN108256537A (en) 2016-12-28 2016-12-28 A kind of user gender prediction method and system

Publications (1)

Publication Number Publication Date
CN108256537A true CN108256537A (en) 2018-07-06

Family

ID=62720215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611236167.3A Pending CN108256537A (en) 2016-12-28 2016-12-28 A kind of user gender prediction method and system

Country Status (1)

Country Link
CN (1) CN108256537A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191185A (en) * 2018-08-15 2019-01-11 深圳市和讯华谷信息技术有限公司 A kind of visitor's heap sort method and system
CN109885834A (en) * 2019-02-18 2019-06-14 中国联合网络通信集团有限公司 A kind of prediction technique and device of age of user gender
CN109933698A (en) * 2019-02-27 2019-06-25 腾讯科技(深圳)有限公司 A kind of the source method of calibration and device of user's portrait
CN111143441A (en) * 2019-12-30 2020-05-12 北京每日优鲜电子商务有限公司 Gender determination method, device, equipment and storage medium
CN111222026A (en) * 2020-01-09 2020-06-02 支付宝(杭州)信息技术有限公司 Training method of user category identification model and user category identification method
CN111291798A (en) * 2020-01-21 2020-06-16 北京工商大学 User basic attribute prediction method based on ensemble learning
CN111639714A (en) * 2020-06-01 2020-09-08 贝壳技术有限公司 Method, device and equipment for determining attributes of users
CN112132209A (en) * 2020-09-19 2020-12-25 北京智能工场科技有限公司 Attribute prediction method based on bias characteristics
CN112434136A (en) * 2020-12-08 2021-03-02 深圳市欢太科技有限公司 Gender classification method, gender classification device, electronic equipment and computer storage medium
CN113268654A (en) * 2020-02-17 2021-08-17 北京搜狗科技发展有限公司 User gender identification method and device and electronic equipment
CN113657917A (en) * 2020-05-12 2021-11-16 上海佳投互联网技术集团有限公司 Visitor gender analysis method and system based on USER-AGENT

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729785A (en) * 2014-01-26 2014-04-16 合一信息技术(北京)有限公司 Video user gender classification method and device for method
US9471851B1 (en) * 2015-06-29 2016-10-18 International Business Machines Corporation Systems and methods for inferring gender by fusion of multimodal content
CN106203473A (en) * 2016-06-24 2016-12-07 有米科技股份有限公司 A kind of mobile subscriber's gender prediction's method based on installation kit list
CN106204127A (en) * 2016-07-06 2016-12-07 乐视控股(北京)有限公司 User's evaluation methodology and device for application

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729785A (en) * 2014-01-26 2014-04-16 合一信息技术(北京)有限公司 Video user gender classification method and device for method
US9471851B1 (en) * 2015-06-29 2016-10-18 International Business Machines Corporation Systems and methods for inferring gender by fusion of multimodal content
CN106203473A (en) * 2016-06-24 2016-12-07 有米科技股份有限公司 A kind of mobile subscriber's gender prediction's method based on installation kit list
CN106204127A (en) * 2016-07-06 2016-12-07 乐视控股(北京)有限公司 User's evaluation methodology and device for application

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191185A (en) * 2018-08-15 2019-01-11 深圳市和讯华谷信息技术有限公司 A kind of visitor's heap sort method and system
CN109885834A (en) * 2019-02-18 2019-06-14 中国联合网络通信集团有限公司 A kind of prediction technique and device of age of user gender
CN109933698B (en) * 2019-02-27 2021-06-08 腾讯科技(深圳)有限公司 User portrait source verification method and device
CN109933698A (en) * 2019-02-27 2019-06-25 腾讯科技(深圳)有限公司 A kind of the source method of calibration and device of user's portrait
CN111143441A (en) * 2019-12-30 2020-05-12 北京每日优鲜电子商务有限公司 Gender determination method, device, equipment and storage medium
CN111222026A (en) * 2020-01-09 2020-06-02 支付宝(杭州)信息技术有限公司 Training method of user category identification model and user category identification method
CN111222026B (en) * 2020-01-09 2023-07-14 支付宝(杭州)信息技术有限公司 Training method of user category recognition model and user category recognition method
CN111291798A (en) * 2020-01-21 2020-06-16 北京工商大学 User basic attribute prediction method based on ensemble learning
CN111291798B (en) * 2020-01-21 2021-04-20 北京工商大学 User basic attribute prediction method based on ensemble learning
CN113268654A (en) * 2020-02-17 2021-08-17 北京搜狗科技发展有限公司 User gender identification method and device and electronic equipment
CN113657917A (en) * 2020-05-12 2021-11-16 上海佳投互联网技术集团有限公司 Visitor gender analysis method and system based on USER-AGENT
CN111639714A (en) * 2020-06-01 2020-09-08 贝壳技术有限公司 Method, device and equipment for determining attributes of users
CN112132209A (en) * 2020-09-19 2020-12-25 北京智能工场科技有限公司 Attribute prediction method based on bias characteristics
CN112132209B (en) * 2020-09-19 2024-05-31 北京智能工场科技有限公司 Attribute prediction method based on biasing characteristics
CN112434136A (en) * 2020-12-08 2021-03-02 深圳市欢太科技有限公司 Gender classification method, gender classification device, electronic equipment and computer storage medium
CN112434136B (en) * 2020-12-08 2024-04-23 深圳市欢太科技有限公司 Sex classification method, apparatus, electronic device and computer storage medium

Similar Documents

Publication Publication Date Title
CN108256537A (en) A kind of user gender prediction method and system
CN109697629B (en) Product data pushing method and device, storage medium and computer equipment
CN109325179B (en) Content promotion method and device
JP7356206B2 (en) Content recommendation and display
CN107833082B (en) Commodity picture recommendation method and device
CN109492180A (en) Resource recommendation method, device, computer equipment and computer readable storage medium
CN103927309B (en) A kind of method and device to business object markup information label
US20180053234A1 (en) Description information generation and presentation systems, methods, and devices
CN109471657A (en) Gray scale dissemination method, device, computer equipment and computer storage medium
CN106164959A (en) Behavior affair system and correlation technique
CN109711931A (en) Method of Commodity Recommendation, device, equipment and storage medium based on user's portrait
CN109658188A (en) Source of houses recommended method, device, equipment and storage medium based on big data analysis
CN108399565A (en) Financial product recommendation apparatus, method and computer readable storage medium
CN108416627A (en) A kind of brand influence force monitoring method and system based on internet data
CN110392155A (en) It has been shown that, processing method, device and the equipment of notification message
CN107977678A (en) Method and apparatus for output information
CN110490237A (en) Data processing method, device, storage medium and electronic equipment
CN110688455A (en) Method, medium and computer equipment for filtering invalid comments based on artificial intelligence
CN110807691B (en) Cross-commodity-class commodity recommendation method and device
CN115147130A (en) Problem prediction method, apparatus, storage medium, and program product
CN109146606A (en) A kind of brand recommended method, electronic equipment, storage medium and system
CN111383072A (en) User credit scoring method, storage medium and server
CN111429200B (en) Content association method and device, storage medium and computer equipment
CN110647504A (en) Method and device for searching judicial documents
CN108596646A (en) A kind of garment coordination recommendation method of fusion face character analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180706