CN108256537A - A kind of user gender prediction method and system - Google Patents
A kind of user gender prediction method and system Download PDFInfo
- Publication number
- CN108256537A CN108256537A CN201611236167.3A CN201611236167A CN108256537A CN 108256537 A CN108256537 A CN 108256537A CN 201611236167 A CN201611236167 A CN 201611236167A CN 108256537 A CN108256537 A CN 108256537A
- Authority
- CN
- China
- Prior art keywords
- user
- app
- sample
- gender prediction
- carrying
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000013145 classification model Methods 0.000 claims abstract description 60
- 238000012549 training Methods 0.000 claims abstract description 45
- 238000012360 testing method Methods 0.000 claims abstract description 31
- 239000000284 extract Substances 0.000 claims description 13
- 238000010801 machine learning Methods 0.000 claims description 12
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000003058 natural language processing Methods 0.000 claims description 8
- 238000009434 installation Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 description 4
- 230000003542 behavioural effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 235000014653 Carica parviflora Nutrition 0.000 description 2
- 241000243321 Cnidaria Species 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of user gender prediction method and systems.The method includes:The sample after mark is divided into training set sample and test set sample according to pre-defined rule;Corresponding APP lists are extracted from the daily record of the user got, and the feature of the APP in user's APP lists is extracted, to obtain carrying out the characteristic attribute of user gender prediction accordingly;The characteristic attribute in training set is trained by train classification models, to obtain carrying out the train classification models of user gender prediction;User's gender in test set sample is predicted by the train classification models for carrying out user gender prediction.The train classification models of progress user gender prediction in the embodiment of the present invention, are that the characteristic attribute of the progress user gender prediction in training set is trained by train classification models.Therefore, technical solution provided by the present invention can accurately predict the gender attribute of user, greatly increase the accuracy rate analyzed potential user.
Description
Technical field
The present invention relates to data analysis technique field, specifically, the present invention relates to a kind of user gender prediction method and
System.
Background technology
At present, under the background of big data, enterprise increasingly begins to focus on the potential value of mining data.Enterprise is in order to dig
The potential value of data is dug, generally requires accurately and fast to analyze behavioural habits, the consumption habit of user.Specifically, analysis is used
The behavioural habits at family, consumption habit, it is possible to draw a portrait to user, in order to the interest whereabouts user success of the user according to grasp
Recommend similar commodity.
Existing method, a kind of method for recommending article to user be the article in the purchaser record according to user's history with
And price, to the same kind of article of the identical price grade of user's recommendation.For example, the APP of mobile phone Taobao
(APPlication), wherein, APP refers to the third party application of smart mobile phone.Mobile phone Taobao (Android editions) is Ali
Ba Ba aims at the software for meeting its personal consumption and online shopping demand of Android phone user release.At present, applied to hand
In the APP of machine Taobao, the method to user's Recommendations is the commodity bought in the History Order of the user got, for example,
User has purchased " autumn coral fleece nightwear " before, then when user later starts mobile phone Taobao, then on the boundary of mobile phone Taobao APP
The Related product of similar " autumn coral fleece nightwear " is shown on face.In this way, similar article has been had purchased based on user, it is past
It is past to be greatly reduced to the probability of the similar article of user's successful referral.
Existing method, another method to user's recommendation article are the objects in the purchaser record according to user's history
Product, the article bought to active user's recommendation user for buying the article similary with active user.For example, applied to sub- horse
In inferior APP, the method to user's Recommendations is the books commodity bought in the History Order of the user got, for example,
User has purchased before《Count the probability statistics of thinker person's mathematics》Books, then start Amazon in user later
During APP, then shown on the interface of the APP of Amazon and buy the customer of this commodity also while have purchased corresponding books, example
Such as,《Hadoop technologies are explained in detail》,《The development approach of machine learning practical test driving》,《Think deeply as computer scientist
Python》,《HBase authority's guide》Etc. books, will not enumerate herein.In this way, different knowledge is had based on user
Structure, even if having purchased same books, it is also possible to which its remaining books for needing to read is completely different.Therefore, make in this way
The probability of books that must have been bought to other relevant users of user's successful referral greatly reduces.
It is existing to user recommend article method the shortcomings that be:Success recommends the probability of article low to user, to potential
The accuracy rate of customer analysis, greatly reduces user experience.
Invention content
The embodiment of the present invention is to provide a kind of user gender prediction method and system, by train classification models to training
The characteristic attribute of the progress user gender prediction of concentration is trained to obtain the train classification models for carrying out user gender prediction.Cause
This, technical solution provided by the present invention can accurately predict the gender attribute of user, greatly increase to potential use
The accurate judgement at family, improves user experience.
In a first aspect, an embodiment of the present invention provides a kind of user gender prediction method, the method includes:
The label of different APP is captured as the sample for carrying out user gender prediction, and therefrom choose user's registration information to have
The sample of gender mark is marked accordingly, to get the sample after mark;
The sample after the mark is divided into training set sample and test set sample according to pre-defined rule;
Corresponding APP lists are extracted from the daily record of the user got, and to the feature of the APP in user's APP lists
It extracts, to obtain carrying out the characteristic attribute of user gender prediction accordingly;
The characteristic attribute in training set is trained by train classification models, to obtain carrying out user gender prediction's
Train classification models;
User's gender in test set sample is predicted by the train classification models for carrying out user gender prediction.
Preferably, it is described corresponding APP lists to be extracted from the daily record of the user got, and in user's APP lists
The feature of APP extract, specifically included with the characteristic attribute for obtaining carrying out user gender prediction accordingly:
The daily record of user got by the analysis of Hadoop map-reduce software frames is arranged with extracting corresponding APP
Table;
Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted;
According to preset rule, the APP in the APP lists of corresponding user is filtered, can be distinguished with filtering out
Go out the APP set of sex character;
The startup time in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics
Number, and the APP numbers of starts are recorded;
The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user's gender
The characteristic attribute of prediction;
To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, with
Obtain carrying out the characteristic attribute of user gender prediction accordingly.
Preferably, which is characterized in that the preset rule is specially:
When the APP in the APP lists of user document parameter probability valuing ranging from be more than 0.1 and less than 0.8 and should
When APP is by more than at least ten user installation, selects and add in the APP set that the APP can extremely distinguish sex character.
Preferably, based on the GBDT train classification models in the scikit-learn of machine learning library to the feature in training set
Attribute is trained, to obtain carrying out the train classification models of user gender prediction.
Preferably, after the train classification models for obtaining carrying out user gender prediction, the method further includes:
The train classification models of progress user gender prediction assessed by the characteristic attribute in test set sample
Accuracy rate.
Preferably, the training set sample of selection and the ratio of test set sample are specially 7:3.
Second aspect, an embodiment of the present invention provides a kind of user gender prediction system, the system comprises:
Sample acquisition unit captures the label of different APP as the sample for carrying out user gender prediction, and therefrom chooses and use
The sample that family log-on message has gender to identify is marked accordingly, to get the sample after mark;
Sample after the mark is divided into training set sample and test set sample by sample classification unit according to pre-defined rule
This;
Characteristic attribute acquiring unit extracts corresponding APP lists, and to user APP from the daily record of the user got
The feature of APP in list extracts, to obtain carrying out the characteristic attribute of user gender prediction accordingly;
Model obtains and predicting unit, and the characteristic attribute in training set is trained by train classification models, with
To the train classification models for carrying out user gender prediction;And
User's gender in test set sample is predicted by the train classification models for carrying out user gender prediction.
Preferably, the characteristic attribute acquiring unit is specifically used for:
The daily record of user got by the analysis of Hadoop map-reduce software frames is arranged with extracting corresponding APP
Table;
Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted;
According to preset rule, the APP in the APP lists of corresponding user is filtered, can be distinguished with filtering out
Go out the APP set of sex character;
The startup time in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics
Number, and the APP numbers of starts are recorded;
The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user's gender
The characteristic attribute of prediction;
To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, with
Obtain carrying out the characteristic attribute of user gender prediction accordingly.
Preferably, the preset rule of the characteristic attribute acquiring unit is specially:
When the APP in the APP lists of user document parameter probability valuing ranging from be more than 0.1 and less than 0.8 and should
When APP is by more than at least ten user installation, selects and add in the APP set that the APP can extremely distinguish sex character.
Preferably, model obtains and predicting unit is specifically used for:
Based on the GBDT train classification models in the scikit-learn of machine learning library to the characteristic attribute in training set into
Row training, to obtain carrying out the train classification models of user gender prediction.
An embodiment of the present invention provides a kind of user gender prediction method and system, wherein, the method includes:Crawl is not
Label with APP therefrom chooses the sample that user's registration information has gender to identify as the sample for carrying out user gender prediction
It is marked accordingly, to get the sample after mark;The sample after mark is divided into training set sample according to pre-defined rule
With test set sample;Corresponding APP lists are extracted from the daily record of the user got, and to the APP's in user's APP lists
Feature extracts, to obtain carrying out the characteristic attribute of user gender prediction accordingly;By train classification models to training set
In characteristic attribute be trained, with obtain carry out user gender prediction train classification models;It is pre- by carrying out user's gender
The train classification models of survey predict user's gender in test set sample.Carry out user's gender in the embodiment of the present invention
The train classification models of prediction, be by train classification models in training set progress user gender prediction characteristic attribute into
Row training obtains.Therefore, technical solution provided by the present invention can accurately predict the gender attribute of user, greatly
The accuracy rate analyzed potential user is improved, improves user experience.
Description of the drawings
Fig. 1 is a kind of flow chart of user gender prediction method provided in an embodiment of the present invention;
Fig. 2 is a kind of structure diagram of user gender prediction system provided in an embodiment of the present invention.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention
In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
All other embodiments obtained without making creative work shall fall within the protection scope of the present invention.
For ease of the understanding to the embodiment of the present invention, it is further explained below in conjunction with attached drawing with specific embodiment
It is bright.
In technical solution provided by the present invention, it is used as by the label for capturing different APP and carries out user gender prediction's
Sample, and the sample that therefrom selection user's registration information has gender to identify is marked accordingly, to get the sample after mark
This;The sample after mark is divided into training set sample and test set sample according to pre-defined rule;From the daily record of the user got
The middle corresponding APP lists of extraction, and the feature of the APP in user's APP lists is extracted, to obtain carrying out user accordingly
The characteristic attribute of gender prediction;The characteristic attribute in training set is trained by train classification models, to obtain being used
The train classification models of family gender prediction;By carrying out the train classification models of user gender prediction to the use in test set sample
Family gender is predicted.The train classification models of progress user gender prediction in the embodiment of the present invention, are classified by training
Model is trained the characteristic attribute of the progress user gender prediction in training set.Therefore, it is provided by the present invention
Technical solution can accurately predict the gender attribute of user, greatly increase the accuracy rate analyzed potential user, carry
High user experience.
Existing method according to the data come out, obtains corresponding user's portrait, and according to the hobby of user to
The method of corresponding commodity is recommended at family, the gender attribute of nearly 90% user, can not Direct Recognition come out.It is gender-based not
Together, consumption habit is also different.If carrying out the recommendation of dependent merchandise to women, the probability that women buys similar commodity greatly improves.
, whereas if carrying out the recommendation of dependent merchandise to male, because male is mainly rational thinking, male buys the general of similar commodity
Rate substantially reduces.For the existing hobby according to user to the method for user's Recommendations, the probability of successful referral is high.Cause
This, in order to improve the probability for successfully recommending similar commodity to user, improves the accuracy rate analyzed potential user, basis is needed to obtain
The behavioural habits of the user got and consumption habit predict the gender of user.
The technical solution that the invention will now be described in detail with reference to the accompanying drawings.
Fig. 1 is a kind of flow chart of user gender prediction method provided in an embodiment of the present invention, as shown in Figure 1, a kind of use
Family gender prediction's method includes the following steps:
S101:The label of different APP is captured as the sample for carrying out user gender prediction, and therefrom chooses user's registration letter
The sample that breath has gender to identify is marked accordingly, to get the sample after mark.Wherein, the training set sample of selection with
The ratio of test set sample is specially 7:3.
It should be noted that the label of different APP is captured, it can be complete by crawler capturing 360 mobile phone assistant APP labels
Into.
S102:The sample after mark is divided into training set sample and test set sample according to pre-defined rule.
S103:Corresponding APP lists are extracted from the daily record of the user got, and to the APP's in user's APP lists
Feature extracts, to obtain carrying out the characteristic attribute of user gender prediction accordingly.
Specifically, corresponding APP lists are extracted from the daily record of the user got, and in user's APP lists
The feature of APP extracts, and following steps are specifically included with the characteristic attribute for obtaining carrying out user gender prediction accordingly:
The daily record of user got by the analysis of Hadoop map-reduce software frames is arranged with extracting corresponding APP
Table.It should be noted that Hadoop map-reduce are a software frames, application can easily be write based on the frame
Program, these application programs can be operated on the big cluster being made of thousands of a business machines, and reliable with one kind, had
The mode of fault-tolerant ability concurrently handles the mass data collection of TB ranks.
Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted.It needs
It is bright, each APP is interpreted as word, the APP lists of each user are interpreted as document, based on the document in natural language processing
Frequency counts each APP documents probability.
According to preset rule, the APP in the APP lists of corresponding user is filtered, can be distinguished with filtering out
Go out the APP set of sex character;It should be noted that preset rule is specially:When the text of the APP in the APP lists of user
Shelves parameter probability valuing be ranging from more than 0.1 and less than the 0.8 and APP by more than at least ten user installation when, selection is simultaneously
It adds in the APP set that the APP can extremely distinguish sex character.
The startup time in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics
Number, and the APP numbers of starts are recorded.
The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user's gender
The characteristic attribute of prediction.
To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, with
Obtain carrying out the characteristic attribute of user gender prediction accordingly.
In practical applications, the process of feature extraction is described in detail below:
First, daily record is analyzed by Hadoop map-reduce, extracts the APP lists of each user.
Secondly, each APP is interpreted as word, the APP lists of each user are interpreted as document, based in natural language processing
Document frequency, count each APP documents probability.
Then, the APP for being used as feature is selected based on document probability sieves.Wherein, preset screening rule is:
Document probability is less than 0.8 and is more than 0.1, and APP is at least by 10 user installations.
Later, the daily number of starts of each user in APP set selected by statistics, the value as this feature.
Finally, the label based on APP and the number of starts add up the value of label characteristics.
Explanation is needed further exist for, in practical applications, by judging that the APP that user uses can tentatively be inferred to use
The gender at family.Often it is women for example, playing the user for seeing APP game repeatedly.And the user of Need For Speed APP game is played, often
For male.Often it is male in addition, using the user of Amazon APP purchase books.And it is washed using mobile phone Taobao APP purchases clean
The user of essence, is often women.The keyword of relevant user buying habit detected by these, often can tentatively judge
Go out the gender of user.
S104:The characteristic attribute in training set is trained by train classification models, to obtain carrying out user's gender
The train classification models of prediction.
Specifically, based on the GBDT train classification models in the scikit-learn of machine learning library to the spy in training set
Sign attribute is trained, to obtain carrying out the train classification models of user gender prediction.
It should be noted that scikit-learn machine learning library, scikit-learn is a machine learning mould of increasing income
Block.The characteristics of it is maximum is exactly to provide various machine learning algorithm interfaces to the user, and user can be allowed simply and efficiently to carry out
Data mining and data analysis.
Scikit-learn can be loaded into data set, and common machine learning library data are contained in scikit-learn
Collection, for example iris the and digit data sets classified are done, for the classical data set Boston house prices of recurrence.
Scikit-learn can be learnt and be predicted.Scikit-learn provides connecing for various machine learning algorithms
Mouthful, allow user that can easily use.The calling of each algorithm is just as a black box, for a user, it is only necessary to root
According to the demand of oneself, corresponding parameter is set.
GBDT (Gradient Boosting Decision Tree) is a kind of decision Tree algorithms of iteration, herein no longer
It repeats.
S105:User's gender in test set sample is carried out by the train classification models for carrying out user gender prediction pre-
It surveys.
In addition, after to obtain carrying out the train classification models of user gender prediction, pass through the spy in test set sample
The accuracy rate of the train classification models of progress user gender prediction that sign attribute evaluation obtains.
Existing technology, it is unpredictable go out 90% user gender.And technical solution provided by the present invention, from user
It is extracted in cell phone application list and can be used in the startup frequency of the other APP of distinctiveness and the corresponding APP of statistics to predict use
The gender at family.According to statistics, the gender of prediction user is gone by technical solution provided by the present invention, rate of accuracy reached arrives
89%.
In conclusion a kind of user gender prediction method provided in an embodiment of the present invention, by the label for capturing different APP
As the sample for carrying out user gender prediction, and the sample that therefrom selection user's registration information has gender to identify is marked accordingly
Note, to get the sample after mark;The sample after mark is divided into training set sample and test set sample according to pre-defined rule;
Corresponding APP lists are extracted from the daily record of the user got, and the feature of the APP in user's APP lists is extracted,
To obtain carrying out the characteristic attribute of user gender prediction accordingly;By train classification models to the characteristic attribute in training set into
Row training, to obtain carrying out the train classification models of user gender prediction;By the training classification mould for carrying out user gender prediction
Type predicts user's gender in test set sample.The training classification of progress user gender prediction in the embodiment of the present invention
Model is that the characteristic attribute of the progress user gender prediction in training set is trained by train classification models.
Therefore, technical solution provided by the present invention can accurately predict the gender attribute of user, greatly increase to potential
The accuracy rate of customer analysis, improves user experience.
As shown in Fig. 2, a kind of user gender prediction system that the embodiment of the present invention is provided, including:Sample acquisition unit
201st, sample classification unit 202, characteristic attribute acquiring unit 203 and model acquisition and predicting unit 204.
Specifically, sample acquisition unit, captures the label of different APP as the sample for carrying out user gender prediction, and
The sample that therefrom choosing user's registration information has gender to identify is marked accordingly, to get the sample after mark;
Sample after mark is divided into training set sample and test set sample by sample classification unit according to pre-defined rule;It needs
It is noted that the training set sample of sample classification unit selection and the ratio of test set sample are specially 7:3.
Characteristic attribute acquiring unit extracts corresponding APP lists, and to user APP from the daily record of the user got
The feature of APP in list extracts, to obtain carrying out the characteristic attribute of user gender prediction accordingly;
Model obtains and predicting unit, and the characteristic attribute in training set is trained by train classification models, with
To the train classification models for carrying out user gender prediction;And by carrying out the train classification models of user gender prediction to test
User's gender in collection sample is predicted.
Further, characteristic attribute acquiring unit is specifically used for:
The daily record of user got by the analysis of Hadoop map-reduce software frames is arranged with extracting corresponding APP
Table;
Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted;
According to preset rule, the APP in the APP lists of corresponding user is filtered, can be distinguished with filtering out
Go out the APP set of sex character;
The startup time in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics
Number, and the APP numbers of starts are recorded;
The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user's gender
The characteristic attribute of prediction;
To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, with
Obtain carrying out the characteristic attribute of user gender prediction accordingly.
Further, the preset rule of characteristic attribute acquiring unit is specially:When the text of the APP in the APP lists of user
Shelves parameter probability valuing be ranging from more than 0.1 and less than the 0.8 and APP by more than at least ten user installation when, selection is simultaneously
It adds in the APP set that the APP can extremely distinguish sex character.
Further, model obtains and predicting unit is specifically used for:Based in the scikit-learn of machine learning library
GBDT train classification models are trained the characteristic attribute in training set, are classified with the training for obtaining carrying out user gender prediction
Model.
In addition, for the accuracy rate of the train classification models of progress user gender prediction preferably assessed, this hair
A kind of user gender prediction system that bright specific embodiment provides further includes:Assessment unit (does not mark) in fig. 2, is obtained in model
Take and predicting unit obtain carry out user gender prediction train classification models after, assessment unit, by test set sample
Characteristic attribute assess progress user gender prediction train classification models accuracy rate.
In technical scheme of the present invention, the sample for carrying out user gender prediction is used as by the label for capturing different APP, and
The sample that therefrom choosing user's registration information has gender to identify is marked accordingly, to get the sample after mark;According to
Sample after mark is divided into training set sample and test set sample by pre-defined rule;Phase is extracted from the daily record of the user got
The APP lists answered, and the feature of the APP in user's APP lists is extracted, to obtain carrying out user gender prediction accordingly
Characteristic attribute;The characteristic attribute in training set is trained by train classification models, it is pre- to obtain progress user's gender
The train classification models of survey;By carry out the train classification models of user gender prediction to user's gender in test set sample into
Row prediction.The train classification models of progress user gender prediction in the embodiment of the present invention, are to instruction by train classification models
What the characteristic attribute for the progress user gender prediction that white silk is concentrated was trained.Therefore, technical solution provided by the present invention
The gender attribute of user can be accurately predicted, the accuracy rate analyzed potential user is greatly increased, improves user
Experience Degree.
Above-described specific embodiment has carried out the purpose of the present invention, technical solution and advantageous effect further
It is described in detail, it should be understood that the foregoing is merely the specific embodiment of the present invention, is not intended to limit the present invention
Protection domain, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (10)
- A kind of 1. user gender prediction method, which is characterized in that including:The label of different APP is captured as the sample for carrying out user gender prediction, and therefrom choose user's registration information to have gender The sample of mark is marked accordingly, to get the sample after mark;The sample after the mark is divided into training set sample and test set sample according to pre-defined rule;Corresponding APP lists are extracted from the daily record of the user got, and the feature of the APP in user's APP lists is carried out Extraction, to obtain carrying out the characteristic attribute of user gender prediction accordingly;The characteristic attribute in training set is trained by train classification models, to obtain carrying out the training of user gender prediction Disaggregated model;User's gender in test set sample is predicted by the train classification models for carrying out user gender prediction.
- 2. according to the method described in claim 1, it is characterized in that, described extract accordingly from the daily record of the user got APP lists, and the feature of the APP in user's APP lists is extracted, to obtain carrying out the spy of user gender prediction accordingly Sign attribute specifically includes:The daily record of user got by the analysis of Hadoop map-reduce software frames, to extract corresponding APP lists;Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted;According to preset rule, the APP in the APP lists of corresponding user is filtered, with filter out being capable of distinguishing property The APP set of other feature;The number of starts in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics, and The APP numbers of starts are recorded;The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user gender prediction Characteristic attribute;To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, to obtain The corresponding characteristic attribute for carrying out user gender prediction.
- 3. according to the method described in claim 2, it is characterized in that, the preset rule is specially:When the document parameter probability valuing of the APP in the APP lists of user is ranging from more than 0.1 and less than 0.8 and the APP quilts During more than at least ten user installation, select and add in the APP set that the APP can extremely distinguish sex character.
- 4. according to the method described in claim 1, it is characterized in that, based on the GBDT instructions in the scikit-learn of machine learning library Practice disaggregated model to be trained the characteristic attribute in training set, to obtain carrying out the train classification models of user gender prediction.
- 5. according to the method described in claim 1, it is characterized in that, in the train classification models for obtaining carrying out user gender prediction Later, the method further includes:By the characteristic attribute in test set sample assess progress user gender prediction train classification models it is accurate Rate.
- 6. according to the method described in claim 1, it is characterized in that, the training set sample and the ratio of test set sample chosen have Body is 7:3.
- 7. a kind of user gender prediction system, which is characterized in that including:Sample acquisition unit captures the label of different APP as the sample for carrying out user gender prediction, and therefrom chooses user's note The sample that volume information has gender to identify is marked accordingly, to get the sample after mark;Sample after the mark is divided into training set sample and test set sample by sample classification unit according to pre-defined rule;Characteristic attribute acquiring unit extracts corresponding APP lists, and to user's APP lists from the daily record of the user got In the feature of APP extract, to obtain carrying out the characteristic attribute of user gender prediction accordingly;Model obtain and predicting unit, the characteristic attribute in training set is trained by train classification models, with obtain into The train classification models of row user gender prediction;AndUser's gender in test set sample is predicted by the train classification models for carrying out user gender prediction.
- 8. system according to claim 7, which is characterized in that the characteristic attribute acquiring unit is specifically used for:The daily record of user got by the analysis of Hadoop map-reduce software frames, to extract corresponding APP lists;Based on natural language processing method, the document frequency of the APP in the APP lists of user is counted;According to preset rule, the APP in the APP lists of corresponding user is filtered, with filter out being capable of distinguishing property The APP set of other feature;The number of starts in the APP predetermined times in the APP set of sex character can be distinguished for each user statistics, and The APP numbers of starts are recorded;The label of the APP of sex character will be distinguished, the corresponding APP numbers of starts are denoted as carrying out user gender prediction Characteristic attribute;To the label of the APP that can distinguish sex character of acquisition, the corresponding APP numbers of starts are recorded, to obtain The corresponding characteristic attribute for carrying out user gender prediction.
- 9. system according to claim 8, which is characterized in that the preset rule of the characteristic attribute acquiring unit is specific For:When the document parameter probability valuing of the APP in the APP lists of user is ranging from more than 0.1 and less than 0.8 and the APP quilts During more than at least ten user installation, select and add in the APP set that the APP can extremely distinguish sex character.
- 10. system according to claim 7, which is characterized in that model obtains and predicting unit is specifically used for:The characteristic attribute in training set is instructed based on the GBDT train classification models in the scikit-learn of machine learning library Practice, to obtain carrying out the train classification models of user gender prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611236167.3A CN108256537A (en) | 2016-12-28 | 2016-12-28 | A kind of user gender prediction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611236167.3A CN108256537A (en) | 2016-12-28 | 2016-12-28 | A kind of user gender prediction method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108256537A true CN108256537A (en) | 2018-07-06 |
Family
ID=62720215
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611236167.3A Pending CN108256537A (en) | 2016-12-28 | 2016-12-28 | A kind of user gender prediction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108256537A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109191185A (en) * | 2018-08-15 | 2019-01-11 | 深圳市和讯华谷信息技术有限公司 | A kind of visitor's heap sort method and system |
CN109885834A (en) * | 2019-02-18 | 2019-06-14 | 中国联合网络通信集团有限公司 | A kind of prediction technique and device of age of user gender |
CN109933698A (en) * | 2019-02-27 | 2019-06-25 | 腾讯科技(深圳)有限公司 | A kind of the source method of calibration and device of user's portrait |
CN111143441A (en) * | 2019-12-30 | 2020-05-12 | 北京每日优鲜电子商务有限公司 | Gender determination method, device, equipment and storage medium |
CN111222026A (en) * | 2020-01-09 | 2020-06-02 | 支付宝(杭州)信息技术有限公司 | Training method of user category identification model and user category identification method |
CN111291798A (en) * | 2020-01-21 | 2020-06-16 | 北京工商大学 | User basic attribute prediction method based on ensemble learning |
CN111639714A (en) * | 2020-06-01 | 2020-09-08 | 贝壳技术有限公司 | Method, device and equipment for determining attributes of users |
CN112132209A (en) * | 2020-09-19 | 2020-12-25 | 北京智能工场科技有限公司 | Attribute prediction method based on bias characteristics |
CN112434136A (en) * | 2020-12-08 | 2021-03-02 | 深圳市欢太科技有限公司 | Gender classification method, gender classification device, electronic equipment and computer storage medium |
CN113268654A (en) * | 2020-02-17 | 2021-08-17 | 北京搜狗科技发展有限公司 | User gender identification method and device and electronic equipment |
CN113657917A (en) * | 2020-05-12 | 2021-11-16 | 上海佳投互联网技术集团有限公司 | Visitor gender analysis method and system based on USER-AGENT |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729785A (en) * | 2014-01-26 | 2014-04-16 | 合一信息技术(北京)有限公司 | Video user gender classification method and device for method |
US9471851B1 (en) * | 2015-06-29 | 2016-10-18 | International Business Machines Corporation | Systems and methods for inferring gender by fusion of multimodal content |
CN106203473A (en) * | 2016-06-24 | 2016-12-07 | 有米科技股份有限公司 | A kind of mobile subscriber's gender prediction's method based on installation kit list |
CN106204127A (en) * | 2016-07-06 | 2016-12-07 | 乐视控股(北京)有限公司 | User's evaluation methodology and device for application |
-
2016
- 2016-12-28 CN CN201611236167.3A patent/CN108256537A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729785A (en) * | 2014-01-26 | 2014-04-16 | 合一信息技术(北京)有限公司 | Video user gender classification method and device for method |
US9471851B1 (en) * | 2015-06-29 | 2016-10-18 | International Business Machines Corporation | Systems and methods for inferring gender by fusion of multimodal content |
CN106203473A (en) * | 2016-06-24 | 2016-12-07 | 有米科技股份有限公司 | A kind of mobile subscriber's gender prediction's method based on installation kit list |
CN106204127A (en) * | 2016-07-06 | 2016-12-07 | 乐视控股(北京)有限公司 | User's evaluation methodology and device for application |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109191185A (en) * | 2018-08-15 | 2019-01-11 | 深圳市和讯华谷信息技术有限公司 | A kind of visitor's heap sort method and system |
CN109885834A (en) * | 2019-02-18 | 2019-06-14 | 中国联合网络通信集团有限公司 | A kind of prediction technique and device of age of user gender |
CN109933698B (en) * | 2019-02-27 | 2021-06-08 | 腾讯科技(深圳)有限公司 | User portrait source verification method and device |
CN109933698A (en) * | 2019-02-27 | 2019-06-25 | 腾讯科技(深圳)有限公司 | A kind of the source method of calibration and device of user's portrait |
CN111143441A (en) * | 2019-12-30 | 2020-05-12 | 北京每日优鲜电子商务有限公司 | Gender determination method, device, equipment and storage medium |
CN111222026A (en) * | 2020-01-09 | 2020-06-02 | 支付宝(杭州)信息技术有限公司 | Training method of user category identification model and user category identification method |
CN111222026B (en) * | 2020-01-09 | 2023-07-14 | 支付宝(杭州)信息技术有限公司 | Training method of user category recognition model and user category recognition method |
CN111291798A (en) * | 2020-01-21 | 2020-06-16 | 北京工商大学 | User basic attribute prediction method based on ensemble learning |
CN111291798B (en) * | 2020-01-21 | 2021-04-20 | 北京工商大学 | User basic attribute prediction method based on ensemble learning |
CN113268654A (en) * | 2020-02-17 | 2021-08-17 | 北京搜狗科技发展有限公司 | User gender identification method and device and electronic equipment |
CN113657917A (en) * | 2020-05-12 | 2021-11-16 | 上海佳投互联网技术集团有限公司 | Visitor gender analysis method and system based on USER-AGENT |
CN111639714A (en) * | 2020-06-01 | 2020-09-08 | 贝壳技术有限公司 | Method, device and equipment for determining attributes of users |
CN112132209A (en) * | 2020-09-19 | 2020-12-25 | 北京智能工场科技有限公司 | Attribute prediction method based on bias characteristics |
CN112132209B (en) * | 2020-09-19 | 2024-05-31 | 北京智能工场科技有限公司 | Attribute prediction method based on biasing characteristics |
CN112434136A (en) * | 2020-12-08 | 2021-03-02 | 深圳市欢太科技有限公司 | Gender classification method, gender classification device, electronic equipment and computer storage medium |
CN112434136B (en) * | 2020-12-08 | 2024-04-23 | 深圳市欢太科技有限公司 | Sex classification method, apparatus, electronic device and computer storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108256537A (en) | A kind of user gender prediction method and system | |
CN109697629B (en) | Product data pushing method and device, storage medium and computer equipment | |
CN109325179B (en) | Content promotion method and device | |
JP7356206B2 (en) | Content recommendation and display | |
CN107833082B (en) | Commodity picture recommendation method and device | |
CN109492180A (en) | Resource recommendation method, device, computer equipment and computer readable storage medium | |
CN103927309B (en) | A kind of method and device to business object markup information label | |
US20180053234A1 (en) | Description information generation and presentation systems, methods, and devices | |
CN109471657A (en) | Gray scale dissemination method, device, computer equipment and computer storage medium | |
CN106164959A (en) | Behavior affair system and correlation technique | |
CN109711931A (en) | Method of Commodity Recommendation, device, equipment and storage medium based on user's portrait | |
CN109658188A (en) | Source of houses recommended method, device, equipment and storage medium based on big data analysis | |
CN108399565A (en) | Financial product recommendation apparatus, method and computer readable storage medium | |
CN108416627A (en) | A kind of brand influence force monitoring method and system based on internet data | |
CN110392155A (en) | It has been shown that, processing method, device and the equipment of notification message | |
CN107977678A (en) | Method and apparatus for output information | |
CN110490237A (en) | Data processing method, device, storage medium and electronic equipment | |
CN110688455A (en) | Method, medium and computer equipment for filtering invalid comments based on artificial intelligence | |
CN110807691B (en) | Cross-commodity-class commodity recommendation method and device | |
CN115147130A (en) | Problem prediction method, apparatus, storage medium, and program product | |
CN109146606A (en) | A kind of brand recommended method, electronic equipment, storage medium and system | |
CN111383072A (en) | User credit scoring method, storage medium and server | |
CN111429200B (en) | Content association method and device, storage medium and computer equipment | |
CN110647504A (en) | Method and device for searching judicial documents | |
CN108596646A (en) | A kind of garment coordination recommendation method of fusion face character analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180706 |