CN109145932A - User's gender prediction's method, device and equipment - Google Patents
User's gender prediction's method, device and equipment Download PDFInfo
- Publication number
- CN109145932A CN109145932A CN201710507593.4A CN201710507593A CN109145932A CN 109145932 A CN109145932 A CN 109145932A CN 201710507593 A CN201710507593 A CN 201710507593A CN 109145932 A CN109145932 A CN 109145932A
- Authority
- CN
- China
- Prior art keywords
- user
- gender
- predicted
- characteristic
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of user gender prediction method, device and equipments, when carrying out user gender prediction, extract the characteristic that do not predict for progressive of each preset kind data of each user to be predicted;The characteristic of extraction is substituted into preset full dose user in predicting model and obtains the full dose gender prediction of each user to be predicted as a result, and the corresponding default component user in predicting model of each characteristic that each characteristic of at least one preset kind data substitutes into the preset kind data respectively is obtained component gender prediction's result of at least one user to be predicted;Then full dose prediction result and component prediction result are merged to obtain the final gender prediction result of each user to be predicted.Final prediction result when present invention progress user gender prediction in conjunction with the output result of full dose user in predicting model and component user in predicting model as user's gender, can largely promote the accuracy rate of gender prediction.
Description
Technical field
The present invention relates to the communications field more particularly to a kind of user gender prediction method, device and equipments.
Background technique
User's portrait is also known as user role (Persona), delineates target user, connection user's demand and design as one kind
The effective tool in direction, user's portrait are widely used in each field.Often with most during practical operation
The attribute of user, behavior and expectation are tied for plain and closeness to life language.As the virtual representations of actual user,
User's portrait be formed by user role be not be detached from it is constructed except product and market come out, the user role of formation needs
Want the main audient and target group of representative energy representative products.
As its name suggests, user gender prediction refers to, daily by its to the user of (such as telecommunications) in certain carrier network
Internet content and voice habit, predict the network gender of the user.It is then assumed that network gender and the true gender of user are strong
It is relevant.The network gender predicted is defined as true gender.Certainly also there is the case where network gender and true gender are not inconsistent,
But operator is more concerned about its virtual network gender come out by online and phonetic representation.
The research of the existing gender prediction to each user of mobile communication field is all only limitted to based on for daily internet content
And the data such as voice habit, user gender prediction is carried out by a training pattern (single model), is obtained to by the training pattern
To gender prediction's result have no other any correction mechanisms, the problem for causing gender prediction's result accuracy rate low.
Summary of the invention
User gender prediction method, device and equipment provided in an embodiment of the present invention, mainly solving the technical problems that: it is existing
The low problem of the prediction result accuracy rate for thering is user gender prediction only to cause by single model progress gender prediction.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of user gender prediction method, which comprises
Extract the characteristic that do not predict for progressive of each preset kind data of each user to be predicted;
The characteristic of extraction is substituted into preset full dose user in predicting model and obtains the full dose of each user to be predicted
Each characteristic of at least one preset kind data as a result, and is substituted into each spies of the preset kind data by gender prediction respectively
The corresponding default component user in predicting model of sign data obtains the component gender prediction of at least one user to be predicted
As a result;The full dose user in predicting model is in the training process according to the feature of each preset kind data of training user
Data training obtains, the component user in predicting model be in the training process according to the training user at least one
Each characteristic training of preset kind data obtains;
It merges the full dose prediction result and the component prediction result obtains the final gender of each user to be predicted
Prediction result.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of user gender prediction device, comprising:
Data extraction module, each preset kind data for extracting each user to be predicted are not predicted for progressive
Characteristic;
Model processing modules, it is pre- that the characteristic for extracting the data extraction module substitutes into preset full dose user
It surveys model and obtains the full dose gender prediction of each user to be predicted as a result, and by each feature of at least one preset kind data
The corresponding default component user in predicting model of each characteristic that data substitute into the preset kind data respectively obtains at least
Component gender prediction's result of one user to be predicted;The full dose user in predicting model is in the training process according to instruction
The characteristic training for practicing each preset kind data of user obtains, and the component user in predicting model is in training process
Each characteristic training of middle at least one preset kind data according to the training user obtains;
Predict processing module, for merge the full dose prediction result and the component prediction result obtain it is described respectively to pre-
Survey the final gender prediction result of user.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of user gender prediction equipment, comprising: processing
Device, memory and communication bus;
The communication bus is for realizing the connection communication between the processor and the memory;
It is as described above to realize that the processor is used to execute the user gender prediction program stored in the memory
Step in user's gender prediction's method.
The embodiment of the present invention also provides a kind of computer storage medium, is stored with computer in the computer storage medium
Executable instruction, the computer executable instructions are for executing user gender prediction method above-mentioned.
The beneficial effects of the present invention are:
The user's gender prediction's method, device and equipment provided according to embodiments of the present invention is carrying out user gender prediction
When, extract the characteristic that do not predict for progressive of each preset kind data of each user to be predicted;By the feature of extraction
Data substitute into preset full dose user in predicting model obtain each user to be predicted full dose gender prediction as a result, and by least one
Each characteristic of preset kind data substitutes into the corresponding default component of each characteristic of the preset kind data respectively
User in predicting model obtains component gender prediction's result of at least one user to be predicted;Then by full dose prediction result and
Component prediction result is merged to obtain the final gender prediction result of each user to be predicted.The present invention carries out user gender prediction
When be not directly adopt the prediction result of single model as final prediction result, but combine full dose user in predicting model and point
Measure final prediction result of the output result as user's gender of user in predicting model, therefore can largely enhancing
The accuracy rate that do not predict.
Detailed description of the invention
Fig. 1 is one schematic diagram of full dose prediction result and component prediction result amalgamation mode that the embodiment of the present invention one provides;
Fig. 2 is two schematic diagram of full dose prediction result and component prediction result amalgamation mode that the embodiment of the present invention one provides;
Fig. 3 is user gender prediction method flow schematic diagram provided by Embodiment 2 of the present invention;
Fig. 4 is one schematic diagram of full dose prediction result provided by Embodiment 2 of the present invention and component prediction result amalgamation mode;
Fig. 5 is two schematic diagram of full dose prediction result provided by Embodiment 2 of the present invention and component prediction result amalgamation mode;
Fig. 6 is user's gender prediction's apparatus structure schematic diagram that the embodiment of the present invention three provides;
Fig. 7 is user's gender prediction's device structure schematic diagram that the embodiment of the present invention four provides.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiment is a part of the embodiment in the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Embodiment one:
User gender prediction scheme provided in this embodiment include at least model training and to user's gender to be predicted into
Row two processes of prediction.
Wherein, the model training in the present embodiment includes the training of full dose user in predicting model and component user in predicting model
Process, the process include extracting feature, model construction and carrying out the process such as predicting according to the model of building.
Extracted in the present embodiment feature can for training user (gender of training user is known, therefore men and women's property
Other ratio is also known), the characteristic for being used to carry out gender prediction of respective type data is extracted, herein respective type number
According to namely the training stage preset kind data, the data of which type are specifically chosen, and extract which in these types
A little characteristics can flexibly be set, as long as can effectively carry out gender distinguishes prediction.
In the present embodiment, for the characteristic of extraction, it can be modeled to obtain one using corresponding modeling pattern
The output result of full dose user in predicting model, the full dose user in predicting model can predict the gender of all training users.
It, can also be for each characteristic difference of at least a kind of preset kind data while in order to promote the accuracy rate of user gender prediction
Corresponding component user in predicting model is established, the foundation of component user in predicting model is preferably used and built with full dose user in predicting model
Identical modeling pattern immediately, such as modeling pattern in the present embodiment include but is not limited to logistic regression modeling, supporting vector
Machine modeling, random forest modeling, GBDT (Gradient Boosting Decision Tree) modeling, XGboost
(Scalable and Flexible Gradient Boosting) modeling.Specifically select which kind of modeling pattern flexible
Setting, such as select XGboost modeling pattern to obtain full dose user in predicting model and at least one component user in a kind of example
Prediction model.Characteristic in the present embodiment specific to those categorical datas establishes component user in predicting model can also spirit
It is living selected.For example, in one embodiment, preset kind data may include the internet records number of preset web in training process
According to, using record data, communicating data and online at least one of use habit data.Including internet records data or
When using record data, phase can be established respectively for internet records data or using the characteristic of record data
The component user in predicting model answered.
In the present embodiment, when carrying out characteristic extraction, for an at least categorical data, it can be drawn using gender accounting
It is divided into multiple classifications to achieve the purpose that drop latitude.When existing network data in use and app (application) data, due to
Websites quantity is huge, and user number difference is very big, and when together as feature, eigenmatrix just becomes superelevation dimension supersparsity square
Battle array, is unfavorable for modeling.To this use drop latitude mode be then according to website, using APP natural quality or directly select master
Drift net station, abandons the website of minority, but the minority website abandoned includes that can distinguish to gender.For app data, can select
It selects and is also likely to be present in the data that the app of topN (N=15 or other) loses as feature, other data, but loses pair
Gender has the app and site information of differentiation.Existing this discarding or the APP number merged according to natural quality or carry out minority website
According to discarding achieve the purpose that drop latitude, can cover or abandon a part can be to the data that gender data distinguishes.In this regard, this reality
It applies example and can carry out classifying by sex ratio (such as masculinity proportion or female ratio) and reach drop latitude effect, avoiding can distinction
Other data are dropped, and while reliably drop latitude, promote the accuracy rate of gender prediction.
Such as: when preset kind data include the internet records data of preset web in the training process, internet records
The characteristic of data include at least one for the characteristic of male gender prediction and at least one for female gender into
At least one of the characteristic of row prediction, namely drop latitude processing is carried out by sex ratio;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling website and each goal-selling website
The amount of asking;Each goal-selling website is that in the training process, access preset targeted website male's accounting value is more than or equal to training user
Middle male's accounting value, the targeted website in a characteristic include access website male accounting value in default masculinity proportion value stroke
Divide each website in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling website and each goal-selling website
The amount of asking;Each goal-selling website is that in the training process, access preset targeted website women accounting value is more than or equal to the training
Women accounting value in user;
Then it is modeled respectively for each characteristic of internet records data using default modeling pattern accordingly and obtains phase
The component user in predicting model answered.
Each feature that each characteristic of at least one preset kind data is substituted into the preset kind data respectively
The corresponding default component user in predicting model of data includes:
Another example is: preset kind data include in training process preset application using record data when, the application
The characteristic of usage record data also includes that at least one is directed to for the characteristic of male gender prediction at least one
The characteristic that female gender is predicted;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling application and the application of each goal-selling
The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to the instruction using male's accounting value that goal-selling is applied
Practice male's accounting value in user, the target application in a characteristic includes male's accounting value using application in default male
Ratio value divides each application in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling application and the application of each goal-selling
The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to the instruction using the women accounting value that goal-selling is applied
Practice women accounting value in user;
Then it is modeled respectively for each characteristic using record data using default modeling pattern accordingly
To corresponding component user in predicting model.
The extracted characteristic quantity of a certain categorical data and corresponding component user in predicting are directed in the present embodiment
The number of model can flexibly be set.Such as can also be only comprising the characteristic for male gender prediction, and number can be with
Flexibly setting can also only include the characteristic predicted for female gender, and number can also flexibly be set.Certainly
It also can be simultaneously comprising being directed to the male gender characteristic predicted and the characteristic predicted for female gender.
Then according to the output for merging above-mentioned full dose user in predicting model and component user in predicting model as a result, and default
Male's probability threshold value, women probability threshold value is trained the prediction of user's gender, and by the prediction result of each training user and
The corresponding practical gender result of each training user is compared, and adjusts corresponding male's probability threshold value, women according to comparison result
Probability threshold value, until the male to female ratio obtained according to prediction result and male to female ratio actual in training user are equal or close.
In the present embodiment, carried out in the output result for merging above-mentioned full dose user in predicting model and component user in predicting model
During gender prediction, finally obtained gender probability value is compared with above-mentioned threshold value, wherein being higher than male's probability threshold value then
It is judged to male, is then judged to women lower than women probability threshold value, is more than or equal to women probability threshold value, is less than or equal to male's probability threshold value
It is then judged to neutrality, i.e., reserved is faintly neutral user with Sex is placed.
For example, in a kind of example: full dose prediction result and component prediction result are the gender probability of each user to be predicted
Value;Fusion full dose prediction result and component prediction result obtain the final gender prediction result of each user to be predicted referring to Fig. 1 institute
Show, comprising:
S101: being directed to each training user, obtains the gender probability of the full dose user in predicting model output of the training user
The gender probability value of value and the output of each component user in predicting model;
S102: the mean value for calculating the above-mentioned gender probability value got obtains gender prediction's probability value;
S103: the male's probability threshold value and women probability threshold being arranged in obtained gender prediction's probability value and training process
Value is compared, and obtains the final gender prediction's result of the training user.
In another example in a kind of example: full dose prediction result and component prediction result are to characterize each training user respectively to be
Male, neutrality, the 1 of women, 0, -1 ident value;
Fusion full dose prediction result and component prediction result obtain the final gender prediction result of each user to be predicted referring to
Shown in Fig. 2, comprising:
S201: be directed to each training user, obtain the training user full dose user in predicting model output ident value with
The product of the predictablity rate of the full dose user in predicting model, and obtain each component user in predicting model output of the training user
The product of the predictablity rate of ident value and each component user in predicting model;The predictablity rate of the full dose user in predicting model and each
The predictablity rate of component user in predicting model obtains in the training process according to the output of each model and training data comparison
It takes;
S202: it calculates the sum of each product got and obtains gender prediction's probability value;
S203: the male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and training process are carried out
Compare, obtains the final gender prediction's result of the training user.
The present embodiment can promote the accuracy rate of gender prediction by above-mentioned multi-model amalgamation mode, while not will increase again
Model latitude.
In the present embodiment, in order to further enhance the accuracy of gender prediction, when can also further be slided by setting
Between window achieve the purpose that be modified the result predicted before.It is modeled again in each time window, when obtaining each
Between in window user gender prediction's result.The gender prediction of user in each time window is merged as a result, obtaining user most
Whole gender prediction's result.
In the present embodiment, when using sliding time window, in the training process, each preset kind of each training user is extracted
The characteristic that do not predict for progressive of data is the characteristic extracted in current gender predicted time window;
At this point, fusion full dose prediction result and component prediction result obtain the final gender prediction result of each training user
Afterwards, further includes: each training user is directed to, by the corresponding final gender prediction result of current predictive time window and before at least one
The corresponding final gender prediction result of a predicted time window is matched, and the lastness of the training user is adjusted according to matching result
Other prediction result.
Wherein, by the corresponding final gender prediction result of current predictive time window and at least one predicted time window pair before
The final gender prediction result answered is matched including but not limited to following two mode:
Mode one: the corresponding final gender prediction result of current predictive time window and upper predicted time window is corresponding most
Whole gender prediction's result matches;
Include: according to the final gender prediction result that matching result adjusts the training user
If matching result is that gender is identical, enhance the final gender prediction result of the training user;
If matching result be gender on the contrary, if weaken the final gender prediction result of the training user;
Otherwise, keep the final gender prediction result intensity of the training user constant;
Mode two: by the corresponding final gender prediction result of current predictive time window and at least one predicted time window before
Corresponding final gender prediction result is matched are as follows: by the corresponding final gender prediction result of current predictive time window and before
The corresponding final gender prediction result of all predicted time windows is matched;
Adjusting the final gender prediction result of the training user according to matching result includes: by training user described each pre-
The most final gender prediction result of frequency of occurrence is as the current newest final gender prediction of the training user in survey time window
As a result.
As it can be seen that the present embodiment can also repeatedly merge obtained prediction result by way of above-mentioned time slip-window
Amendment, further to promote the accuracy of user's gender prediction's result.
Embodiment two:
The full dose user in predicting model and at least one component that the present embodiment is obtained based on embodiment one by training process
User in predicting model, the process for carrying out gender prediction to user to be predicted illustrate.It is shown in Figure 3, this implementation
Example provide a kind of user gender prediction method include:
S301: the characteristic that do not predict for progressive of each preset kind data of each user to be predicted is extracted.
Preset kind data in the present embodiment can be in the training process set categorical data, for example including but
It is not limited to the internet records data of preset web in training process, uses habit using record data, communicating data and online
At least one of used data.When including internet records data or using record data, internet records data can be directed to
Or corresponding component user in predicting model is established respectively using the characteristic of record data.
S302: the characteristic of extraction is substituted into preset full dose user in predicting model and obtains each user's to be predicted
Full dose gender prediction as a result, and by each characteristic of at least one preset kind data substitute into respectively the preset kind data it
The corresponding default component user in predicting model of each characteristic obtains the component gender of at least one user to be predicted
Prediction result.
Full dose user in predicting model in the present embodiment is in the training process according to each default class of training user
The characteristic training of type data obtains, and component user in predicting model is in the training process according to the training user
Each characteristic training of at least one preset kind data obtains.Referring specifically to shown in embodiment one, details are not described herein.
S303: fusion full dose prediction result and component prediction result obtain the final gender prediction knot of each user to be predicted
Fruit.
In a kind of example, when preset kind data include the internet records data of preset web in training process, extraction
The characteristic of internet records data includes that at least one characteristic predicted for male gender is directed to female at least one
At least one for the characteristic that property gender is predicted;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling website and each goal-selling website
The amount of asking;Each goal-selling website is that in the training process, access preset targeted website male's accounting value is more than or equal to described
Male's accounting value in training user, the targeted website in a characteristic include access website male accounting value in default male
Ratio value divides each website in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling website and each goal-selling website
The amount of asking;Each goal-selling website is that in the training process, access preset targeted website women accounting value is more than or equal to described
Women accounting value in training user;
At this point, each characteristic of at least one preset kind data to be substituted into each feature of the preset kind data respectively
The corresponding default component user in predicting model of data includes:
Each characteristic for including by internet records data substitutes into each characteristic of the internet records data respectively respectively
Corresponding default component user in predicting model.
In another example, preset kind data include in training process preset application using record data;It answers
Characteristic with usage record data includes that at least one is directed to for the characteristic of male gender prediction at least one
The characteristic that female gender is predicted;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling application and the application of each goal-selling
The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to training using male's accounting value that goal-selling is applied and use
Male's accounting value in family, the target application in a characteristic include male's accounting value using application in default masculinity proportion
Value divides each application in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling application and the application of each goal-selling
The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to training using the women accounting value that goal-selling is applied and use
Women accounting value in family;
Each characteristic of at least one preset kind data is substituted into each characteristic of the preset kind data respectively
Corresponding default component user in predicting model includes:
Each characteristic for including using record data is substituted into respectively should be using each feature of record data
The corresponding default component user in predicting model of data.
In a kind of example, preset kind data may also include at least one in communicating data and online use habit data
Kind;
Wherein the characteristic of communicating data includes contact person's number, is called total duration, and caller total duration is called number,
Caller number, converse total degree, be called number and converse total degree quotient and caller number and call total degree quotient in
It is at least one;
The characteristic of online use habit data includes the online probability of each default online measurement period, and each
At least one of the comentropy surfed the Internet in internet information measurement period.
In a kind of example of the present embodiment, full dose prediction result and component prediction result are the gender of each user to be predicted
Probability value;At this point, fusion full dose prediction result and component prediction result obtain the final gender prediction result of each user to be predicted
It is shown in Figure 4, comprising:
S401: being directed to each user to be predicted, obtains the gender of the full dose user in predicting model output of the user to be predicted
The gender probability value of probability value and the output of each component user in predicting model;
S402: the mean value for calculating the gender probability value got obtains gender prediction's probability value;
S403: the male's probability threshold value and women probability threshold value being arranged in gender prediction's probability value and training process
It is compared, obtains the final gender prediction's result of the user to be predicted.
In another example of the present embodiment, full dose prediction result and component prediction result are each to be predicted to characterize respectively
User is male, neutrality, the 1 of women, 0, -1 ident value;Merge at this time full dose prediction result and component prediction result obtain respectively to
Predict that the final gender prediction result of user is shown in Figure 5, comprising:
S501: being directed to each user to be predicted, obtains the mark of the full dose user in predicting model output of the user to be predicted
The product of value and the predictablity rate of the full dose user in predicting model, and obtain each component user in predicting model of user to be predicted
The product of the predictablity rate of the ident value of output and each component user in predicting model;The predictablity rate of full dose user in predicting model
It is obtained in the training process according to the output of each model and training data comparison with the predictablity rate of each component user in predicting model
It takes;
S502: it calculates the sum of each product got and obtains gender prediction's probability value;
S503: the male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and training process are carried out
Compare, obtains the final gender prediction's result of the user to be predicted.
In one implementation, setting male's probability threshold value is greater than the women probability threshold value;At this point, by gender prediction
The male's probability threshold value and women probability threshold value being arranged in probability value and training process are compared, and obtain the user to be predicted most
Whole gender prediction's result includes:
When gender prediction's probability value is greater than male's probability threshold value, the final gender prediction of corresponding user to be predicted is determined
It as a result is male;
When gender prediction's probability value is less than the women probability threshold value, the final gender of corresponding user to be predicted is determined
Prediction result is women;
When gender prediction's probability value is more than or equal to the women probability threshold value, is less than or equal to male's probability threshold value, determine
The final gender prediction's result of corresponding user to be predicted is neutrality.
In the present embodiment, the feature that do not predict for progressive of each preset kind data of each user to be predicted is extracted
It further include judging whether preset gender prediction's time window reaches before data;Extract each preset kind of each user to be predicted
The characteristic that do not predict for progressive of data is the characteristic extracted in current gender predicted time window;
Fusion full dose prediction result and the component prediction result obtain the final gender prediction of each user to be predicted
As a result after, further includes:
For each user to be predicted, by the corresponding final gender prediction result of current predictive time window and before at least one
The corresponding final gender prediction result of a predicted time window is matched, and adjusts the final of the user to be predicted according to matching result
Gender prediction's result.
The corresponding final gender prediction result of current predictive time window is corresponding at least one predicted time window before
Final gender prediction result is matched are as follows: when by the corresponding final gender prediction result of current predictive time window with upper one prediction
Between the corresponding final gender prediction result of window matched;
Include: according to the final gender prediction result that matching result adjusts the user to be predicted at this time
If matching result is that gender is identical, enhance the final gender prediction result of the user to be predicted;
If matching result be gender on the contrary, if weaken the final gender prediction result of the user to be predicted;
Otherwise, keep the final gender prediction result intensity of the user to be predicted constant;
Or,
The corresponding final gender prediction result of current predictive time window is corresponding at least one predicted time window before
Final gender prediction result is matched are as follows: by the corresponding final gender prediction result of current predictive time window with it is all before pre-
The corresponding final gender prediction result of time window is surveyed to be matched;
The final gender prediction result of the user to be predicted is adjusted according to matching result at this time can include: by user to be predicted
The most final gender prediction result of frequency of occurrence is currently newest as the user to be predicted in each predicted time window
Final gender prediction result.
As it can be seen that scheme provided in this embodiment at least has following advantages:
1, using full dose user in predicting model+component user in predicting Model Fusion method, the prediction of model can be improved just
True rate.After tested, male to female ratio is 2:1 in telecommunication user, and women accounting is 32%, if just with the prediction of commonsense method women
True rate is less than 40%, and after built-up pattern, the prediction accuracy of women be can be improved to 58%.Overall accuracy is promoted from 72%
Value 80%.
2, after carrying out dimension reduction method dimensionality reduction using sex ratio, model training speed is improved.Characteristic dimension is substantially reduced, and
More information is saved as far as possible, substantially reduces mode input matrix dimensionality, improves model training speed.
3, by introducing time window concept, reducing the ups and downs of website and app class data and changing caused by model not
Benefit influences, the final accuracy for improving model.
Embodiment three:
A kind of user gender prediction device is present embodiments provided, it is shown in Figure 6, comprising:
Data extraction module 61 is not predicted for extracting the progressive that is used for of each preset kind data of each user to be predicted
Characteristic;Preset kind data in the present embodiment can be categorical data set in the training process, such as wrap
It includes but is not limited to the internet records data of preset web in training process, make using record data, communicating data and online
With at least one of habit data.When including internet records data or using record data, internet records can be directed to
Data establish corresponding component user in predicting model using the characteristic of record data respectively.
Model processing modules 62, the characteristic for extracting data extraction module substitute into preset full dose user in predicting
Model obtains the full dose gender prediction of each user to be predicted as a result, and dividing each characteristic of at least one preset kind data
The corresponding default component user in predicting model of each characteristic for not substituting into the preset kind data obtains at least one and waits for
Predict component gender prediction's result of user;Full dose user in predicting model in the present embodiment is in the training process according to training
The characteristic training of each preset kind data of user obtains, and component user in predicting model is basis in the training process
Each characteristic training of at least one preset kind data of the training user obtains.Referring specifically to one institute of embodiment
Show, details are not described herein.
It predicts processing module 63, obtains each user to be predicted most for merging full dose prediction result and component prediction result
Whole gender prediction's result.
In a kind of example, when preset kind data include the internet records data of preset web in training process, count at this time
The characteristic for the internet records data extracted according to extraction module 61 includes the characteristic that at least one is directed to male gender prediction
According at least one for the characteristic predicted at least one for female gender;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling website and each goal-selling website
The amount of asking;Each goal-selling website is that in the training process, access preset targeted website male's accounting value is more than or equal to described
Male's accounting value in training user, the targeted website in a characteristic include access website male accounting value in default male
Ratio value divides each website in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling website and each goal-selling website
The amount of asking;Each goal-selling website is that in the training process, access preset targeted website women accounting value is more than or equal to described
Women accounting value in training user;
At this point, each characteristic of at least one preset kind data is substituted into the default class by model processing modules 62 respectively
The corresponding default component user in predicting model of each characteristic of type data includes:
Each characteristic for including by internet records data substitutes into each characteristic of the internet records data respectively respectively
Corresponding default component user in predicting model.
In another example, preset kind data include in training process preset application using record data;This
When the characteristic using record data extracted of data extraction module 61 include at least one for male gender prediction
Characteristic and at least one be directed to characteristic for being predicted of female gender;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling application and the application of each goal-selling
The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to training using male's accounting value that goal-selling is applied and use
Male's accounting value in family, the target application in a characteristic include male's accounting value using application in default masculinity proportion
Value divides each application in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling application and the application of each goal-selling
The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to training using the women accounting value that goal-selling is applied and use
Women accounting value in family;
At this point, each characteristic of at least one preset kind data is substituted into the default class by model processing modules 62 respectively
The corresponding default component user in predicting model of each characteristic of type data includes:
Each characteristic for including using record data is substituted into respectively should be using each feature of record data
The corresponding default component user in predicting model of data.
In a kind of example, preset kind data may also include at least one in communicating data and online use habit data
Kind;
The characteristic for the communicating data that wherein data extraction module 61 is extracted includes contact person's number, is called total duration,
Caller total duration, is called number, caller number, and total degree of conversing is called quotient and the caller number of number and total degree of conversing
With at least one of the quotient of call total degree;
The characteristic for the online use habit data that data extraction module 61 is extracted includes each default online statistics week
At least one of the comentropy surfed the Internet in the online probability of phase and each internet information measurement period.
In a kind of example of the present embodiment, full dose prediction result and component prediction result are the gender of each user to be predicted
Probability value;At this point, prediction processing module 63 merges full dose prediction result and component prediction result obtains each user to be predicted most
Whole gender prediction's result includes:
Predict that processing module 63 is directed to each user to be predicted, the full dose user in predicting model for obtaining the user to be predicted is defeated
The gender probability value of gender probability value and the output of each component user in predicting model out;
The mean value that prediction processing module 63 calculates the gender probability value got obtains gender prediction's probability value;
Predict processing module 63 by the male's probability threshold value being arranged in gender prediction's probability value and training process and female
Property probability threshold value is compared, and obtains the final gender prediction's result of the user to be predicted.
In another example of the present embodiment, full dose prediction result and component prediction result are each to be predicted to characterize respectively
User is male, neutrality, the 1 of women, 0, -1 ident value;Prediction processing module 63 merges full dose prediction result at this time and component is pre-
It surveys result and obtains the final gender prediction result of each user to be predicted and include:
Predict that processing module 63 is directed to each user to be predicted, the full dose user in predicting model for obtaining the user to be predicted is defeated
The product of the predictablity rate of ident value and the full dose user in predicting model out, and obtain each component user of user to be predicted
The product of the predictablity rate of the ident value and each component user in predicting model of prediction model output;Full dose user in predicting model it is pre-
The predictablity rate of accuracy rate and each component user in predicting model is surveyed in the training process according to the output of each model and training number
It is obtained according to comparing;
Prediction processing module 63 calculates the sum of each product got and obtains gender prediction's probability value;
Predict that processing module 63 is general by the male's probability threshold value and women being arranged in gender prediction's probability value and training process
Rate threshold value is compared, and obtains the final gender prediction's result of the user to be predicted.
In one implementation, setting male's probability threshold value is greater than the women probability threshold value;At this point, prediction processing mould
The male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and training process are compared by block 63, are obtained
The final gender prediction's result of the user to be predicted includes:
When gender prediction's probability value is greater than male's probability threshold value, the final gender prediction of corresponding user to be predicted is determined
It as a result is male;
When gender prediction's probability value is less than the women probability threshold value, the final gender of corresponding user to be predicted is determined
Prediction result is women;
When gender prediction's probability value is more than or equal to the women probability threshold value, is less than or equal to male's probability threshold value, determine
The final gender prediction's result of corresponding user to be predicted is neutrality.
In the present embodiment, data extraction module 61 extract each preset kind data of each user to be predicted for carrying out
It further include judging whether preset gender prediction's time window reaches before the characteristic of gender prediction;Extract each use to be predicted
The characteristic that do not predict for progressive of each preset kind data at family is the spy extracted in current gender predicted time window
Levy data;
Prediction processing module 63 merges full dose prediction result and component prediction result obtains each user to be predicted most
After whole gender prediction's result, further includes:
It predicts that processing module 63 is directed to each user to be predicted, the corresponding final gender prediction of current predictive time window is tied
Fruit final gender prediction result corresponding at least one predicted time window before is matched, should be to according to matching result adjustment
Predict the final gender prediction result of user.
Predict processing module 63 by the corresponding final gender prediction result of current predictive time window at least one is pre- before
It surveys the corresponding final gender prediction result of time window to be matched are as follows: tie the corresponding final gender prediction of current predictive time window
Fruit final gender prediction result corresponding with upper predicted time window is matched;
Prediction processing module 63 includes: according to the final gender prediction result that matching result adjusts the user to be predicted at this time
If matching result is that gender is identical, enhance the final gender prediction result of the user to be predicted;
If matching result be gender on the contrary, if weaken the final gender prediction result of the user to be predicted;
Otherwise, keep the final gender prediction result intensity of the user to be predicted constant;
Or,
Predict processing module 63 by the corresponding final gender prediction result of current predictive time window at least one is pre- before
It surveys the corresponding final gender prediction result of time window to be matched are as follows: tie the corresponding final gender prediction of current predictive time window
Fruit final gender prediction result corresponding with all predicted time windows before is matched;
Prediction processing module 63 can be wrapped according to the final gender prediction result that matching result adjusts the user to be predicted at this time
Include: using user to be predicted, the most final gender prediction result of frequency of occurrence is to be predicted as this in each predicted time window
The current newest final gender prediction result of user.
The function of above-mentioned each module in the present embodiment can be realized by the processor of user's gender prediction's device.This reality
The user's gender prediction's device for applying example offer uses full dose user in predicting model+component user in predicting Model Fusion method, can
Improve the prediction accuracy of model.After carrying out dimension reduction method dimensionality reduction using sex ratio simultaneously, model training speed is improved.Significantly
Characteristic dimension is reduced, and saves more information as far as possible, substantially reduces mode input matrix dimensionality, improves model training speed.
Additionally by time window concept is introduced, the ups and downs of website and app class data are reduced and change adversely affecting caused by model,
The final accuracy for improving model.
Example IV:
A kind of user gender prediction equipment is present embodiments provided, it is shown in Figure 7 comprising processor 71, memory
72 and communication bus 73;
Communication bus 73 is for realizing the connection communication between processor 71 and memory 73;
Processor 72 is for executing the user gender prediction program stored in memory 73 to realize as in embodiment one, two
User's gender prediction's method in step.And it should be understood that user's gender prediction's equipment in the present embodiment can be
The server of each operator's setting, is also possible to other equipment.
In order to make it easy to understand, the present embodiment combines two kinds of concrete implementation modes to be illustrated.
Example one:
Realize that the process of user gender prediction is as follows in the example:
Collection selection:
Gender prediction's time window is set, is analyzed by data, this example uses the bimestrial internet records of user, app
Using, call details, online habit data.
Gender prediction's time window sliding, it is ensured that window does not have coincidence.Such as window be 2017-04-01 extremely
2017-05-31, then previous window is 2017-02-01 to 2017-03-31, and the latter window is 2017-06-01 to 2017-
07-31。
Extract feature:
The all user data convergence of the time dimension of data is day data (an i.e. daily data), then converges again and is
Moon data (i.e. a monthly data), are then converged again as two months data (i.e. two months datas).
Call details feature extraction.The characteristic mainly extracted includes but is not limited to: contact person's number, when being called total
Long, caller total duration is called number, and caller number is called number/call total degree, caller number/call total degree.
Net habit data characteristics is extracted.The feature mainly extracted includes but is not limited to: 24 hours each hour (online statistics
Period is a hour) online probability, the comentropy of daily (internet information measurement period) online (portrays user's surf time
Degree of scatter).
Internet records data characteristics is extracted, and this feature processing method is referring to following procedure:
Host (reaction network address is only intercepted to website url (uniform resource locator, uniform resource locator)
The address ip of url and host name) part, it is secondary to the access in detail to the website of host rank to obtain all samples in training set
Number, i.e. user-website visiting degree matrix, are denoted as user-host matrix;
Masculinity proportion in each website is counted, website and masculinity proportion data are obtained, the identical website of masculinity proportion is drawn
It is divided into same class website, masculinity proportion is accurate to 0.001, obtains 1000 website classes in this way, is denoted as host-set, in the set
There are 1000 elements, first value of each element is male's accounting (three are accurate to after decimal point), remaining element is the male
Website name under ratio;
Each column in user-host matrix are standardized, i.e., the data standard of each website is turned to variance is 0,
Value is 1.Then it will be that website data after of a sort standardization is summed in host-set, obtain -1000 class website of user spy
Matrix is levied, user-hostset matrix is denoted as.
Assuming that male's accounting 68% in data set, women accounting 32%, following operation obtain 6 data sets:
It selects user number in user-host matrix and is greater than 5 less than 500, and male's accounting is greater than 90% in host-set
Website, then select, browse the user of such website, user as user-web matrix value, obtains the amount of access of website
Dataset1 (i.e. first characteristic of internet records data);
It selects user number in user-host matrix and is more than or equal to 500 less than 5000, and male's accounting is greater than 80% net
It stands, then selects the user for browsing such website, user as user-web matrix value, obtains the amount of access of website
Dataset2 (i.e. second characteristic of internet records data);
User number is selected greater than 5000, and male's accounting is greater than 75% website, then selects the use for browsing such website
Family, user, as user-web matrix value, obtain dataset3 (the i.e. third of internet records data to the amount of access of website
A characteristic);
It selects user number in user-host matrix and is less than 500, and women accounting is greater than 75% net in host-set
It stands, then selects, browse the user of such website, user as user-web matrix value, obtains the amount of access of website
Dataset4 (i.e. the 4th characteristics of internet records data);
It selects user number in user-host matrix and is more than or equal to 500 less than 5000, and women accounting is greater than 60% net
It stands, then selects the user for browsing such website, user as user-web matrix value, obtains the amount of access of website
Dataset5 (i.e. the 5th characteristics of internet records data);
User number is selected greater than 5000, and women accounting is greater than 50% website, then selects the use for browsing such website
Family, user, as user-web matrix value, obtain dataset6 (i.e. the 6th of internet records data to the amount of access of website
A characteristic).
App is extracted using data characteristics.Characteristic processing method, similar with website treating method:
The identical app of masculinity proportion is divided into same class, male's ratio using men and women's use ratio of every money app by statistics
Example is accurate to 0.01, obtains 1000 app classes in this way, is denoted as app-set, there is 1000 elements in the set, each element the
One value is male's accounting (being accurate to 2 significant digits), remaining element is the app name under the masculinity proportion;
User's-app access times matrix is each user to the access times of all app, is denoted as user-app matrix, marks
Each column in standardization user-app matrix, i.e., the data standard of each app is turned to variance is 0, mean value 1.Then will
It is the app data summation after of a sort standardization in app-set, obtains -100 class app eigenmatrix of user, be denoted as user-
Appset matrix.
According to the male to female ratio that user uses the access times of app and every money app, 6 numbers are obtained by following operation
According to collection:
Number of users is selected greater than 5 less than 2000, male's accounting is greater than 90% app, then selects the use using these app
Family obtains user's-app matrix, and the value of the matrix is access times of the user to app, and being denoted as dataset7, (i.e. app uses number
According to first characteristic);
User is selected more than or equal to 1000 less than 5000, male's accounting is greater than 80% app, then selects and use these app
User, obtain user's-app matrix, the value of the matrix is user to the access times of app, and being denoted as dataset8, (i.e. app makes
With second characteristic of data);
User is selected greater than 5000, male's accounting is greater than 75% app, then selects the user using these app, is used
Family-app matrix, the value of the matrix are access times of the user to app, and being denoted as dataset9, (i.e. app is a using the third of data
Characteristic);
User is selected greater than 5 less than 500, women accounting is greater than 70% app, then selects the user using these app, obtains
To user's-app matrix, the value of the matrix is access times of the user to app, and being denoted as dataset10, (i.e. app uses data
4th characteristic);
User is selected more than or equal to 500 less than 3000, women accounting is greater than 60% app, then selects using these app's
User obtains user's-app matrix, and the value of the matrix is access times of the user to app, and being denoted as dataset11, (i.e. app is used
5th characteristic of data);
User is selected greater than 3000, women accounting is greater than 50% app, then selects the user using these app, is used
Family-app matrix, the value of the matrix are access times of the user to app, and being denoted as dataset12, (i.e. app uses the 6th of data
A characteristic).
Model is constructed, male is provided with and is identified as 1, neutrality is identified as 0, and women is identified as -1:
Using each characteristic of said extracted, the eigenmatrix of full dose user is constituted, compares logistic regression, supporting vector
Machine, random forest, GBDT, XGboost model result select XGboost regression algorithm, obtain XGboost model, be denoted as
Model0 (full dose user in predicting model).The learning rate that XGboost is wherein arranged in this example is 0.2, the number of iterations 50, and tree is deep
Degree is 7.Select the probability value for making women accuracy rate reach 55% as women threshold value, selection makes male's accuracy rate reach 80%
Probability value is as male's threshold value.Judge male and female, and neutral user.
It is directed to dataset1-dataset12 respectively, is modeled respectively with XGboost, obtaining 12 models, (component user is pre-
Survey model).It is denoted as model1-model12 respectively, wherein model1-model3, model7-model9 are to predict male
The high male's model of accuracy rate, model4-model6, model10-model12 are the women high to women predictablity rate
Model.The threshold value of each model is set, as a result are as follows: the model accuracy of model1, model4, model7, model10 are very
The model accuracy of height, model2, model5, model8, model11 is taken second place, model3, model6, model9, model12
Model accuracy it is lower relative to first two.
Merge model0, model1-model12.Each model has a gender result output to user, according to general
Model output result seeks weighted sum, and wherein weight is the predictablity rate and each component user in predicting mould of full dose user in predicting model
The predictablity rate of type compares according to the output of each model and training data obtain in the training process.
The relationship for judging final result Yu -1,0,1 determines the gender of user.
Threshold value is adjusted, the optimal threshold for dividing gender is obtained.Since male to female ratio is 68:32 in training sample, because
This given threshold obtains final threshold value so that male to female ratio is consistent with male to female ratio in sample in prediction result.
Sliding time window is handled once the every two moon again, corrects the gender prediction of user as a result, to guarantee user's property
The accuracy that do not predict, if user gender prediction result twice on the contrary, if weaken the final gender prediction of the user to be predicted
As a result, user can be for example classified as to gender bender again, or when being added up using above-mentioned ident value, then subtract 1;If used twice
Family gender prediction's result is identical, then enhances, such as plus 1;Other situations can then remain unchanged.
It is compared, is handled with existing prediction technique in the result that model obtains, the accuracy of women is less than 40% according to test.
After the prediction technique in this example, women accuracy can achieve 58%.Overall accuracy can be promoted to from 72%
80%.
Example two:
Realize that the process of user gender prediction is as follows in the example:
Collection selection:
Gender prediction's time window is set, is analyzed by data, this example uses the trimestral internet records of user, app
Use data.
Gender prediction's time window is arranged to slide, it is ensured that window does not have coincidence.Such as a window is 2017-04-
01 to 2017-06-31, then previous window be 2017-01-01 to 2017-03-31, the latter window be 2017-07-01 extremely
2017-09-31。
Extract feature:
The all user data convergence of the time dimension of data is day data (an i.e. daily data), then converges again and is
Month data, are then converged again as three months data.
Internet records data characteristics is extracted.Characteristic processing method:
The part host is only intercepted to website url, obtains all samples in training set in detail to the website of host rank name
Access times, i.e. user-website visiting degree matrix is denoted as user-host matrix;
Masculinity proportion in each website is counted, website and masculinity proportion data are obtained, the identical website of masculinity proportion is drawn
It is divided into same class website, masculinity proportion is accurate to 0.001, obtains 1000 website classes in this way, is denoted as host-set, in the set
There are 1000 elements, first value of each element is male's accounting (three are accurate to after decimal point), remaining element is the male
Website name under ratio;
Each column in user-host matrix are standardized, i.e., the data standard of each website is turned to variance is 0,
Value is 1.Then it will be that website data after of a sort standardization is summed in host-set, obtain -1000 class website of user spy
Matrix is levied, user-hostset matrix is denoted as.
Assuming that male's accounting 65% in data set, women accounting 35%, following operation obtain 4 data sets:
It selects user number in user-host matrix and is greater than 5 less than 3000, and male's accounting is greater than 85% in host-set
Website, then select, browse the user of such website, user as user-web matrix value, obtains the amount of access of website
Dataset1 (i.e. first characteristic of internet records data);
It selects user number in user-host matrix and is more than or equal to 3000, and male's accounting is greater than 75% website, then selects
The user of such website is browsed out, and user, as user-web matrix value, obtains dataset2 (i.e. to the amount of access of website
Second characteristic of internet records data).
It selects user number in user-host matrix and is less than 2000, and women accounting is greater than 75% net in host-set
It stands, then selects, browse the user of such website, user as user-web matrix value, obtains the amount of access of website
Dataset3 (i.e. the third characteristics of internet records data);
It selects user number in user-host matrix and is more than or equal to 2000, and women accounting is greater than 50% website, then selects
The user of such website is browsed out, and user, as user-web matrix value, obtains dataset4 (i.e. to the amount of access of website
4th characteristic of internet records data).
App is extracted using data characteristics.Characteristic processing method, similar with website treating method:
The identical app of masculinity proportion is divided into same class, male's ratio using men and women's use ratio of every money app by statistics
Example is accurate to 0.01, obtains 1000 app classes in this way, is denoted as app-set, there is 1000 elements in the set, each element the
One value is male's accounting (being accurate to 2 significant digits), remaining element is the app name under the masculinity proportion;
User's-app access times matrix is each user to the access times of all app, is denoted as user-app matrix, marks
Each column in standardization user-app matrix, i.e., the data standard of each app is turned to variance is 0, mean value 1.Then will
It is the app data summation after of a sort standardization in app-set, obtains -100 class app eigenmatrix of user, be denoted as user-
Appset matrix.
According to the male to female ratio that user uses the access times of app and every money app, 4 data are obtained by following operation
Collection:
Number of users is selected greater than 5 less than 5000, male's accounting is greater than 85% app, then selects the use using these app
Family obtains user's-app matrix, and the value of the matrix is access times of the user to app, and being denoted as dataset5, (i.e. app uses number
According to first characteristic);
User is selected greater than 5000, male's accounting is greater than 75% app, then selects the user using these app, is used
Family-app matrix, the value of the matrix are access times of the user to app, and being denoted as dataset6, (i.e. app uses second of data
Characteristic);
User is selected greater than 5 less than 2000, women accounting is greater than 70% app, then selects the user using these app,
User's-app matrix is obtained, the value of the matrix is access times of the user to app, and being denoted as dataset7, (i.e. app uses data
Third characteristic);
User is selected greater than 2000, women accounting is greater than 50% app, then selects the user using these app, is used
Family-app matrix, the value of the matrix are access times of the user to app, and being denoted as dataset8, (i.e. app uses the 4th of data
Characteristic).
Model is constructed, wherein male's mark is still 1, and neutrality is identified as 0, and women is identified as -1:
Using the characteristic of said extracted, constitute the eigenmatrix of full dose user, comparison logistic recurrence, svm, with
Machine forest, GBDT, XGboost model result select XGboost regression algorithm, obtain XGboost model, and it is (complete to be denoted as model0
Measure user in predicting model).Wherein the learning rate of XGboost is 0.3, the number of iterations 100, and tree depth is 8.Obtaining user is male
Probability value.
It is directed to dataset1-dataset8 respectively, is modeled respectively with XGboost, obtains 8 model (component user in predicting
Model).It is denoted as model1-model8 respectively, obtains the probability value that user is male.
Merge model, model1-model8.Each model has a gender probability of outcome output to user, by 9
The result average value of model obtains gender prediction's probability value of user.
Threshold value is adjusted, the optimal threshold for dividing gender is obtained.Since male to female ratio is 65:35 in training sample, because
This given threshold obtains final threshold value so that male to female ratio is consistent with male to female ratio in sample in prediction result.
Sliding time window, every three months are handled once again, correct the gender prediction of user as a result, to guarantee user's property
The accuracy that do not predict selects user to be determined the most gender of number as final gender.Male is judged as when there is user
When woman's number is the same, then the settable user is classified as gender bender, wouldn't judge.
The present invention uses full dose user in predicting model+component user in predicting Model Fusion method, and the pre- of model can be improved
Survey accuracy.After tested, male to female ratio is 2:1 in telecommunication user, and women accounting is 32%, if pre- with commonsense method women
Accuracy is surveyed less than 40%, after built-up pattern, the prediction accuracy of women be can be improved to 58%.Overall accuracy is from 72%
Lifting values 80%.
In addition after the present invention carries out dimension reduction method dimensionality reduction using sex ratio, model training speed is improved.Substantially reduce spy
Dimension is levied, and saves more information as far as possible, substantially reduces mode input matrix dimensionality, improves model training speed.
Meanwhile the present invention is by introducing time window concept, reducing the ups and downs of website and app class data and changing to model
Caused by adversely affect, the final accuracy for improving model.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, method of element, article or device.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that an application apparatus (can be mobile phone, computer, clothes
Be engaged in device, air conditioner or network application apparatus etc.) method that executes each embodiment of the present invention.
The above content is combining specific embodiment to be further described to made by the embodiment of the present invention, cannot recognize
Fixed specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs,
Without departing from the inventive concept of the premise, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to the present invention
Protection scope.
Claims (11)
1. a kind of user gender prediction method, which comprises
Extract the characteristic that do not predict for progressive of each preset kind data of each user to be predicted;
The characteristic of extraction is substituted into preset full dose user in predicting model and obtains the full dose gender of each user to be predicted
Prediction result, and each characteristic of at least one preset kind data is substituted into each characteristic of the preset kind data respectively
Component gender prediction's result of at least one user to be predicted is obtained according to corresponding default component user in predicting model;
The full dose user in predicting model is in the training process according to the characteristic of each preset kind data of training user
Training obtains, and the component user in predicting model is that at least one is default according to the training user in the training process
Each characteristic training of categorical data obtains;
It merges the full dose prediction result and the component prediction result obtains the final gender prediction of each user to be predicted
As a result.
2. user gender prediction method as described in claim 1, which is characterized in that the preset kind data include the instruction
The internet records data of preset web during white silk;
The characteristic of the internet records data includes the characteristic and at least one that at least one is directed to male gender prediction
At least one of a characteristic predicted for female gender;
It is described it is each for the preset characteristic of male gender include goal-selling website and each goal-selling website visit
The amount of asking;Each goal-selling website is that in the training process, access preset targeted website male's accounting value is more than or equal to described
Male's accounting value in training user, the targeted website in a characteristic include access website male accounting value in default male
Ratio value divides each website in range;
It is described it is each for the preset characteristic of female gender include goal-selling website and each goal-selling website visit
The amount of asking;Each goal-selling website is that in the training process, access preset targeted website women accounting value is more than or equal to described
Women accounting value in training user;
Each characteristic that each characteristic of at least one preset kind data is substituted into the preset kind data respectively
Corresponding default component user in predicting model includes:
Each characteristic that the internet records data include is substituted into each characteristic of the internet records data respectively respectively
Corresponding default component user in predicting model.
3. user gender prediction method as described in claim 1, which is characterized in that the preset kind data include the instruction
Default application using record data during practicing;
The characteristic using record data includes at least one for the characteristic of male gender prediction and extremely
A few characteristic predicted for female gender;
Each preset characteristic of male gender that is directed to includes goal-selling application and the visit that each goal-selling is applied
The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to institute using male's accounting value that goal-selling is applied
Male's accounting value in training user is stated, the target application in a characteristic includes male's accounting value using application default
Masculinity proportion value divides each application in range;
Each preset characteristic of female gender that is directed to includes goal-selling application and the visit that each goal-selling is applied
The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to institute using the women accounting value that goal-selling is applied
State women accounting value in training user;
Each characteristic that each characteristic of at least one preset kind data is substituted into the preset kind data respectively
Corresponding default component user in predicting model includes:
Each characteristic for including using record data is substituted into respectively should be using each feature of record data
The corresponding default component user in predicting model of data.
4. user gender prediction method as claimed in claim 2 or claim 3, which is characterized in that the preset kind data further include
At least one of communicating data and online use habit data;
The characteristic of the communicating data includes contact person's number, is called total duration, and caller total duration is called number, caller
In number, total degree of conversing, the quotient and caller number of called number and total degree of conversing and the quotient of call total degree at least
It is a kind of;
The characteristic of the online use habit data includes the online probability of each default online measurement period, and each
At least one of the comentropy surfed the Internet in internet information measurement period.
5. user gender prediction method as described in claim 1, which is characterized in that the full dose prediction result and the component
Prediction result is the gender probability value of each user to be predicted;
The fusion full dose prediction result and the component prediction result obtain the final gender of each user to be predicted
Prediction result includes:
For each user to be predicted, the gender probability value of the full dose user in predicting model output of the user to be predicted and each is obtained
The gender probability value of component user in predicting model output;
The mean value for calculating the gender probability value got obtains gender prediction's probability value;
The male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and the training process are carried out
Compare, obtains the final gender prediction's result of the user to be predicted.
6. user gender prediction method as described in claim 1, which is characterized in that the full dose prediction result and the component
Prediction result is that characterize each user to be predicted respectively be male, neutrality, the 1 of women, 0, -1 ident value;
The fusion full dose prediction result and the component prediction result obtain the final gender of each user to be predicted
Prediction result includes:
For each user to be predicted, the ident value and the full dose of the full dose user in predicting model output of the user to be predicted are obtained
The product of the predictablity rate of user in predicting model, and obtain the mark of each component user in predicting model output of user to be predicted
The product of value and the predictablity rate of each component user in predicting model;The predictablity rate of the full dose user in predicting model and described
The predictablity rate of each component user in predicting model compares in the training process according to the output of each model and training data
It obtains;
It calculates the sum of each product got and obtains gender prediction's probability value;
The male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and the training process are carried out
Compare, obtains the final gender prediction's result of the user to be predicted.
7. such as user gender prediction method described in claim 5 or 6, which is characterized in that male's probability threshold value is greater than institute
State women probability threshold value;
The male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and the training process are carried out
Compare, obtaining the final gender prediction's result of the user to be predicted includes:
When gender prediction's probability value is greater than male's probability threshold value, the final gender of corresponding user to be predicted is determined
Prediction result is male;
When gender prediction's probability value is less than the women probability threshold value, the final gender of corresponding user to be predicted is determined
Prediction result is women;
When gender prediction's probability value is more than or equal to the women probability threshold value, is less than or equal to male's probability threshold value,
Determine the final gender prediction's result of corresponding user to be predicted for neutrality.
8. such as the described in any item user gender prediction methods of claim 1-3,5-6, which is characterized in that the extraction is respectively to pre-
It surveys before the characteristic that do not predicted for progressive of each preset kind data of user, further includes judging that preset gender is pre-
Survey whether time window reaches;The feature that do not predicted for progressive of each preset kind data for extracting each user to be predicted
Data are the characteristic extracted in current gender predicted time window;
The fusion full dose prediction result and the component prediction result obtain the final gender of each user to be predicted
After prediction result, further includes:
For each user to be predicted, by the corresponding final gender prediction result of current predictive time window and before at least one
The corresponding final gender prediction result of a predicted time window is matched, and adjusts the final of the user to be predicted according to matching result
Gender prediction's result.
9. user gender prediction method as claimed in claim 8, which is characterized in that described that current predictive time window is corresponding
Final gender prediction result final gender prediction result corresponding at least one predicted time window before is matched are as follows: will be worked as
The corresponding final gender prediction result of preceding predicted time window final gender prediction result corresponding with upper predicted time window carries out
Matching;
The final gender prediction result for adjusting the user to be predicted according to matching result includes:
If the matching result is that gender is identical, enhance the final gender prediction result of the user to be predicted;
If the matching result be gender on the contrary, if weaken the final gender prediction result of the user to be predicted;
Otherwise, keep the final gender prediction result intensity of the user to be predicted constant;
Or,
It is described that the corresponding final gender prediction result of current predictive time window is corresponding at least one predicted time window before
Final gender prediction result is matched are as follows: by the corresponding final gender prediction result of current predictive time window with it is all before pre-
The corresponding final gender prediction result of time window is surveyed to be matched;
The final gender prediction result that the user to be predicted is adjusted according to matching result includes: by user to be predicted described
The most final gender prediction result of frequency of occurrence is as the current newest lastness of the user to be predicted in each predicted time window
Other prediction result.
10. a kind of user gender prediction device, comprising:
Data extraction module, the feature that do not predicted for progressive of each preset kind data for extracting each user to be predicted
Data;
Model processing modules, the characteristic for extracting the data extraction module substitute into preset full dose user in predicting mould
Type obtains the full dose gender prediction of each user to be predicted as a result, and by each characteristic of at least one preset kind data
The corresponding default component user in predicting model of each characteristic for substituting into the preset kind data respectively obtains at least one
Component gender prediction's result of the user to be predicted;The full dose user in predicting model is to be used in the training process according to training
The characteristic training of each preset kind data at family obtains, and the component user in predicting model is root in the training process
It is obtained according to each characteristic training of at least one preset kind data described in the training user;
It predicts processing module, obtains each use to be predicted for merging the full dose prediction result and the component prediction result
The final gender prediction result at family.
11. a kind of user gender prediction equipment, comprising: processor, memory and communication bus;
The communication bus is for realizing the connection communication between the processor and the memory;
The processor is used to execute the user gender prediction program stored in the memory to realize that claim 1-9 such as appoints
Step in user's gender prediction's method described in one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710507593.4A CN109145932A (en) | 2017-06-28 | 2017-06-28 | User's gender prediction's method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710507593.4A CN109145932A (en) | 2017-06-28 | 2017-06-28 | User's gender prediction's method, device and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109145932A true CN109145932A (en) | 2019-01-04 |
Family
ID=64803046
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710507593.4A Withdrawn CN109145932A (en) | 2017-06-28 | 2017-06-28 | User's gender prediction's method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109145932A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111143441A (en) * | 2019-12-30 | 2020-05-12 | 北京每日优鲜电子商务有限公司 | Gender determination method, device, equipment and storage medium |
CN112825178A (en) * | 2019-11-21 | 2021-05-21 | 北京沃东天骏信息技术有限公司 | Method and device for predicting user gender portrait |
CN113806656A (en) * | 2020-06-17 | 2021-12-17 | 华为技术有限公司 | Method, apparatus and computer readable medium for determining characteristics of a user |
US11694059B2 (en) | 2019-09-12 | 2023-07-04 | Samsung Electronics Co., Ltd. | Method, apparatus, electronic device and storage medium for predicting user attribute |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102262440A (en) * | 2010-06-11 | 2011-11-30 | 微软公司 | Multi-modal gender recognition |
CN104331404A (en) * | 2013-07-22 | 2015-02-04 | 中国科学院深圳先进技术研究院 | A user behavior predicting method and device based on net surfing data of a user's cell phone |
CN106484762A (en) * | 2015-08-27 | 2017-03-08 | 优像数位媒体科技股份有限公司 | Method for predicting gender by using webpage browsing behavior |
CN106528745A (en) * | 2016-10-27 | 2017-03-22 | 北京奇虎科技有限公司 | Method and device for recommending resources on mobile terminal, and mobile terminal |
CN106682686A (en) * | 2016-12-09 | 2017-05-17 | 北京拓明科技有限公司 | User gender prediction method based on mobile phone Internet-surfing behavior |
CN106897727A (en) * | 2015-12-21 | 2017-06-27 | 百度在线网络技术(北京)有限公司 | A kind of user's gender identification method and device |
-
2017
- 2017-06-28 CN CN201710507593.4A patent/CN109145932A/en not_active Withdrawn
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102262440A (en) * | 2010-06-11 | 2011-11-30 | 微软公司 | Multi-modal gender recognition |
CN104331404A (en) * | 2013-07-22 | 2015-02-04 | 中国科学院深圳先进技术研究院 | A user behavior predicting method and device based on net surfing data of a user's cell phone |
CN106484762A (en) * | 2015-08-27 | 2017-03-08 | 优像数位媒体科技股份有限公司 | Method for predicting gender by using webpage browsing behavior |
CN106897727A (en) * | 2015-12-21 | 2017-06-27 | 百度在线网络技术(北京)有限公司 | A kind of user's gender identification method and device |
CN106528745A (en) * | 2016-10-27 | 2017-03-22 | 北京奇虎科技有限公司 | Method and device for recommending resources on mobile terminal, and mobile terminal |
CN106682686A (en) * | 2016-12-09 | 2017-05-17 | 北京拓明科技有限公司 | User gender prediction method based on mobile phone Internet-surfing behavior |
Non-Patent Citations (3)
Title |
---|
李源昊等: "面向移动社会网络的用户年龄与性别特征识别", 《计算机应用》 * |
马莉婷: "数据挖掘技术在客户精细营销预测模型中的应用――以移动通信业务为例", 《闽江学院学报》 * |
黄关维: "一种用于说话人性别鉴定的混合算法", 《现代计算机(专业版)》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11694059B2 (en) | 2019-09-12 | 2023-07-04 | Samsung Electronics Co., Ltd. | Method, apparatus, electronic device and storage medium for predicting user attribute |
CN112825178A (en) * | 2019-11-21 | 2021-05-21 | 北京沃东天骏信息技术有限公司 | Method and device for predicting user gender portrait |
CN111143441A (en) * | 2019-12-30 | 2020-05-12 | 北京每日优鲜电子商务有限公司 | Gender determination method, device, equipment and storage medium |
CN113806656A (en) * | 2020-06-17 | 2021-12-17 | 华为技术有限公司 | Method, apparatus and computer readable medium for determining characteristics of a user |
CN113806656B (en) * | 2020-06-17 | 2024-04-26 | 华为技术有限公司 | Method, apparatus and computer readable medium for determining characteristics of a user |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP4080889A1 (en) | Anchor information pushing method and apparatus, computer device, and storage medium | |
CN109145932A (en) | User's gender prediction's method, device and equipment | |
CN109670940A (en) | Credit Risk Assessment Model generation method and relevant device based on machine learning | |
CN104573304A (en) | User property state assessment method based on information entropy and cluster grouping | |
CN110417607B (en) | Flow prediction method, device and equipment | |
CN107517481A (en) | A kind of load of base station balanced management method and system | |
CN109857935A (en) | A kind of information recommendation method and device | |
CN108960505A (en) | Quantitative estimation method, device, system and the storage medium of personal finance credit | |
CN109756632B (en) | Fraud telephone analysis method based on multidimensional time sequence | |
CN109670962A (en) | Finance product method for pushing, device, equipment and storage medium based on big data | |
CN108629379A (en) | A kind of individual's reference appraisal procedure and system | |
CN106874416A (en) | Seniority among brothers and sisters list generation method and ranking list single generating device | |
CN112633962A (en) | Service recommendation method and device, computer equipment and storage medium | |
CN112785005B (en) | Multi-objective task assistant decision-making method and device, computer equipment and medium | |
CN110288350A (en) | User's Value Prediction Methods, device, equipment and storage medium | |
CN107832291A (en) | Client service method, electronic installation and the storage medium of man-machine collaboration | |
CN109428760B (en) | User credit evaluation method based on operator data | |
CN110415036A (en) | Determination method, apparatus, computer equipment and the storage medium of user gradation | |
CN105790866B (en) | Base station rankings method and device | |
CN112685639A (en) | Activity recommendation method and device, computer equipment and storage medium | |
CN109754135A (en) | Behavior of credit data processing method, device, storage medium and computer equipment | |
CN112200375B (en) | Prediction model generation method, prediction model generation device, and computer-readable medium | |
CN108596120A (en) | A kind of object detection method and device based on deep learning | |
CN107016460A (en) | User changes planes Forecasting Methodology and device | |
CN116976739A (en) | Cloud computing product demand priority ordering method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190104 |