CN109145932A - User's gender prediction's method, device and equipment - Google Patents

User's gender prediction's method, device and equipment Download PDF

Info

Publication number
CN109145932A
CN109145932A CN201710507593.4A CN201710507593A CN109145932A CN 109145932 A CN109145932 A CN 109145932A CN 201710507593 A CN201710507593 A CN 201710507593A CN 109145932 A CN109145932 A CN 109145932A
Authority
CN
China
Prior art keywords
user
gender
predicted
characteristic
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710507593.4A
Other languages
Chinese (zh)
Inventor
许雪敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201710507593.4A priority Critical patent/CN109145932A/en
Publication of CN109145932A publication Critical patent/CN109145932A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of user gender prediction method, device and equipments, when carrying out user gender prediction, extract the characteristic that do not predict for progressive of each preset kind data of each user to be predicted;The characteristic of extraction is substituted into preset full dose user in predicting model and obtains the full dose gender prediction of each user to be predicted as a result, and the corresponding default component user in predicting model of each characteristic that each characteristic of at least one preset kind data substitutes into the preset kind data respectively is obtained component gender prediction's result of at least one user to be predicted;Then full dose prediction result and component prediction result are merged to obtain the final gender prediction result of each user to be predicted.Final prediction result when present invention progress user gender prediction in conjunction with the output result of full dose user in predicting model and component user in predicting model as user's gender, can largely promote the accuracy rate of gender prediction.

Description

User's gender prediction's method, device and equipment
Technical field
The present invention relates to the communications field more particularly to a kind of user gender prediction method, device and equipments.
Background technique
User's portrait is also known as user role (Persona), delineates target user, connection user's demand and design as one kind The effective tool in direction, user's portrait are widely used in each field.Often with most during practical operation The attribute of user, behavior and expectation are tied for plain and closeness to life language.As the virtual representations of actual user, User's portrait be formed by user role be not be detached from it is constructed except product and market come out, the user role of formation needs Want the main audient and target group of representative energy representative products.
As its name suggests, user gender prediction refers to, daily by its to the user of (such as telecommunications) in certain carrier network Internet content and voice habit, predict the network gender of the user.It is then assumed that network gender and the true gender of user are strong It is relevant.The network gender predicted is defined as true gender.Certainly also there is the case where network gender and true gender are not inconsistent, But operator is more concerned about its virtual network gender come out by online and phonetic representation.
The research of the existing gender prediction to each user of mobile communication field is all only limitted to based on for daily internet content And the data such as voice habit, user gender prediction is carried out by a training pattern (single model), is obtained to by the training pattern To gender prediction's result have no other any correction mechanisms, the problem for causing gender prediction's result accuracy rate low.
Summary of the invention
User gender prediction method, device and equipment provided in an embodiment of the present invention, mainly solving the technical problems that: it is existing The low problem of the prediction result accuracy rate for thering is user gender prediction only to cause by single model progress gender prediction.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of user gender prediction method, which comprises
Extract the characteristic that do not predict for progressive of each preset kind data of each user to be predicted;
The characteristic of extraction is substituted into preset full dose user in predicting model and obtains the full dose of each user to be predicted Each characteristic of at least one preset kind data as a result, and is substituted into each spies of the preset kind data by gender prediction respectively The corresponding default component user in predicting model of sign data obtains the component gender prediction of at least one user to be predicted As a result;The full dose user in predicting model is in the training process according to the feature of each preset kind data of training user Data training obtains, the component user in predicting model be in the training process according to the training user at least one Each characteristic training of preset kind data obtains;
It merges the full dose prediction result and the component prediction result obtains the final gender of each user to be predicted Prediction result.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of user gender prediction device, comprising:
Data extraction module, each preset kind data for extracting each user to be predicted are not predicted for progressive Characteristic;
Model processing modules, it is pre- that the characteristic for extracting the data extraction module substitutes into preset full dose user It surveys model and obtains the full dose gender prediction of each user to be predicted as a result, and by each feature of at least one preset kind data The corresponding default component user in predicting model of each characteristic that data substitute into the preset kind data respectively obtains at least Component gender prediction's result of one user to be predicted;The full dose user in predicting model is in the training process according to instruction The characteristic training for practicing each preset kind data of user obtains, and the component user in predicting model is in training process Each characteristic training of middle at least one preset kind data according to the training user obtains;
Predict processing module, for merge the full dose prediction result and the component prediction result obtain it is described respectively to pre- Survey the final gender prediction result of user.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of user gender prediction equipment, comprising: processing Device, memory and communication bus;
The communication bus is for realizing the connection communication between the processor and the memory;
It is as described above to realize that the processor is used to execute the user gender prediction program stored in the memory Step in user's gender prediction's method.
The embodiment of the present invention also provides a kind of computer storage medium, is stored with computer in the computer storage medium Executable instruction, the computer executable instructions are for executing user gender prediction method above-mentioned.
The beneficial effects of the present invention are:
The user's gender prediction's method, device and equipment provided according to embodiments of the present invention is carrying out user gender prediction When, extract the characteristic that do not predict for progressive of each preset kind data of each user to be predicted;By the feature of extraction Data substitute into preset full dose user in predicting model obtain each user to be predicted full dose gender prediction as a result, and by least one Each characteristic of preset kind data substitutes into the corresponding default component of each characteristic of the preset kind data respectively User in predicting model obtains component gender prediction's result of at least one user to be predicted;Then by full dose prediction result and Component prediction result is merged to obtain the final gender prediction result of each user to be predicted.The present invention carries out user gender prediction When be not directly adopt the prediction result of single model as final prediction result, but combine full dose user in predicting model and point Measure final prediction result of the output result as user's gender of user in predicting model, therefore can largely enhancing The accuracy rate that do not predict.
Detailed description of the invention
Fig. 1 is one schematic diagram of full dose prediction result and component prediction result amalgamation mode that the embodiment of the present invention one provides;
Fig. 2 is two schematic diagram of full dose prediction result and component prediction result amalgamation mode that the embodiment of the present invention one provides;
Fig. 3 is user gender prediction method flow schematic diagram provided by Embodiment 2 of the present invention;
Fig. 4 is one schematic diagram of full dose prediction result provided by Embodiment 2 of the present invention and component prediction result amalgamation mode;
Fig. 5 is two schematic diagram of full dose prediction result provided by Embodiment 2 of the present invention and component prediction result amalgamation mode;
Fig. 6 is user's gender prediction's apparatus structure schematic diagram that the embodiment of the present invention three provides;
Fig. 7 is user's gender prediction's device structure schematic diagram that the embodiment of the present invention four provides.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiment is a part of the embodiment in the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Embodiment one:
User gender prediction scheme provided in this embodiment include at least model training and to user's gender to be predicted into Row two processes of prediction.
Wherein, the model training in the present embodiment includes the training of full dose user in predicting model and component user in predicting model Process, the process include extracting feature, model construction and carrying out the process such as predicting according to the model of building.
Extracted in the present embodiment feature can for training user (gender of training user is known, therefore men and women's property Other ratio is also known), the characteristic for being used to carry out gender prediction of respective type data is extracted, herein respective type number According to namely the training stage preset kind data, the data of which type are specifically chosen, and extract which in these types A little characteristics can flexibly be set, as long as can effectively carry out gender distinguishes prediction.
In the present embodiment, for the characteristic of extraction, it can be modeled to obtain one using corresponding modeling pattern The output result of full dose user in predicting model, the full dose user in predicting model can predict the gender of all training users. It, can also be for each characteristic difference of at least a kind of preset kind data while in order to promote the accuracy rate of user gender prediction Corresponding component user in predicting model is established, the foundation of component user in predicting model is preferably used and built with full dose user in predicting model Identical modeling pattern immediately, such as modeling pattern in the present embodiment include but is not limited to logistic regression modeling, supporting vector Machine modeling, random forest modeling, GBDT (Gradient Boosting Decision Tree) modeling, XGboost (Scalable and Flexible Gradient Boosting) modeling.Specifically select which kind of modeling pattern flexible Setting, such as select XGboost modeling pattern to obtain full dose user in predicting model and at least one component user in a kind of example Prediction model.Characteristic in the present embodiment specific to those categorical datas establishes component user in predicting model can also spirit It is living selected.For example, in one embodiment, preset kind data may include the internet records number of preset web in training process According to, using record data, communicating data and online at least one of use habit data.Including internet records data or When using record data, phase can be established respectively for internet records data or using the characteristic of record data The component user in predicting model answered.
In the present embodiment, when carrying out characteristic extraction, for an at least categorical data, it can be drawn using gender accounting It is divided into multiple classifications to achieve the purpose that drop latitude.When existing network data in use and app (application) data, due to Websites quantity is huge, and user number difference is very big, and when together as feature, eigenmatrix just becomes superelevation dimension supersparsity square Battle array, is unfavorable for modeling.To this use drop latitude mode be then according to website, using APP natural quality or directly select master Drift net station, abandons the website of minority, but the minority website abandoned includes that can distinguish to gender.For app data, can select It selects and is also likely to be present in the data that the app of topN (N=15 or other) loses as feature, other data, but loses pair Gender has the app and site information of differentiation.Existing this discarding or the APP number merged according to natural quality or carry out minority website According to discarding achieve the purpose that drop latitude, can cover or abandon a part can be to the data that gender data distinguishes.In this regard, this reality It applies example and can carry out classifying by sex ratio (such as masculinity proportion or female ratio) and reach drop latitude effect, avoiding can distinction Other data are dropped, and while reliably drop latitude, promote the accuracy rate of gender prediction.
Such as: when preset kind data include the internet records data of preset web in the training process, internet records The characteristic of data include at least one for the characteristic of male gender prediction and at least one for female gender into At least one of the characteristic of row prediction, namely drop latitude processing is carried out by sex ratio;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling website and each goal-selling website The amount of asking;Each goal-selling website is that in the training process, access preset targeted website male's accounting value is more than or equal to training user Middle male's accounting value, the targeted website in a characteristic include access website male accounting value in default masculinity proportion value stroke Divide each website in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling website and each goal-selling website The amount of asking;Each goal-selling website is that in the training process, access preset targeted website women accounting value is more than or equal to the training Women accounting value in user;
Then it is modeled respectively for each characteristic of internet records data using default modeling pattern accordingly and obtains phase The component user in predicting model answered.
Each feature that each characteristic of at least one preset kind data is substituted into the preset kind data respectively The corresponding default component user in predicting model of data includes:
Another example is: preset kind data include in training process preset application using record data when, the application The characteristic of usage record data also includes that at least one is directed to for the characteristic of male gender prediction at least one The characteristic that female gender is predicted;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling application and the application of each goal-selling The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to the instruction using male's accounting value that goal-selling is applied Practice male's accounting value in user, the target application in a characteristic includes male's accounting value using application in default male Ratio value divides each application in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling application and the application of each goal-selling The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to the instruction using the women accounting value that goal-selling is applied Practice women accounting value in user;
Then it is modeled respectively for each characteristic using record data using default modeling pattern accordingly To corresponding component user in predicting model.
The extracted characteristic quantity of a certain categorical data and corresponding component user in predicting are directed in the present embodiment The number of model can flexibly be set.Such as can also be only comprising the characteristic for male gender prediction, and number can be with Flexibly setting can also only include the characteristic predicted for female gender, and number can also flexibly be set.Certainly It also can be simultaneously comprising being directed to the male gender characteristic predicted and the characteristic predicted for female gender.
Then according to the output for merging above-mentioned full dose user in predicting model and component user in predicting model as a result, and default Male's probability threshold value, women probability threshold value is trained the prediction of user's gender, and by the prediction result of each training user and The corresponding practical gender result of each training user is compared, and adjusts corresponding male's probability threshold value, women according to comparison result Probability threshold value, until the male to female ratio obtained according to prediction result and male to female ratio actual in training user are equal or close.
In the present embodiment, carried out in the output result for merging above-mentioned full dose user in predicting model and component user in predicting model During gender prediction, finally obtained gender probability value is compared with above-mentioned threshold value, wherein being higher than male's probability threshold value then It is judged to male, is then judged to women lower than women probability threshold value, is more than or equal to women probability threshold value, is less than or equal to male's probability threshold value It is then judged to neutrality, i.e., reserved is faintly neutral user with Sex is placed.
For example, in a kind of example: full dose prediction result and component prediction result are the gender probability of each user to be predicted Value;Fusion full dose prediction result and component prediction result obtain the final gender prediction result of each user to be predicted referring to Fig. 1 institute Show, comprising:
S101: being directed to each training user, obtains the gender probability of the full dose user in predicting model output of the training user The gender probability value of value and the output of each component user in predicting model;
S102: the mean value for calculating the above-mentioned gender probability value got obtains gender prediction's probability value;
S103: the male's probability threshold value and women probability threshold being arranged in obtained gender prediction's probability value and training process Value is compared, and obtains the final gender prediction's result of the training user.
In another example in a kind of example: full dose prediction result and component prediction result are to characterize each training user respectively to be Male, neutrality, the 1 of women, 0, -1 ident value;
Fusion full dose prediction result and component prediction result obtain the final gender prediction result of each user to be predicted referring to Shown in Fig. 2, comprising:
S201: be directed to each training user, obtain the training user full dose user in predicting model output ident value with The product of the predictablity rate of the full dose user in predicting model, and obtain each component user in predicting model output of the training user The product of the predictablity rate of ident value and each component user in predicting model;The predictablity rate of the full dose user in predicting model and each The predictablity rate of component user in predicting model obtains in the training process according to the output of each model and training data comparison It takes;
S202: it calculates the sum of each product got and obtains gender prediction's probability value;
S203: the male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and training process are carried out Compare, obtains the final gender prediction's result of the training user.
The present embodiment can promote the accuracy rate of gender prediction by above-mentioned multi-model amalgamation mode, while not will increase again Model latitude.
In the present embodiment, in order to further enhance the accuracy of gender prediction, when can also further be slided by setting Between window achieve the purpose that be modified the result predicted before.It is modeled again in each time window, when obtaining each Between in window user gender prediction's result.The gender prediction of user in each time window is merged as a result, obtaining user most Whole gender prediction's result.
In the present embodiment, when using sliding time window, in the training process, each preset kind of each training user is extracted The characteristic that do not predict for progressive of data is the characteristic extracted in current gender predicted time window;
At this point, fusion full dose prediction result and component prediction result obtain the final gender prediction result of each training user Afterwards, further includes: each training user is directed to, by the corresponding final gender prediction result of current predictive time window and before at least one The corresponding final gender prediction result of a predicted time window is matched, and the lastness of the training user is adjusted according to matching result Other prediction result.
Wherein, by the corresponding final gender prediction result of current predictive time window and at least one predicted time window pair before The final gender prediction result answered is matched including but not limited to following two mode:
Mode one: the corresponding final gender prediction result of current predictive time window and upper predicted time window is corresponding most Whole gender prediction's result matches;
Include: according to the final gender prediction result that matching result adjusts the training user
If matching result is that gender is identical, enhance the final gender prediction result of the training user;
If matching result be gender on the contrary, if weaken the final gender prediction result of the training user;
Otherwise, keep the final gender prediction result intensity of the training user constant;
Mode two: by the corresponding final gender prediction result of current predictive time window and at least one predicted time window before Corresponding final gender prediction result is matched are as follows: by the corresponding final gender prediction result of current predictive time window and before The corresponding final gender prediction result of all predicted time windows is matched;
Adjusting the final gender prediction result of the training user according to matching result includes: by training user described each pre- The most final gender prediction result of frequency of occurrence is as the current newest final gender prediction of the training user in survey time window As a result.
As it can be seen that the present embodiment can also repeatedly merge obtained prediction result by way of above-mentioned time slip-window Amendment, further to promote the accuracy of user's gender prediction's result.
Embodiment two:
The full dose user in predicting model and at least one component that the present embodiment is obtained based on embodiment one by training process User in predicting model, the process for carrying out gender prediction to user to be predicted illustrate.It is shown in Figure 3, this implementation Example provide a kind of user gender prediction method include:
S301: the characteristic that do not predict for progressive of each preset kind data of each user to be predicted is extracted.
Preset kind data in the present embodiment can be in the training process set categorical data, for example including but It is not limited to the internet records data of preset web in training process, uses habit using record data, communicating data and online At least one of used data.When including internet records data or using record data, internet records data can be directed to Or corresponding component user in predicting model is established respectively using the characteristic of record data.
S302: the characteristic of extraction is substituted into preset full dose user in predicting model and obtains each user's to be predicted Full dose gender prediction as a result, and by each characteristic of at least one preset kind data substitute into respectively the preset kind data it The corresponding default component user in predicting model of each characteristic obtains the component gender of at least one user to be predicted Prediction result.
Full dose user in predicting model in the present embodiment is in the training process according to each default class of training user The characteristic training of type data obtains, and component user in predicting model is in the training process according to the training user Each characteristic training of at least one preset kind data obtains.Referring specifically to shown in embodiment one, details are not described herein.
S303: fusion full dose prediction result and component prediction result obtain the final gender prediction knot of each user to be predicted Fruit.
In a kind of example, when preset kind data include the internet records data of preset web in training process, extraction The characteristic of internet records data includes that at least one characteristic predicted for male gender is directed to female at least one At least one for the characteristic that property gender is predicted;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling website and each goal-selling website The amount of asking;Each goal-selling website is that in the training process, access preset targeted website male's accounting value is more than or equal to described Male's accounting value in training user, the targeted website in a characteristic include access website male accounting value in default male Ratio value divides each website in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling website and each goal-selling website The amount of asking;Each goal-selling website is that in the training process, access preset targeted website women accounting value is more than or equal to described Women accounting value in training user;
At this point, each characteristic of at least one preset kind data to be substituted into each feature of the preset kind data respectively The corresponding default component user in predicting model of data includes:
Each characteristic for including by internet records data substitutes into each characteristic of the internet records data respectively respectively Corresponding default component user in predicting model.
In another example, preset kind data include in training process preset application using record data;It answers Characteristic with usage record data includes that at least one is directed to for the characteristic of male gender prediction at least one The characteristic that female gender is predicted;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling application and the application of each goal-selling The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to training using male's accounting value that goal-selling is applied and use Male's accounting value in family, the target application in a characteristic include male's accounting value using application in default masculinity proportion Value divides each application in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling application and the application of each goal-selling The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to training using the women accounting value that goal-selling is applied and use Women accounting value in family;
Each characteristic of at least one preset kind data is substituted into each characteristic of the preset kind data respectively Corresponding default component user in predicting model includes:
Each characteristic for including using record data is substituted into respectively should be using each feature of record data The corresponding default component user in predicting model of data.
In a kind of example, preset kind data may also include at least one in communicating data and online use habit data Kind;
Wherein the characteristic of communicating data includes contact person's number, is called total duration, and caller total duration is called number, Caller number, converse total degree, be called number and converse total degree quotient and caller number and call total degree quotient in It is at least one;
The characteristic of online use habit data includes the online probability of each default online measurement period, and each At least one of the comentropy surfed the Internet in internet information measurement period.
In a kind of example of the present embodiment, full dose prediction result and component prediction result are the gender of each user to be predicted Probability value;At this point, fusion full dose prediction result and component prediction result obtain the final gender prediction result of each user to be predicted It is shown in Figure 4, comprising:
S401: being directed to each user to be predicted, obtains the gender of the full dose user in predicting model output of the user to be predicted The gender probability value of probability value and the output of each component user in predicting model;
S402: the mean value for calculating the gender probability value got obtains gender prediction's probability value;
S403: the male's probability threshold value and women probability threshold value being arranged in gender prediction's probability value and training process It is compared, obtains the final gender prediction's result of the user to be predicted.
In another example of the present embodiment, full dose prediction result and component prediction result are each to be predicted to characterize respectively User is male, neutrality, the 1 of women, 0, -1 ident value;Merge at this time full dose prediction result and component prediction result obtain respectively to Predict that the final gender prediction result of user is shown in Figure 5, comprising:
S501: being directed to each user to be predicted, obtains the mark of the full dose user in predicting model output of the user to be predicted The product of value and the predictablity rate of the full dose user in predicting model, and obtain each component user in predicting model of user to be predicted The product of the predictablity rate of the ident value of output and each component user in predicting model;The predictablity rate of full dose user in predicting model It is obtained in the training process according to the output of each model and training data comparison with the predictablity rate of each component user in predicting model It takes;
S502: it calculates the sum of each product got and obtains gender prediction's probability value;
S503: the male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and training process are carried out Compare, obtains the final gender prediction's result of the user to be predicted.
In one implementation, setting male's probability threshold value is greater than the women probability threshold value;At this point, by gender prediction The male's probability threshold value and women probability threshold value being arranged in probability value and training process are compared, and obtain the user to be predicted most Whole gender prediction's result includes:
When gender prediction's probability value is greater than male's probability threshold value, the final gender prediction of corresponding user to be predicted is determined It as a result is male;
When gender prediction's probability value is less than the women probability threshold value, the final gender of corresponding user to be predicted is determined Prediction result is women;
When gender prediction's probability value is more than or equal to the women probability threshold value, is less than or equal to male's probability threshold value, determine The final gender prediction's result of corresponding user to be predicted is neutrality.
In the present embodiment, the feature that do not predict for progressive of each preset kind data of each user to be predicted is extracted It further include judging whether preset gender prediction's time window reaches before data;Extract each preset kind of each user to be predicted The characteristic that do not predict for progressive of data is the characteristic extracted in current gender predicted time window;
Fusion full dose prediction result and the component prediction result obtain the final gender prediction of each user to be predicted As a result after, further includes:
For each user to be predicted, by the corresponding final gender prediction result of current predictive time window and before at least one The corresponding final gender prediction result of a predicted time window is matched, and adjusts the final of the user to be predicted according to matching result Gender prediction's result.
The corresponding final gender prediction result of current predictive time window is corresponding at least one predicted time window before Final gender prediction result is matched are as follows: when by the corresponding final gender prediction result of current predictive time window with upper one prediction Between the corresponding final gender prediction result of window matched;
Include: according to the final gender prediction result that matching result adjusts the user to be predicted at this time
If matching result is that gender is identical, enhance the final gender prediction result of the user to be predicted;
If matching result be gender on the contrary, if weaken the final gender prediction result of the user to be predicted;
Otherwise, keep the final gender prediction result intensity of the user to be predicted constant;
Or,
The corresponding final gender prediction result of current predictive time window is corresponding at least one predicted time window before Final gender prediction result is matched are as follows: by the corresponding final gender prediction result of current predictive time window with it is all before pre- The corresponding final gender prediction result of time window is surveyed to be matched;
The final gender prediction result of the user to be predicted is adjusted according to matching result at this time can include: by user to be predicted The most final gender prediction result of frequency of occurrence is currently newest as the user to be predicted in each predicted time window Final gender prediction result.
As it can be seen that scheme provided in this embodiment at least has following advantages:
1, using full dose user in predicting model+component user in predicting Model Fusion method, the prediction of model can be improved just True rate.After tested, male to female ratio is 2:1 in telecommunication user, and women accounting is 32%, if just with the prediction of commonsense method women True rate is less than 40%, and after built-up pattern, the prediction accuracy of women be can be improved to 58%.Overall accuracy is promoted from 72% Value 80%.
2, after carrying out dimension reduction method dimensionality reduction using sex ratio, model training speed is improved.Characteristic dimension is substantially reduced, and More information is saved as far as possible, substantially reduces mode input matrix dimensionality, improves model training speed.
3, by introducing time window concept, reducing the ups and downs of website and app class data and changing caused by model not Benefit influences, the final accuracy for improving model.
Embodiment three:
A kind of user gender prediction device is present embodiments provided, it is shown in Figure 6, comprising:
Data extraction module 61 is not predicted for extracting the progressive that is used for of each preset kind data of each user to be predicted Characteristic;Preset kind data in the present embodiment can be categorical data set in the training process, such as wrap It includes but is not limited to the internet records data of preset web in training process, make using record data, communicating data and online With at least one of habit data.When including internet records data or using record data, internet records can be directed to Data establish corresponding component user in predicting model using the characteristic of record data respectively.
Model processing modules 62, the characteristic for extracting data extraction module substitute into preset full dose user in predicting Model obtains the full dose gender prediction of each user to be predicted as a result, and dividing each characteristic of at least one preset kind data The corresponding default component user in predicting model of each characteristic for not substituting into the preset kind data obtains at least one and waits for Predict component gender prediction's result of user;Full dose user in predicting model in the present embodiment is in the training process according to training The characteristic training of each preset kind data of user obtains, and component user in predicting model is basis in the training process Each characteristic training of at least one preset kind data of the training user obtains.Referring specifically to one institute of embodiment Show, details are not described herein.
It predicts processing module 63, obtains each user to be predicted most for merging full dose prediction result and component prediction result Whole gender prediction's result.
In a kind of example, when preset kind data include the internet records data of preset web in training process, count at this time The characteristic for the internet records data extracted according to extraction module 61 includes the characteristic that at least one is directed to male gender prediction According at least one for the characteristic predicted at least one for female gender;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling website and each goal-selling website The amount of asking;Each goal-selling website is that in the training process, access preset targeted website male's accounting value is more than or equal to described Male's accounting value in training user, the targeted website in a characteristic include access website male accounting value in default male Ratio value divides each website in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling website and each goal-selling website The amount of asking;Each goal-selling website is that in the training process, access preset targeted website women accounting value is more than or equal to described Women accounting value in training user;
At this point, each characteristic of at least one preset kind data is substituted into the default class by model processing modules 62 respectively The corresponding default component user in predicting model of each characteristic of type data includes:
Each characteristic for including by internet records data substitutes into each characteristic of the internet records data respectively respectively Corresponding default component user in predicting model.
In another example, preset kind data include in training process preset application using record data;This When the characteristic using record data extracted of data extraction module 61 include at least one for male gender prediction Characteristic and at least one be directed to characteristic for being predicted of female gender;
It is each to be directed to the visit that the preset characteristic of male gender includes goal-selling application and the application of each goal-selling The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to training using male's accounting value that goal-selling is applied and use Male's accounting value in family, the target application in a characteristic include male's accounting value using application in default masculinity proportion Value divides each application in range;
It is each to be directed to the visit that the preset characteristic of female gender includes goal-selling application and the application of each goal-selling The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to training using the women accounting value that goal-selling is applied and use Women accounting value in family;
At this point, each characteristic of at least one preset kind data is substituted into the default class by model processing modules 62 respectively The corresponding default component user in predicting model of each characteristic of type data includes:
Each characteristic for including using record data is substituted into respectively should be using each feature of record data The corresponding default component user in predicting model of data.
In a kind of example, preset kind data may also include at least one in communicating data and online use habit data Kind;
The characteristic for the communicating data that wherein data extraction module 61 is extracted includes contact person's number, is called total duration, Caller total duration, is called number, caller number, and total degree of conversing is called quotient and the caller number of number and total degree of conversing With at least one of the quotient of call total degree;
The characteristic for the online use habit data that data extraction module 61 is extracted includes each default online statistics week At least one of the comentropy surfed the Internet in the online probability of phase and each internet information measurement period.
In a kind of example of the present embodiment, full dose prediction result and component prediction result are the gender of each user to be predicted Probability value;At this point, prediction processing module 63 merges full dose prediction result and component prediction result obtains each user to be predicted most Whole gender prediction's result includes:
Predict that processing module 63 is directed to each user to be predicted, the full dose user in predicting model for obtaining the user to be predicted is defeated The gender probability value of gender probability value and the output of each component user in predicting model out;
The mean value that prediction processing module 63 calculates the gender probability value got obtains gender prediction's probability value;
Predict processing module 63 by the male's probability threshold value being arranged in gender prediction's probability value and training process and female Property probability threshold value is compared, and obtains the final gender prediction's result of the user to be predicted.
In another example of the present embodiment, full dose prediction result and component prediction result are each to be predicted to characterize respectively User is male, neutrality, the 1 of women, 0, -1 ident value;Prediction processing module 63 merges full dose prediction result at this time and component is pre- It surveys result and obtains the final gender prediction result of each user to be predicted and include:
Predict that processing module 63 is directed to each user to be predicted, the full dose user in predicting model for obtaining the user to be predicted is defeated The product of the predictablity rate of ident value and the full dose user in predicting model out, and obtain each component user of user to be predicted The product of the predictablity rate of the ident value and each component user in predicting model of prediction model output;Full dose user in predicting model it is pre- The predictablity rate of accuracy rate and each component user in predicting model is surveyed in the training process according to the output of each model and training number It is obtained according to comparing;
Prediction processing module 63 calculates the sum of each product got and obtains gender prediction's probability value;
Predict that processing module 63 is general by the male's probability threshold value and women being arranged in gender prediction's probability value and training process Rate threshold value is compared, and obtains the final gender prediction's result of the user to be predicted.
In one implementation, setting male's probability threshold value is greater than the women probability threshold value;At this point, prediction processing mould The male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and training process are compared by block 63, are obtained The final gender prediction's result of the user to be predicted includes:
When gender prediction's probability value is greater than male's probability threshold value, the final gender prediction of corresponding user to be predicted is determined It as a result is male;
When gender prediction's probability value is less than the women probability threshold value, the final gender of corresponding user to be predicted is determined Prediction result is women;
When gender prediction's probability value is more than or equal to the women probability threshold value, is less than or equal to male's probability threshold value, determine The final gender prediction's result of corresponding user to be predicted is neutrality.
In the present embodiment, data extraction module 61 extract each preset kind data of each user to be predicted for carrying out It further include judging whether preset gender prediction's time window reaches before the characteristic of gender prediction;Extract each use to be predicted The characteristic that do not predict for progressive of each preset kind data at family is the spy extracted in current gender predicted time window Levy data;
Prediction processing module 63 merges full dose prediction result and component prediction result obtains each user to be predicted most After whole gender prediction's result, further includes:
It predicts that processing module 63 is directed to each user to be predicted, the corresponding final gender prediction of current predictive time window is tied Fruit final gender prediction result corresponding at least one predicted time window before is matched, should be to according to matching result adjustment Predict the final gender prediction result of user.
Predict processing module 63 by the corresponding final gender prediction result of current predictive time window at least one is pre- before It surveys the corresponding final gender prediction result of time window to be matched are as follows: tie the corresponding final gender prediction of current predictive time window Fruit final gender prediction result corresponding with upper predicted time window is matched;
Prediction processing module 63 includes: according to the final gender prediction result that matching result adjusts the user to be predicted at this time
If matching result is that gender is identical, enhance the final gender prediction result of the user to be predicted;
If matching result be gender on the contrary, if weaken the final gender prediction result of the user to be predicted;
Otherwise, keep the final gender prediction result intensity of the user to be predicted constant;
Or,
Predict processing module 63 by the corresponding final gender prediction result of current predictive time window at least one is pre- before It surveys the corresponding final gender prediction result of time window to be matched are as follows: tie the corresponding final gender prediction of current predictive time window Fruit final gender prediction result corresponding with all predicted time windows before is matched;
Prediction processing module 63 can be wrapped according to the final gender prediction result that matching result adjusts the user to be predicted at this time Include: using user to be predicted, the most final gender prediction result of frequency of occurrence is to be predicted as this in each predicted time window The current newest final gender prediction result of user.
The function of above-mentioned each module in the present embodiment can be realized by the processor of user's gender prediction's device.This reality The user's gender prediction's device for applying example offer uses full dose user in predicting model+component user in predicting Model Fusion method, can Improve the prediction accuracy of model.After carrying out dimension reduction method dimensionality reduction using sex ratio simultaneously, model training speed is improved.Significantly Characteristic dimension is reduced, and saves more information as far as possible, substantially reduces mode input matrix dimensionality, improves model training speed. Additionally by time window concept is introduced, the ups and downs of website and app class data are reduced and change adversely affecting caused by model, The final accuracy for improving model.
Example IV:
A kind of user gender prediction equipment is present embodiments provided, it is shown in Figure 7 comprising processor 71, memory 72 and communication bus 73;
Communication bus 73 is for realizing the connection communication between processor 71 and memory 73;
Processor 72 is for executing the user gender prediction program stored in memory 73 to realize as in embodiment one, two User's gender prediction's method in step.And it should be understood that user's gender prediction's equipment in the present embodiment can be The server of each operator's setting, is also possible to other equipment.
In order to make it easy to understand, the present embodiment combines two kinds of concrete implementation modes to be illustrated.
Example one:
Realize that the process of user gender prediction is as follows in the example:
Collection selection:
Gender prediction's time window is set, is analyzed by data, this example uses the bimestrial internet records of user, app Using, call details, online habit data.
Gender prediction's time window sliding, it is ensured that window does not have coincidence.Such as window be 2017-04-01 extremely 2017-05-31, then previous window is 2017-02-01 to 2017-03-31, and the latter window is 2017-06-01 to 2017- 07-31。
Extract feature:
The all user data convergence of the time dimension of data is day data (an i.e. daily data), then converges again and is Moon data (i.e. a monthly data), are then converged again as two months data (i.e. two months datas).
Call details feature extraction.The characteristic mainly extracted includes but is not limited to: contact person's number, when being called total Long, caller total duration is called number, and caller number is called number/call total degree, caller number/call total degree.
Net habit data characteristics is extracted.The feature mainly extracted includes but is not limited to: 24 hours each hour (online statistics Period is a hour) online probability, the comentropy of daily (internet information measurement period) online (portrays user's surf time Degree of scatter).
Internet records data characteristics is extracted, and this feature processing method is referring to following procedure:
Host (reaction network address is only intercepted to website url (uniform resource locator, uniform resource locator) The address ip of url and host name) part, it is secondary to the access in detail to the website of host rank to obtain all samples in training set Number, i.e. user-website visiting degree matrix, are denoted as user-host matrix;
Masculinity proportion in each website is counted, website and masculinity proportion data are obtained, the identical website of masculinity proportion is drawn It is divided into same class website, masculinity proportion is accurate to 0.001, obtains 1000 website classes in this way, is denoted as host-set, in the set There are 1000 elements, first value of each element is male's accounting (three are accurate to after decimal point), remaining element is the male Website name under ratio;
Each column in user-host matrix are standardized, i.e., the data standard of each website is turned to variance is 0, Value is 1.Then it will be that website data after of a sort standardization is summed in host-set, obtain -1000 class website of user spy Matrix is levied, user-hostset matrix is denoted as.
Assuming that male's accounting 68% in data set, women accounting 32%, following operation obtain 6 data sets:
It selects user number in user-host matrix and is greater than 5 less than 500, and male's accounting is greater than 90% in host-set Website, then select, browse the user of such website, user as user-web matrix value, obtains the amount of access of website Dataset1 (i.e. first characteristic of internet records data);
It selects user number in user-host matrix and is more than or equal to 500 less than 5000, and male's accounting is greater than 80% net It stands, then selects the user for browsing such website, user as user-web matrix value, obtains the amount of access of website Dataset2 (i.e. second characteristic of internet records data);
User number is selected greater than 5000, and male's accounting is greater than 75% website, then selects the use for browsing such website Family, user, as user-web matrix value, obtain dataset3 (the i.e. third of internet records data to the amount of access of website A characteristic);
It selects user number in user-host matrix and is less than 500, and women accounting is greater than 75% net in host-set It stands, then selects, browse the user of such website, user as user-web matrix value, obtains the amount of access of website Dataset4 (i.e. the 4th characteristics of internet records data);
It selects user number in user-host matrix and is more than or equal to 500 less than 5000, and women accounting is greater than 60% net It stands, then selects the user for browsing such website, user as user-web matrix value, obtains the amount of access of website Dataset5 (i.e. the 5th characteristics of internet records data);
User number is selected greater than 5000, and women accounting is greater than 50% website, then selects the use for browsing such website Family, user, as user-web matrix value, obtain dataset6 (i.e. the 6th of internet records data to the amount of access of website A characteristic).
App is extracted using data characteristics.Characteristic processing method, similar with website treating method:
The identical app of masculinity proportion is divided into same class, male's ratio using men and women's use ratio of every money app by statistics Example is accurate to 0.01, obtains 1000 app classes in this way, is denoted as app-set, there is 1000 elements in the set, each element the One value is male's accounting (being accurate to 2 significant digits), remaining element is the app name under the masculinity proportion;
User's-app access times matrix is each user to the access times of all app, is denoted as user-app matrix, marks Each column in standardization user-app matrix, i.e., the data standard of each app is turned to variance is 0, mean value 1.Then will It is the app data summation after of a sort standardization in app-set, obtains -100 class app eigenmatrix of user, be denoted as user- Appset matrix.
According to the male to female ratio that user uses the access times of app and every money app, 6 numbers are obtained by following operation According to collection:
Number of users is selected greater than 5 less than 2000, male's accounting is greater than 90% app, then selects the use using these app Family obtains user's-app matrix, and the value of the matrix is access times of the user to app, and being denoted as dataset7, (i.e. app uses number According to first characteristic);
User is selected more than or equal to 1000 less than 5000, male's accounting is greater than 80% app, then selects and use these app User, obtain user's-app matrix, the value of the matrix is user to the access times of app, and being denoted as dataset8, (i.e. app makes With second characteristic of data);
User is selected greater than 5000, male's accounting is greater than 75% app, then selects the user using these app, is used Family-app matrix, the value of the matrix are access times of the user to app, and being denoted as dataset9, (i.e. app is a using the third of data Characteristic);
User is selected greater than 5 less than 500, women accounting is greater than 70% app, then selects the user using these app, obtains To user's-app matrix, the value of the matrix is access times of the user to app, and being denoted as dataset10, (i.e. app uses data 4th characteristic);
User is selected more than or equal to 500 less than 3000, women accounting is greater than 60% app, then selects using these app's User obtains user's-app matrix, and the value of the matrix is access times of the user to app, and being denoted as dataset11, (i.e. app is used 5th characteristic of data);
User is selected greater than 3000, women accounting is greater than 50% app, then selects the user using these app, is used Family-app matrix, the value of the matrix are access times of the user to app, and being denoted as dataset12, (i.e. app uses the 6th of data A characteristic).
Model is constructed, male is provided with and is identified as 1, neutrality is identified as 0, and women is identified as -1:
Using each characteristic of said extracted, the eigenmatrix of full dose user is constituted, compares logistic regression, supporting vector Machine, random forest, GBDT, XGboost model result select XGboost regression algorithm, obtain XGboost model, be denoted as Model0 (full dose user in predicting model).The learning rate that XGboost is wherein arranged in this example is 0.2, the number of iterations 50, and tree is deep Degree is 7.Select the probability value for making women accuracy rate reach 55% as women threshold value, selection makes male's accuracy rate reach 80% Probability value is as male's threshold value.Judge male and female, and neutral user.
It is directed to dataset1-dataset12 respectively, is modeled respectively with XGboost, obtaining 12 models, (component user is pre- Survey model).It is denoted as model1-model12 respectively, wherein model1-model3, model7-model9 are to predict male The high male's model of accuracy rate, model4-model6, model10-model12 are the women high to women predictablity rate Model.The threshold value of each model is set, as a result are as follows: the model accuracy of model1, model4, model7, model10 are very The model accuracy of height, model2, model5, model8, model11 is taken second place, model3, model6, model9, model12 Model accuracy it is lower relative to first two.
Merge model0, model1-model12.Each model has a gender result output to user, according to general Model output result seeks weighted sum, and wherein weight is the predictablity rate and each component user in predicting mould of full dose user in predicting model The predictablity rate of type compares according to the output of each model and training data obtain in the training process.
The relationship for judging final result Yu -1,0,1 determines the gender of user.
Threshold value is adjusted, the optimal threshold for dividing gender is obtained.Since male to female ratio is 68:32 in training sample, because This given threshold obtains final threshold value so that male to female ratio is consistent with male to female ratio in sample in prediction result.
Sliding time window is handled once the every two moon again, corrects the gender prediction of user as a result, to guarantee user's property The accuracy that do not predict, if user gender prediction result twice on the contrary, if weaken the final gender prediction of the user to be predicted As a result, user can be for example classified as to gender bender again, or when being added up using above-mentioned ident value, then subtract 1;If used twice Family gender prediction's result is identical, then enhances, such as plus 1;Other situations can then remain unchanged.
It is compared, is handled with existing prediction technique in the result that model obtains, the accuracy of women is less than 40% according to test. After the prediction technique in this example, women accuracy can achieve 58%.Overall accuracy can be promoted to from 72% 80%.
Example two:
Realize that the process of user gender prediction is as follows in the example:
Collection selection:
Gender prediction's time window is set, is analyzed by data, this example uses the trimestral internet records of user, app Use data.
Gender prediction's time window is arranged to slide, it is ensured that window does not have coincidence.Such as a window is 2017-04- 01 to 2017-06-31, then previous window be 2017-01-01 to 2017-03-31, the latter window be 2017-07-01 extremely 2017-09-31。
Extract feature:
The all user data convergence of the time dimension of data is day data (an i.e. daily data), then converges again and is Month data, are then converged again as three months data.
Internet records data characteristics is extracted.Characteristic processing method:
The part host is only intercepted to website url, obtains all samples in training set in detail to the website of host rank name Access times, i.e. user-website visiting degree matrix is denoted as user-host matrix;
Masculinity proportion in each website is counted, website and masculinity proportion data are obtained, the identical website of masculinity proportion is drawn It is divided into same class website, masculinity proportion is accurate to 0.001, obtains 1000 website classes in this way, is denoted as host-set, in the set There are 1000 elements, first value of each element is male's accounting (three are accurate to after decimal point), remaining element is the male Website name under ratio;
Each column in user-host matrix are standardized, i.e., the data standard of each website is turned to variance is 0, Value is 1.Then it will be that website data after of a sort standardization is summed in host-set, obtain -1000 class website of user spy Matrix is levied, user-hostset matrix is denoted as.
Assuming that male's accounting 65% in data set, women accounting 35%, following operation obtain 4 data sets:
It selects user number in user-host matrix and is greater than 5 less than 3000, and male's accounting is greater than 85% in host-set Website, then select, browse the user of such website, user as user-web matrix value, obtains the amount of access of website Dataset1 (i.e. first characteristic of internet records data);
It selects user number in user-host matrix and is more than or equal to 3000, and male's accounting is greater than 75% website, then selects The user of such website is browsed out, and user, as user-web matrix value, obtains dataset2 (i.e. to the amount of access of website Second characteristic of internet records data).
It selects user number in user-host matrix and is less than 2000, and women accounting is greater than 75% net in host-set It stands, then selects, browse the user of such website, user as user-web matrix value, obtains the amount of access of website Dataset3 (i.e. the third characteristics of internet records data);
It selects user number in user-host matrix and is more than or equal to 2000, and women accounting is greater than 50% website, then selects The user of such website is browsed out, and user, as user-web matrix value, obtains dataset4 (i.e. to the amount of access of website 4th characteristic of internet records data).
App is extracted using data characteristics.Characteristic processing method, similar with website treating method:
The identical app of masculinity proportion is divided into same class, male's ratio using men and women's use ratio of every money app by statistics Example is accurate to 0.01, obtains 1000 app classes in this way, is denoted as app-set, there is 1000 elements in the set, each element the One value is male's accounting (being accurate to 2 significant digits), remaining element is the app name under the masculinity proportion;
User's-app access times matrix is each user to the access times of all app, is denoted as user-app matrix, marks Each column in standardization user-app matrix, i.e., the data standard of each app is turned to variance is 0, mean value 1.Then will It is the app data summation after of a sort standardization in app-set, obtains -100 class app eigenmatrix of user, be denoted as user- Appset matrix.
According to the male to female ratio that user uses the access times of app and every money app, 4 data are obtained by following operation Collection:
Number of users is selected greater than 5 less than 5000, male's accounting is greater than 85% app, then selects the use using these app Family obtains user's-app matrix, and the value of the matrix is access times of the user to app, and being denoted as dataset5, (i.e. app uses number According to first characteristic);
User is selected greater than 5000, male's accounting is greater than 75% app, then selects the user using these app, is used Family-app matrix, the value of the matrix are access times of the user to app, and being denoted as dataset6, (i.e. app uses second of data Characteristic);
User is selected greater than 5 less than 2000, women accounting is greater than 70% app, then selects the user using these app, User's-app matrix is obtained, the value of the matrix is access times of the user to app, and being denoted as dataset7, (i.e. app uses data Third characteristic);
User is selected greater than 2000, women accounting is greater than 50% app, then selects the user using these app, is used Family-app matrix, the value of the matrix are access times of the user to app, and being denoted as dataset8, (i.e. app uses the 4th of data Characteristic).
Model is constructed, wherein male's mark is still 1, and neutrality is identified as 0, and women is identified as -1:
Using the characteristic of said extracted, constitute the eigenmatrix of full dose user, comparison logistic recurrence, svm, with Machine forest, GBDT, XGboost model result select XGboost regression algorithm, obtain XGboost model, and it is (complete to be denoted as model0 Measure user in predicting model).Wherein the learning rate of XGboost is 0.3, the number of iterations 100, and tree depth is 8.Obtaining user is male Probability value.
It is directed to dataset1-dataset8 respectively, is modeled respectively with XGboost, obtains 8 model (component user in predicting Model).It is denoted as model1-model8 respectively, obtains the probability value that user is male.
Merge model, model1-model8.Each model has a gender probability of outcome output to user, by 9 The result average value of model obtains gender prediction's probability value of user.
Threshold value is adjusted, the optimal threshold for dividing gender is obtained.Since male to female ratio is 65:35 in training sample, because This given threshold obtains final threshold value so that male to female ratio is consistent with male to female ratio in sample in prediction result.
Sliding time window, every three months are handled once again, correct the gender prediction of user as a result, to guarantee user's property The accuracy that do not predict selects user to be determined the most gender of number as final gender.Male is judged as when there is user When woman's number is the same, then the settable user is classified as gender bender, wouldn't judge.
The present invention uses full dose user in predicting model+component user in predicting Model Fusion method, and the pre- of model can be improved Survey accuracy.After tested, male to female ratio is 2:1 in telecommunication user, and women accounting is 32%, if pre- with commonsense method women Accuracy is surveyed less than 40%, after built-up pattern, the prediction accuracy of women be can be improved to 58%.Overall accuracy is from 72% Lifting values 80%.
In addition after the present invention carries out dimension reduction method dimensionality reduction using sex ratio, model training speed is improved.Substantially reduce spy Dimension is levied, and saves more information as far as possible, substantially reduces mode input matrix dimensionality, improves model training speed.
Meanwhile the present invention is by introducing time window concept, reducing the ups and downs of website and app class data and changing to model Caused by adversely affect, the final accuracy for improving model.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that an application apparatus (can be mobile phone, computer, clothes Be engaged in device, air conditioner or network application apparatus etc.) method that executes each embodiment of the present invention.
The above content is combining specific embodiment to be further described to made by the embodiment of the present invention, cannot recognize Fixed specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, Without departing from the inventive concept of the premise, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to the present invention Protection scope.

Claims (11)

1. a kind of user gender prediction method, which comprises
Extract the characteristic that do not predict for progressive of each preset kind data of each user to be predicted;
The characteristic of extraction is substituted into preset full dose user in predicting model and obtains the full dose gender of each user to be predicted Prediction result, and each characteristic of at least one preset kind data is substituted into each characteristic of the preset kind data respectively Component gender prediction's result of at least one user to be predicted is obtained according to corresponding default component user in predicting model; The full dose user in predicting model is in the training process according to the characteristic of each preset kind data of training user Training obtains, and the component user in predicting model is that at least one is default according to the training user in the training process Each characteristic training of categorical data obtains;
It merges the full dose prediction result and the component prediction result obtains the final gender prediction of each user to be predicted As a result.
2. user gender prediction method as described in claim 1, which is characterized in that the preset kind data include the instruction The internet records data of preset web during white silk;
The characteristic of the internet records data includes the characteristic and at least one that at least one is directed to male gender prediction At least one of a characteristic predicted for female gender;
It is described it is each for the preset characteristic of male gender include goal-selling website and each goal-selling website visit The amount of asking;Each goal-selling website is that in the training process, access preset targeted website male's accounting value is more than or equal to described Male's accounting value in training user, the targeted website in a characteristic include access website male accounting value in default male Ratio value divides each website in range;
It is described it is each for the preset characteristic of female gender include goal-selling website and each goal-selling website visit The amount of asking;Each goal-selling website is that in the training process, access preset targeted website women accounting value is more than or equal to described Women accounting value in training user;
Each characteristic that each characteristic of at least one preset kind data is substituted into the preset kind data respectively Corresponding default component user in predicting model includes:
Each characteristic that the internet records data include is substituted into each characteristic of the internet records data respectively respectively Corresponding default component user in predicting model.
3. user gender prediction method as described in claim 1, which is characterized in that the preset kind data include the instruction Default application using record data during practicing;
The characteristic using record data includes at least one for the characteristic of male gender prediction and extremely A few characteristic predicted for female gender;
Each preset characteristic of male gender that is directed to includes goal-selling application and the visit that each goal-selling is applied The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to institute using male's accounting value that goal-selling is applied Male's accounting value in training user is stated, the target application in a characteristic includes male's accounting value using application default Masculinity proportion value divides each application in range;
Each preset characteristic of female gender that is directed to includes goal-selling application and the visit that each goal-selling is applied The amount of asking;Each goal-selling is applied in the training process, to be more than or equal to institute using the women accounting value that goal-selling is applied State women accounting value in training user;
Each characteristic that each characteristic of at least one preset kind data is substituted into the preset kind data respectively Corresponding default component user in predicting model includes:
Each characteristic for including using record data is substituted into respectively should be using each feature of record data The corresponding default component user in predicting model of data.
4. user gender prediction method as claimed in claim 2 or claim 3, which is characterized in that the preset kind data further include At least one of communicating data and online use habit data;
The characteristic of the communicating data includes contact person's number, is called total duration, and caller total duration is called number, caller In number, total degree of conversing, the quotient and caller number of called number and total degree of conversing and the quotient of call total degree at least It is a kind of;
The characteristic of the online use habit data includes the online probability of each default online measurement period, and each At least one of the comentropy surfed the Internet in internet information measurement period.
5. user gender prediction method as described in claim 1, which is characterized in that the full dose prediction result and the component Prediction result is the gender probability value of each user to be predicted;
The fusion full dose prediction result and the component prediction result obtain the final gender of each user to be predicted Prediction result includes:
For each user to be predicted, the gender probability value of the full dose user in predicting model output of the user to be predicted and each is obtained The gender probability value of component user in predicting model output;
The mean value for calculating the gender probability value got obtains gender prediction's probability value;
The male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and the training process are carried out Compare, obtains the final gender prediction's result of the user to be predicted.
6. user gender prediction method as described in claim 1, which is characterized in that the full dose prediction result and the component Prediction result is that characterize each user to be predicted respectively be male, neutrality, the 1 of women, 0, -1 ident value;
The fusion full dose prediction result and the component prediction result obtain the final gender of each user to be predicted Prediction result includes:
For each user to be predicted, the ident value and the full dose of the full dose user in predicting model output of the user to be predicted are obtained The product of the predictablity rate of user in predicting model, and obtain the mark of each component user in predicting model output of user to be predicted The product of value and the predictablity rate of each component user in predicting model;The predictablity rate of the full dose user in predicting model and described The predictablity rate of each component user in predicting model compares in the training process according to the output of each model and training data It obtains;
It calculates the sum of each product got and obtains gender prediction's probability value;
The male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and the training process are carried out Compare, obtains the final gender prediction's result of the user to be predicted.
7. such as user gender prediction method described in claim 5 or 6, which is characterized in that male's probability threshold value is greater than institute State women probability threshold value;
The male's probability threshold value and women probability threshold value that are arranged in gender prediction's probability value and the training process are carried out Compare, obtaining the final gender prediction's result of the user to be predicted includes:
When gender prediction's probability value is greater than male's probability threshold value, the final gender of corresponding user to be predicted is determined Prediction result is male;
When gender prediction's probability value is less than the women probability threshold value, the final gender of corresponding user to be predicted is determined Prediction result is women;
When gender prediction's probability value is more than or equal to the women probability threshold value, is less than or equal to male's probability threshold value, Determine the final gender prediction's result of corresponding user to be predicted for neutrality.
8. such as the described in any item user gender prediction methods of claim 1-3,5-6, which is characterized in that the extraction is respectively to pre- It surveys before the characteristic that do not predicted for progressive of each preset kind data of user, further includes judging that preset gender is pre- Survey whether time window reaches;The feature that do not predicted for progressive of each preset kind data for extracting each user to be predicted Data are the characteristic extracted in current gender predicted time window;
The fusion full dose prediction result and the component prediction result obtain the final gender of each user to be predicted After prediction result, further includes:
For each user to be predicted, by the corresponding final gender prediction result of current predictive time window and before at least one The corresponding final gender prediction result of a predicted time window is matched, and adjusts the final of the user to be predicted according to matching result Gender prediction's result.
9. user gender prediction method as claimed in claim 8, which is characterized in that described that current predictive time window is corresponding Final gender prediction result final gender prediction result corresponding at least one predicted time window before is matched are as follows: will be worked as The corresponding final gender prediction result of preceding predicted time window final gender prediction result corresponding with upper predicted time window carries out Matching;
The final gender prediction result for adjusting the user to be predicted according to matching result includes:
If the matching result is that gender is identical, enhance the final gender prediction result of the user to be predicted;
If the matching result be gender on the contrary, if weaken the final gender prediction result of the user to be predicted;
Otherwise, keep the final gender prediction result intensity of the user to be predicted constant;
Or,
It is described that the corresponding final gender prediction result of current predictive time window is corresponding at least one predicted time window before Final gender prediction result is matched are as follows: by the corresponding final gender prediction result of current predictive time window with it is all before pre- The corresponding final gender prediction result of time window is surveyed to be matched;
The final gender prediction result that the user to be predicted is adjusted according to matching result includes: by user to be predicted described The most final gender prediction result of frequency of occurrence is as the current newest lastness of the user to be predicted in each predicted time window Other prediction result.
10. a kind of user gender prediction device, comprising:
Data extraction module, the feature that do not predicted for progressive of each preset kind data for extracting each user to be predicted Data;
Model processing modules, the characteristic for extracting the data extraction module substitute into preset full dose user in predicting mould Type obtains the full dose gender prediction of each user to be predicted as a result, and by each characteristic of at least one preset kind data The corresponding default component user in predicting model of each characteristic for substituting into the preset kind data respectively obtains at least one Component gender prediction's result of the user to be predicted;The full dose user in predicting model is to be used in the training process according to training The characteristic training of each preset kind data at family obtains, and the component user in predicting model is root in the training process It is obtained according to each characteristic training of at least one preset kind data described in the training user;
It predicts processing module, obtains each use to be predicted for merging the full dose prediction result and the component prediction result The final gender prediction result at family.
11. a kind of user gender prediction equipment, comprising: processor, memory and communication bus;
The communication bus is for realizing the connection communication between the processor and the memory;
The processor is used to execute the user gender prediction program stored in the memory to realize that claim 1-9 such as appoints Step in user's gender prediction's method described in one.
CN201710507593.4A 2017-06-28 2017-06-28 User's gender prediction's method, device and equipment Withdrawn CN109145932A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710507593.4A CN109145932A (en) 2017-06-28 2017-06-28 User's gender prediction's method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710507593.4A CN109145932A (en) 2017-06-28 2017-06-28 User's gender prediction's method, device and equipment

Publications (1)

Publication Number Publication Date
CN109145932A true CN109145932A (en) 2019-01-04

Family

ID=64803046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710507593.4A Withdrawn CN109145932A (en) 2017-06-28 2017-06-28 User's gender prediction's method, device and equipment

Country Status (1)

Country Link
CN (1) CN109145932A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143441A (en) * 2019-12-30 2020-05-12 北京每日优鲜电子商务有限公司 Gender determination method, device, equipment and storage medium
CN112825178A (en) * 2019-11-21 2021-05-21 北京沃东天骏信息技术有限公司 Method and device for predicting user gender portrait
CN113806656A (en) * 2020-06-17 2021-12-17 华为技术有限公司 Method, apparatus and computer readable medium for determining characteristics of a user
US11694059B2 (en) 2019-09-12 2023-07-04 Samsung Electronics Co., Ltd. Method, apparatus, electronic device and storage medium for predicting user attribute

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262440A (en) * 2010-06-11 2011-11-30 微软公司 Multi-modal gender recognition
CN104331404A (en) * 2013-07-22 2015-02-04 中国科学院深圳先进技术研究院 A user behavior predicting method and device based on net surfing data of a user's cell phone
CN106484762A (en) * 2015-08-27 2017-03-08 优像数位媒体科技股份有限公司 Method for predicting gender by using webpage browsing behavior
CN106528745A (en) * 2016-10-27 2017-03-22 北京奇虎科技有限公司 Method and device for recommending resources on mobile terminal, and mobile terminal
CN106682686A (en) * 2016-12-09 2017-05-17 北京拓明科技有限公司 User gender prediction method based on mobile phone Internet-surfing behavior
CN106897727A (en) * 2015-12-21 2017-06-27 百度在线网络技术(北京)有限公司 A kind of user's gender identification method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262440A (en) * 2010-06-11 2011-11-30 微软公司 Multi-modal gender recognition
CN104331404A (en) * 2013-07-22 2015-02-04 中国科学院深圳先进技术研究院 A user behavior predicting method and device based on net surfing data of a user's cell phone
CN106484762A (en) * 2015-08-27 2017-03-08 优像数位媒体科技股份有限公司 Method for predicting gender by using webpage browsing behavior
CN106897727A (en) * 2015-12-21 2017-06-27 百度在线网络技术(北京)有限公司 A kind of user's gender identification method and device
CN106528745A (en) * 2016-10-27 2017-03-22 北京奇虎科技有限公司 Method and device for recommending resources on mobile terminal, and mobile terminal
CN106682686A (en) * 2016-12-09 2017-05-17 北京拓明科技有限公司 User gender prediction method based on mobile phone Internet-surfing behavior

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李源昊等: "面向移动社会网络的用户年龄与性别特征识别", 《计算机应用》 *
马莉婷: "数据挖掘技术在客户精细营销预测模型中的应用――以移动通信业务为例", 《闽江学院学报》 *
黄关维: "一种用于说话人性别鉴定的混合算法", 《现代计算机(专业版)》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11694059B2 (en) 2019-09-12 2023-07-04 Samsung Electronics Co., Ltd. Method, apparatus, electronic device and storage medium for predicting user attribute
CN112825178A (en) * 2019-11-21 2021-05-21 北京沃东天骏信息技术有限公司 Method and device for predicting user gender portrait
CN111143441A (en) * 2019-12-30 2020-05-12 北京每日优鲜电子商务有限公司 Gender determination method, device, equipment and storage medium
CN113806656A (en) * 2020-06-17 2021-12-17 华为技术有限公司 Method, apparatus and computer readable medium for determining characteristics of a user
CN113806656B (en) * 2020-06-17 2024-04-26 华为技术有限公司 Method, apparatus and computer readable medium for determining characteristics of a user

Similar Documents

Publication Publication Date Title
EP4080889A1 (en) Anchor information pushing method and apparatus, computer device, and storage medium
CN109145932A (en) User's gender prediction's method, device and equipment
CN109670940A (en) Credit Risk Assessment Model generation method and relevant device based on machine learning
CN104573304A (en) User property state assessment method based on information entropy and cluster grouping
CN110417607B (en) Flow prediction method, device and equipment
CN107517481A (en) A kind of load of base station balanced management method and system
CN109857935A (en) A kind of information recommendation method and device
CN108960505A (en) Quantitative estimation method, device, system and the storage medium of personal finance credit
CN109756632B (en) Fraud telephone analysis method based on multidimensional time sequence
CN109670962A (en) Finance product method for pushing, device, equipment and storage medium based on big data
CN108629379A (en) A kind of individual's reference appraisal procedure and system
CN106874416A (en) Seniority among brothers and sisters list generation method and ranking list single generating device
CN112633962A (en) Service recommendation method and device, computer equipment and storage medium
CN112785005B (en) Multi-objective task assistant decision-making method and device, computer equipment and medium
CN110288350A (en) User's Value Prediction Methods, device, equipment and storage medium
CN107832291A (en) Client service method, electronic installation and the storage medium of man-machine collaboration
CN109428760B (en) User credit evaluation method based on operator data
CN110415036A (en) Determination method, apparatus, computer equipment and the storage medium of user gradation
CN105790866B (en) Base station rankings method and device
CN112685639A (en) Activity recommendation method and device, computer equipment and storage medium
CN109754135A (en) Behavior of credit data processing method, device, storage medium and computer equipment
CN112200375B (en) Prediction model generation method, prediction model generation device, and computer-readable medium
CN108596120A (en) A kind of object detection method and device based on deep learning
CN107016460A (en) User changes planes Forecasting Methodology and device
CN116976739A (en) Cloud computing product demand priority ordering method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190104