CN107886366A - Generation method, sex fill method, terminal and the storage medium of Gender Classification model - Google Patents
Generation method, sex fill method, terminal and the storage medium of Gender Classification model Download PDFInfo
- Publication number
- CN107886366A CN107886366A CN201711176286.9A CN201711176286A CN107886366A CN 107886366 A CN107886366 A CN 107886366A CN 201711176286 A CN201711176286 A CN 201711176286A CN 107886366 A CN107886366 A CN 107886366A
- Authority
- CN
- China
- Prior art keywords
- user
- business
- gender
- sex
- data collection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present invention discloses the generation method of sex disaggregated model, sex fill method, terminal and storage medium, and the generation method of its Gender Classification model includes:Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained to generate objective matrix table;What the gender data collection in objective matrix table filtered out multiple business treats training user, treats that training user is included in multiple pre-set business and gathers containing gender information and gender information's identical user;It will treat that gender data collection and behavioral data collection of the training user in the objective matrix table are converted to the characteristic data set of training Gender Classification model, wherein characteristic data set includes training dataset and test data set;According to training dataset, Gender Classification model is trained using decision Tree algorithms;According to algorithm tuning parameter and test data set cross validation Gender Classification model, optimal sex disaggregated model is obtained.
Description
Technical field
The present invention relates to electronic technology field, more particularly to a kind of generation method of Gender Classification model, sex filling side
Method, terminal and storage medium.
Background technology
At present, with Internet technology development and ecommerce popularization, and high performance intelligent mobile terminal by
Gradually popularize, mobile Internet is that user has made a brand-new communication environment, can greatly meet the differentiation need of user
Ask, Mobile solution is also enriched constantly with astonishing speed.Wherein, ecommerce is typically referred in the extensive business in all parts of the world
In industry trade activity, under open mobile internet environment, based on browser/server application mode, on both parties' line
Carry out various commercial activities.However, service mode under traditional wire is different from, and on line in process of exchange, of the trade company to user
People's Back ground Information is had little understanding, and causing trade company to understand the demand of user has certain limitation, easily cause some advertisements and
Situations such as marketing measures such as the invalid dispensing or advertisement of promotion are difficult to the set goal.Therefore, the base of research prediction user
This attribute information and historical behavior are highly desirable to the demand of precise positioning user, to provide the user more preferable personalization
Service.Wherein, the gender information of the user index most basic as demographics, it is most heavy in structure user's portrait label system
One of composition wanted.Gender information combines other base attributes of user and the historical behavior of user is usually used in analysis and sees clearly user
Hobby and individual demand, crowd orient in, sex is one of most important screening conditions.But user base category
Property information is considered as individual privacy information, registration of the user in each platform such as wechat, Sina as sex, age etc. by user
During all selectively can not fill in this kind of individual privacy information, therefore, many network application companies be difficult obtain user
The primary attribute information such as sex, age.
In the prior art, the gender information for obtaining user relies on substantially gender information that user filled in or with certain
The data of individual single business are modeled gender information both modes that prediction obtains.For example some network application companies exist
It can Qiang Zhiyaoqiu that user fills in or allowed user selectively to fill in gender information during user's registration individual's account, but this kind of individual is hidden
Personal letter breath is more sensitive for a user.Therefore, for the log-on message filled in of compulsive requirement, the experience effect of user compared with
Difference, focus on the user of privacy for part or even can easily cause the dislike of user, furthermore user may also deliberately fill in mistake
Information, these false information have negative interaction to the personalized recommendation of user.And actual conditions are most of users is noting
Primary attribute information such as gender information of correlation etc. is not all filled in during volume.App of the prior art also by obtaining user installation
The single data modeling such as Apply Names or series installation bag name list predicts sex, and will predict the sex come and make
For the sex label that user is final.However, if uniquely rely on user establishes model in single operational behavioral data, carry out
Gender prediction, easily cause the degree of accuracy for predicting the sex come relatively low, even if the degree of accuracy of gender prediction is higher, the row of collection
It is the user group of the single business for data, the coverage rate of user group is narrower, and the sex of the user of other business is still
Vacancy.
Therefore, to solve to fill in sex by user in the prior art or be modeled with some single business datum pre-
The problem of sex is present is surveyed, it is necessary to a kind of APP classifications and its history that can be used with the data and combination user of multiple business
Behavioral data establishes Gender Classification model, so as to the sex label for the model prediction user that classifies by sex, and then can be to multiple
The sex of all users of business is effectively filled.
The content of the invention
The embodiment of the present invention provides a kind of generation method of Gender Classification model, sex fill method, terminal and storage and is situated between
Matter, training dataset can be used as by the gender data collection and behavioral data collection of the higher user of confidence level and trains sex point
Class model, and optimal sex disaggregated model is gone out by algorithm tuning parameter and test data set cross validation;And can be by this
The sex for not having the user of gender information or the relatively low user of confidence level in multiple business is filled in the prediction of optimal sex disaggregated model
Label, improve the overall accuracy of the sex label that all users finally judge in platform.
In a first aspect, the embodiments of the invention provide a kind of generation method of Gender Classification model, this method includes:Obtain
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are to generate objective matrix table;
The gender data collection in the objective matrix table filters out the training user that treats of the multiple business, described to wait to train
User is included in multiple pre-set business to be gathered containing gender information and gender information's identical user;Treat that training is used by described
Gender data collection and behavioral data collection of the family in the objective matrix table are converted to the characteristic of training Gender Classification model
Collection, wherein the characteristic data set includes training dataset and test data set;According to the training dataset, using decision tree
Algorithm for Training goes out the Gender Classification model;According to sex described in algorithm tuning parameter and the test data set cross validation point
Class model, obtain optimal sex disaggregated model.
Second aspect, the embodiments of the invention provide a kind of sex fill method, this method includes:User is obtained multiple
Gender data collection in business and its behavioral data collection in multiple application programs are to generate objective matrix table;According to the property
Other data set filters out the user to be filled of the multiple business and user to be corrected, and the user to be filled is included in described more
The user for not having gender information in individual business gathers, and the user to be corrected, which is included in the multiple business, contains gender information
And gender information's difference respectively occupies user's set of half;Each use to be filled is obtained according to the behavioral data collection
The number of clicks of family and the user to be corrected in each application program is as characteristic vector;According to the feature to
Amount, the sex of the user to be filled is predicted and by prediction result using optimal sex disaggregated model as described in relation to the first aspect
It is filled;According to the characteristic vector, using user to be corrected described in the optimal sex disaggregated model prediction sex simultaneously
With reference to the gender data collection of the user to be corrected, its mode is taken the final sex of user to be corrected and to be carried out as described in
Filling.
The third aspect, the embodiments of the invention provide a kind of terminal, the terminal includes being used to perform above-mentioned first and second
The unit of the method for aspect.
Fourth aspect, the embodiments of the invention provide another terminal, including processor, input equipment, output equipment and
Memory, the processor, input equipment, output equipment and memory are connected with each other, wherein, the memory is used to store branch
The computer program that terminal performs the above method is held, the computer program includes programmed instruction, and the processor is configured to use
In calling described program instruction, the method for performing above-mentioned first and second aspect.
5th aspect, the embodiments of the invention provide a kind of storage medium, the computer-readable storage medium is stored with calculating
Machine program, the computer program include programmed instruction, and described program instruction when being executed by a processor holds the processor
The method of the above-mentioned first and second aspect of row.
The embodiment of the present invention provides a kind of generation method of Gender Classification model, sex fill method, terminal and storage and is situated between
Matter, it is by user data with multiple business, gender data collection and its APP that uses with reference to the higher user of confidence level
Classification and its click behavioral data collection include training dataset and test data as characteristic data set, the characteristic data set
Collection, according to decision Tree algorithms, trains Gender Classification model, according to algorithm tuning parameter and the test using training dataset
Data set cross validation obtains optimal sex disaggregated model, and then to there is no the user of gender information and sex to believe in multiple business
The sex of the relatively low user of breath confidence level is filled, and can effectively improve the sex label that all users in platform finally judge
Overall accuracy, the sex label finally judged can according to the preference of the user of different sexes it is related to demand progress
Personalized service recommendation provides support, meanwhile, the accuracy rate estimated to the marketing of crowd's precise positioning and clicking rate plays important
Effect.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, it is required in being described below to embodiment to use
Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the present invention, general for this area
For logical technical staff, on the premise of not paying creative work, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is a kind of schematic flow diagram of the generation method of Gender Classification model provided in an embodiment of the present invention;
Fig. 2 be Gender Classification model shown in Fig. 1 generation method in step S11 specific schematic flow diagram;
Fig. 3 is the specific schematic flow diagram of step S11b in step S11 shown in Fig. 2;
Fig. 4 is a kind of schematic flow diagram of sex fill method provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic flow diagram for sex fill method that first embodiment of the invention provides;
Fig. 6 is a kind of schematic flow diagram for sex fill method that second embodiment of the invention provides;
Fig. 7 is a kind of structural representation of the terminal corresponding with Fig. 1 methods provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic block diagram of the first acquisition unit of terminal shown in Fig. 7;
Fig. 9 is a kind of structural representation of the terminal corresponding with Fig. 4 methods provided in an embodiment of the present invention;
Figure 10 is a kind of structural representation of the terminal corresponding with Fig. 5 methods provided in an embodiment of the present invention;
Figure 11 is a kind of structural representation of the terminal corresponding with Fig. 6 methods provided in an embodiment of the present invention;
Figure 12 is another terminal schematic block diagram provided in an embodiment of the present invention;
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is part of the embodiment of the present invention, rather than whole embodiments.Based on this hair
Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made
Example, belongs to the scope of protection of the invention.
It should be appreciated that ought be in this specification and in the appended claims in use, term " comprising " and "comprising" instruction
Described feature, entirety, step, operation, the presence of element and/or component, but it is not precluded from one or more of the other feature, whole
Body, step, operation, element, component and/or its presence or addition for gathering.
It is also understood that the term used in this description of the invention is merely for the sake of the mesh for describing specific embodiment
And be not intended to limit the present invention.As used in description of the invention and appended claims, unless on
Other situations are hereafter clearly indicated, otherwise " one " of singulative, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in description of the invention and appended claims is
Refer to any combinations of one or more of the associated item listed and be possible to combine, and including these combinations.
In the specific implementation, the terminal described in the embodiment of the present invention is including but not limited to such as with touch sensitive surface
The mobile phone, laptop computer or tablet PC of (for example, touch-screen display and/or touch pad) etc it is other just
Portable device.It is to be further understood that in certain embodiments, the equipment is not portable communication device, but with tactile
Touch the desktop computer of sensing surface (for example, touch-screen display and/or touch pad).
In discussion below, the terminal including display and touch sensitive surface is described.It is, however, to be understood that
It is that terminal can include one or more of the other physical user-interface device of such as physical keyboard, mouse and/or control-rod.
Terminal supports various application programs, such as one or more of following:Drawing application program, demonstration application journey
Sequence, word-processing application, website create application program, disk imprinting application program, spreadsheet applications, game application
Program, telephony application, videoconference application, email application, instant messaging applications, exercise
Support application program, photo management application program, digital camera application program, digital camera application program, web-browsing application
Program, digital music player application and/or video frequency player application program.
The various application programs that can be performed in terminal can use at least one public of such as touch sensitive surface
Physical user-interface device.It can adjust and/or change among applications and/or in corresponding application programs and touch sensitive table
The corresponding information shown in the one or more functions and terminal in face.So, the public physical structure of terminal is (for example, touch
Sensing surface) the various application programs with user interface directly perceived and transparent for a user can be supported.
Fig. 1 is referred to, it is a kind of exemplary flow of the generation method of Gender Classification model provided in an embodiment of the present invention
Figure, this method may operate in smart mobile phone (such as Android phone, IOS mobile phones), have the flat board electricity of mobile networking function
In the equipment such as brain, personal digital assistant (PDA), Intelligent worn device.As illustrated, this method may include step S11 to S15.
S11, obtain gender data collection of the user in multiple business and its behavioral data collection in multiple application programs
To generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, the business such as Yan Bao and reading, it is described more
Individual application program can apply generic according to corresponding divide of major function of each application program, and the applicating category can be with
Including multiple applicating categories such as browser, input method, news consulting, Web Community and amusement social activities, it is preferable that the application
Classification can include 478 application categorys such as browser, input method, news consulting, Web Community.In the present embodiment,
By technical limit spacing users such as crawler capturing external website data, inquiry internal database or purchase interfaces in multiple business as purchased
Machine, after sale, the gender information that in Yan Bao and reading etc. has reported and do not reported, while user can also be obtained and browsed
Behavioral data collection in multiple applicating categories such as device, input method, news consulting, Web Community, the gender data collection that will be got
Combined with behavioral data collection to generate objective matrix table.
S12, the gender data collection in the matrix table filter out the training user that treats of the multiple business, institute
State and treat that training user gathers in multiple pre-set business containing gender information and gender information's identical user.Specifically, root
Filtered out according to the gender data collection in the multiple pre-set business obtained in the objective matrix table containing gender information and
Gender information's identical treats training user, wherein, the multiple pre-set business can carry out self-defined setting according to user's request,
Gender information is more and the overall accuracy rate of sex is higher business can also be gone out to contain by system detectio as multiple pre-set business.
For example, in the present embodiment, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, wherein, if
Detect user in the multiple business in such as purchase machine, Yan Bao and reading three major businesses reported gender information more and
Gender information is consistent, then the purchase machine, Yan Bao and reading is set in advance as into the multiple pre-set business.It is feasible at some
Embodiment in, the business number of the multiple pre-set business at least accounts for the 75% of the total business number of the multiple business.
S13, the gender data collection for treating training user in the objective matrix table and behavioral data collection be converted to
The characteristic data set of Gender Classification model is trained, wherein the characteristic data set includes training dataset and test data set.Tool
Body, it is described to treat that training user includes containing gender information and gender information's identical user gathering in multiple pre-set business,
The confidence level of the gender data collection treated training user and reported in multiple pre-set business is higher.In the present embodiment, choose
Higher described of confidence level treats that the gender data collection of training user and behavioral data collection are converted to the spy of the Gender Classification model
Data set is levied, wherein, the behavioral data collection treats that training user uses the historical behavior of multiple application programs including described
Data set, for example, obtaining the number that user clicks on each application program in preset time;Each user will be obtained pre-
If the number of clicks in the time in each application program enters row vector conversion, so as to obtain each user in each application
Click feature vector under program.In addition, by preset ratio random division it is training dataset and survey by the characteristic data set
Try data set.In the present embodiment, it is the training dataset and institute in seven or three ratio random divisions by the characteristic data set
Test data set is stated, wherein, the training dataset accounts for 70 the percent of the characteristic data set, and the test data set accounts for
30 the percent of the characteristic data set.In some feasible embodiments, the preset ratio can be according to user's request
Carry out self-defined setting.
S14, according to the training dataset, train the Gender Classification model using decision Tree algorithms.Specifically, exist
In the present embodiment, the characteristic data set is randomly divided into the training dataset and the test data in seven or three ratios
Collection, i.e., described training dataset account for 70 the percent of the characteristic data set, and the test data set accounts for the characteristic
30 the percent of collection.Wherein, the training dataset is as the training set for training the Gender Classification model, by the training
Report gender information to show positive sample of the user of male as training pattern in data set, show the user of women as training
The negative sample of model.The decision Tree algorithms can include CART algorithms (Classification And Regression
Tree Algorithm), ID3 algorithms, C4.5 algorithms and random forests algorithm (Random Forest Algorithm).
In the present embodiment, the gender data collection and its behavioral data collection of positive negative sample are obtained, using random forests algorithm, utilizes more trees
The gender data collection and its behavioral data collection of the positive negative sample are trained, so as to train the Gender Classification model.
Wherein, random forests algorithm refers to setting a kind of grader for being trained sample and predicting, the class of the output using more
It is not to be determined by the mode of the classification of each tree output.Random forests algorithm can handle very high-dimensional data, and it goes without doing
Feature selecting, its character subset are randomly selected, i.e., in each node, randomly select a subset of all features, be used for
Calculate optimal segmentation mode.Random forests algorithm not only for unbalanced data set for, it can be with balance error, Er Qieru
Fruit has substantial portion of missing features, can still maintain the degree of accuracy of its Algorithm for Training model.
S15, the Gender Classification model according to algorithm tuning parameter and the test data set cross validation, are obtained optimal
Gender Classification model.Specifically, the algorithm tuning parameter can include:A number (numTrees) for random forest tree, feature
Subset selection strategy (Feature Subset Strategy), Attributions selection measurement (0impurity), the depth capacity of tree
The parameter such as (max Depth) and the Breadth Maximum (max Bins) of tree.Wherein, the class number is without default value, and the parameter
Tuning scope includes [20,50,90,100,150,160,210,220];The feature subset selection strategy is without default value, the ginseng
Several tuning scopes include:Auto, sqrt, log2, one third;The Attributions selection measurement is without default value, the tune of the parameter
It is excellent to include purity (gini) and information gain (entropy);The depth capacity of the tree is without default value, the arameter optimization scope bag
Include [5,10,20,25,30];The Breadth Maximum of the tree without default value, the arameter optimization scope include [50,100,200,300,
400,500].In the present embodiment, different tuning parameters is set, and then the training dataset of random division is instructed
Practice, and cross validation carried out to obtain optimal sex disaggregated model using Gender Classification model described in the test data set pair,
The evaluation index of the cross validation includes:Precision ((Precision), recall rate (Recall) and overall accuracy rate
(Accuracy).For two categorizing systems, the situation of Gender Classification model prediction filling has 4 kinds, wherein, this 4 kinds
Including:User is male and predicts that user's sex result is male, and user is male but predicts that user's sex result is
Women, user's sex is women but predicts that user's sex result is male, and user's sex is women and predicts user's property
Other result is women.Wherein, the definition of accuracy in the evaluation index of the cross validation is the correctly predicted result of the category
Total number of users and be predicted as the category total number of users ratio, by taking male's sample of test data set as an example, user for man
Property classification precision=user be male and predict total number of users that user's sex is male/(user is male and prediction should
User's sex is that total number of users+user of male is the total number of users that women still predicts that user's sex is male).It is described to call together
The rate of returning is defined as the total number of persons of the correctly predicted result of the category and the ratio of category effective strength, with the male of test data set
Exemplified by sample, user be recall rate=user of male's classification be male and predict total number of users that user's sex is male/
(user is male and predicts that total number of users+user that user's sex is male is male but predicts that user's sex is women
Total number of users).The overall accuracy rate of the Gender Classification model is defined as correctly predicted number and actual prediction number
Ratio, the overall accuracy rate=(user be male and predict total number of users+user that user's sex is male be women and
Predict the total number of users that user's sex is women)/total number of users.In the present embodiment, by the training dataset and institute
The M-F of test data set is stated 5:In the range of 1, therefore oversampling or sub- sampling processing are not carried out to it, in model only
On the premise of fitting, model evaluation index is also mainly defined by overall accuracy rate.In the present embodiment, by algorithm tuning parameter
The Gender Classification model for training to obtain according to the training dataset with the test data set cross validation, and then obtain optimal
Gender Classification model.Preferably, the overall accuracy rate of the optimal sex disaggregated model is at least up to 89.41%.
In the above-described embodiments, by integrating gender data collection of the user in multiple business and its in multiple application programs
In behavioral data collection, and then according to the gender data collection filter out confidence level it is higher treat training user, wait to instruct by described
Practice the gender data collection of user and the behavioral data collection is converted to the characteristic data set of training pattern, by using decision-making
The higher user characteristic data collection of tree algorithm training confidence level draws the Gender Classification model, the prediction accuracy of its model compared with
Height, and model credibility is higher, nicety of grading is high, and according to algorithm tuning parameter and the test data set cross validation
Gender Classification model, so as to obtain optimal sex disaggregated model.Therefore, the Gender Classification model is higher by using confidence level
Training user is treated, and the data based on multiple business are modeled, its user's coverage rate is wider, the degree of accuracy of gender prediction's result
It is higher.
Refer to Fig. 2, its be Gender Classification model shown in Fig. 1 generation method in step S11 specific schematic flow diagram.
As illustrated, step S11 includes S11a-S11b.
S11a, obtain gender data collection of the user in multiple business and its behavioral data collection in multiple application programs
To generate original matrix table, wherein, the gender data collection includes user the gender information in each business, the row
Click on the number of each application program in preset time including user for data set, the row of the original matrix table is to use
Family ID number, row are gender information of the corresponding user in each business and its each application of click in preset time
The number of program, wherein, the gender data collection and behavioral data collection are associated and be used as characteristic by the ID number of user
According to collection.Specifically, the multiple business can include purchase machine, after sale, the business such as Yan Bao and reading, in the present embodiment, can be with
By technical limit spacing users such as crawler capturing external website data, inquiry internal database or purchase interfaces in multiple business as purchased
Machine, after sale, in Yan Bao and reading etc. reported gender information and the gender data collection and ID users without gender information;
The historical behavior data set for accessing the application program is browsed due to can accordingly be produced when user accesses some application program, at this
In embodiment, the behavioral data collection includes the number that user clicks on each application program under mobile internet environment,
An internet log will be accordingly produced because user clicks on an application program, counting user is answered in preset time all
With the internet log of program can counting user click on the corresponding numbers of all application programs, then all users are carried out similar
Statistics, so as to obtain the behavioral data collection that all users access all application programs.Wherein, the preset time can be random
A period of time of setting, self-defined setting can also be carried out according to the demand data of model training;According to the gender data collection
The original matrix table can be generated with behavioral data collection, wherein the row of the original matrix table is ID users, row are corresponding every
The number of the gender information and its click each application program of the individual user in each business.Specifically, to obtaining
The obtained gender data collection of ID users, user in multiple business and its behavioral data collection in multiple application programs etc.
Data carry out collecting arrangement to generate original matrix table.In the present embodiment, the original matrix table can be as shown in table 1.
Table one
In original matrix table as shown in table 1, the line direction in the original matrix table correspondingly includes each ID users,
Column direction can include gender data collection of each user in multiple business and its access in preset time each to apply journey
Behavioral data collection caused by sequence, in the present embodiment, the behavioral data collection are clicked on including user under mobile internet environment
The number of each application program, in the present embodiment, the behavioral data collection include user under mobile internet environment
Click on the number of each application program.
S11b, data cleansing is carried out to the original matrix table to generate the objective matrix table.Specifically, in this implementation
In example, as shown in figure 3, Fig. 3 is the specific schematic flow diagram of step S11b in step S11 shown in Fig. 2, to the original matrix table
Data cleansing is carried out, specific steps S11b includes S11b1-S11b2:
Miss rate is more than 90% application program in S11b1, the identification original matrix table.Specifically, city at present
The application program of available download is very more on face, and the application program that different user installations uses also is not quite similar, and removes
Some mobile phones are conventional to apply such as wechat, Alipay application program, the application programs of more minority's classes is cut etc. such as U.S. shaddock, hundred words
Installation and frequency of use are also what is varied with each individual, therefore, 90% null value are had more than in the original matrix table.In the present embodiment
In, the classification of the application program obtained in the original matrix table can include browser, input method, news consulting, net
The application category of network community etc. 478, therefore, it is necessary to identify application journey of the miss rate more than 90% in the original matrix table
Sequence.
S11b2, delete the application program identified from the original matrix table and generate the objective matrix table.
In the above-described embodiments, by integrating gender data collection of the user in multiple business and its applying journey multiple
Behavioral data collection in sequence arranges cleaning to generate objective matrix table, wherein, the gender data collection includes user in each institute
The gender information for having reported in business or not reported is stated, the behavioral data collection is clicked on each including user in preset time
The number of application program, due to being available for the application program of installation to be unequal to its number, the application program that different people uses on the market
Differ, and then need to carry out data cleansing to original matrix table, remove the application program that miss rate is up to 90%.Therefore, it is right
The original matrix table carries out data cleansing processing, can reduce the complexity of algorithm process, improves and trains the Gender Classification mould
The training effectiveness of type and the degree of accuracy.
Fig. 4 is referred to, it is a kind of schematic flow diagram of sex fill method provided in an embodiment of the present invention, and this method can
With tablet personal computer, the individual digital for operating in smart mobile phone (such as Android phone, IOS mobile phones), there is communication interaction function
In the equipment such as assistant (PDA), Intelligent worn device.As illustrated, this method may include step S21 to S25.
S21, obtain gender data collection of the user in multiple business and its behavioral data collection in multiple application programs
To generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, the business such as Yan Bao and reading, it is described more
Individual application program can apply generic according to corresponding divide of major function of each application program, and the applicating category can be with
Including multiple applicating categories such as browser, input method, news consulting, Web Community and amusement social activities, it is preferable that the application
Classification can include 478 application categorys such as browser, input method, news consulting, Web Community.In the present embodiment,
By technical limit spacing users such as crawler capturing external website data, inquiry internal database or purchase interfaces in multiple business as purchased
Machine, after sale, the gender information that in Yan Bao and reading etc. has reported and do not reported, while user can also be obtained and browsed
Behavioral data collection in multiple applicating categories such as device, input method, news consulting, Web Community, the gender data collection that will be got
Combined with behavioral data collection to generate objective matrix table.
S22, the user to be filled of the multiple business and user to be corrected filtered out according to the gender data collection, it is described
User to be filled is included in the user for not having gender information in the multiple business and gathered, and the user to be corrected is included in described
Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in multiple business.Specifically
Ground, in the present embodiment, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, wherein, if inspection
Measure user and do not filled in the multiple business and report gender information, then screen the user group as the multiple
The user to be filled of business;If detecting, user partly believes in the multiple business containing gender information and containing different sexes
The business of breath respectively occupies half, i.e., in purchase machine, after sale, some user may simply in purchase machine in the four big business such as Yan Bao and reading
Business and after sale business, which are filled in, has reported gender information, and fills in gender information and differ, such as certain user in machine business is purchased
It is male to fill in the gender information reported, and it is women that the gender information reported is filled in business after sale;May also four kinds all fill out
Write but the number of services containing different sexes information respectively occupies half, such as certain user fills out in purchase machine business and reading business
The gender information for writing report is women, and it is male that the gender information reported is filled in business after sale and Yan Bao business, then should
User group filters out the user to be corrected as the multiple business.
S23, each number of clicks of the user in each application program obtained according to the behavioral data collection
As characteristic vector.Specifically, the behavioral data collection clicks on each institute including each user of statistics in preset time
The number in application program is stated, and statistics is obtained into number of clicks of each user under each application program and enters row vector
Conversion, so as to obtain click feature vector of each user under each application program.
S24, according to the characteristic vector, using the optimal sex disaggregated model obtained by the method as described in Fig. 1-3 come
Predict the sex of the user to be filled and be filled prediction result.Specifically, by using the optimal Gender Classification
Model, the characteristic vector data collection of each user to be filled is predicted, show that each user's to be filled is pre-
Sex is surveyed, the sex result of the prediction of each user to be filled is filled.
S25, according to the characteristic vector, using the sex of user to be corrected described in the optimal sex disaggregated model prediction
And with reference to the gender data collection of the user to be corrected, its mode final sex of user to be corrected as described in is taken to go forward side by side
Row filling.Specifically, by using the optimal sex disaggregated model, to the characteristic vector data of each user to be corrected
Collection is trained prediction, the prediction sex of each user to be corrected is drawn, by the prediction of each user to be corrected
Sex combines with its gender data collection in multiple business, takes the sex result of its mode to fill and is used as the user final
Sex, for example, when the sex that some user in predicting to be corrected is drawn is women, what it was reported in purchase machine business and reading business
Gender information is male, is women in the gender information for prolonging guarantor's business and being reported in business after sale, then by prediction result and sex
Data concentrating takes its mode as the final sex of some user to be corrected and filled altogether, i.e., described user to be corrected
The sex finally filled is women.
The sex fill method that the present embodiment provides, can by obtain gender data collection of the user in multiple business and its
Behavioral data collection in multiple application programs, and filtered out according to the gender data collection and do not reported in multiple business
The user to be filled of gender information and the relatively low user to be corrected of confidence level, by the user to be filled and described treat that correction is used
Number of clicks conversion of the family in each application program is used as characteristic vector, is predicted by using optimal sex disaggregated model
Draw the sex of the user to be filled and user to be corrected and filled accordingly, its model credibility is higher, classification essence
Degree is high, and the prediction accuracy of model is higher.Therefore, the optimal sex disaggregated model can be based on multiple traffic forecasts use to be filled
Family and user to be corrected, its user's coverage rate is wider, can effectively fill the user group of unknown sex, correct in partial service
There is the sex for reporting gender information but the relatively low user group of confidence level, wherein, the correction reports sex in partial service
Information but the sex of the relatively low user group of confidence level are also a kind of mode of user's sex filling.This programme can be more accurate pre-
The sex label filled and do not have the user of gender information or the relatively low user of confidence level in multiple business is surveyed, improves in platform and owns
The overall accuracy for the sex label that user finally judges.
Refer to Fig. 5, its be first embodiment of the invention provide a kind of sex fill method schematic flow diagram, the party
Method may operate in smart mobile phone (such as Android phone, IOS mobile phones), the tablet personal computer with communication interaction function, individual
In the equipment such as digital assistants (PDA), Intelligent worn device.As illustrated, this method may include step S31 to S37.
S31, obtain gender data collection of the user in multiple business and its behavioral data collection in multiple application programs
To generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, the business such as Yan Bao and reading, it is described more
Individual application program can apply generic according to corresponding divide of major function of each application program, and the applicating category can be with
Including multiple applicating categories such as browser, input method, news consulting, Web Community and amusement social activities, it is preferable that the application
Classification can include 478 application categorys such as browser, input method, news consulting, Web Community.In the present embodiment,
By technical limit spacing users such as crawler capturing external website data, inquiry internal database or purchase interfaces in multiple business as purchased
Machine, after sale, the gender information that in Yan Bao and reading etc. has reported and do not reported, while user can also be obtained and browsed
Behavioral data collection in multiple applicating categories such as device, input method, news consulting, Web Community, the gender data collection that will be got
Combined with behavioral data collection to generate objective matrix table.
S32, the user to be filled of the multiple business and user to be corrected filtered out according to the gender data collection, it is described
User to be filled is included in the user for not having gender information in the multiple business and gathered, and the user to be corrected is included in described
Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in multiple business.Specifically
Ground, in the present embodiment, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, wherein, if inspection
Measure user and do not filled in the multiple business and report gender information, then screen the user group as the multiple
The user to be filled of business;If detecting, user partly believes in the multiple business containing gender information and containing different sexes
The business of breath respectively occupies half, i.e., in purchase machine, after sale, some user may simply in purchase machine in the four big business such as Yan Bao and reading
Business and after sale business, which are filled in, has reported gender information, and fills in gender information and differ, such as certain user in machine business is purchased
It is male to fill in the gender information reported, and it is women that the gender information reported is filled in business after sale;May also four kinds all fill out
Write but the number of services containing different sexes information respectively occupies half, such as certain user fills out in purchase machine business and reading business
The gender information for writing report is women, and it is male that the gender information reported is filled in business after sale and Yan Bao business, then should
User group filters out the user to be corrected as the multiple business.
S33, each number of clicks of the user in each application program obtained according to the behavioral data collection
As characteristic vector.Specifically, the behavioral data collection clicks on each institute including each user of statistics in preset time
The number in application program is stated, and statistics is obtained into number of clicks of each user under each application program and enters row vector
Conversion, so as to obtain click feature vector of each user under each application program.
S34, according to the characteristic vector, using the optimal sex disaggregated model obtained by the method as described in Fig. 1-3 come
Predict the sex of the user to be filled and be filled prediction result.Specifically, by using the optimal Gender Classification
Model, the characteristic vector data collection of each user to be filled is predicted, show that each user's to be filled is pre-
Sex is surveyed, the sex result of the prediction of each user to be filled is filled.
S35, obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If S36, the prediction result are women, the scoring of the prediction result is S1.Specifically, when the optimal sex
When disaggregated model training prediction sex result is women, the scoring of the prediction result is the optimal sex disaggregated model prediction
The user to be filled is the overall accuracy rate S1 of women.
If S37, the prediction result are male, the scoring of the prediction result is S2, S2=1-S1.Specifically, institute is worked as
When to state optimal sex disaggregated model training prediction sex result be male, the scoring of the prediction result is (1-S1).
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading do not report sex
The user to be filled of attribute, the sex label of the user is predicted by calling optimal sex disaggregated model, and by the gender prediction
Result judgement is accordingly scored for the final prediction result of the user and to the prediction result, wherein, when gender prediction ties
When fruit is male, appraisal result is (the optimal sex disaggregated models of 1- predict the overall accuracy rate S1 that the user is women), works as institute
When to state gender prediction's result be women, appraisal result is the overall accuracy rate that optimal sex disaggregated model predicts that the user is women
S1.The accuracy rate of the prediction result can be drawn by scoring.
Refer to Fig. 6, its be second embodiment of the invention provide a kind of sex fill method schematic flow diagram, the party
Method may operate in smart mobile phone (such as Android phone, IOS mobile phones), the tablet personal computer with communication interaction function, individual
In the equipment such as digital assistants (PDA), Intelligent worn device.As illustrated, this method may include step S41 to S49.
S41, obtain gender data collection of the user in multiple business and its behavioral data collection in multiple application programs
To generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, the business such as Yan Bao and reading, it is described more
Individual application program can apply generic according to corresponding divide of major function of each application program, and the applicating category can be with
Including multiple applicating categories such as browser, input method, news consulting, Web Community and amusement social activities, it is preferable that the application
Classification can include 478 application categorys such as browser, input method, news consulting, Web Community.In the present embodiment,
By technical limit spacing users such as crawler capturing external website data, inquiry internal database or purchase interfaces in multiple business as purchased
Machine, after sale, the gender information that in Yan Bao and reading etc. has reported and do not reported, while user can also be obtained and browsed
Behavioral data collection in multiple applicating categories such as device, input method, news consulting, Web Community, the gender data collection that will be got
Combined with behavioral data collection to generate objective matrix table.
S42, the user to be filled of the multiple business and user to be corrected filtered out according to the gender data collection, it is described
User to be filled is included in the user for not having gender information in the multiple business and gathered, and the user to be corrected is included in described
Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in multiple business.Specifically
Ground, in the present embodiment, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, wherein, if inspection
Measure user and do not filled in the multiple business and report gender information, then screen the user group as the multiple
The user to be filled of business;If detecting, user partly believes in the multiple business containing gender information and containing different sexes
The business of breath respectively occupies half, i.e., in purchase machine, after sale, some user may simply in purchase machine in the four big business such as Yan Bao and reading
Business and after sale business, which are filled in, has reported gender information, and fills in gender information and differ, such as certain user in machine business is purchased
It is male to fill in the gender information reported, and it is women that the gender information reported is filled in business after sale;May also four kinds all fill out
Write but the number of services containing different sexes information respectively occupies half, such as certain user fills out in purchase machine business and reading business
The gender information for writing report is women, and it is male that the gender information reported is filled in business after sale and Yan Bao business, then should
User group filters out the user to be corrected as the multiple business.
S43, each number of clicks of the user in each application program obtained according to the behavioral data collection
As characteristic vector.Specifically, the behavioral data collection clicks on each institute including each user of statistics in preset time
The number in application program is stated, and statistics is obtained into number of clicks of each user under each application program and enters row vector
Conversion, so as to obtain click feature vector of each user under each application program.
S44, according to the characteristic vector, using the sex of user to be corrected described in the optimal sex disaggregated model prediction
And with reference to the gender data collection of the user to be corrected, its mode final sex of user to be corrected as described in is taken to go forward side by side
Row filling.Specifically, by using the optimal sex disaggregated model, to the characteristic vector data of each user to be corrected
Collection is trained prediction, the prediction sex of each user to be corrected is drawn, by the prediction of each user to be corrected
Sex combines with its gender data collection in multiple business, takes the sex result of its mode to fill and is used as the user final
Sex, for example, when the sex that some user in predicting to be corrected is drawn is women, what it was reported in purchase machine business and reading business
Gender information is male, is women in the gender information for prolonging guarantor's business and being reported in business after sale, then by prediction result and sex
Data concentrating takes its mode as the final sex of some user to be corrected and filled altogether, i.e., described user to be corrected
The sex finally filled is women.
S45, the sex result that the user that gender information has been reported in each business is sampled to investigation and its
The gender information accordingly reported in each business is compared one by one.Specifically, in the multiple business on
Report the user of gender information to carry out random sampling, and in each business user is corresponded to it to sample survey results
Calculating is compared in the gender information reported, so as to obtain the sex entirety accuracy rate in each business.
S46, according to comparison result, the sex entirety accuracy rate z of each business is calculatedn.Specifically, in this reality
Apply in example, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, pass through investigation of sampling and compare knot
Fruit can obtain the purchase machine business, after sale business, prolong guarantor's business and read business sex entirety accuracy rate be respectively z1, z2,
z3, z4.In some feasible embodiments, if being investigated to the multiple business without sampling, the overall accuracy rate z1,
z2, z3, z4It is defaulted as 1.0.
S47, obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If S48, the final sex are women, the other scoring of lastness is S3, S3=(1-S1 × (1-z1)×
(1-z2)...×(1-zn)).Specifically, n value is the total business number of the multiple business;And gender information has been reported as man
Business corresponding to property and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero.For example,
The multiple business includes purchase machine business, reading business, prolongs guarantor's business and the after sale four big business such as business, so n=4, its
In, the purchase machine business, after sale business, prolong guarantor's business and read the sex entirety accuracy rate for having reported gender information of business
Respectively z1, z2, z3, z4.When the sex that some user in predicting to be corrected is drawn is women, its property reported in machine business is purchased
Other information is male, is women prolonging the gender information that guarantor's business reports, remaining two kinds of business does not report gender information, then z1
=0, z2=0, z4=0, therefore, the scoring of some user to be corrected is S3=(1-S1 × (1-z3)).For another example, when certain
The sex that individual user in predicting to be corrected is drawn is women, and its gender information reported in purchase machine business and reading business is man
Property, is women in the gender information for prolonging guarantor's business and being reported in business after sale, then n=4, z1=0, z2=0, therefore, it is described some
The scoring of user to be corrected is S3=(1-S1 × (1-z3)×(1-z4))。
If S49, the final sex are male, the other scoring S4 of lastness, S4=(1- (1-S1) × (1-z1)×
(1-z2)...×(1-zn)), wherein, n value is the total business number of the multiple business;And it is women to have reported gender information
Corresponding business and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero.Specifically,
The multiple business includes purchase machine business, reading business, prolongs guarantor's business and the after sale four big business such as business, so n=4, its
In, the purchase machine business, after sale business, prolong guarantor's business and read the sex entirety accuracy rate for having reported gender information of business
Respectively z1, z2, z3, z4.When the sex that some user in predicting to be corrected is drawn is male, its property reported in machine business is purchased
Other information is male, is women prolonging the gender information that guarantor's business reports, remaining two kinds of business does not report gender information, then z2
=0, z3=0, z4=0, therefore, the scoring of some user to be corrected is S4=(1- (1-S1) × (1-z1)).For another example,
When the sex that some user in predicting to be corrected is drawn is male, its gender information reported in purchase machine business and reading business is
Male, be women in the gender information for prolonging guarantor's business and being reported in business after sale, then z2=0, z3=0, therefore, it is described some treat
The scoring for correcting user is S4=(1- (1-S1) × (1-z1)×(1-z4))。
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading only reported two great causes
The gender attribute of business and gender information is inconsistent or reported gender attribute in four big business and gender information is inconsistent respectively accounts for
The relatively low user to be corrected of the confidence level of half, by calling optimal sex disaggregated model to predict the sex label of the user, and
With reference to the gender data collection of the user to be corrected, final sex of the sex result filling of mode as the user is taken,
And scored accordingly.The other accuracy rate of lastness of the prediction can be drawn by scoring.
Fig. 7 is referred to, it is a kind of structural representation of the terminal corresponding with Fig. 1 methods provided in an embodiment of the present invention
Figure.The terminal 100 can be smart mobile phone (such as Android phone, IOS mobile phones), tablet personal computer, personal digital assistant
(PDA), Intelligent worn device etc. has the equipment of mobile networking function.The terminal 100 includes first acquisition unit 110, first
Screening unit 120, data processing unit 130, training pattern unit 140, model tuning unit 150.
The first acquisition unit 110, for obtain gender data collection of the user in multiple business and its it is multiple should
With the behavioral data collection in program to generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, prolong
The business such as guarantor and reading, the multiple application program can be according to belonging to the corresponding division applications of major function of each application program
Classification, the applicating category can include browser, input method, and news consulting, Web Community and amusement social activity etc. are multiple should
With classification, it is preferable that the applicating category can include 478 applications such as browser, input method, news consulting, Web Community
Programs categories.In the present embodiment, by the first acquisition unit 110 obtain user in multiple business such as purchase machine, after sale, prolong
The gender information that has reported and do not reported in protecting and read etc., while user can also be obtained in browser, input method, newly
The behavioral data collection in multiple applicating categories such as consulting, Web Community is heard, the first acquisition unit 110 is additionally operable to get
Gender data collection and behavioral data collection combine to generate objective matrix table.
First screening unit 120, filtered out for the gender data collection in the matrix table described more
Individual business treats training user, it is described treat training user in multiple pre-set business containing gender information and gender information it is identical
User set.Specifically, first screening unit 120 is used for the multiple pre- in the objective matrix table according to obtaining
If the gender data collection in business filters out treats training user containing gender information and gender information's identical, wherein, it is described
Multiple pre-set business can carry out self-defined setting according to user's request, can also be gone out by system detectio more containing gender information
And the higher business of sex entirety accuracy rate is as multiple pre-set business.For example, in the present embodiment, the multiple business can be with
Including purchase machine, after sale, the four big business such as Yan Bao and reading, wherein, if detect user in the multiple business such as purchase machine, prolong
Protect and read etc. in three major businesses reported gender information more and gender information is consistent, then by the purchase machine, prolong guarantor
And read and be set in advance as the multiple pre-set business.In some feasible embodiments, the business of the multiple pre-set business
Number at least accounts for the 75% of the total business number of the multiple business.
The data processing unit 130, for by the gender data for treating training user in the objective matrix table
Collection and behavioral data collection are converted to the characteristic data set of training Gender Classification model, wherein the characteristic data set includes training number
According to collection and test data set.Specifically, it is described to treat that training user includes containing gender information and sex in multiple pre-set business
Information identical user gathers, the confidence level of the gender data collection treated training user and reported in multiple pre-set business compared with
It is high.In the present embodiment, the data processing unit 130 is used to choose the higher sex number for treating training user of confidence level
The characteristic data set of the Gender Classification model is converted to according to collection and behavioral data collection, wherein, the behavioral data collection includes institute
The historical behavior data set for treating that training user uses multiple application programs is stated, for example, obtaining user's point in preset time
Hit the number of each application program;Click of each user in preset time in each application program will be obtained
Number enters row vector conversion, so as to obtain click feature vector of each user under each application program.In addition, the data
It is training dataset and test data set by preset ratio random division that processing unit 130, which is additionally operable to the characteristic data set,.
In the present embodiment, it is the training dataset and the test data set in pseudo-ginseng ratio random division, wherein, the training
Data set accounts for 70 the percent of the characteristic data set, and the test data set accounts for 3 the percent of the characteristic data set
Ten.In some feasible embodiments, the preset ratio can carry out self-defined setting according to user's request.
The training pattern unit 140, for according to the training dataset, the property to be trained using decision Tree algorithms
Other disaggregated model.Specifically, in the present embodiment, the characteristic data set is randomly divided into the training number in seven or three ratios
According to collection and the test data set, i.e., described training dataset accounts for 70 the percent of the characteristic data set, the test number
30 the percent of the characteristic data set is accounted for according to collection.Wherein, the training dataset is as the training Gender Classification model
Training set, the training data is concentrated the user for reporting gender information to show male be used as the positive sample of training pattern, it is aobvious
Show negative sample of the user as training pattern of women.The decision Tree algorithms can include CART algorithms
(Classification And Regression Tree Algorithm), ID3 algorithms, C4.5 algorithms and random forest are calculated
Method (Random Forest Algorithm).In the present embodiment, the gender data collection and its behavioral data of positive negative sample are obtained
Collection, using random forests algorithm, the gender data collection and its behavioral data collection of the positive negative sample are instructed using more trees
Practice, so as to train the Gender Classification model.Wherein, random forests algorithm refers to being trained sample using more trees
And a kind of grader predicted, the classification of the output are determined by the mode of the classification of each tree output.Random forests algorithm
Very high-dimensional data can be handled, feature selecting that it goes without doing, its character subset is randomly selected, i.e., in each node, with
Machine chooses a subset of all features, for calculating optimal segmentation mode.Random forests algorithm is not only for unbalanced number
For collection, it can be with balance error, and if substantial portion of missing features, can still maintain its Algorithm for Training mould
The degree of accuracy of type.
The model tuning unit 150, for according to algorithm tuning parameter and the test data set cross validation
Gender Classification model, obtain optimal sex disaggregated model.Specifically, the algorithm tuning parameter can include:Random forest tree
A number (numTrees), feature subset selection strategy (Feature Subset Strategy), Attributions selection measurement
The parameter such as the depth capacity (max Depth) of (0impurity), tree and the Breadth Maximum (max Bins) of tree.Wherein, it is described
Class number is without default value, and the tuning scope of the parameter includes [20,50,90,100,150,160,210,220];Feature
Collection selection strategy includes without default value, the tuning scope of the parameter:Auto, sqrt, log2, one third;The Attributions selection
Measurement includes purity (gini) and information gain (entropy) without default value, the tuning of the parameter;The depth capacity of the tree without
Default value, the arameter optimization scope include [5,10,20,25,30];The Breadth Maximum of the tree is without default value, the arameter optimization
Scope includes [50,100,200,300,400,500].In the present embodiment, different tuning parameters is set, and then drawn to random
The training dataset divided is trained, and utilizes Gender Classification model described in the test data set pair to carry out cross validation
To obtain optimal sex disaggregated model, the evaluation index of the cross validation includes:Precision ((Precision), recall rate
And overall accuracy rate (Accuracy) (Recall).For two categorizing systems, Gender Classification model prediction filling
Situation have 4 kinds, wherein, this 4 kinds include:User is male and predicts that user's sex result is male, user be male but
It is to predict that user's sex result be women, user's sex is that women still predicts that user's sex result is male, Yong Huxing
Women and it Wei not predict that user's sex result is women.Wherein, the precision in the evaluation index of the cross validation is determined
Justice is the total number of users of the correctly predicted result of the category and is predicted as the ratio of the total number of users of the category, with test data set
Exemplified by male's sample, user is that precision=user of male's classification is male and predicts that the user that user's sex is male is total
Number/(user is male and predicts that total number of users+user that user's sex is male is that women still predicts that user's sex is
The total number of users of male).The recall rate is defined as total number of persons and the category effective strength of the category correctly predicted result
Ratio, by taking male's sample of test data set as an example, user is that recall rate=user of male's classification is male and predicts the use
Family sex is the total number of users/(user is male and predicts that total number of users+user that user's sex is male is male of male
But predict the total number of users that user's sex is women).The overall accuracy rate of the Gender Classification model is defined as correctly predicted
Number and actual prediction number ratio, the overall accuracy rate=(user is male and predicts that user's sex is male
Total number of users+user be women and predict the total number of users that user's sex is women)/total number of users.In the present embodiment,
Because the training dataset and the M-F of the test data set are 5:In the range of 1, thus it is not carried out oversampling or
The processing of person's sub- sampling, on the premise of model not over-fitting, model evaluation index is also mainly defined by overall accuracy rate.In this reality
Apply in example, the property for training to obtain according to the training dataset by algorithm tuning parameter and the test data set cross validation
Other disaggregated model, and then obtain optimal sex disaggregated model.Preferably, the overall accuracy rate of the optimal sex disaggregated model is extremely
Reach 89.41% less.
In the above-described embodiments, gender data of the user in multiple business is integrated by the first acquisition unit 110
Collection and its behavioral data collection in multiple application programs, and then according to first screening unit 120 filter out confidence level compared with
It is high to treat training user, so according to the data processing unit 130 by the gender data collection for treating training user and
The behavioral data collection is converted to the characteristic data set of training pattern, and the training pattern unit 140 is used to calculate using decision tree
The higher user characteristic data collection of method training confidence level obtains the Gender Classification model, and the prediction accuracy of its model is higher,
And model credibility is higher, nicety of grading is high, and according to model tuning unit 150 come adjustment algorithm tuning parameter, using described
Gender Classification model described in test data set cross validation, so as to obtain optimal sex disaggregated model.Therefore, the Gender Classification mould
Type by using confidence level it is higher treat training user, and the data based on multiple business are modeled, its user's coverage rate compared with
Extensively, the degree of accuracy of gender prediction's result is higher.
Fig. 8 is referred to, it is a kind of schematic block diagram of the first acquisition unit 110 of terminal shown in Fig. 7.In this implementation
In example, the first acquisition unit 110 is used to obtain gender data collection of the user in multiple business and its applies journey multiple
Behavioral data collection in sequence is to generate objective matrix table.Specifically, the first acquisition unit 110 obtains including matrix information battle array
Unit 111 and data cleansing unit 112, wherein the data cleansing unit 112 also includes data identification unit 112a sums
According to deletion unit 112b.
The matrix information acquiring unit 111, obtain gender data collection of the user in multiple business and its it is multiple should
With the behavioral data collection in program to generate original matrix table, wherein, the gender data collection includes user in each industry
Gender information in business, the behavioral data collection include the number that user clicks on each application program in preset time,
The row of the original matrix table is ID users, and row are gender information of the corresponding user in each business and its default
The number of each application program is clicked in time, wherein, by the ID number of user by the gender data collection and behavior number
Associated according to collection as characteristic data set.Specifically, the multiple business can include purchase machine, after sale, Yan Bao and reading etc.
Business, in the present embodiment, the technologies such as crawler capturing external website data, inquiry internal database or purchase interface can be passed through
Obtain user multiple business such as purchase machine, after sale, in Yan Bao and reading etc. reported gender information and without gender information
Gender data collection and ID users;Due to can accordingly be produced when user accesses some application program browse access this apply journey
The historical behavior data set of sequence, in the present embodiment, the behavioral data collection are clicked on including user under mobile internet environment
The number of each application program, an internet log will be accordingly produced because user clicks on an application program, is counted
User in preset time the internet log of all application programs can counting user click on the correspondences time of all application programs
Number, then all users are carried out with similar statistics, so as to obtain the behavioral data collection that all users access all application programs.Its
In, the preset time can be a period of time being randomly provided, and can also be made by oneself according to the demand data of model training
Justice is set;The original matrix table can be generated according to the gender data collection and behavioral data collection, wherein the original matrix table
Row be ID users, row are gender information of the corresponding each user in each business and its click on each application
The number of program.Specifically, to gender data collection in multiple business of the ID users, user that acquire and its multiple
The data such as the behavioral data collection in application program carry out collecting arrangement to generate original matrix table.In the present embodiment, the original
Beginning matrix table can be as shown in table 2.
Table two
In original matrix table as shown in table 2, the line direction in the original matrix table correspondingly includes each ID users,
Column direction can include gender data collection of each user in multiple business and its access in preset time each to apply journey
Behavioral data collection caused by sequence, in the present embodiment, the behavioral data collection are clicked on including user under mobile internet environment
The number of each application program, in the present embodiment, the behavioral data collection include user under mobile internet environment
Click on the number of each application program.
The data cleansing unit 112, for carrying out data cleansing to the matrix table.Specifically, the data cleansing
Unit 112 also includes data identification unit 112a and data delete unit 112b.
The data identification unit 112a, for identifying the application of the miss rate more than 90% in the original matrix table
Program.Specifically, the application program of available download is very more on the market at present, the application journey that different user installations uses
Sequence is also not quite similar, and it is conventional using such as wechat, Alipay application program, the application program of more minority's classes to remove some mobile phones
If installation that U.S. shaddock, hundred words are cut and frequency of use are also varying with each individual, therefore, 90% is had more than in the original matrix table
Null value.In the present embodiment, the classification of the application program obtained in the original matrix table can include browser, defeated
Enter 478 application categorys such as method, news consulting, Web Community, therefore, it is necessary to identify miss rate in the original matrix table
Application program more than 90%.
The data delete unit 112b, for deleting the application program identified from the original matrix table
Generate the objective matrix table.
In the above-described embodiments, by first acquisition unit 110 integrate gender data collection of the user in multiple business and
Its behavioral data collection in multiple application programs arranges cleaning to generate objective matrix table, wherein it is possible to pass through matrix information
The acquisition user of acquiring unit 111 is reported in the multiple business or the gender information not reported and its ID users and use
The number of each application program is clicked at family in preset time, due to being available for the application program of installation to be unequal to its number on the market, no
The application program that same people uses also differs, and then needs to carry out data to original matrix table by data cleansing unit 112
Cleaning, removes the application program that miss rate is up to 90%.Therefore, data cleansing processing is carried out to the original matrix table, can be with
The complexity of algorithm process is reduced, improves the training effectiveness for training the Gender Classification model and the degree of accuracy.
Fig. 9 is referred to, it is a kind of structural representation of the terminal corresponding with Fig. 4 methods provided in an embodiment of the present invention
Figure.The terminal 200 can be smart mobile phone (such as Android phone, IOS mobile phones), tablet personal computer, personal digital assistant
(PDA), Intelligent worn device etc. has the equipment of mobile networking function.The terminal 200 includes second acquisition unit 210, second
Screening unit 220, fisrt feature processing unit 230, the first fills unit 240, the second fills unit 250.
The second acquisition unit 210, for obtain gender data collection of the user in multiple business and its it is multiple should
With the behavioral data collection in program to generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, prolong
The business such as guarantor and reading, the multiple application program can be according to belonging to the corresponding division applications of major function of each application program
Classification, the applicating category can include browser, input method, and news consulting, Web Community and amusement social activity etc. are multiple should
With classification, it is preferable that the applicating category can include 478 applications such as browser, input method, news consulting, Web Community
Programs categories.In the present embodiment, by second acquisition unit 210 obtain user in multiple business such as purchase machine, after sale, prolong guarantor with
And the gender information for having reported and not reported in reading etc., while user can also be obtained and consulted in browser, input method, news
Behavioral data collection in multiple applicating categories such as inquiry, Web Community, the gender data collection got and behavioral data collection are combined
To generate objective matrix table.
Second screening unit 220, for filtering out the to be filled of the multiple business according to the gender data collection
User and user to be corrected, the user to be filled are included in the user for not having gender information in the multiple business and gathered, institute
State user to be corrected and be included in the multiple business business of the part containing gender information and containing different sexes information and respectively account for
The user for having half gathers.Specifically, in the present embodiment, the multiple business can include purchase machine, after sale, Yan Bao and reading
Deng four big business, wherein, if detecting that user does not fill in the multiple business reports gender information, by the user
To be filled user of the mass screening as the multiple business;If detecting, user partly contains sex in the multiple business
Information and business containing different sexes information respectively occupies half, i.e., in purchase machine, after sale, certain in the four big business such as Yan Bao and reading
Individual user may be to fill in have reported gender information in purchase machine business and after sale business, and fill in gender information and differ, example
It is male that certain user, which fills in the gender information reported, such as in machine business is purchased, and the gender information reported is filled in business after sale is
Women;May also four kinds all fill in but the number of services containing different sexes information respectively occupies half, such as in purchase machine business
It is women that the gender information reported is filled in certain user in reading business, and the property reported is filled in business after sale and Yan Bao business
Other information is male, then the user group is filtered out into the user to be corrected as the multiple business.
The fisrt feature processing unit 230, for obtaining each user each according to the behavioral data collection
Number of clicks in the application program is as characteristic vector.Specifically, the behavioral data collection includes each use of statistics
The number in each application program is clicked at family in preset time, and statistics is obtained into each user in each application
Number of clicks under program enters row vector conversion, special so as to obtain click of each user under each application program
Sign vector.
First fills unit 240, for according to the characteristic vector, obtained by the method as described in Fig. 1-3
Optimal sex disaggregated model predict the sex of the user to be filled and be filled prediction result.Specifically, it is described
First fills unit 240 is used to use the optimal sex disaggregated model, to the characteristic vector number of each user to be filled
It is predicted according to collection, draws the prediction sex of each user to be filled, by the property of the prediction of each user to be filled
Sex label that Tian Chong be not final as the user.
Second fills unit 250, for according to the characteristic vector, being predicted using the optimal sex disaggregated model
The sex of the user to be corrected and with reference to the gender data collection of the user to be corrected, take its mode be used as described in wait to rectify
The final sex of positive user is simultaneously filled.Specifically, second fills unit 250 is used to use the optimal Gender Classification
Model, the characteristic vector data collection of each user to be corrected is predicted, show that each user's to be corrected is pre-
Sex is surveyed, the prediction sex of each user to be corrected is combined with its gender data collection in multiple business, taken
The sex result of its mode fills the sex label final as the user, for example, drawn when some user in predicting to be corrected
Sex is women, and its gender information reported in purchase machine business and reading business is male, is prolonging guarantor's business and after sale business
In the gender information that reports be women, combine using prediction result with gender data collection and take its mode to treat that correction is used as some
The final sex at family, i.e., the final sex of some described user to be corrected is women.
The sex fill method that the present embodiment provides, user can be obtained by the second acquisition unit 210 in multiple industry
Gender data collection in business and its behavioral data collection in multiple application programs, and sieved according to second screening unit 220
The user to be filled for not reporting gender information in multiple business and the user to be corrected that confidence level is relatively low are selected, then is led to
Cross point of the fisrt feature processing unit 230 by the user to be filled and the user to be corrected in each application program
Hit number conversion and be used as characteristic vector, and then the first fills unit 240 and the second fills unit 250 use optimal Gender Classification
Model prediction draws the sex of the user to be filled and user to be corrected and filled accordingly, its model credibility compared with
Height, nicety of grading is high, and the prediction accuracy of model is higher.Therefore, it is pre- can be based on multiple business for the optimal sex disaggregated model
User to be filled and user to be corrected are surveyed, its user's coverage rate is wider, can effectively fill the user group of unknown sex, correct
There is the sex for reporting gender information but the relatively low user group of confidence level in partial service, wherein, the correction is in partial service
There is a kind of mode for reporting the sex of gender information but the relatively low user group of confidence level to be also the filling of user's sex.This programme energy
More Accurate Prediction fills the sex label for not having the user of gender information or the relatively low user of confidence level in multiple business, improves
The overall accuracy for the sex label that all users finally judge in platform.
Figure 10 is referred to, it is a kind of structural representation of the terminal corresponding with Fig. 5 methods provided in an embodiment of the present invention
Figure.The terminal 300 can be smart mobile phone (such as Android phone, IOS mobile phones), tablet personal computer, personal digital assistant
(PDA), Intelligent worn device etc. has the equipment of mobile networking function.The terminal 300 includes the 3rd acquiring unit the 310, the 3rd
Screening unit 320, second feature processing unit 330, the 3rd fills unit 340, the first accuracy rate acquiring unit 350, first are commented
The scoring unit 370 of subdivision 360 and second.
3rd acquiring unit 310, for obtain gender data collection of the user in multiple business and its it is multiple should
With the behavioral data collection in program to generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, prolong
The business such as guarantor and reading, the multiple application program can be according to belonging to the corresponding division applications of major function of each application program
Classification, the applicating category can include browser, input method, and news consulting, Web Community and amusement social activity etc. are multiple should
With classification, it is preferable that the applicating category can include 478 applications such as browser, input method, news consulting, Web Community
Programs categories.In the present embodiment, by the 3rd acquiring unit 310 obtain user in multiple business such as purchase machine, after sale, prolong guarantor with
And the gender information for having reported and not reported in reading etc., while user can also be obtained and consulted in browser, input method, news
Behavioral data collection in multiple applicating categories such as inquiry, Web Community, the gender data collection got and behavioral data collection are combined
To generate objective matrix table.
The third filtering unit 320, for filtering out the to be filled of the multiple business according to the gender data collection
User and user to be corrected, the user to be filled are included in the user for not having gender information in the multiple business and gathered, institute
State user to be corrected and be included in the multiple business business of the part containing gender information and containing different sexes information and respectively account for
The user for having half gathers.Specifically, in the present embodiment, the multiple business can include purchase machine, after sale, Yan Bao and reading
Deng four big business, wherein, if detecting that user does not fill in the multiple business reports gender information, by the user
To be filled user of the mass screening as the multiple business;If detecting, user partly contains sex in the multiple business
Information and business containing different sexes information respectively occupies half, i.e., in purchase machine, after sale, certain in the four big business such as Yan Bao and reading
Individual user may be to fill in have reported gender information in purchase machine business and after sale business, and fill in gender information and differ, example
It is male that certain user, which fills in the gender information reported, such as in machine business is purchased, and the gender information reported is filled in business after sale is
Women;May also four kinds all fill in but the number of services containing different sexes information respectively occupies half, such as in purchase machine business
It is women that the gender information reported is filled in certain user in reading business, and the property reported is filled in business after sale and Yan Bao business
Other information is male, then the user group is filtered out into the user to be corrected as the multiple business.
The second feature processing unit 330, for obtaining each user each according to the behavioral data collection
Number of clicks in the application program is as characteristic vector.Specifically, the behavioral data collection includes each use of statistics
The number in each application program is clicked at family in preset time, and statistics is obtained into each user in each application
Number of clicks under program enters row vector conversion, special so as to obtain click of each user under each application program
Sign vector.
3rd fills unit 340, for according to the characteristic vector, obtained by the method as described in Fig. 1-3
Optimal sex disaggregated model predict the sex of the user to be filled and be filled prediction result.Specifically, it is described
3rd fills unit 340 is used to use the optimal sex disaggregated model, to the characteristic vector number of each user to be filled
It is predicted according to collection, draws the prediction sex of each user to be filled, by the property of the prediction of each user to be filled
Sex label that Tian Chong be not final as the user.
The first accuracy rate acquiring unit 350, predict that the user is female for obtaining the optimal sex disaggregated model
The overall accuracy rate S1 of property.
The first scoring unit 360, if being women for the prediction result, the scoring of the prediction result is S1.
Specifically, when the optimal sex disaggregated model training prediction sex result is women, the scoring of the prediction result is institute
State the overall accuracy rate S1 that optimal sex disaggregated model predicts that the user to be filled is women.
The second scoring unit 370, if being male for the prediction result, the scoring of the prediction result is S2,
The S2=1-S1.Specifically, when the optimal sex disaggregated model training prediction sex result is male, the prediction knot
The scoring of fruit is (1-S1).
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading do not report sex
The user to be filled of attribute, optimal sex disaggregated model is called to predict the sex of the user by the 3rd fills unit 340
Label, it is the final prediction result of the user by gender prediction's result judgement, and is scored by the first scoring unit 360 and second
Unit 370 is scored gender prediction's result accordingly, wherein, the second scoring unit 370 is used for when gender prediction ties
When fruit is male, appraisal result is (the optimal sex disaggregated models of 1- predict the overall accuracy rate S1 that the user is women), described
First scoring unit 360 is used for when when it is women that gender prediction, which comes out result, appraisal result is that optimal sex disaggregated model is pre-
Survey the overall accuracy rate S1 that the user is women.The accuracy rate of the prediction result can be drawn by scoring.
Figure 11 is referred to, it is a kind of structural representation of the terminal corresponding with Fig. 6 methods provided in an embodiment of the present invention
Figure.The terminal 400 can be smart mobile phone (such as Android phone, IOS mobile phones), tablet personal computer, personal digital assistant
(PDA), Intelligent worn device etc. has the equipment of mobile networking function.The terminal 400 includes the 4th acquiring unit the 410, the 4th
Screening unit 420, third feature processing unit 430, the 4th fills unit 440, sampling comparing unit 450, computing unit 460,
Second accuracy rate acquiring unit the 470, the 3rd scoring scoring unit 490 of unit 480 and the 4th.
4th acquiring unit 410, for obtain gender data collection of the user in multiple business and its it is multiple should
With the behavioral data collection in program to generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, prolong
The business such as guarantor and reading, the multiple application program can be according to belonging to the corresponding division applications of major function of each application program
Classification, the applicating category can include browser, input method, and news consulting, Web Community and amusement social activity etc. are multiple should
With classification, it is preferable that the applicating category can include 478 applications such as browser, input method, news consulting, Web Community
Programs categories.In the present embodiment, by the 4th acquiring unit 410 obtain user in multiple business such as purchase machine, after sale, prolong guarantor with
And the gender information for having reported and not reported in reading etc., while user can also be obtained and consulted in browser, input method, news
Behavioral data collection in multiple applicating categories such as inquiry, Web Community, the gender data collection got and behavioral data collection are combined
To generate objective matrix table.
4th screening unit 420, for filtering out the to be filled of the multiple business according to the gender data collection
User and user to be corrected, the user to be filled are included in the user for not having gender information in the multiple business and gathered, institute
State user to be corrected and be included in the multiple business business of the part containing gender information and containing different sexes information and respectively account for
The user for having half gathers.Specifically, in the present embodiment, the multiple business can include purchase machine, after sale, Yan Bao and reading
Deng four big business, wherein, if detecting that user does not fill in the multiple business reports gender information, by the user
To be filled user of the mass screening as the multiple business;If detecting, user partly contains sex in the multiple business
Information and business containing different sexes information respectively occupies half, i.e., in purchase machine, after sale, certain in the four big business such as Yan Bao and reading
Individual user may be to fill in have reported gender information in purchase machine business and after sale business, and fill in gender information and differ, example
It is male that certain user, which fills in the gender information reported, such as in machine business is purchased, and the gender information reported is filled in business after sale is
Women;May also four kinds all fill in but the number of services containing different sexes information respectively occupies half, such as in purchase machine business
It is women that the gender information reported is filled in certain user in reading business, and the property reported is filled in business after sale and Yan Bao business
Other information is male, then the user group is filtered out into the user to be corrected as the multiple business.
The third feature processing unit 430, for obtaining each user each according to the behavioral data collection
Number of clicks in the application program is as characteristic vector.Specifically, the behavioral data collection includes each use of statistics
The number in each application program is clicked at family in preset time, and statistics is obtained into each user in each application
Number of clicks under program enters row vector conversion, special so as to obtain click of each user under each application program
Sign vector.
4th fills unit 440, for according to the characteristic vector, being predicted using the optimal sex disaggregated model
The sex of the user to be corrected and with reference to the gender data collection of the user to be corrected, take its mode be used as described in wait to rectify
The final sex of positive user is simultaneously filled.Specifically, the 4th fills unit 440 is used to use the optimal Gender Classification
Model, the characteristic vector data collection of each user to be corrected is predicted, show that each user's to be corrected is pre-
Sex is surveyed, the prediction sex of each user to be corrected is combined with its gender data collection in multiple business, taken
The sex result of its mode fills the sex label final as the user, for example, drawn when some user in predicting to be corrected
Sex is women, and its gender information reported in purchase machine business and reading business is male, is prolonging guarantor's business and after sale business
In the gender information that reports be women, combine using prediction result with gender data collection and take its mode to treat that correction is used as some
The final sex at family, i.e., the final sex of some described user to be corrected is women.
The sampling comparing unit 450, in each business the user of gender information will have been reported to be taken out
Sample investigation sex result to its in each business the corresponding gender information reported compared one by one.Specifically,
To reported in the multiple business gender information user carry out random sampling, and to sample survey results and its
Calculating is compared in the gender information that reporting of user is corresponded in each business, so as to obtain in each business
Sex entirety accuracy rate.
The computing unit 460, for according to comparison result, the sex that each business is calculated to be integrally accurate
Rate.In the present embodiment, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, pass through to sample and adjust
Grind comparison result can obtain the purchase machine business, after sale business, prolong guarantor's business and read the sex entirety accuracy rate point of business
Wei not z1, z2, z3, z4.In some feasible embodiments, if being investigated to the multiple business without sampling, the entirety
Accuracy rate z1, z2, z3, z4It is defaulted as 1.0.
The second accuracy rate acquiring unit 470, predict that the user is female for obtaining the optimal sex disaggregated model
The overall accuracy rate S1 of property.
The 3rd scoring unit 480, if being women for the final sex, the other scoring of lastness is S3,
S3=(1-S1 × (1-z1)×(1-z2)...×(1-zn)).Specifically, n value is the total business number of the multiple business;
And gender information has been reported as the business corresponding to male and integrally accurate without the sex of the business corresponding to gender information
Rate znValue be zero.For example, the multiple business includes purchase machine business, reading business, prolongs guarantor's business and after sale business etc.
Four big business, so n=4, wherein, the purchase machine business, after sale business, prolong guarantor's business and that reads business has reported sex
The sex entirety accuracy rate of information is respectively z1, z2, z3, z4.When the sex that some user in predicting to be corrected is drawn is women, its
The gender information reported in machine business is purchased is male, is women prolonging the gender information that guarantor's business reports, remaining two kinds of business
Do not report gender information, then z1=0, z2=0, z4=0, therefore, the scoring of some user to be corrected is S3=(1-S1
×(1-z3)).For another example, when the sex that some user in predicting to be corrected is drawn is women, it is in purchase machine business and reads business
In the gender information that reports be male, be women in the gender information for prolonging guarantor's business and being reported in business after sale, then n=4, z1=
0, z2=0, therefore, the scoring of some user to be corrected is S3=(1-S1 × (1-z3)×(1-z4))。
The 4th scoring unit 490, if being male for the final sex, lastness other the scoring S4, S4
=(1- (1-S1) × (1-z1)×(1-z2)...×(1-zn)), wherein, n value is the total business number of the multiple business;
And gender information has been reported as the business corresponding to women and integrally accurate without the sex of the business corresponding to gender information
Rate znValue be zero.Specifically, the multiple business includes purchase machine business, reading business, prolongs guarantor's business and after sale business
Deng four big business, so n=4, wherein, the purchase machine business, after sale business, prolong guarantor's business and read the having reported property of business
The sex entirety accuracy rate of other information is respectively z1, z2, z3, z4.When the sex that some user in predicting to be corrected is drawn is male,
Its gender information reported in machine business is purchased is male, is women prolonging the gender information that guarantor's business reports, remaining two kinds of industry
Business does not report gender information, then z2=0, z3=0, z4=0, therefore, the scoring of some user to be corrected is S4=(1-
(1-S1)×(1-z1)).For another example, when the sex that some user in predicting to be corrected is drawn is male, it is in purchase machine business and reads
The gender information reported in reading business is male, is women in the gender information for prolonging guarantor's business and being reported in business after sale, then z2
=0, z3=0, therefore, the scoring of some user to be corrected is S4=(1- (1-S1) × (1-z1)×(1-z4))。
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading only reported two great causes
The gender attribute of business and gender information is inconsistent or reported gender attribute in four big business and gender information is inconsistent respectively accounts for
The relatively low user to be corrected of the confidence level of half, optimal sex disaggregated model is called to predict by the 4th fills unit 440
The sex label of the user, by the gender data collection with reference to the user to be corrected, mode is taken as the final of the user
Sex, and scored accordingly by the described 3rd scoring unit 480 and the 4th scoring unit 490.Can by scoring
To draw the other accuracy rate of the lastness of the prediction
Figure 12 is referred to, it is a kind of terminal schematic block diagram that another embodiment of the present invention provides.As depicted originally
Terminal in embodiment can include:One or more processors 801;One or more input equipments 802, it is one or more defeated
Go out equipment 803 and memory 804.Above-mentioned processor 801, input equipment 802, output equipment 803 and memory 804 pass through bus
805 connections.Memory 802 is used to store computer program, and the computer program includes programmed instruction, and processor 801 is used for
Perform the programmed instruction that memory 802 stores.
Wherein, processor 801 is arranged to call described program instruction to perform:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life
Into objective matrix table.
The gender data collection in the matrix table filters out the training user that treats of the multiple business, described to treat
Training user gathers in multiple pre-set business containing gender information and gender information's identical user.
The gender data collection that training user is treated in the objective matrix table and behavioral data collection are converted into training
The characteristic data set of Gender Classification model, wherein the characteristic data set includes training dataset and test data set.
According to the training dataset, the Gender Classification model is trained using decision Tree algorithms.
According to Gender Classification model described in algorithm tuning parameter and the test data set cross validation, optimal sex is obtained
Disaggregated model.
Further realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life
Into original matrix table, wherein, the gender data collection includes user the gender information in each business, the behavior number
The number for including user according to collecting and each application program being clicked in preset time, the row of the original matrix table is ID
Number, row are gender information of the corresponding user in each business and its each application program of click in preset time
Number, wherein, the gender data collection and behavioral data collection are associated and are used as characteristic data set by the ID number of user.
Data cleansing is carried out to the original matrix table to generate the objective matrix table.Specifically:
Identify the application program of the miss rate more than 90% in the original matrix table.
The application program identified is deleted from the original matrix table and generates the objective matrix table.
Wherein, processor 801 may be further configured for calling described program instruction to perform:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life
Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out
Fill user including the user for not having gender information in multiple business to gather, the user to be corrected is included in the multiple
Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection
Characteristic vector.
According to the characteristic vector, according to the optimal Gender Classification mould obtained by the generation method of the Gender Classification model
Type predicts the sex of the user to be filled and is filled prediction result.
According to the characteristic vector, using the sex and knot of user to be corrected described in the optimal sex disaggregated model prediction
The gender data collection of user to be corrected described in conjunction, its mode is taken the final sex of user to be corrected and to be filled out as described in
Fill.
Further realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life
Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out
Fill user including the user for not having gender information in multiple business to gather, the user to be corrected is included in the multiple
Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection
Characteristic vector.
According to the characteristic vector, using the optimal Gender Classification mould obtained by the generation method of the Gender Classification model
Type predicts the sex of the user to be filled and is filled prediction result.
Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If the prediction result is women, the scoring of the prediction result is S1.
If the prediction result is male, the scoring of the prediction result is S2, the S2=1-S1.
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading do not report sex
The user to be filled of attribute, the sex label of the user is predicted by calling optimal sex disaggregated model, and by the gender prediction
Result judgement is the final prediction result of the user and is scored accordingly, wherein, when gender prediction's result is male user,
Appraisal result is (the optimal sex disaggregated models of 1- predict the overall accuracy rate S1 that the user is women), when gender prediction out ties
When fruit is women, appraisal result is the overall accuracy rate S1 that optimal sex disaggregated model predicts that the user is women.Pass through scoring
The accuracy rate of the prediction result can be drawn.
It can also realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life
Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out
Fill user including the user for not having gender information in multiple business to gather, the user to be corrected is included in the multiple
Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection
Characteristic vector.
According to the characteristic vector, using the sex and knot of user to be corrected described in the optimal sex disaggregated model prediction
The gender data collection of user to be corrected described in conjunction, its mode is taken the final sex of user to be corrected and to be filled out as described in
Fill.
By the user that gender information has been reported in each business be sampled the sex result of investigation with its
The gender information accordingly reported in each business is compared one by one.
According to comparison result, the sex entirety accuracy rate z of each business is calculatedn。
Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If the final sex is women, the other scoring of lastness is S3, S3=(1-S1 × (1-z1)×(1-
z2)...×(1-zn))。
If the final sex is male, the other scoring S4 of lastness, S4=(1- (1-S1) × (1-z1)×(1-
z2)...×(1-zn)), wherein, n value is the total business number of the multiple business;And gender information has been reported as women institute
Corresponding business and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero.
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading only reported two great causes
The gender attribute of business and gender information is inconsistent or reported gender attribute in four big business and gender information is inconsistent respectively accounts for
The relatively low user to be corrected of the confidence level of half, by calling optimal sex disaggregated model to predict the sex label of the user, and
With reference to the gender data collection of the user to be corrected, final sex of the sex result filling of mode as the user is taken,
And scored accordingly.The other accuracy rate of lastness of the prediction can be drawn by scoring.
It should be appreciated that in embodiments of the present invention, alleged processor 801 can be CPU (Central
Processing Unit, CPU), the processor can also be other general processors, digital signal processor (Digital
Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit,
ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other FPGAs
Device, discrete gate or transistor logic, discrete hardware components etc..General processor can be microprocessor or this at
It can also be any conventional processor etc. to manage device.
Input equipment 802 can include Trackpad, fingerprint adopt sensor (finger print information that is used to gathering user and fingerprint
Directional information), microphone etc., output equipment 803 can include display (LCD etc.), loudspeaker etc..
The memory 804 can include read-only storage and random access memory, and to processor 801 provide instruction and
Data.The a part of of memory 804 can also include nonvolatile RAM.For example, memory 804 can also be deposited
Store up the information of device type.
In the specific implementation, processor 801, input equipment 802, the output equipment 803 described in the embodiment of the present invention can
Perform the generation method of Gender Classification model provided in an embodiment of the present invention and the first embodiment of sex fill method and
Implementation described in two embodiments, the implementation of the terminal described by the embodiment of the present invention is also can perform, herein not
Repeat again.
A kind of storage medium is provided in another embodiment of the invention, and the storage medium can be computer-readable deposits
Storage media, the storage medium are stored with computer program, and the computer program includes programmed instruction, described program instruction quilt
Realized during computing device:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life
Into objective matrix table.
The gender data collection in the matrix table filters out the training user that treats of the multiple business, described to treat
Training user gathers in multiple pre-set business containing gender information and gender information's identical user.
The gender data collection that training user is treated in the objective matrix table and behavioral data collection are converted into training
The characteristic data set of Gender Classification model, wherein the characteristic data set includes training dataset and test data set.
According to the training dataset, the Gender Classification model is trained using decision Tree algorithms.
According to Gender Classification model described in algorithm tuning parameter and the test data set cross validation, optimal sex is obtained
Disaggregated model.
Further realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life
Into original matrix table, wherein, the gender data collection includes user the gender information in each business, the behavior number
The number for including user according to collecting and each application program being clicked in preset time, the row of the original matrix table is ID
Number, row are gender information of the corresponding user in each business and its each application program of click in preset time
Number, wherein, the gender data collection and behavioral data collection are associated and are used as characteristic data set by the ID number of user.
Data cleansing is carried out to the original matrix table to generate the objective matrix table.Specifically:
Identify the application program of the miss rate more than 90% in the original matrix table.
The application program identified is deleted from the original matrix table and generates the objective matrix table.
Wherein, processor 801 may be further configured for calling described program instruction to perform:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life
Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out
Filling user includes not having in the multiple business the user of gender information to gather, and the user to be corrected is included in the multiple industry
Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection
Characteristic vector.
According to the characteristic vector, according to the optimal Gender Classification mould obtained by the generation method of the Gender Classification model
Type predicts the sex of the user to be filled and is filled prediction result.
According to the characteristic vector, using the sex and knot of user to be corrected described in the optimal sex disaggregated model prediction
The gender data collection of user to be corrected described in conjunction, its mode is taken the final sex of user to be corrected and to be filled out as described in
Fill.
It can also realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life
Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out
Fill user including the user for not having gender information in multiple business to gather, the user to be corrected is included in the multiple
Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection
Characteristic vector.
According to the characteristic vector, using the optimal Gender Classification mould obtained by the generation method of the Gender Classification model
Type predicts the sex of the user to be filled and is filled prediction result.
Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If the prediction result is women, the scoring of the prediction result is S1.
If the prediction result is male, the scoring of the prediction result is S2, the S2=1-S1.
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading do not report sex
The user to be filled of attribute, the sex label of the user is predicted by calling optimal sex disaggregated model, and by the gender prediction
Result judgement is the final prediction result of the user and is scored accordingly, wherein, when gender prediction's result is male user,
Appraisal result is (the optimal sex disaggregated models of 1- predict the overall accuracy rate S1 that the user is women), when gender prediction out ties
When fruit is women, appraisal result is the overall accuracy rate S1 that optimal sex disaggregated model predicts that the user is women.Pass through scoring
The accuracy rate of the prediction result can be drawn.
Further, can also realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life
Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out
Fill user including the user for not having gender information in multiple business to gather, the user to be corrected is included in the multiple
Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection
Characteristic vector.
According to the characteristic vector, using the sex and knot of user to be corrected described in the optimal sex disaggregated model prediction
The gender data collection of user to be corrected described in conjunction, its mode is taken the final sex of user to be corrected and to be filled out as described in
Fill.
By the user that gender information has been reported in each business be sampled the sex result of investigation with its
The gender information accordingly reported in each business is compared one by one.
According to comparison result, the sex entirety accuracy rate z of each business is calculatedn。
Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If the final sex is women, the other scoring of lastness is S3, S3=(1-S1 × (1-z1)×(1-
z2)...×(1-zn))。
If the final sex is male, the other scoring S4 of lastness, S4=(1- (1-S1) × (1-z1)×(1-
z2)...×(1-zn)), wherein, n value is the total business number of the multiple business;And gender information has been reported as women institute
Corresponding business and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero.
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading only reported two great causes
The gender attribute of business and gender information is inconsistent or reported gender attribute in four big business and gender information is inconsistent respectively accounts for
The relatively low user to be corrected of the confidence level of half, by calling optimal sex disaggregated model to predict the sex label of the user, and
With reference to the gender data collection of the user to be corrected, final sex of the sex result filling of mode as the user is taken,
And scored accordingly.The other accuracy rate of lastness of the prediction can be drawn by scoring.
The storage medium can be the internal storage unit of the terminal described in foregoing any embodiment, such as terminal is hard
Disk or internal memory.The storage medium can also be the grafting being equipped with the External memory equipment of the terminal, such as the terminal
Formula hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card
(Flash Card) etc..Further, the storage medium can also both including the terminal internal storage unit and also including
External memory equipment.The storage medium is used to store the computer program and other program sums needed for the terminal
According to.The storage medium can be also used for temporarily storing the data that has exported or will export.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein
Member and algorithm steps, it can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate hardware
With the interchangeability of software, the composition and step of each example are generally described according to function in the above description.This
A little functions are performed with hardware or software mode actually, application-specific and design constraint depending on technical scheme.Specially
Industry technical staff can realize described function using distinct methods to each specific application, but this realization is not
It is considered as beyond the scope of this invention.
It is apparent to those skilled in the art that for convenience of description and succinctly, the end of foregoing description
End and the specific work process of unit, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
In several embodiments provided herein, it should be understood that disclosed terminal and method, it can be passed through
Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, only
Only a kind of division of logic function, there can be other dividing mode when actually realizing, such as multiple units or component can be tied
Another system is closed or is desirably integrated into, or some features can be ignored, or do not perform.In addition, shown or discussed phase
Coupling or direct-coupling or communication connection between mutually can be INDIRECT COUPLING or the communication by some interfaces, device or unit
Connection or electricity, the connection of mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be selected to realize scheme of the embodiment of the present invention according to the actual needs
Purpose.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
It is that unit is individually physically present or two or more units are integrated in a unit.It is above-mentioned integrated
Unit can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use
When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially
The part to be contributed in other words to prior art, or all or part of the technical scheme can be in the form of software product
Embody, the computer software product is stored in a storage medium, including some instructions are causing a computer
Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the present invention
Portion or part steps.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey
The medium of sequence code.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, various equivalent modifications can be readily occurred in or replaced
Change, these modifications or substitutions should be all included within the scope of the present invention.Therefore, protection scope of the present invention should be with right
It is required that protection domain be defined.
Claims (14)
- A kind of 1. generation method of Gender Classification model, it is characterised in that including:Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained to generate mesh Mark matrix table;The gender data collection in the objective matrix table filters out the training user that treats of the multiple business, described to treat Training user is included in multiple pre-set business to be gathered containing gender information and gender information's identical user;The gender data collection that training user is treated in the objective matrix table and behavioral data collection are converted into training sex The characteristic data set of disaggregated model, wherein the characteristic data set includes training dataset and test data set;According to the training dataset, the Gender Classification model is trained using decision Tree algorithms;According to Gender Classification model described in algorithm tuning parameter and the test data set cross validation, optimal Gender Classification is obtained Model.
- 2. according to the method for claim 1, it is characterised in that the gender data collection for obtaining user in multiple business And its behavioral data collection in multiple application programs is specifically included with generating objective matrix table:Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained to generate original Beginning matrix table, wherein, the gender data collection includes user the gender information in each business, the behavioral data collection The number of each application program is clicked in preset time including user, the row of the original matrix table is ID users, Row are gender information of the corresponding user in each business and its each application program of click in preset time Number;Data cleansing is carried out to the original matrix table to generate the objective matrix table.
- 3. according to the method for claim 2, it is characterised in that described that data cleansing is carried out to the original matrix table with life Into the objective matrix table, specifically include:Identify application program of the miss rate more than 90% in the original matrix table;The application program identified is deleted from the original matrix table and generates the objective matrix table.
- 4. according to the method for claim 1, it is characterised in that the multiple business include purchase machine business, after sale business, Prolong guarantor's business, reading business.
- 5. according to the method for claim 1, it is characterised in that the business number of the multiple pre-set business at least accounts for described more The 75% of the total business number of individual business.
- 6. according to the method for claim 1, it is characterised in that the decision Tree algorithms include:CART algorithms, ID3 algorithms, C4.5 algorithms and random forests algorithm.
- 7. according to the method for claim 1, it is characterised in that the algorithm tuning parameter includes:A number for decision tree, spy Levy subset selection strategy, Attributions selection measurement, the depth capacity of tree and the Breadth Maximum of tree.
- 8. according to the method for claim 1, it is characterised in that the evaluation index of the cross validation includes:Precision, recall Rate and overall accuracy rate.
- A kind of 9. sex fill method, it is characterised in that including:Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained to generate mesh Mark matrix table;The user to be filled of the multiple business and user to be corrected, the use to be filled are filtered out according to the gender data collection Family is included in the user for not having gender information in the multiple business and gathered, and the user to be corrected is included in the multiple business Business of the middle part containing gender information and containing different sexes information respectively occupies user's set of half;According to the behavior number The each number of clicks conduct of the user to be filled and the user to be corrected in each application program is obtained according to collection Characteristic vector;According to the characteristic vector, described treat is predicted using the optimal sex disaggregated model described in claim any one of 1-8 Fill the sex of user and be filled prediction result;According to the characteristic vector, using the sex of user to be corrected described in the optimal sex disaggregated model prediction and institute is combined The gender data collection of user to be corrected is stated, takes its mode the final sex of user to be corrected and to be filled as described in.
- 10. according to the method for claim 9, it is characterised in that it is described prediction result is filled after, in addition to:Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women;If the prediction result is women, the scoring of the prediction result is S1;If the prediction result is male, the scoring of the prediction result is S2, and the S2 is equal to 1-S1.
- 11. according to the method for claim 9, it is characterised in that described to take its mode user to be corrected is most as described in Whole sex and after being filled, in addition to:The user that gender information has been reported in each business is sampled the sex result of investigation with it in each institute The gender information accordingly reported in business is stated to be compared one by one;According to comparison result, the sex entirety accuracy rate z of each business is calculatedn;Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women;If the final sex is women, the other scoring of lastness is S3, S3=(1-S1 × (1-z1)×(1-z2)...× (1-zn)), wherein, n value is the total business number of the multiple business;And industry of the gender information corresponding to male is reported Business and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero;If the final sex is male, the other scoring of lastness is S4, S4=(1- (1-S1) × (1-z1)×(1- z2)...×(1-zn)), wherein, n value is the total business number of the multiple business;And gender information has been reported as women institute Corresponding business and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero.
- 12. a kind of terminal, it is characterised in that including for performing the method as described in claim 1-11 any claims Unit.
- 13. a kind of terminal, it is characterised in that the processor, defeated including processor, input equipment, output equipment and memory Enter equipment, output equipment and memory to be connected with each other, wherein, the memory is used to store computer program, the computer Program includes programmed instruction, and the processor is arranged to call described program instruction, performed as claim 1-11 is any Method described in.
- 14. a kind of storage medium, it is characterised in that the storage medium is stored with computer program, the computer program bag Programmed instruction is included, described program instruction makes the computing device such as any one of claim 1-11 institutes when being executed by a processor The method stated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711176286.9A CN107886366A (en) | 2017-11-22 | 2017-11-22 | Generation method, sex fill method, terminal and the storage medium of Gender Classification model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711176286.9A CN107886366A (en) | 2017-11-22 | 2017-11-22 | Generation method, sex fill method, terminal and the storage medium of Gender Classification model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107886366A true CN107886366A (en) | 2018-04-06 |
Family
ID=61778274
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711176286.9A Withdrawn CN107886366A (en) | 2017-11-22 | 2017-11-22 | Generation method, sex fill method, terminal and the storage medium of Gender Classification model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107886366A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108960922A (en) * | 2018-07-09 | 2018-12-07 | 中国联合网络通信集团有限公司 | The replacement prediction technique and device of terminal |
CN109492104A (en) * | 2018-11-09 | 2019-03-19 | 北京京东尚科信息技术有限公司 | Training method, classification method, system, equipment and the medium of intent classifier model |
CN110097170A (en) * | 2019-04-25 | 2019-08-06 | 深圳市豪斯莱科技有限公司 | Information pushes object prediction model acquisition methods, terminal and storage medium |
CN110502432A (en) * | 2019-07-23 | 2019-11-26 | 平安科技(深圳)有限公司 | Intelligent test method, device, equipment and readable storage medium storing program for executing |
CN110781374A (en) * | 2018-07-13 | 2020-02-11 | 北京字节跳动网络技术有限公司 | User data matching method and device, electronic equipment and computer readable medium |
CN110784760A (en) * | 2019-09-16 | 2020-02-11 | 清华大学 | Video playing method, video player and computer storage medium |
CN111078742A (en) * | 2019-12-09 | 2020-04-28 | 秒针信息技术有限公司 | User classification model training method, user classification method and device |
CN111178983A (en) * | 2020-01-03 | 2020-05-19 | 北京搜狐新媒体信息技术有限公司 | User gender prediction method, device, equipment and storage medium |
WO2020192460A1 (en) * | 2019-03-25 | 2020-10-01 | 华为技术有限公司 | Data processing method, terminal-side device, cloud-side device, and terminal-cloud collaboration system |
CN113657917A (en) * | 2020-05-12 | 2021-11-16 | 上海佳投互联网技术集团有限公司 | Visitor gender analysis method and system based on USER-AGENT |
CN116992267A (en) * | 2023-09-28 | 2023-11-03 | 北京融信数联科技有限公司 | Regional population gender identification method and system based on signaling data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105654131A (en) * | 2015-12-30 | 2016-06-08 | 小米科技有限责任公司 | Classification model training method and device |
CN106203473A (en) * | 2016-06-24 | 2016-12-07 | 有米科技股份有限公司 | A kind of mobile subscriber's gender prediction's method based on installation kit list |
CN106682686A (en) * | 2016-12-09 | 2017-05-17 | 北京拓明科技有限公司 | User gender prediction method based on mobile phone Internet-surfing behavior |
CN106897727A (en) * | 2015-12-21 | 2017-06-27 | 百度在线网络技术(北京)有限公司 | A kind of user's gender identification method and device |
-
2017
- 2017-11-22 CN CN201711176286.9A patent/CN107886366A/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897727A (en) * | 2015-12-21 | 2017-06-27 | 百度在线网络技术(北京)有限公司 | A kind of user's gender identification method and device |
CN105654131A (en) * | 2015-12-30 | 2016-06-08 | 小米科技有限责任公司 | Classification model training method and device |
CN106203473A (en) * | 2016-06-24 | 2016-12-07 | 有米科技股份有限公司 | A kind of mobile subscriber's gender prediction's method based on installation kit list |
CN106682686A (en) * | 2016-12-09 | 2017-05-17 | 北京拓明科技有限公司 | User gender prediction method based on mobile phone Internet-surfing behavior |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108960922A (en) * | 2018-07-09 | 2018-12-07 | 中国联合网络通信集团有限公司 | The replacement prediction technique and device of terminal |
CN110781374A (en) * | 2018-07-13 | 2020-02-11 | 北京字节跳动网络技术有限公司 | User data matching method and device, electronic equipment and computer readable medium |
CN109492104A (en) * | 2018-11-09 | 2019-03-19 | 北京京东尚科信息技术有限公司 | Training method, classification method, system, equipment and the medium of intent classifier model |
WO2020192460A1 (en) * | 2019-03-25 | 2020-10-01 | 华为技术有限公司 | Data processing method, terminal-side device, cloud-side device, and terminal-cloud collaboration system |
CN110097170A (en) * | 2019-04-25 | 2019-08-06 | 深圳市豪斯莱科技有限公司 | Information pushes object prediction model acquisition methods, terminal and storage medium |
CN110502432A (en) * | 2019-07-23 | 2019-11-26 | 平安科技(深圳)有限公司 | Intelligent test method, device, equipment and readable storage medium storing program for executing |
CN110502432B (en) * | 2019-07-23 | 2023-11-28 | 平安科技(深圳)有限公司 | Intelligent test method, device, equipment and readable storage medium |
CN110784760A (en) * | 2019-09-16 | 2020-02-11 | 清华大学 | Video playing method, video player and computer storage medium |
CN111078742A (en) * | 2019-12-09 | 2020-04-28 | 秒针信息技术有限公司 | User classification model training method, user classification method and device |
CN111078742B (en) * | 2019-12-09 | 2023-09-05 | 秒针信息技术有限公司 | User classification model training method, user classification method and device |
CN111178983A (en) * | 2020-01-03 | 2020-05-19 | 北京搜狐新媒体信息技术有限公司 | User gender prediction method, device, equipment and storage medium |
CN111178983B (en) * | 2020-01-03 | 2024-03-12 | 北京搜狐新媒体信息技术有限公司 | User gender prediction method, device, equipment and storage medium |
CN113657917A (en) * | 2020-05-12 | 2021-11-16 | 上海佳投互联网技术集团有限公司 | Visitor gender analysis method and system based on USER-AGENT |
CN116992267A (en) * | 2023-09-28 | 2023-11-03 | 北京融信数联科技有限公司 | Regional population gender identification method and system based on signaling data |
CN116992267B (en) * | 2023-09-28 | 2024-01-23 | 北京融信数联科技有限公司 | Regional population gender identification method and system based on signaling data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107886366A (en) | Generation method, sex fill method, terminal and the storage medium of Gender Classification model | |
CN102708130B (en) | Calculate the easily extensible engine that fine point of user is mated for offer | |
CN106779457A (en) | A kind of rating business credit method and system | |
CN105610929A (en) | Personalized data pushing method and device | |
CN109978033A (en) | The method and apparatus of the building of biconditional operation people's identification model and biconditional operation people identification | |
CN108073659A (en) | A kind of love and marriage object recommendation method and device | |
CN110246007A (en) | A kind of Method of Commodity Recommendation and device | |
CN108764332A (en) | A kind of Channel Quality analysis method, computing device and storage medium | |
CN107563621A (en) | A kind of website user's wastage analysis method and device | |
CN110781308A (en) | Anti-fraud system for building knowledge graph based on big data | |
CN108648068A (en) | A kind of assessing credit risks method and system | |
CN109325845A (en) | A kind of financial product intelligent recommendation method and system | |
CN110020149A (en) | Labeling processing method, device, terminal device and the medium of user information | |
CN107220745A (en) | A kind of recognition methods, system and equipment for being intended to behavioral data | |
CN110457576A (en) | Account-classification method, device, computer equipment and storage medium | |
CN111353600A (en) | Abnormal behavior detection method and device | |
CN106651547A (en) | Data processing method and apparatus | |
Sharaf Addin et al. | Customer mobile behavioral segmentation and analysis in telecom using machine learning | |
CN102411589A (en) | Method and equipment for monitoring and managing keywords | |
CN110852785A (en) | User grading method, device and computer readable storage medium | |
CN112449002A (en) | Method, device and equipment for pushing object to be pushed and storage medium | |
CN107077455A (en) | Flow mass is determined using the score traffic based on event | |
CN102243634A (en) | Data statistical method and system | |
CN109561162A (en) | Excavate the method and device that user accesses hobby | |
Sarirah Husin et al. | Process mining approach to analyze user navigation behavior of a news website |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180406 |
|
WW01 | Invention patent application withdrawn after publication |