CN107180044A - Recognize Internet user's sex method and system - Google Patents

Recognize Internet user's sex method and system Download PDF

Info

Publication number
CN107180044A
CN107180044A CN201610134810.5A CN201610134810A CN107180044A CN 107180044 A CN107180044 A CN 107180044A CN 201610134810 A CN201610134810 A CN 201610134810A CN 107180044 A CN107180044 A CN 107180044A
Authority
CN
China
Prior art keywords
mrow
sex
user
data
network behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610134810.5A
Other languages
Chinese (zh)
Inventor
李倚
吴贇哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jing Shuo Technology (beijing) Ltd By Share Ltd
Original Assignee
Jing Shuo Technology (beijing) Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jing Shuo Technology (beijing) Ltd By Share Ltd filed Critical Jing Shuo Technology (beijing) Ltd By Share Ltd
Priority to CN201610134810.5A priority Critical patent/CN107180044A/en
Publication of CN107180044A publication Critical patent/CN107180044A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Internet user's property method for distinguishing is recognized the present invention relates to one kind, including:Extract the gender data and network behavior data of multiple investigation samples;Sex behavior model is built according to the gender data and the network behavior data of the investigation sample;Receive the network behavior data of user to be analyzed;The sex probability that sex's model calculates user to be analyzed is brought into by the network behavior data of the user to be analyzed.The invention further relates to recognize the system of Internet user's sex, including sample preparatory unit, modeling unit, testing data receiving unit and sex recognition unit.

Description

Recognize Internet user's sex method and system
Technical field
The invention belongs to computer realm, it is related to a kind of identification Internet user's property method for distinguishing.The invention further relates to one The system for planting identification Internet user's sex.
Background technology
Traditionally classification problem can be realized by expert system.Selected by domain expert and class object strong correlation Feature, such as height, body weight, there is interconnection internet behavior again, such as access automotive-type website, the behavior of women toiletries website, Also either it is accustomed to the time of online, the body matter for checking webpage etc..The table of each feature is determined by domain expert again Value indicative or scoring value, can be with reference to this case:Height counts -1 point less than 160cm higher than 1 point of 178cm meters, height, accesses automotive-type Website count 1 point, access women class website meter -1 grade etc. it is such.Finally count some features of user to be predicted Score, just predicting into male, just predicting into women less than 0 more than 0.
We can be found that the form of expert system can be with varied, but its essence is the system based on experience.By The domain knowledge that expert provides occupies conclusive status.But in actual life, each feature for sample use for not It is certain rationally and to prove effective, as the strong correlation of the height for Women Volleyball Players and sex be not it is evident that, even It is wrong.It is that to choose present be also universal phenomenon that boy student, which accesses women cosmetics website, before the such as commemoration day again.Such one Carrying out the rule of expert's customization will be become increasingly complex, and condition will be also increasingly difficult to judge, and so cost is uncontrollable, heavier What is wanted is that regular formulation will lag behind timeliness forever, so as to cause undesirable expert system effect, time and effort consuming, dependence field Expertise.
The content of the invention
In order to release the dependence to expertise, inventor develops with great concentration a kind of can automate and chosen and target phase to be predicted The characteristic dimension of association, automatic training pattern obtain optimal models on time sample space, are easy to the identification of automatic prediction mutual On-line customer's property method for distinguishing.
According to the first aspect of the invention, the present invention provides a kind of identification Internet user's property method for distinguishing, including:
Extract the gender data and network behavior data of multiple investigation samples;
Sex-behavior model is built according to the gender data and the network behavior data of the investigation sample;
Receive the network behavior data of user to be analyzed;
Bring the network behavior data of the user to be analyzed into property that the sex-behavior model calculates user to be analyzed Other probability.
The network behavior data include exposing in the species of advertisement, media, the webpage url accessed, text message extremely Few one kind.
Preferably, after the gender data and network behavior data of investigation sample is extracted, the data obtained are carried out Cleaning, removes error message.
Alternatively, after error message is removed, network behavior data is extracted with feature and is quantized.
Alternatively, after error message is removed, network behavior data are extracted with feature, and according to feature coverage rate, card At least one of square statistic, information gain-comentropy exclude undesirable feature.
Preferably, the sex-behavior model is generalized linear model.
It is highly preferred that sex-the behavior model is
Wherein
X is user characteristics vector, have recorded user's internet access behavior after quantizing;
μ is sex probability;
β is optimal estimates of parameters.
According to the second aspect of the invention, the present invention provides a kind of system for recognizing Internet user's sex, including
Sample preparatory unit, extracts the gender data and network behavior data of multiple investigation samples;
Modeling unit, sex-behavior mould is built according to the gender data and the network behavior data of the investigation sample Type;
Testing data receiving unit, receives the network behavior data of user to be identified;With
The network behavior data of the user to be identified are substituted into the sex-behavior model and calculated by sex recognition unit Obtain the sex probability of user to be identified.
Preferably, the sex-behavior model is
In certain embodiments of the present invention, the modeling unit includes
The network behavior data extracted are converted into characteristic set, and quantize by feature extraction module;
Feature cleaning module, by with target correlation to be predicted is low, coverage rate is small, and/or the similar feature of effect is excluded Fall;With
Parameter estimation module, optimal estimates of parameters β is obtained using statistical method.
Preferably, optimal ginseng can be estimated by Maximum-likelihood estimation, Quasi-maximum likelihood estimators or Bayes' theorem Number estimate β.
It is highly preferred that obtaining optimal estimates of parameters β by Maximum-likelihood estimation
According to a third aspect of the present invention, the present invention provides a kind of identification Internet user's property method for distinguishing, including:
Receive the network behavior data of user to be identified;
Network behavior data to user to be identified carry out feature extraction;
The model that the feature extracted is brought into model library is calculated;With
Export result of calculation.
Preferably, the feature extracted is converted into user characteristics vector X by quantizing.
Preferably, the model is
Wherein
X is user characteristics vector, have recorded user's internet access behavior after quantizing;
μ is sex probability;
β is optimal estimates of parameters.
In certain embodiments of the present invention, the β value can change according to the selection of user characteristics vector.
According to the fourth aspect of the invention, the present invention provides a kind of system for recognizing Internet user's sex, including
Testing data receiving unit, receives the network behavior data of user to be identified;
Characteristic extracting module, extracts the network behavior data of user to be identified and quantizes;With
The network behavior data of the user to be identified are substituted into the sex-behavior model and calculated by sex recognition unit Obtain the sex probability of user to be identified.
Preferably, the sex recognition unit includes:
Model library module, stores sex-behavior model;With
Model corresponding with the feature extracted is calculated in sex probability evaluation entity, calling model storehouse.
Preferably, the sex recognition unit also includes:Feature set module, stores spy corresponding with sex-behavior model Levy.
Brief description of the drawings
The preferred embodiments of the present invention will be described in detail by referring to accompanying drawing below, and make one of ordinary skill in the art more In the described and other feature and advantage of the clear present invention, accompanying drawing:
Fig. 1 is the flow chart of the method for some embodiments of the invention.
Fig. 2 is the flow chart of the method for other embodiment of the invention.
Fig. 3 is the schematic diagram of the system of some embodiments of the invention.
Fig. 4 is the schematic diagram of the system of other embodiment of the invention.
Embodiment
In the following description, a large amount of concrete details are given to provide more thorough understanding of the invention.So And, it will be apparent to one skilled in the art that the present invention can be able to without one or more of these details Implement.In other examples, in order to avoid obscuring with the present invention, do not enter for some technical characteristics well known in the art Row description.
The Web content that term " behavioral data " in the present invention is accessed and checked according to user is obtained, and is included but is not limited to Game, film, shopping at network, browse News & Activitics, stock market etc..
Term " behavioral data " in the present invention can also further expand the second dimension data, including but not limited to time Information, such as behavior specifically occur in some or which time interval.Specifically, the carrier of all these behavioral datas can To be network access daily record, every network access daily record includes bulk information, such as source, Cookie ID, access time, stops Stay time, facility information, operating system version, browser version etc..
Term " investigation sample " in the present invention refers to known gender data and the priori sample data of behavioral data.This The data of a little historical samples can be obtained from research company or advertiser is directly provided or by monitoring of the advertisement company Obtained in the project data of accumulation.
Term " quantizing " in the present invention refers to being changed into many information complicated and changeable into the numeral that can be measured, number According to, then with these numeral, data set up appropriate digital model, they are changed into a series of binary codes, introduce Computer-internal, is uniformly processed, here it is the basic process quantized.
Term " Maximum-likelihood estimation " in the present invention is a kind of statistical method, and it is used for asking the correlation of a sample set general The parameter of rate density function.Some known random sample meets certain probability distribution, but wherein specific parameter is unknown, parameter Estimation is exactly that, by testing several times, observation is as a result, release the big probable value of parameter using result.Maximum-likelihood estimation is to set up In such hypothesis:Some known parameter can make the maximum probability that this sample occurs, then no longer going selection certainly, other are small The sample of probability, therefore the parameter is selected as the actual value of estimation.
Term " generalized linear model " (Generalized linear model) in the present invention is statistically a kind of By wide variety of linear regression model.This pattern is assumed in the distribution function and experiment of the random variable that experimenter is measured Systematic effect (i.e. nonrandom effect) can set up its soluble correlation via link function (link function) Function.
According to certain embodiments of the present invention, identification Internet user property method for distinguishing of the invention includes following step Suddenly:
Data prepare
Step S100, extracts the gender data and behavioral data of multiple investigation samples.Specifically, the behavioral data root The Web content for accessing and checking according to user is obtained (such as the species of exposure advertisement, media, the webpage url and text message accessed Etc.), it can be questionnaire form is obtained by inquiry questionnaire information, can also be the information of other data mediums.
After these data are obtained, the data preferably to gained are cleaned.The purpose of cleaning is to filter off substantially event The meaning disorderly user that fills out or the error message caused due to acquisition system.Certainly, if sample data is obtained by buying, lead to Cross if may insure the degree of accuracy, then can not also do cleaning and directly use.In other words, when doing data preparation, sample prepares Unit 100 generally comprises sample collection module 101 and sample cleaning modul 102.If being introduced directly into the sample of purchase, or have In the case that other method ensures availability of data, then sample cleaning modul can not be set.
Prepared by data, obtain the sample data of shape such as table 1
Table 1
Cookie ID Browse record Sex
1403061550041722265 …… F
1403061553071964915 …… M
…… …… ……
Cookie is record user No. id, browses record and refers to the behavior that user produces via browser, sex F (Female) represented with M (Male).In the present invention, the scope for browsing record widely, includes but is not limited in video The directory that website has been seen, the purchaser record on shopping website, the forum logged in, the advertisement clicked on etc..
Modeling
Step S110 builds sex-behavior mould according to the gender data and the network behavior data of the investigation sample Type.
As shown in figure 3, modeling unit 200 includes feature extraction module 201, feature cleaning module 202 and parameter Estimation mould Block 203.
Feature extraction module 201:Browser behavior data in raw sample data are converted into one by feature extraction module Group characteristic set.As shown in table 2, behavioral data can be whether classification type (for example " accessed website A ", conclusion is non-zero i.e. 1), also but value type (such as " in website B stop hourage ", such as:1.7 hours, 37 minutes;Or " in website C Spending amount ", such as 3.2 yuan).For example, if accessed the home Web site of automobile, it is 1 to access overwriting, has not visited record For 0 etc. such situation.
Table 2
Cookie ID Browse record Featuren Featurem Sex
1403061550041722265 …… 1 0 F
1403061553071964915 …… 1 1 M
…… …… …… …… ……
Table 2 is the example of the characteristic set extracted.
The original network behavior data class of user is numerous and diverse, heterogeneous.The purpose of the feature extraction module of the present invention is just It is the form for non-structured user network behavioral data being changed into a vector such as [12, -3,1,0,2.4 ...] etc.
In certain embodiments of the present invention, three dimensional characteristics for example are respectively:Whether the family of automobile, 23 are accessed Either with or without internet records and weekly total online duration after point.So user characteristics vector with regard to shape such as [1,0,15], represent respectively Have accessed after the home Web site of automobile, 23 points does not have internet records and one week total online duration 15 hours.
Feature cleaning module 202:By feature extraction module 201, pole can be obtained from the network behavior data of sample Many features.But it is inappropriate that these features, which are all used for modeling,.Need by those and target correlation to be predicted it is low, covering The feature that rate is small, effect is similar is excluded.
The data volume that internet behavior is provided is magnanimity, and characteristic dimension is similarly magnanimity, some features even people Brain also can not be identified for prediction user's sex probability either with or without with.In this case, seen with these characteristic dimension established models Whether can well predict user sex, be the method for more scientific rationality if seeing.
In certain embodiments of the present invention, it is known that the investigation sample of users of sex is million grades of user, from them Internet behavioral data in extraction feature collection, the number of feature set is probably 100,000 features, that is to say, that each user There are 100,000 features, then men and women's most strong several features, such as 1000 are distinguished to prediction from this 100,000 Feature Selections Individual feature.
Roughing is carried out to feature present invention is primarily based at least one of three below index:Feature coverage rate, card side's system Metering, information gain-comentropy.In cases where an amount of data is large, three kinds of indexs are preferably used to clean feature, are cleared up Fall three undesirable features of index.Feature coverage rate, chi amount, information gain-comentropy are that ripe statistics is general Read, its definition for having standard and operating specification, those skilled in the art should know how to use it for cleaning feature of present invention. These three indexs act on very big in the correlation for verification characteristics with predicting target, in view of being general in statistics and information theory Suitable means, not described here any more.
Parameter estimation module 203:In the present invention, inventor is used as model basis from generalized linear model.Broad sense line Property model be not solve classification problem the only resource, such as SVMs (svm), artificial neural network (ann), Multilayer Perception Machine (mlp), decision tree (CART, C4.5) etc. can solve classification problem.But the method model training of the present invention it is simple, Error function is that convex function must have global minima point (i.e. optimal solution), predict that (identification) stage calculates extremely simple quick, phase There is no small advantage for other algorithms, be especially suitable for great scene to be measured.Various aspects can be balanced very well by being one Forecast model type selecting.
It is according to the internet behavioural characteristic of the user of a large amount of investigation known users sexes, such as above-mentioned to select most 1000 strong features obtain optimal parameter Estimation β to investigation user modeling, and modeling is just completed.It should be noted that Male's probability that μ in the embodiment of the application is represented, and 1- μ are exactly women probability.Such as model prediction user Male's probability μ be 0.7, then be rational from statistics aspect equivalent to saying that the women probability of the user is 0.3.It can be seen that When the characteristic vector of a user is 0 vector:Generally that is the significant feature of all sexes of user does not all have Hit, or there is no the record of this user in data set, then can see this user μ=0.5, i.e. male's probability 0.5, Women probability is also 0.5, is also quite reasonable.
Generalized linear models:Generalized linear model (generalized linear model, GLM) is simple minimum Two multiply recurrence (OLS) extension, in generalized linear pattern, it is assumed that the observation y of each data comes from some ED~* class. The average β of the distribution can be by explaining with the independent X of the point:E (y)=μ=g-1(Xβ)
Wherein E (y) be y desired value, X β be by:The Linear Estimation formula of unknown parameter beta to be estimated, known parameter X compositions, G is then link function, and g chooses Logit in this case, i.e.,
In the present case
X is user characteristics vector, have recorded user's internet access behavior after quantizing;
μ is sex probability
β is optimal estimates of parameters.
In investigation sample, each investigation user gender is to determine, male y=1;Women y=0 is (certainly Male y=0, women y=1 can be set).The μ come is predicted, the probability that user is male is can be regarded as, can be from formula In to find out u be between 0,1.
Feature vector, X and parameter Estimation β inner product X β can be the negative infinite real numbers between just infinite.And μ X β=- The limit is 0 during ∞, is 1 in X β=∞ limit.
β is the optimized parameter for wanting in parametric estimation step to obtain so that the μ that we estimate is got over true y error It is small better.
For example, investigation user A is male, its feature vector, XAWith β inner product XAβ is preferably a very big real number, such as+10. Like this, μA≈0.999.It is women, its feature vector, X to investigate user BBInner product with β is preferably the number negative of a very little, The μ as -10B≈ 0, so investigates user B male's probability just close to 0, thus obtained a rational prediction, It should be noted that be fixed for all user β, and everyone feature X is different, is so accomplished by obtaining one The global β minimum for all investigation user in predicting errors.This is also the purpose of parameter Estimation.
Thus, it is to obtain one group of optimal estimates of parameters β by targeted transformation, this case uses Maximum-likelihood estimation (mle) To obtain this group of parameter Estimation.In view of Maximum-likelihood estimation is conventional statistics means, those skilled in the art will be appreciated that How estimates of parameters β, therefore its mathematics deduction process not described here any more obtained with it.
Identification
Step S120 receives the network behavior data of user to be analyzed.
Testing data receiving unit 300,310 can be imported by data acquisition or data and obtain user network row to be analyzed For data.
The user network data to be analyzed imported are extracted feature by characteristic extracting module 412 and quantized.
Step S130 brings the network behavior data of the user to be analyzed into the sex-behavior model and calculates to be analyzed The sex probability of user.
Sex recognition unit 410 includes model library module 411 and sex probability evaluation entity 414.
As previously described, most strong 1000 feature selected in the user internet behavioural characteristic of magnanimity is to investigation User modeling, has obtained optimal parameter Estimation β, so as to complete modeling.When being identified, there are more than one hundred million users, do not know Road sex is, it is necessary to predict their men and women, in certain embodiments of the present invention, it can be understood as male's probability.At this time Such as 1000 features of each user are just extracted, the characteristic vector of user are obtained again with modeling the obtained β having determined, just The male's probability for the unknown subscriber for not knowing sex can be calculated.
In other words, model predictive process is as follows:
Optimal parameter Estimation β is obtained, it is only necessary to obtain user characteristics vector X to be analyzed, it is possible to calculate X β value, finally brings link function into, that is, obtains predicted value, in an embodiment of the invention, and the predicted value is one The value of male's probability can be considered as.
Feature corresponding with model, namely feature used during modeling are stored in feature set module 413.Different moulds The corresponding feature of type is also not quite similar.
It should be clear that, the dimension of user internet to be analyzed behavioural characteristic simultaneously need not be with investigation sample complete one Cause, it is only necessary to have overlapping extensively.
Be stored with the model built up in model library module 411.Model is not necessarily the only one, can be according to selection not The different models that same set of eigenvectors is trained, i.e. with different β.The result of calculation of output can also more than one, The result or synthesis result calculated according to different models can be shown.
For in theory, in the case where data volume is enough, if modeling is successful, different models are based on same user Predicting the outcome for internet behavioral data be convergent.
The reason for needing different models be:The feature extracted for specific user, the internet behavioral data collected Perhaps and the characteristic vector that is unsatisfactory for needed for a certain model vector.Therefore it provides a variety of models can improve the general of whole system Adaptive.
The technology of the present invention is a mathematical modeling and the workflow integration technology of Computational frame.Due to being used for each Family, is all to choose to sex separating capacity most strong feature to calculate from its network behavior feature so that feature extraction Coupling is relieved in itself with mathematical modeling.So, a feature extraction framework can support a variety of models, then reduce weight It is multiple to calculate, improve computational efficiency.Not only using generalized linear model in itself, it is also added into feature coarse sizing process so that Model reduces computational complexity while precision is not lost.
One of ordinary skill in the art will appreciate that all or part of step in the above method can be instructed by program Related hardware is completed, and described program can be stored in computer-readable recording medium, such as read-only storage, disk or CD Deng.Alternatively, all or part of step of above-described embodiment can also use one or more integrated circuits to realize.Accordingly Each module/unit in ground, above-described embodiment can be realized in the form of hardware, it would however also be possible to employ the shape of software function module Formula is realized.The application is not restricted to the combination of the hardware and software of any particular form.
The present invention is illustrated by the embodiment, but it is to be understood that, the embodiment is only intended to Citing and the purpose of explanation, and be not intended to limit the invention in described scope of embodiments.In addition people in the art Member according to the teachings of the present invention it is understood that the invention is not limited in the embodiment, can also make more kinds of Variants and modifications, these variants and modifications are all fallen within scope of the present invention.Protection scope of the present invention by The appended claims and its equivalent scope are defined.

Claims (10)

1. one kind identification Internet user's property method for distinguishing, including:
Extract the gender data and network behavior data of multiple investigation samples;
Sex-behavior model is built according to the gender data and the network behavior data of the investigation sample;
Receive the network behavior data of user to be analyzed;
The sex for bringing the network behavior data of the user to be analyzed into the sex-behavior model calculating user to be analyzed is general Rate.
2. according to the method described in claim 1, it is characterised in that extracting the gender data and network behavior number of investigation sample After, the data obtained are cleaned, error message is removed.
3. according to the method described in claim 1, it is characterised in that species of the network behavior data including exposure advertisement, At least one of media, the webpage url accessed, text message.
4. method according to claim 2, it is characterised in that after error message is removed, network behavior data are carried Take feature and quantize.
5. method according to claim 2, it is characterised in that after error message is removed, network behavior data are carried Feature is taken, and according to the undesirable spy of at least one of feature coverage rate, chi amount, information gain-comentropy exclusion Levy.
6. according to the method described in claim 1, it is characterised in that the sex-behavior model is generalized linear model.
7. according to the method described in claim 1, it is characterised in that the sex-behavior model is
<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>&amp;mu;</mi> <mo>=</mo> <mfrac> <mrow> <mi>exp</mi> <mrow> <mo>(</mo> <mi>X</mi> <mi>&amp;beta;</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mn>1</mn> <mo>+</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mi>X</mi> <mi>&amp;beta;</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </mtd> <mtd> <mrow> <mi>X</mi> <mi>&amp;beta;</mi> <mo>=</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <mfrac> <mi>&amp;mu;</mi> <mrow> <mn>1</mn> <mo>-</mo> <mi>&amp;mu;</mi> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced>
Wherein
X is user characteristics vector;
μ is sex probability;
β is optimal estimates of parameters.
8. a kind of system for recognizing Internet user's sex, including
Sample preparatory unit, extracts the gender data and network behavior data of multiple investigation samples;
Modeling unit, sex-behavior model is built according to the gender data and the network behavior data of the investigation sample;
Testing data receiving unit, receives the network behavior data of user to be identified;With
The network behavior data of the user to be identified are substituted into the sex-behavior model calculating and obtained by sex recognition unit The sex probability of user to be identified.
9. system according to claim 8, it is characterised in that the sex-behavior model is
<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>&amp;mu;</mi> <mo>=</mo> <mfrac> <mrow> <mi>exp</mi> <mrow> <mo>(</mo> <mi>X</mi> <mi>&amp;beta;</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mn>1</mn> <mo>+</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mi>X</mi> <mi>&amp;beta;</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </mtd> <mtd> <mrow> <mi>X</mi> <mi>&amp;beta;</mi> <mo>=</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <mfrac> <mi>&amp;mu;</mi> <mrow> <mn>1</mn> <mo>-</mo> <mi>&amp;mu;</mi> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced>
Wherein
X is user characteristics vector;
μ is sex probability;
β is optimal estimates of parameters.
10. system according to claim 8, it is characterised in that the modeling unit includes
The network behavior data extracted are converted into characteristic set, and quantize by feature extraction module;
Feature cleaning module, by with target correlation to be predicted is low, coverage rate is small, and/or the similar feature of effect is excluded;With
Parameter estimation module, estimates optimal estimates of parameters β.
CN201610134810.5A 2016-03-09 2016-03-09 Recognize Internet user's sex method and system Pending CN107180044A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610134810.5A CN107180044A (en) 2016-03-09 2016-03-09 Recognize Internet user's sex method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610134810.5A CN107180044A (en) 2016-03-09 2016-03-09 Recognize Internet user's sex method and system

Publications (1)

Publication Number Publication Date
CN107180044A true CN107180044A (en) 2017-09-19

Family

ID=59829677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610134810.5A Pending CN107180044A (en) 2016-03-09 2016-03-09 Recognize Internet user's sex method and system

Country Status (1)

Country Link
CN (1) CN107180044A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704547A (en) * 2017-09-26 2018-02-16 硕诺科技(深圳)有限公司 One kind passes through mobile phone usage behavior identity method for distinguishing
WO2019120007A1 (en) * 2017-12-22 2019-06-27 Oppo广东移动通信有限公司 Method and apparatus for predicting user gender, and electronic device
CN109948633A (en) * 2017-12-20 2019-06-28 广东欧珀移动通信有限公司 User gender prediction method, apparatus, storage medium and electronic equipment
CN111209173A (en) * 2020-01-02 2020-05-29 腾讯科技(深圳)有限公司 Performance prediction method, device, storage medium and electronic equipment
CN111311321A (en) * 2020-02-14 2020-06-19 北京百度网讯科技有限公司 User consumption behavior prediction model training method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070061314A1 (en) * 2005-02-01 2007-03-15 Outland Research, Llc Verbal web search with improved organization of documents based upon vocal gender analysis
CN103164470A (en) * 2011-12-15 2013-06-19 盛大计算机(上海)有限公司 Directional application method based on user gender distinguished results and system thereof
CN104636504A (en) * 2015-03-10 2015-05-20 飞狐信息技术(天津)有限公司 Method and system for identifying sexuality of user
CN105260414A (en) * 2015-09-24 2016-01-20 精硕世纪科技(北京)有限公司 User behavior similarity computing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070061314A1 (en) * 2005-02-01 2007-03-15 Outland Research, Llc Verbal web search with improved organization of documents based upon vocal gender analysis
CN103164470A (en) * 2011-12-15 2013-06-19 盛大计算机(上海)有限公司 Directional application method based on user gender distinguished results and system thereof
CN104636504A (en) * 2015-03-10 2015-05-20 飞狐信息技术(天津)有限公司 Method and system for identifying sexuality of user
CN105260414A (en) * 2015-09-24 2016-01-20 精硕世纪科技(北京)有限公司 User behavior similarity computing method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704547A (en) * 2017-09-26 2018-02-16 硕诺科技(深圳)有限公司 One kind passes through mobile phone usage behavior identity method for distinguishing
CN107704547B (en) * 2017-09-26 2022-01-14 英望科技(山东)有限公司 Method for identifying gender through mobile phone using behaviors
CN109948633A (en) * 2017-12-20 2019-06-28 广东欧珀移动通信有限公司 User gender prediction method, apparatus, storage medium and electronic equipment
WO2019120007A1 (en) * 2017-12-22 2019-06-27 Oppo广东移动通信有限公司 Method and apparatus for predicting user gender, and electronic device
CN111209173A (en) * 2020-01-02 2020-05-29 腾讯科技(深圳)有限公司 Performance prediction method, device, storage medium and electronic equipment
CN111209173B (en) * 2020-01-02 2023-10-31 腾讯科技(深圳)有限公司 Gender prediction method and device, storage medium and electronic equipment
CN111311321A (en) * 2020-02-14 2020-06-19 北京百度网讯科技有限公司 User consumption behavior prediction model training method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109685631B (en) Personalized recommendation method based on big data user behavior analysis
CN106485562B (en) Commodity information recommendation method and system based on user historical behaviors
CN111523976B (en) Commodity recommendation method and device, electronic equipment and storage medium
CN107180044A (en) Recognize Internet user&#39;s sex method and system
CN107562818A (en) Information recommendation system and method
CN105989074B (en) A kind of method and apparatus recommend by mobile device information cold start-up
CN107689008A (en) A kind of user insures the method and device of behavior prediction
CN111724238B (en) Method, device and equipment for evaluating product recommendation accuracy and storage medium
CN107146089A (en) The single recognition methods of one kind brush and device, electronic equipment
CN108665311B (en) Electric commercial user time-varying feature similarity calculation recommendation method based on deep neural network
CN106022800A (en) User feature data processing method and device
CN103164804A (en) Personalized method and personalized device of information push
CN106600372A (en) Commodity recommending method and system based on user behaviors
CN104239338A (en) Information recommendation method and information recommendation device
CN102254028A (en) Personalized commodity recommending method and system which integrate attributes and structural similarity
CN103853948A (en) User identity recognizing and information filtering and searching method and server
CN106447463A (en) Commodity recommendation method based on Markov decision-making process model
CN104820879A (en) User behavior information analysis method and device thereof
CN111949887A (en) Item recommendation method and device and computer-readable storage medium
CN113034238B (en) Commodity brand feature extraction and intelligent recommendation management method based on electronic commerce platform transaction
CN112417294A (en) Intelligent business recommendation method based on neural network mining model
CN111738805A (en) Behavior log-based search recommendation model generation method, device and storage medium
CN116127184A (en) Product recommendation method and device, nonvolatile storage medium and electronic equipment
CN111461827A (en) Product evaluation information pushing method and device
CN103309885A (en) Method and device for identifying feature user in electronic trading platform, search method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170919