CN107180044A - Recognize Internet user's sex method and system - Google Patents
Recognize Internet user's sex method and system Download PDFInfo
- Publication number
- CN107180044A CN107180044A CN201610134810.5A CN201610134810A CN107180044A CN 107180044 A CN107180044 A CN 107180044A CN 201610134810 A CN201610134810 A CN 201610134810A CN 107180044 A CN107180044 A CN 107180044A
- Authority
- CN
- China
- Prior art keywords
- mrow
- sex
- user
- data
- network behavior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Internet user's property method for distinguishing is recognized the present invention relates to one kind, including:Extract the gender data and network behavior data of multiple investigation samples;Sex behavior model is built according to the gender data and the network behavior data of the investigation sample;Receive the network behavior data of user to be analyzed;The sex probability that sex's model calculates user to be analyzed is brought into by the network behavior data of the user to be analyzed.The invention further relates to recognize the system of Internet user's sex, including sample preparatory unit, modeling unit, testing data receiving unit and sex recognition unit.
Description
Technical field
The invention belongs to computer realm, it is related to a kind of identification Internet user's property method for distinguishing.The invention further relates to one
The system for planting identification Internet user's sex.
Background technology
Traditionally classification problem can be realized by expert system.Selected by domain expert and class object strong correlation
Feature, such as height, body weight, there is interconnection internet behavior again, such as access automotive-type website, the behavior of women toiletries website,
Also either it is accustomed to the time of online, the body matter for checking webpage etc..The table of each feature is determined by domain expert again
Value indicative or scoring value, can be with reference to this case:Height counts -1 point less than 160cm higher than 1 point of 178cm meters, height, accesses automotive-type
Website count 1 point, access women class website meter -1 grade etc. it is such.Finally count some features of user to be predicted
Score, just predicting into male, just predicting into women less than 0 more than 0.
We can be found that the form of expert system can be with varied, but its essence is the system based on experience.By
The domain knowledge that expert provides occupies conclusive status.But in actual life, each feature for sample use for not
It is certain rationally and to prove effective, as the strong correlation of the height for Women Volleyball Players and sex be not it is evident that, even
It is wrong.It is that to choose present be also universal phenomenon that boy student, which accesses women cosmetics website, before the such as commemoration day again.Such one
Carrying out the rule of expert's customization will be become increasingly complex, and condition will be also increasingly difficult to judge, and so cost is uncontrollable, heavier
What is wanted is that regular formulation will lag behind timeliness forever, so as to cause undesirable expert system effect, time and effort consuming, dependence field
Expertise.
The content of the invention
In order to release the dependence to expertise, inventor develops with great concentration a kind of can automate and chosen and target phase to be predicted
The characteristic dimension of association, automatic training pattern obtain optimal models on time sample space, are easy to the identification of automatic prediction mutual
On-line customer's property method for distinguishing.
According to the first aspect of the invention, the present invention provides a kind of identification Internet user's property method for distinguishing, including:
Extract the gender data and network behavior data of multiple investigation samples;
Sex-behavior model is built according to the gender data and the network behavior data of the investigation sample;
Receive the network behavior data of user to be analyzed;
Bring the network behavior data of the user to be analyzed into property that the sex-behavior model calculates user to be analyzed
Other probability.
The network behavior data include exposing in the species of advertisement, media, the webpage url accessed, text message extremely
Few one kind.
Preferably, after the gender data and network behavior data of investigation sample is extracted, the data obtained are carried out
Cleaning, removes error message.
Alternatively, after error message is removed, network behavior data is extracted with feature and is quantized.
Alternatively, after error message is removed, network behavior data are extracted with feature, and according to feature coverage rate, card
At least one of square statistic, information gain-comentropy exclude undesirable feature.
Preferably, the sex-behavior model is generalized linear model.
It is highly preferred that sex-the behavior model is
Wherein
X is user characteristics vector, have recorded user's internet access behavior after quantizing;
μ is sex probability;
β is optimal estimates of parameters.
According to the second aspect of the invention, the present invention provides a kind of system for recognizing Internet user's sex, including
Sample preparatory unit, extracts the gender data and network behavior data of multiple investigation samples;
Modeling unit, sex-behavior mould is built according to the gender data and the network behavior data of the investigation sample
Type;
Testing data receiving unit, receives the network behavior data of user to be identified;With
The network behavior data of the user to be identified are substituted into the sex-behavior model and calculated by sex recognition unit
Obtain the sex probability of user to be identified.
Preferably, the sex-behavior model is
In certain embodiments of the present invention, the modeling unit includes
The network behavior data extracted are converted into characteristic set, and quantize by feature extraction module;
Feature cleaning module, by with target correlation to be predicted is low, coverage rate is small, and/or the similar feature of effect is excluded
Fall;With
Parameter estimation module, optimal estimates of parameters β is obtained using statistical method.
Preferably, optimal ginseng can be estimated by Maximum-likelihood estimation, Quasi-maximum likelihood estimators or Bayes' theorem
Number estimate β.
It is highly preferred that obtaining optimal estimates of parameters β by Maximum-likelihood estimation
According to a third aspect of the present invention, the present invention provides a kind of identification Internet user's property method for distinguishing, including:
Receive the network behavior data of user to be identified;
Network behavior data to user to be identified carry out feature extraction;
The model that the feature extracted is brought into model library is calculated;With
Export result of calculation.
Preferably, the feature extracted is converted into user characteristics vector X by quantizing.
Preferably, the model is
Wherein
X is user characteristics vector, have recorded user's internet access behavior after quantizing;
μ is sex probability;
β is optimal estimates of parameters.
In certain embodiments of the present invention, the β value can change according to the selection of user characteristics vector.
According to the fourth aspect of the invention, the present invention provides a kind of system for recognizing Internet user's sex, including
Testing data receiving unit, receives the network behavior data of user to be identified;
Characteristic extracting module, extracts the network behavior data of user to be identified and quantizes;With
The network behavior data of the user to be identified are substituted into the sex-behavior model and calculated by sex recognition unit
Obtain the sex probability of user to be identified.
Preferably, the sex recognition unit includes:
Model library module, stores sex-behavior model;With
Model corresponding with the feature extracted is calculated in sex probability evaluation entity, calling model storehouse.
Preferably, the sex recognition unit also includes:Feature set module, stores spy corresponding with sex-behavior model
Levy.
Brief description of the drawings
The preferred embodiments of the present invention will be described in detail by referring to accompanying drawing below, and make one of ordinary skill in the art more
In the described and other feature and advantage of the clear present invention, accompanying drawing:
Fig. 1 is the flow chart of the method for some embodiments of the invention.
Fig. 2 is the flow chart of the method for other embodiment of the invention.
Fig. 3 is the schematic diagram of the system of some embodiments of the invention.
Fig. 4 is the schematic diagram of the system of other embodiment of the invention.
Embodiment
In the following description, a large amount of concrete details are given to provide more thorough understanding of the invention.So
And, it will be apparent to one skilled in the art that the present invention can be able to without one or more of these details
Implement.In other examples, in order to avoid obscuring with the present invention, do not enter for some technical characteristics well known in the art
Row description.
The Web content that term " behavioral data " in the present invention is accessed and checked according to user is obtained, and is included but is not limited to
Game, film, shopping at network, browse News & Activitics, stock market etc..
Term " behavioral data " in the present invention can also further expand the second dimension data, including but not limited to time
Information, such as behavior specifically occur in some or which time interval.Specifically, the carrier of all these behavioral datas can
To be network access daily record, every network access daily record includes bulk information, such as source, Cookie ID, access time, stops
Stay time, facility information, operating system version, browser version etc..
Term " investigation sample " in the present invention refers to known gender data and the priori sample data of behavioral data.This
The data of a little historical samples can be obtained from research company or advertiser is directly provided or by monitoring of the advertisement company
Obtained in the project data of accumulation.
Term " quantizing " in the present invention refers to being changed into many information complicated and changeable into the numeral that can be measured, number
According to, then with these numeral, data set up appropriate digital model, they are changed into a series of binary codes, introduce
Computer-internal, is uniformly processed, here it is the basic process quantized.
Term " Maximum-likelihood estimation " in the present invention is a kind of statistical method, and it is used for asking the correlation of a sample set general
The parameter of rate density function.Some known random sample meets certain probability distribution, but wherein specific parameter is unknown, parameter
Estimation is exactly that, by testing several times, observation is as a result, release the big probable value of parameter using result.Maximum-likelihood estimation is to set up
In such hypothesis:Some known parameter can make the maximum probability that this sample occurs, then no longer going selection certainly, other are small
The sample of probability, therefore the parameter is selected as the actual value of estimation.
Term " generalized linear model " (Generalized linear model) in the present invention is statistically a kind of
By wide variety of linear regression model.This pattern is assumed in the distribution function and experiment of the random variable that experimenter is measured
Systematic effect (i.e. nonrandom effect) can set up its soluble correlation via link function (link function)
Function.
According to certain embodiments of the present invention, identification Internet user property method for distinguishing of the invention includes following step
Suddenly:
Data prepare
Step S100, extracts the gender data and behavioral data of multiple investigation samples.Specifically, the behavioral data root
The Web content for accessing and checking according to user is obtained (such as the species of exposure advertisement, media, the webpage url and text message accessed
Etc.), it can be questionnaire form is obtained by inquiry questionnaire information, can also be the information of other data mediums.
After these data are obtained, the data preferably to gained are cleaned.The purpose of cleaning is to filter off substantially event
The meaning disorderly user that fills out or the error message caused due to acquisition system.Certainly, if sample data is obtained by buying, lead to
Cross if may insure the degree of accuracy, then can not also do cleaning and directly use.In other words, when doing data preparation, sample prepares
Unit 100 generally comprises sample collection module 101 and sample cleaning modul 102.If being introduced directly into the sample of purchase, or have
In the case that other method ensures availability of data, then sample cleaning modul can not be set.
Prepared by data, obtain the sample data of shape such as table 1
Table 1
Cookie ID | Browse record | Sex |
1403061550041722265 | …… | F |
1403061553071964915 | …… | M |
…… | …… | …… |
Cookie is record user No. id, browses record and refers to the behavior that user produces via browser, sex F
(Female) represented with M (Male).In the present invention, the scope for browsing record widely, includes but is not limited in video
The directory that website has been seen, the purchaser record on shopping website, the forum logged in, the advertisement clicked on etc..
Modeling
Step S110 builds sex-behavior mould according to the gender data and the network behavior data of the investigation sample
Type.
As shown in figure 3, modeling unit 200 includes feature extraction module 201, feature cleaning module 202 and parameter Estimation mould
Block 203.
Feature extraction module 201:Browser behavior data in raw sample data are converted into one by feature extraction module
Group characteristic set.As shown in table 2, behavioral data can be whether classification type (for example " accessed website A ", conclusion is non-zero i.e.
1), also but value type (such as " in website B stop hourage ", such as:1.7 hours, 37 minutes;Or " in website C
Spending amount ", such as 3.2 yuan).For example, if accessed the home Web site of automobile, it is 1 to access overwriting, has not visited record
For 0 etc. such situation.
Table 2
Cookie ID | Browse record | Featuren | Featurem | Sex |
1403061550041722265 | …… | 1 | 0 | F |
1403061553071964915 | …… | 1 | 1 | M |
…… | …… | …… | …… | …… |
Table 2 is the example of the characteristic set extracted.
The original network behavior data class of user is numerous and diverse, heterogeneous.The purpose of the feature extraction module of the present invention is just
It is the form for non-structured user network behavioral data being changed into a vector such as [12, -3,1,0,2.4 ...] etc.
In certain embodiments of the present invention, three dimensional characteristics for example are respectively:Whether the family of automobile, 23 are accessed
Either with or without internet records and weekly total online duration after point.So user characteristics vector with regard to shape such as [1,0,15], represent respectively
Have accessed after the home Web site of automobile, 23 points does not have internet records and one week total online duration 15 hours.
Feature cleaning module 202:By feature extraction module 201, pole can be obtained from the network behavior data of sample
Many features.But it is inappropriate that these features, which are all used for modeling,.Need by those and target correlation to be predicted it is low, covering
The feature that rate is small, effect is similar is excluded.
The data volume that internet behavior is provided is magnanimity, and characteristic dimension is similarly magnanimity, some features even people
Brain also can not be identified for prediction user's sex probability either with or without with.In this case, seen with these characteristic dimension established models
Whether can well predict user sex, be the method for more scientific rationality if seeing.
In certain embodiments of the present invention, it is known that the investigation sample of users of sex is million grades of user, from them
Internet behavioral data in extraction feature collection, the number of feature set is probably 100,000 features, that is to say, that each user
There are 100,000 features, then men and women's most strong several features, such as 1000 are distinguished to prediction from this 100,000 Feature Selections
Individual feature.
Roughing is carried out to feature present invention is primarily based at least one of three below index:Feature coverage rate, card side's system
Metering, information gain-comentropy.In cases where an amount of data is large, three kinds of indexs are preferably used to clean feature, are cleared up
Fall three undesirable features of index.Feature coverage rate, chi amount, information gain-comentropy are that ripe statistics is general
Read, its definition for having standard and operating specification, those skilled in the art should know how to use it for cleaning feature of present invention.
These three indexs act on very big in the correlation for verification characteristics with predicting target, in view of being general in statistics and information theory
Suitable means, not described here any more.
Parameter estimation module 203:In the present invention, inventor is used as model basis from generalized linear model.Broad sense line
Property model be not solve classification problem the only resource, such as SVMs (svm), artificial neural network (ann), Multilayer Perception
Machine (mlp), decision tree (CART, C4.5) etc. can solve classification problem.But the method model training of the present invention it is simple,
Error function is that convex function must have global minima point (i.e. optimal solution), predict that (identification) stage calculates extremely simple quick, phase
There is no small advantage for other algorithms, be especially suitable for great scene to be measured.Various aspects can be balanced very well by being one
Forecast model type selecting.
It is according to the internet behavioural characteristic of the user of a large amount of investigation known users sexes, such as above-mentioned to select most
1000 strong features obtain optimal parameter Estimation β to investigation user modeling, and modeling is just completed.It should be noted that
Male's probability that μ in the embodiment of the application is represented, and 1- μ are exactly women probability.Such as model prediction user
Male's probability μ be 0.7, then be rational from statistics aspect equivalent to saying that the women probability of the user is 0.3.It can be seen that
When the characteristic vector of a user is 0 vector:Generally that is the significant feature of all sexes of user does not all have
Hit, or there is no the record of this user in data set, then can see this user μ=0.5, i.e. male's probability 0.5,
Women probability is also 0.5, is also quite reasonable.
Generalized linear models:Generalized linear model (generalized linear model, GLM) is simple minimum
Two multiply recurrence (OLS) extension, in generalized linear pattern, it is assumed that the observation y of each data comes from some ED~* class.
The average β of the distribution can be by explaining with the independent X of the point:E (y)=μ=g-1(Xβ)
Wherein E (y) be y desired value, X β be by:The Linear Estimation formula of unknown parameter beta to be estimated, known parameter X compositions,
G is then link function, and g chooses Logit in this case, i.e.,
In the present case
X is user characteristics vector, have recorded user's internet access behavior after quantizing;
μ is sex probability
β is optimal estimates of parameters.
In investigation sample, each investigation user gender is to determine, male y=1;Women y=0 is (certainly
Male y=0, women y=1 can be set).The μ come is predicted, the probability that user is male is can be regarded as, can be from formula
In to find out u be between 0,1.
Feature vector, X and parameter Estimation β inner product X β can be the negative infinite real numbers between just infinite.And μ X β=-
The limit is 0 during ∞, is 1 in X β=∞ limit.
β is the optimized parameter for wanting in parametric estimation step to obtain so that the μ that we estimate is got over true y error
It is small better.
For example, investigation user A is male, its feature vector, XAWith β inner product XAβ is preferably a very big real number, such as+10.
Like this, μA≈0.999.It is women, its feature vector, X to investigate user BBInner product with β is preferably the number negative of a very little,
The μ as -10B≈ 0, so investigates user B male's probability just close to 0, thus obtained a rational prediction,
It should be noted that be fixed for all user β, and everyone feature X is different, is so accomplished by obtaining one
The global β minimum for all investigation user in predicting errors.This is also the purpose of parameter Estimation.
Thus, it is to obtain one group of optimal estimates of parameters β by targeted transformation, this case uses Maximum-likelihood estimation (mle)
To obtain this group of parameter Estimation.In view of Maximum-likelihood estimation is conventional statistics means, those skilled in the art will be appreciated that
How estimates of parameters β, therefore its mathematics deduction process not described here any more obtained with it.
Identification
Step S120 receives the network behavior data of user to be analyzed.
Testing data receiving unit 300,310 can be imported by data acquisition or data and obtain user network row to be analyzed
For data.
The user network data to be analyzed imported are extracted feature by characteristic extracting module 412 and quantized.
Step S130 brings the network behavior data of the user to be analyzed into the sex-behavior model and calculates to be analyzed
The sex probability of user.
Sex recognition unit 410 includes model library module 411 and sex probability evaluation entity 414.
As previously described, most strong 1000 feature selected in the user internet behavioural characteristic of magnanimity is to investigation
User modeling, has obtained optimal parameter Estimation β, so as to complete modeling.When being identified, there are more than one hundred million users, do not know
Road sex is, it is necessary to predict their men and women, in certain embodiments of the present invention, it can be understood as male's probability.At this time
Such as 1000 features of each user are just extracted, the characteristic vector of user are obtained again with modeling the obtained β having determined, just
The male's probability for the unknown subscriber for not knowing sex can be calculated.
In other words, model predictive process is as follows:
Optimal parameter Estimation β is obtained, it is only necessary to obtain user characteristics vector X to be analyzed, it is possible to calculate X
β value, finally brings link function into, that is, obtains predicted value, in an embodiment of the invention, and the predicted value is one
The value of male's probability can be considered as.
Feature corresponding with model, namely feature used during modeling are stored in feature set module 413.Different moulds
The corresponding feature of type is also not quite similar.
It should be clear that, the dimension of user internet to be analyzed behavioural characteristic simultaneously need not be with investigation sample complete one
Cause, it is only necessary to have overlapping extensively.
Be stored with the model built up in model library module 411.Model is not necessarily the only one, can be according to selection not
The different models that same set of eigenvectors is trained, i.e. with different β.The result of calculation of output can also more than one,
The result or synthesis result calculated according to different models can be shown.
For in theory, in the case where data volume is enough, if modeling is successful, different models are based on same user
Predicting the outcome for internet behavioral data be convergent.
The reason for needing different models be:The feature extracted for specific user, the internet behavioral data collected
Perhaps and the characteristic vector that is unsatisfactory for needed for a certain model vector.Therefore it provides a variety of models can improve the general of whole system
Adaptive.
The technology of the present invention is a mathematical modeling and the workflow integration technology of Computational frame.Due to being used for each
Family, is all to choose to sex separating capacity most strong feature to calculate from its network behavior feature so that feature extraction
Coupling is relieved in itself with mathematical modeling.So, a feature extraction framework can support a variety of models, then reduce weight
It is multiple to calculate, improve computational efficiency.Not only using generalized linear model in itself, it is also added into feature coarse sizing process so that
Model reduces computational complexity while precision is not lost.
One of ordinary skill in the art will appreciate that all or part of step in the above method can be instructed by program
Related hardware is completed, and described program can be stored in computer-readable recording medium, such as read-only storage, disk or CD
Deng.Alternatively, all or part of step of above-described embodiment can also use one or more integrated circuits to realize.Accordingly
Each module/unit in ground, above-described embodiment can be realized in the form of hardware, it would however also be possible to employ the shape of software function module
Formula is realized.The application is not restricted to the combination of the hardware and software of any particular form.
The present invention is illustrated by the embodiment, but it is to be understood that, the embodiment is only intended to
Citing and the purpose of explanation, and be not intended to limit the invention in described scope of embodiments.In addition people in the art
Member according to the teachings of the present invention it is understood that the invention is not limited in the embodiment, can also make more kinds of
Variants and modifications, these variants and modifications are all fallen within scope of the present invention.Protection scope of the present invention by
The appended claims and its equivalent scope are defined.
Claims (10)
1. one kind identification Internet user's property method for distinguishing, including:
Extract the gender data and network behavior data of multiple investigation samples;
Sex-behavior model is built according to the gender data and the network behavior data of the investigation sample;
Receive the network behavior data of user to be analyzed;
The sex for bringing the network behavior data of the user to be analyzed into the sex-behavior model calculating user to be analyzed is general
Rate.
2. according to the method described in claim 1, it is characterised in that extracting the gender data and network behavior number of investigation sample
After, the data obtained are cleaned, error message is removed.
3. according to the method described in claim 1, it is characterised in that species of the network behavior data including exposure advertisement,
At least one of media, the webpage url accessed, text message.
4. method according to claim 2, it is characterised in that after error message is removed, network behavior data are carried
Take feature and quantize.
5. method according to claim 2, it is characterised in that after error message is removed, network behavior data are carried
Feature is taken, and according to the undesirable spy of at least one of feature coverage rate, chi amount, information gain-comentropy exclusion
Levy.
6. according to the method described in claim 1, it is characterised in that the sex-behavior model is generalized linear model.
7. according to the method described in claim 1, it is characterised in that the sex-behavior model is
<mfenced open = "" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mi>&mu;</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<mi>X</mi>
<mi>&beta;</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mn>1</mn>
<mo>+</mo>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<mi>X</mi>
<mi>&beta;</mi>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>X</mi>
<mi>&beta;</mi>
<mo>=</mo>
<mi>l</mi>
<mi>n</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mi>&mu;</mi>
<mrow>
<mn>1</mn>
<mo>-</mo>
<mi>&mu;</mi>
</mrow>
</mfrac>
<mo>)</mo>
</mrow>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
Wherein
X is user characteristics vector;
μ is sex probability;
β is optimal estimates of parameters.
8. a kind of system for recognizing Internet user's sex, including
Sample preparatory unit, extracts the gender data and network behavior data of multiple investigation samples;
Modeling unit, sex-behavior model is built according to the gender data and the network behavior data of the investigation sample;
Testing data receiving unit, receives the network behavior data of user to be identified;With
The network behavior data of the user to be identified are substituted into the sex-behavior model calculating and obtained by sex recognition unit
The sex probability of user to be identified.
9. system according to claim 8, it is characterised in that the sex-behavior model is
<mfenced open = "" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mi>&mu;</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<mi>X</mi>
<mi>&beta;</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mn>1</mn>
<mo>+</mo>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<mi>X</mi>
<mi>&beta;</mi>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>X</mi>
<mi>&beta;</mi>
<mo>=</mo>
<mi>l</mi>
<mi>n</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mi>&mu;</mi>
<mrow>
<mn>1</mn>
<mo>-</mo>
<mi>&mu;</mi>
</mrow>
</mfrac>
<mo>)</mo>
</mrow>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
Wherein
X is user characteristics vector;
μ is sex probability;
β is optimal estimates of parameters.
10. system according to claim 8, it is characterised in that the modeling unit includes
The network behavior data extracted are converted into characteristic set, and quantize by feature extraction module;
Feature cleaning module, by with target correlation to be predicted is low, coverage rate is small, and/or the similar feature of effect is excluded;With
Parameter estimation module, estimates optimal estimates of parameters β.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610134810.5A CN107180044A (en) | 2016-03-09 | 2016-03-09 | Recognize Internet user's sex method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610134810.5A CN107180044A (en) | 2016-03-09 | 2016-03-09 | Recognize Internet user's sex method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107180044A true CN107180044A (en) | 2017-09-19 |
Family
ID=59829677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610134810.5A Pending CN107180044A (en) | 2016-03-09 | 2016-03-09 | Recognize Internet user's sex method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107180044A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704547A (en) * | 2017-09-26 | 2018-02-16 | 硕诺科技(深圳)有限公司 | One kind passes through mobile phone usage behavior identity method for distinguishing |
WO2019120007A1 (en) * | 2017-12-22 | 2019-06-27 | Oppo广东移动通信有限公司 | Method and apparatus for predicting user gender, and electronic device |
CN109948633A (en) * | 2017-12-20 | 2019-06-28 | 广东欧珀移动通信有限公司 | User gender prediction method, apparatus, storage medium and electronic equipment |
CN111209173A (en) * | 2020-01-02 | 2020-05-29 | 腾讯科技(深圳)有限公司 | Performance prediction method, device, storage medium and electronic equipment |
CN111311321A (en) * | 2020-02-14 | 2020-06-19 | 北京百度网讯科技有限公司 | User consumption behavior prediction model training method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070061314A1 (en) * | 2005-02-01 | 2007-03-15 | Outland Research, Llc | Verbal web search with improved organization of documents based upon vocal gender analysis |
CN103164470A (en) * | 2011-12-15 | 2013-06-19 | 盛大计算机(上海)有限公司 | Directional application method based on user gender distinguished results and system thereof |
CN104636504A (en) * | 2015-03-10 | 2015-05-20 | 飞狐信息技术(天津)有限公司 | Method and system for identifying sexuality of user |
CN105260414A (en) * | 2015-09-24 | 2016-01-20 | 精硕世纪科技(北京)有限公司 | User behavior similarity computing method and device |
-
2016
- 2016-03-09 CN CN201610134810.5A patent/CN107180044A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070061314A1 (en) * | 2005-02-01 | 2007-03-15 | Outland Research, Llc | Verbal web search with improved organization of documents based upon vocal gender analysis |
CN103164470A (en) * | 2011-12-15 | 2013-06-19 | 盛大计算机(上海)有限公司 | Directional application method based on user gender distinguished results and system thereof |
CN104636504A (en) * | 2015-03-10 | 2015-05-20 | 飞狐信息技术(天津)有限公司 | Method and system for identifying sexuality of user |
CN105260414A (en) * | 2015-09-24 | 2016-01-20 | 精硕世纪科技(北京)有限公司 | User behavior similarity computing method and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704547A (en) * | 2017-09-26 | 2018-02-16 | 硕诺科技(深圳)有限公司 | One kind passes through mobile phone usage behavior identity method for distinguishing |
CN107704547B (en) * | 2017-09-26 | 2022-01-14 | 英望科技(山东)有限公司 | Method for identifying gender through mobile phone using behaviors |
CN109948633A (en) * | 2017-12-20 | 2019-06-28 | 广东欧珀移动通信有限公司 | User gender prediction method, apparatus, storage medium and electronic equipment |
WO2019120007A1 (en) * | 2017-12-22 | 2019-06-27 | Oppo广东移动通信有限公司 | Method and apparatus for predicting user gender, and electronic device |
CN111209173A (en) * | 2020-01-02 | 2020-05-29 | 腾讯科技(深圳)有限公司 | Performance prediction method, device, storage medium and electronic equipment |
CN111209173B (en) * | 2020-01-02 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Gender prediction method and device, storage medium and electronic equipment |
CN111311321A (en) * | 2020-02-14 | 2020-06-19 | 北京百度网讯科技有限公司 | User consumption behavior prediction model training method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109685631B (en) | Personalized recommendation method based on big data user behavior analysis | |
CN106485562B (en) | Commodity information recommendation method and system based on user historical behaviors | |
CN111523976B (en) | Commodity recommendation method and device, electronic equipment and storage medium | |
CN107180044A (en) | Recognize Internet user's sex method and system | |
CN107562818A (en) | Information recommendation system and method | |
CN105989074B (en) | A kind of method and apparatus recommend by mobile device information cold start-up | |
CN107689008A (en) | A kind of user insures the method and device of behavior prediction | |
CN111724238B (en) | Method, device and equipment for evaluating product recommendation accuracy and storage medium | |
CN107146089A (en) | The single recognition methods of one kind brush and device, electronic equipment | |
CN108665311B (en) | Electric commercial user time-varying feature similarity calculation recommendation method based on deep neural network | |
CN106022800A (en) | User feature data processing method and device | |
CN103164804A (en) | Personalized method and personalized device of information push | |
CN106600372A (en) | Commodity recommending method and system based on user behaviors | |
CN104239338A (en) | Information recommendation method and information recommendation device | |
CN102254028A (en) | Personalized commodity recommending method and system which integrate attributes and structural similarity | |
CN103853948A (en) | User identity recognizing and information filtering and searching method and server | |
CN106447463A (en) | Commodity recommendation method based on Markov decision-making process model | |
CN104820879A (en) | User behavior information analysis method and device thereof | |
CN111949887A (en) | Item recommendation method and device and computer-readable storage medium | |
CN113034238B (en) | Commodity brand feature extraction and intelligent recommendation management method based on electronic commerce platform transaction | |
CN112417294A (en) | Intelligent business recommendation method based on neural network mining model | |
CN111738805A (en) | Behavior log-based search recommendation model generation method, device and storage medium | |
CN116127184A (en) | Product recommendation method and device, nonvolatile storage medium and electronic equipment | |
CN111461827A (en) | Product evaluation information pushing method and device | |
CN103309885A (en) | Method and device for identifying feature user in electronic trading platform, search method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170919 |