CN105389714A - Method for identifying user characteristic from behavior data - Google Patents

Method for identifying user characteristic from behavior data Download PDF

Info

Publication number
CN105389714A
CN105389714A CN201510701305.XA CN201510701305A CN105389714A CN 105389714 A CN105389714 A CN 105389714A CN 201510701305 A CN201510701305 A CN 201510701305A CN 105389714 A CN105389714 A CN 105389714A
Authority
CN
China
Prior art keywords
user
distribution
behavioural characteristic
personality
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510701305.XA
Other languages
Chinese (zh)
Other versions
CN105389714B (en
Inventor
马亮
周鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hcr Beijing Co Ltd
Original Assignee
Hcr Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hcr Beijing Co Ltd filed Critical Hcr Beijing Co Ltd
Priority to CN201510701305.XA priority Critical patent/CN105389714B/en
Publication of CN105389714A publication Critical patent/CN105389714A/en
Application granted granted Critical
Publication of CN105389714B publication Critical patent/CN105389714B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a method for identifying a user characteristic from behavior data. The method comprises the steps of establishing a behavioral characteristic database, calculating the distribution information of a behavioral characteristic which appears in user behavior data, obtaining the individual distribution, classfication distribution and global distribution corresponding to the behavioral characteristic, and comprehensively calculating the final distribution result of the behavioral characteristic; evaluating the probability evaluation value of an associated user characteristic; completing the shallow user characteristic calculation; calculating the final evaluation result of a deep layer label comprised by the user; and obtaining all labels which are a finally analyzed user characteristic. The method has the advantages of simple model structure and parameter and low computational complexity, good performance and trash page identification effect are obtained in experimental test data, the promotion and adaptability are good, and the identification effect is objective, reliable and comprehensive.

Description

A kind of method of subordinate act data identification user personality
Technical field
The present invention relates to internet arena, specifically a kind of method of subordinate act data identification user personality.
Background technology
1. user behavior data
User behavior data refers to that people is individual as behavior, the digitized record result of daily all behaviors.Along with developing rapidly of internet and mobile Internet, on line, behavior has become the important composition of mankind's daily behavior, and behavioral data on the line corresponded, then account for more than 90% of daily recordable user behavior data total amount, from this angle, representative of consumer behavioral data can be carried out with behavioral data on line.
Behavioral data on line, can be divided into several large classification by affiliated behavior scene: mobile App behavior, change in location behavior, search behavior, web page browsing behavior, purchase transaction behavior, Social behaviors etc.The source scene of every class data, attribute, generate pattern are all different.Along with carrying out of internet/mobile internet service, customer group scale large (having covered more than 7 one-tenth of daily population) on line, the behavioral data scale of construction produced is huge especially.With each user, every day, behavioral data can reach thousands of, was greater than 100,000 every year.The user search behavioral data that Baidu records, every day is just close to 10,000,000,000.
So abundant/large-scale behavioral data can disclose a lot of personal characteristicses of user, has very large commercial value.As can be found the shopping characteristic (buying product and Brang Preference) of user by search, purchase transaction behavioral data, electric commercial business industry can carry out precisely personalized commercial product recommending based on this.Can find user's Social Characteristics (as interest and values) by Social behaviors data, a large amount of enterprise can based on hobby for user provides the service of more mating (as Intelligent friend-making).
2. user personality
User personality, refers to the feature that user shows based on self background and behavior in user study field.This feature can define/describe certain side and the tendency of user.User personality comprises a lot of aspect, as natural characteristic (as the male sex, after 90s, old man, fat, live Beijing), life characteristic (post, occupation, has private car ...), interest (is liked basketball, is liked to see a film ...), shopping preferences (likes brand, use cosmetics type), values and life style (as liked big shot, pursuit quality, petty bourgeoisie, consuming capacity strong).
User personality is from the description to the one obtained after user's long-term observation qualitative (non-quantitation), various dimensions.It is from the primitive attribute information of user and long-term action; but conceal primitive attribute detail; so both protect the privacy of user (as the ID (identity number) card information from user; available user personality is women, after 80s; but can not the corresponding concrete birthday), also have more extensive promotional value.
Current, user personality has used for reference the thinking of internet, defines concrete attribute by labeling mode.Each user personality can think a label of user, and all characteristics of such user can combine definition by a series of label.To the analysis of the characteristic of user, just become the analysis to user tag.User personality main users label replaces hereinafter.
3. user personality (label) analysis identifies
Because user tag (user personality) embodies a large amount of user's internal informations (as interest preference), huge commercial value (recommending as done corresponding commodity and service for user interest class label) can be brought, so how to analyze and accurately identify user tag, relevant method is from the extensive attention having received user study and commercial application field since 2014.
User personality is analyzed mainly through two kinds of mechanism.(1) based on a large number of users base attribute information (as identification card number/position/inhabitation address etc.), this mode data covering scope is narrow, and analyzable user personality is limited, also there is the problem revealing privacy of user, so less use simultaneously.(2) based on user behavior data.Extract label by analyzing user personality to the excavation of user behavior, this pattern does not relate to privacy of user, and the mass users behavioral data of internet/mobile Internet also provides enough Data supports simultaneously.Thus current main analytical model is become.
Based in the analysis mechanisms of user behavior, not needing the social mark (as identification card number) of any direct private data (as home address) of user and actual life, is that the behavior history continued by user carrys out abstract generalization.Each user is uniquely denoted as insignificant digital id (cannot correspond to real-life concrete personnel, as being u001), to be derived its genuine property labelling by long-term action (the such as mobile phone A pp use/web page browsing/purchase transaction etc.) data of this id.For an example intuitively, start us to know nothing user u001, but find behavioral data from its half a year: the conventional beautiful figure of its mobile phone A pp is elegant to autodyne and opens certain Yoga application, the love that browses web sites goes fragrant plant Sha fashion and Sina's tourism, shopping online often buys Imported Milk, and we are easy to just to analyze this user (high likelihood) features tab and comprise: women (peppery mother), like fashion, there is baby in hobby Yoga, family.In actual applications, because behavioral data scene is various, huge, the scale of the user that analyze also, all usually more than 1,000,000 ranks, must have been come by the analytical approach of robotization.
The method of automated analysis user tag, current main-stream is the pattern (how being adopted by internet/ electricity Shang class enterprise) based on keyword (behavioural characteristic keyword).Basic skills is as follows:
Keyword in define behavior, the classification setting its correspondence and the user tag associated (user personality).
Calculate the statistically information (as frequency) that keyword occurs in behavioral data, and be mapped to the frequency of the user tag of association.
The user personality that statistics frequency is high is considered to the final response of user, remains.
Above method is used in analysis part user tag (shopping and Brang Preference class) in specific behavior scene (purchase transaction behavior), and the user tag identification and follow-up accurate sale that are well suited for electric business/internet are recommended.But the method is difficult to use in other (as App use/navigation patterns etc.) more valuable behavior scenes, thus cannot find more fully user tag.And relatively simple evaluation mechanism not only accuracy is poor, and the characteristic (being commonly referred to top layer user tag) on user surface can only be analyzed, be difficult to excavate its deep layer characteristic (deep layer label).Normal purchase diet coke and xylitol in such as certain user's Shopping Behaviors, the discovery user tag that existing method can only isolate likes cola, preference Coca-Cola brand and eats xylitol, but comprehensively cannot disclose the speciality that user hides: a large amount of sugarless products, illustrate that it may be diabetic.This speciality is called as deep layer user tag (user tag directly cannot deduced by user behavior data).Clearly, deep layer label is more meaningful and using value is larger (commercial product recommending for diabetic is more accurate, and user's acceptance also can be higher).
Summary of the invention
The object of the invention is to the deficiency of the correlation technique analyzing user personality for existing Behavior-based control datamation, a kind of method of subordinate act data identification user personality is provided.The method, based on more fully user behavior feature database, comprehensively introduces multiple distribution (self, affiliated classification, the overall situation) feature of behavioural characteristic, feature and user personality is characterized to reach by probability to associate more accurately.Adopt multiple order derivation method simultaneously, find deep layer user tag further by top layer characteristic.Compared with existing analytical algorithm, analysis result of the present invention more accurately and have more the degree of depth, and has versatility, is applicable to all behavior scenes, so that research more comprehensively user personality.
For achieving the above object, the invention provides following technical scheme:
A method for subordinate act data identification user personality, comprises the following steps:
1) set up behavioral characteristic database, comprise behavioural characteristic definition storehouse, behavioural characteristic-user personality mapping ruler storehouse, behavioural characteristic distributed data and user personality and deduce storehouse;
Behavioural characteristic definition storehouse defines the base attribute of all behavioural characteristic/user personalities related to;
Behavioural characteristic-user personality mapping ruler storehouse defines each behavioural characteristic and how to be mapped to user personality;
Behavioural characteristic distributed data is the distributed data calculating behavioural characteristic from full dose behavioral data;
User personality deduces the rule of inference of storehouse definition shallow-layer label and deep layer label;
2) to a user, calculate the distributed intelligence of certain behavioural characteristic occurred in this user behavior data, reentry the behavior feature corresponding individual distribution, classification distribution and the overall situation distribution; Classification distribution and the overall situation are distributed as benchmark, in conjunction with weighting algorithm, is distributed by individual's distribution, classification distribution and the overall situation, the final distribution results of COMPREHENSIVE CALCULATING behavior feature;
3) based on the final distribution results of the behavioural characteristic of this user, the possibility assessed value of the user personality associated by assessment, represents with probability;
4), after completing all tag computation involved by user behavior feature, basic shallow-layer user personality has calculated;
5) storehouse is deduced based on user personality again, find the shallow-layer user personality that has been identified of active user user's deep layer tag feature of deducing out, and based on deduction pattern, calculate the final assessment result of the deep layer label that user has further, represent with probability;
6) all labels of certain user of calculating of said method, namely shallow-layer label and deep layer label and dependent evaluation value, be namely the user personality that final analysis goes out.
As the further scheme of the present invention: behavioural characteristic distributed data, comprising: calculate classification distributed data Fc: based on the classification belonging to each behavioural characteristic, add up this distribution frequency being sorted in full dose behavioral data or user's accounting;
Calculate overall distributed data Fg: statistics behavioral data contains all users of behavior feature, is as the criterion with the colony of this match user, being evenly distributed of the statistical correlation overall situation.
As the further scheme of the present invention: judge that the deduction pattern of shallow-layer label deduction deep layer label is based on possibility probability or distribution threshold value; If when deducing based on possibility probability, then shallow-layer label derives the Credibility probability of deep layer label between 0-1; If when deducing based on distribution threshold value, produce the minimum distribution threshold value of deriving, exceed the possibility being considered to have this deep layer label of distribution threshold value.
As the further scheme of the present invention: step 3) if in user have multiple behavioural characteristic to be mapped to same label, the then final assessment result of this label, based on the independent same distribution principle of Probability Statistics Theory, obtained by the possibility assessed value COMPREHENSIVE CALCULATING of multiple behavioural characteristic.
Compared with prior art, the invention has the beneficial effects as follows:
The present invention can analyze the features tab's (comprising deep layer features tab) finding user from mass users behavioral data.Simply, algorithm complex is low, and test data of experiment achieves good performance and spam page recognition effect for model structure and parameter.The present invention has good generalization and adaptability, and objective, reliable, comprehensive feature that recognition effect has, has a good application prospect.
Accompanying drawing explanation
Fig. 1 is actual user personality analytic process figure;
Fig. 2 is the corresponding relation figure of user personality/behavioural characteristic and characteristic key words.
Embodiment
Below in conjunction with the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
The present invention completes on computers, successively containing following steps:
Step 1 sets up behavioral characteristic database
Behavioral characteristic database is for automatically calculating the valuable source of user personality in this method, and from artificial (user study expert), mark and programming count calculate on a small quantity.Relevant work comprises:
Step 1.1: the behavioural characteristic creating behavioural characteristic and user personality defines storehouse, defines the base attribute of all behavioural characteristic/user personalities related to.The attribute of behavioural characteristic is as shown in table 1.
Table 1
The attribute of user personality is as shown in table 2.
Table 2
Step 1.2: the mapping ruler storehouse creating behavioural characteristic-user personality, defines each behavioural characteristic and how to be mapped to user personality.There is the situation of the corresponding multiple behavioral trait of a user personality.The definition of the mapping ruler relation of behavioural characteristic and user personality is as table 3.
Table 3
Step 1.3: the distributed data calculating behavioural characteristic from full dose behavioral data.Comprise:
Calculate classification distributed data Fc: based on the classification (table 1) belonging to each behavioural characteristic, add up this distribution being sorted in full dose behavioral data (frequency/user's accounting etc.).
Calculate overall distributed data Fg: statistics behavioral data contains all users of behavior feature, is as the criterion with the colony of this match user, being evenly distributed of the statistical correlation overall situation average frequency of the behavioural characteristic (/ account for full number of users ratio etc.).
Step 1.4: create the user personality deduction storehouse that deep layer label deduced by shallow-layer label, define the user personality that the behavioral trait how passing through shallow-layer label finds deep layer.The multiple shallow-layer labels of usual needs derivation deep layer jointly label condition.The definition of the rule of inference relation of shallow-layer user personality and deep layer user personality is as table 4.
Table 4
Step 2 calculates user personality corresponding to single behavioural characteristic
The user that step 2.1 adds up behavioural characteristic distributes substantially
According to the behavioural characteristic P that table 1 defines, obtain all key word keyword of association, and according to keywords inquire about in user behavior data.If behavioral data relates to Chinese (title as browsing content), then need to do corresponding word segmentation processing (certain participle program can be selected, as " ICTCLAS3.0 Chinese automatic word-cut ") in advance.The behavioral data record (being set to set DSet) of coupling is for analyzing the relevant feature of the behavioural characteristic P of user.
For user U, to the behavior record collection DSet of coupling, add up the distribution situation PFu of the behavioural characteristic P that this user has, such as there is total degree, unit time (can for day/moon etc.) average frequency etc., and do smoothing processing (as extraction of square root), avoid the impact of abnormal extreme value.
Step 2.2, based on three kinds of properties of distributions, calculates the final credible distribution Pf of behavioural characteristic P
To the behavioural characteristic P of user U, the overall distributed data Fg of all users that classification distributed data Fc and behavioural characteristic P relates to belonging to User behavior feature P in subordinate act property data base.Based on PFu, Fc and Fg tri-distribution, calculate the final credible distribution Pf of behavioural characteristic P.Pf=K 1*PFu+K 2*Fc+K 3*Fg。K 1+ K 2+ K 3=1.0, and K 1usually at (0.6-0.8), undulating quantity is determined by the ratio of PFu/Fc and PFu/Fg.
Step 2.3 calculates the possibility assessed value TPu of user personality T corresponding to behavioural characteristic P
The final credible distribution Pf of Behavior-based control feature P, calculates the possibility assessed value TPu generating respective user characteristic T.
TPu=f (Pf, Rate), f are binomial function, and final credible distribution Pf is the credible distribution of behavioural characteristic P, and Rate is the derivation probability (defining in step 1.2) of behavioural characteristic P and corresponding label Tag.
Derived by a behavioural characteristic P since then and obtain the final assessment possibility (probability) that user has user personality T.
Step 3 calculates the assessment result Tu of the user personality that many behavioural characteristics are derived
Because multiple behavioural characteristic can illustrate that user has identical characteristic (can illustrate that user reads news as accessed multiple news website).So the final evaluation of user personality T, needs the TPu of all behavioural characteristics according to association to do final analysis.
Calculate the possibility assessed value TPu of user personality corresponding to each behavioural characteristic in step 2.The set of behavioural characteristic P of user personality T of supposing to derive is PSet (P1, P2, P3...Pn), the corresponding behavioural characteristic (obtaining from table 3) of each behavioural characteristic P, then the assessment result Tu of this user personality can be calculated as follows: Tu=f (TPu 1, TPu 2..., TPu n), TPu 1, TPu 2..., TPu nit is the assessment result of all behavioural characteristics of PSet.N is generally between 10-20.
Assessment result Tu finally defines the possibility (between 0-1) that user has (shallow-layer) user personality T.
If the tag evaluates results set of user U is UT, user personality T (user personality T and assessment result Tu) is added tag evaluates result set UT.
To all shallow-layer user personalities (non-deep layer user personality), repeat step 2 to 3, the calculating of all assessment result Tu that completing user U relates to.The result of all shallow-layer user personalities that user U has is obtained in such tag evaluates result set UT.
Step 4 calculates the assessment result TDu of the deep layer user personality of user U
To all shallow-layer user personalities (noting it not being behavioural characteristic) of the user U that preceding step obtains, assuming that set is TLSet (TL1, TL2, TL3 ...), TLx is the shallow-layer user personality of user U.To each deep layer label (table 2 defines) TLx, the derivation rule of look-up table 4 and deep layer label TagD, and carry out correlation computations according to relevant derivation pattern (possibility probability still distribute threshold value), finally generate the assessment result TDu of all deep layer user personalities derived.
TagD (TagD and assessment result TDu) is joined the tag evaluates result set UT of user U.
Cycling, can generate all possible deep layer user personality to user U.
After completing above each step, obtain tag evaluates result set UT, be the assessment result collection of all labels (containing shallow-layer label and deep layer label) of user U, respective labels (evaluation of estimate) quantizes the final response representing user U.
Relevant algorithm completes enforcement by software " HCR is large, and data user researchs and analyses platform ".This software is by java language development, and the programming realization related algorithm of the inventive method, completes the whole processing procedures obtaining user personality label based on the large data of new method analytical behavior.Main functional modules and process comprise:
Tag control module: for setting up user personality system, and carry out related setting (the label definition of the table 2 of step 1.1, the deduction relation etc. that step 1.4 defines) for different business and scene.
Basic evaluation/labeling module: artificial mark and management are fast realized to the basic resource (behavioural characteristic that step 1.1-1.3 relates to/relevance deduces setting etc.) needed for analyzing.
Data preprocessing module: the automatic pre-service that the user behavior data of magnanimity is correlated with.Comprise the cleaning of the importing of raw data, unnormal number certificate, and calculate correlation distribution data etc. required in generation step 1.3.
Label analysis module: the core analysis module achieving algorithm.Automatically pretreated behavioral data is carried out to the actual analysis (all calculating of step 2-step 4) of label, analysis result is recorded in results repository.Because related data amount and computational load are large, so program supports the framework of Distributed Calculation, can complete by server cluster is concurrent.
Result display module: the user personality analysis result obtained computational analysis, the visual statistics of sing on web and chart, carry out relevant displaying, so that researchist carries out actual analysis.
Actual treatment flow process is shown in Fig. 1.
(1) user personality system and relevant label deduction relation is defined.Relevant setting is manually completed by researcher.
(2) define behavior feature deduces relation to relevant label.Some work is by manually arranging, and the work of residue major part is obtained by statistics.The structural system that user personality/behavioural characteristic is correlated with is as Fig. 2.
(3) user behavior data for analyzing, first carries out pre-service.Comprise basic data cleansing (based on ETL instrument), and calculate relevant distributed intelligence.
(4) select a user, perform follow-up operation.
(5) user personality (shallow-layer label) assessed value is obtained according to the single behavioural characteristic analysis of user.
(6) to the multiple behavioural characteristics associated by user personality, comprehensive assessment obtains the final assessed value of user personality.
(7) return step (5), continue to perform.Until complete the analysis of all shallow-layer labels.Turn to next step.
(8) based on all shallow-layer labels obtained, all deep layer user personalities generating this user are analyzed.
(9) set of all analyzing tags results exports as final analysis result.
(10) step (4) is turned to.
In order to verify validity and the versatility of the inventive method, carry out correlation test.
Select two important behavior scenes: mobile Internet App usage behavior (user uses the detailed record of App) and web page browsing (browsing of each internet web page) are tested.To 200 selected general-purpose families, extract real behavior data set: mobile App behavioral data (Continuous behavior of 6 months, 5,800,000,000) and web page browsing behavioral data (the continuous browsing histories of 3 months, 1.2 hundred million).
After having carried out the foundation of relevant initial mark and basic label system (about 150 shallow-layer labels, 20 deep layer labels), by related software, actual analysis test is carried out to related data.Finally compare with the label analysis result that analysis result and this crowd of user obtain based on classic method.Result is as follows:
The ability of discovery of user personality: the ability of discovery of shallow-layer label, upper similar with classic method ability in specific classification (interest/shopping preferences), but on more classification (as natural quality, characteristic of living etc.) label analyzable label more than classic method more than 50%.For deep layer label, new method can analyze 15, and none identification of classic method.
The precision that label is analyzed: the label analysis result (inclined interest and shopping preferences class) jointly had two kinds of methods, manually differentiates.The analysis result of random sampling 1000 user, by user study, personnel pass judgment on, new method possibility result, and precision is higher than classic method by 23%.
Algorithm adaptive faculty to behavior scene: classic method is good at the behavior of shopping at network, and this method is except being applicable to this scene, also effectively can be applied to mobile App behavior and navigation patterns scene simultaneously.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned one exemplary embodiment, and when not deviating from spirit of the present invention or essential characteristic, the present invention can be realized in other specific forms.Therefore, no matter from which point, all should embodiment be regarded as exemplary, and be nonrestrictive, scope of the present invention is limited by claims instead of above-mentioned explanation, and all changes be therefore intended in the implication of the equivalency by dropping on claim and scope are included in the present invention.
In addition, be to be understood that, although this instructions is described according to embodiment, but not each embodiment only comprises an independently technical scheme, this narrating mode of instructions is only for clarity sake, those skilled in the art should by instructions integrally, and the technical scheme in each embodiment also through appropriately combined, can form other embodiments that it will be appreciated by those skilled in the art that.

Claims (4)

1. a method for subordinate act data identification user personality, is characterized in that, comprise the following steps:
1) set up behavioral characteristic database, comprise behavioural characteristic definition storehouse, behavioural characteristic-user personality mapping ruler storehouse, behavioural characteristic distributed data and user personality and deduce storehouse;
Behavioural characteristic definition storehouse defines the base attribute of all behavioural characteristic/user personalities related to;
Behavioural characteristic-user personality mapping ruler storehouse defines each behavioural characteristic and how to be mapped to user personality;
Behavioural characteristic distributed data is the distributed data calculating behavioural characteristic from full dose behavioral data;
User personality deduces the rule of inference of storehouse definition shallow-layer label and deep layer label;
2) to a user, calculate the distributed intelligence of certain behavioural characteristic occurred in this user behavior data, reentry the behavior feature corresponding individual distribution, classification distribution and the overall situation distribution; Classification distribution and the overall situation are distributed as benchmark, in conjunction with weighting algorithm, is distributed by individual's distribution, classification distribution and the overall situation, the final distribution results of COMPREHENSIVE CALCULATING behavior feature;
3) based on the final distribution results of the behavioural characteristic of this user, the possibility assessed value of the user personality associated by assessment, represents with probability;
4), after completing all tag computation involved by user behavior feature, basic shallow-layer user personality has calculated;
5) storehouse is deduced based on user personality again, find the shallow-layer user personality that has been identified of active user user's deep layer tag feature of deducing out, and based on deduction pattern, calculate the final assessment result of the deep layer label that user has further, represent with probability;
6) all labels of certain user of calculating of said method, namely shallow-layer label and deep layer label and dependent evaluation value, be namely the user personality that final analysis goes out.
2. the method for subordinate act data identification user personality according to claim 1, it is characterized in that, behavioural characteristic distributed data, comprising: calculate classification distributed data Fc: based on the classification belonging to each behavioural characteristic, add up this distribution frequency being sorted in full dose behavioral data or user's accounting;
Calculate overall distributed data Fg: statistics behavioral data contains all users of behavior feature, is as the criterion with the colony of this match user, being evenly distributed of the statistical correlation overall situation.
3. the method for subordinate act data identification user personality according to claim 1, is characterized in that, judges that the deduction pattern of shallow-layer label deduction deep layer label is based on possibility probability or distribution threshold value; If when deducing based on possibility probability, then shallow-layer label derives the Credibility probability of deep layer label between 0-1; If when deducing based on distribution threshold value, produce the minimum distribution threshold value of deriving, exceed the possibility being considered to have this deep layer label of distribution threshold value.
4. the method for subordinate act data identification user personality according to claim 1, it is characterized in that, step 3) if in user have multiple behavioural characteristic to be mapped to same label, the then final assessment result of this label, based on the independent same distribution principle of Probability Statistics Theory, obtained by the possibility assessed value COMPREHENSIVE CALCULATING of multiple behavioural characteristic.
CN201510701305.XA 2015-10-23 2015-10-23 Method for identifying user characteristics from behavior data Active CN105389714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510701305.XA CN105389714B (en) 2015-10-23 2015-10-23 Method for identifying user characteristics from behavior data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510701305.XA CN105389714B (en) 2015-10-23 2015-10-23 Method for identifying user characteristics from behavior data

Publications (2)

Publication Number Publication Date
CN105389714A true CN105389714A (en) 2016-03-09
CN105389714B CN105389714B (en) 2022-07-05

Family

ID=55421972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510701305.XA Active CN105389714B (en) 2015-10-23 2015-10-23 Method for identifying user characteristics from behavior data

Country Status (1)

Country Link
CN (1) CN105389714B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056444A (en) * 2016-05-25 2016-10-26 腾讯科技(深圳)有限公司 Data processing method and device
CN106127515A (en) * 2016-06-22 2016-11-16 北京网智天元科技股份有限公司 A kind of passenger portrait and the method and device of data analysis
CN107016026A (en) * 2016-11-11 2017-08-04 阿里巴巴集团控股有限公司 A kind of user tag determination, information-pushing method and equipment
CN108491490A (en) * 2018-03-14 2018-09-04 南京易好信息技术有限公司 Electric business platform Commercial goods labels Division identification system and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778555A (en) * 2014-01-21 2014-05-07 北京集奥聚合科技有限公司 User attribute mining method and system based on user tags
US20150154508A1 (en) * 2013-11-29 2015-06-04 Alibaba Group Holding Limited Individualized data search

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150154508A1 (en) * 2013-11-29 2015-06-04 Alibaba Group Holding Limited Individualized data search
CN103778555A (en) * 2014-01-21 2014-05-07 北京集奥聚合科技有限公司 User attribute mining method and system based on user tags

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李政泽: "微博用户行为分析技术的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
黄碗明: "基于数据挖掘的社区网站用户行为分析系统", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056444A (en) * 2016-05-25 2016-10-26 腾讯科技(深圳)有限公司 Data processing method and device
CN106127515A (en) * 2016-06-22 2016-11-16 北京网智天元科技股份有限公司 A kind of passenger portrait and the method and device of data analysis
CN107016026A (en) * 2016-11-11 2017-08-04 阿里巴巴集团控股有限公司 A kind of user tag determination, information-pushing method and equipment
CN107016026B (en) * 2016-11-11 2020-07-24 阿里巴巴集团控股有限公司 User tag determination method, information push method, user tag determination device, information push device
CN108491490A (en) * 2018-03-14 2018-09-04 南京易好信息技术有限公司 Electric business platform Commercial goods labels Division identification system and method

Also Published As

Publication number Publication date
CN105389714B (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN107424043B (en) Product recommendation method and device and electronic equipment
CN107229708B (en) Personalized travel service big data application system and method
CN106156127B (en) Method and device for selecting data content to push to terminal
Cheng et al. Personalized click prediction in sponsored search
US10685065B2 (en) Method and system for recommending content to a user
Dey et al. Acquiring competitive intelligence from social media
US20170140038A1 (en) Method and system for hybrid information query
CN108805598B (en) Similarity information determination method, server and computer-readable storage medium
Shin et al. Context-aware recommendation by aggregating user context
JP5615857B2 (en) Analysis apparatus, analysis method, and analysis program
CN104866474A (en) Personalized data searching method and device
US20130263181A1 (en) Systems and methods for defining video advertising channels
CN103310003A (en) Method and system for predicting click rate of new advertisement based on click log
CN105447186A (en) Big data platform based user behavior analysis system
CN103886067A (en) Method for recommending books through label implied topic
CN112632405B (en) Recommendation method, recommendation device, recommendation equipment and storage medium
CN104462336A (en) Information pushing method and device
CN111159341A (en) Information recommendation method and device based on user investment and financing preference
CN105389714A (en) Method for identifying user characteristic from behavior data
CN114693409A (en) Product matching method, device, computer equipment, storage medium and program product
CN114201680A (en) Method for recommending marketing product content to user
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
CN105138552A (en) Fashion tendency analysis system mining online sale data
CN112330426A (en) Product recommendation method, device and storage medium
Sun Music Individualization Recommendation System Based on Big Data Analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant