CN105447147A - Data processing method and apparatus - Google Patents

Data processing method and apparatus Download PDF

Info

Publication number
CN105447147A
CN105447147A CN201510843568.4A CN201510843568A CN105447147A CN 105447147 A CN105447147 A CN 105447147A CN 201510843568 A CN201510843568 A CN 201510843568A CN 105447147 A CN105447147 A CN 105447147A
Authority
CN
China
Prior art keywords
user
characteristic information
data
portrait
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510843568.4A
Other languages
Chinese (zh)
Other versions
CN105447147B (en
Inventor
汤奇峰
万昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Original Assignee
ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd filed Critical ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Priority to CN201510843568.4A priority Critical patent/CN105447147B/en
Publication of CN105447147A publication Critical patent/CN105447147A/en
Application granted granted Critical
Publication of CN105447147B publication Critical patent/CN105447147B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a data processing method and apparatus. The method comprises the steps of: acquiring online behavior data, wherein the online behavior data comprises at least one of Cookie and application software data; analyzing online behavior data corresponding to each user so as to obtain a feature information set of each user, wherein the feature information set comprises multiple feature information; performing comprehensive analysis on the feature information set of each user, so as to supplement the feature information set lacking certain feature information; and portraying the user according to the feature information sets of a plurality of users. According to the data processing method and apparatus, accuracy of a user portrait technology can be improved.

Description

A kind of data processing method and device
Technical field
The present invention relates to data processing field, particularly relate to a kind of data processing method and device.
Background technology
Along with the development of Information technology, increasing daily demand can be expressed by network, and network is the information that user provides magnanimity, and user also can leave one's own footprint on network simultaneously.Along with the progress of information gathering techniques, more data can be obtained from network, also can carry out extraction and the conclusion of feature to user or user group based on the information collected, this extraction and conclusion, be also referred to as user's portrait.
The accuracy of existing user's Portrait brand technology has to be hoisted.
Summary of the invention
The technical matters that the present invention solves is the accuracy promoting user's portrait.
For solving the problems of the technologies described above, the embodiment of the present invention provides a kind of data processing method, comprising:
Obtain internet behavior data; Described internet behavior data comprise following at least one: Cookie and application of software data;
Analyze the internet behavior data of corresponding each user respectively, to obtain the characteristic information set of each user, described characteristic information set comprises various features information;
The characteristic information set of the described each user of comprehensive analysis, to supplement the described characteristic information set lacking certain characteristic information;
With reference to the characteristic information set of described multiple user, carry out user's portrait.
Optionally, the characteristic information set of the described each user of described comprehensive analysis, comprise to carry out the described characteristic information set lacking certain characteristic information supplementing: utilize the characteristic information set of Naive Bayes Classification Algorithm to described each user comprehensively to analyze, the described characteristic information set lacking certain characteristic information is supplemented.
Optionally, described data processing method also comprises: arrange reptile to obtain Back ground Information, and described Back ground Information is suitable for the reference information as extracting described characteristic information; The information format of unified described Back ground Information; Formation base knowledge base; Described characteristic information set is supplemented with reference to described primary knowledge base.
Optionally, described data processing method also comprises: verify described characteristic information set with reference to described primary knowledge base.
Optionally, described data processing method also comprises: the corresponding relation setting up described internet behavior data and user.
Optionally, the described corresponding relation setting up described internet behavior data and user comprises:
Generate the identification marking of described Cookie, described identification marking comprises: machine address, current process, system timestamp;
With reference to the network behavior information that the machine address in described identification marking and described Cookie comprise, set up the corresponding relation belonging to the identification marking of the Cookie of same user.
Optionally, the described corresponding relation setting up described internet behavior data and user comprises:
Generate the identification marking of described application of software data; The identification marking of described application of software data comprises mobile phone IMEI and identifies;
With reference to the network behavior information in described mobile phone IMEI mark and described software data, set up the corresponding relation belonging to the identification marking of the application of software data of same user.
Optionally, the described corresponding relation setting up described internet behavior data and user comprises: the network behavior information that the network behavior information comprised with reference to described Cookie and described application of software data comprise, and sets up the corresponding relation belonging to the Cookie of same user and the identification marking of application of software data.
Optionally, the described characteristic information set with reference to described multiple user, carries out user's portrait, also comprises: receive portrait conditional information; With reference to described portrait conditional information, carry out user's portrait.
The embodiment of the present invention also provides a kind of data processing equipment, comprising: internet behavior data capture unit, the first analytic unit, the second analytic unit, portrait unit; Wherein:
Described internet behavior data capture unit, is suitable for obtaining internet behavior data; Described internet behavior data comprise following at least one: Cookie and application of software data;
Described first analytic unit, is suitable for the internet behavior data analyzing corresponding each user respectively, and to obtain the characteristic information set of each user, described characteristic information set comprises various features information;
Described second analytic unit, is suitable for the characteristic information set comprehensively analyzing described each user, to supplement the described characteristic information set lacking certain characteristic information;
Described portrait unit, is suitable for, with reference to the characteristic information set of described multiple user, carrying out user's portrait.
Optionally, described second analytic unit, is suitable for utilizing the characteristic information set of Naive Bayes Classification Algorithm to described each user comprehensively to analyze, supplements the described characteristic information set lacking certain characteristic information.
Optionally, described data processing equipment also comprises: primary knowledge base generation unit and the 3rd analytic unit; Wherein:
Described primary knowledge base generation unit, is suitable for: arrange reptile to obtain Back ground Information; Described Back ground Information is suitable for the reference information as extracting described characteristic information; The information format of unified described Back ground Information; Formation base knowledge base;
Described 3rd analytic unit, is suitable for supplementing described characteristic information set with reference to described primary knowledge base.
Optionally, described data processing equipment also comprises: the 4th analytic unit, is suitable for verifying described characteristic information set with reference to described primary knowledge base.
Optionally, described data processing equipment also comprises: corresponding relation sets up unit, is suitable for the corresponding relation setting up described internet behavior data and user.
Optionally, described corresponding relation is set up unit and is suitable for: the identification marking generating described Cookie, and described identification marking comprises: machine address, current process, system timestamp; With reference to the network behavior information that the machine address in described identification marking and described Cookie comprise, set up the corresponding relation belonging to the identification marking of the Cookie of same user.
Optionally, described corresponding relation is set up unit and is suitable for: the identification marking generating described application of software data; The identification marking of described application of software data comprises mobile phone IMEI and identifies; With reference to the network behavior information in described mobile phone IMEI mark and described software data, set up the corresponding relation belonging to the identification marking of the application of software data of same user.
Optionally, described corresponding relation is set up unit and is suitable for: the network behavior information that the network behavior information comprised with reference to described Cookie and described application of software data comprise, and sets up the corresponding relation belonging to the Cookie of same user and the identification marking of application of software data.
Optionally, described portrait unit, is also suitable for: receive portrait conditional information; With reference to described portrait conditional information, carry out user's portrait.
Compared with prior art, the technical scheme of the embodiment of the present invention has following beneficial effect:
By obtaining any one or more in Cookie and application of software data, widening Data Source channel, having expanded the foundation kind of user's portrait, thus the accuracy of user's portrait can have been promoted; By analyzing internet behavior data, obtain the characteristic information set of each user, each characteristic information set of comprehensive analysis, to supplement the described characteristic information set lacking certain characteristic information, thus can perfect information characteristic set, the dimension of extend information characteristic set, the data basis of optimizing user portrait, thus the accuracy of user's portrait can be promoted.
Further, by setting up the corresponding relation of internet behavior data and user, the internet behavior data of same for correspondence user being associated, can avoid when user draws a portrait, to the double counting of same user, thus the accuracy of user's portrait can be promoted.
In addition, by receiving portrait conditional information, carry out user's portrait with reference to portrait conditional information, can provide user's portrait more targetedly, applicable scene is more wide.
Accompanying drawing explanation
Fig. 1 is the structural representation of a kind of data handling system in the embodiment of the present invention;
Fig. 2 is the process flow diagram of a kind of data processing method in the embodiment of the present invention;
Fig. 3 is the partial process view of a kind of data processing method in the embodiment of the present invention;
Fig. 4 is the structural representation of a kind of data processing equipment in the embodiment of the present invention.
Embodiment
Discovery is studied through inventor, existing user's Portrait brand technology often can only be drawn a portrait to the data of some websites, and do not comprise other data in internet, the analysis of data is caused inherently to have limitation like this, analysis result can only be a part for whole website data, and can not with other many comparative analyses of website data of internet.
The identification of existing user's Portrait brand technology to user i.e. website audient is not accurate enough: the mutual once displaying mainly passing through browser of website data and website audient, and the website Cookie of routine using same browsing apparatus as mark, can only carry out the behavior of analyzing web site audient.And in real life, same website audient can pass through notebook, mobile phone, the multiple terminal equipment such as panel computer, conduct interviews to website at different when and wheres.The recognition method of same network audience may be cell-phone number, E-mail address, No. QQ, micro-signal, the multiple marks such as Taobao's account, and between traditional data, association not, causes same physics audient to be taken as different audient to process, and causes portrait result not accurate enough.
In existing user's Portrait brand technology, because the data dimension data that directly obtain are comprehensive, the audient that can only do partial dimensional draws a portrait analysis.Make the portrait of website audient, must have comprehensively, the audience information of each dimension, as the age of audient, the sex of audient, the educational background of audient, the occupation of audient, the Regional Distribution etc. of audient, and traditional website ERP system only has on the one hand or the data of several respects, the audient that therefore also can only do partial dimensional draws a portrait analysis.
Existing user's Portrait brand technology can not be analyzed by self-defined audience portrait.The data of conventional web sites only have the data of inside, website, therefore can only analyze the audient of this website, and the portrait support for other audience is inadequate.Such as the portrait analysis of the similar competition client in this website, newly attracting the audient come to draw a portrait analysiss for advertiser web site, because the limitation of website data, is not fine to self-defining audience portrait analysis support.
Data processing method in the embodiment of the present invention, by obtaining any one or more in Cookie and application of software data, has widened Data Source channel, has expanded the foundation kind of user's portrait, thus can promote the accuracy of user's portrait; By analyzing internet behavior data, obtain the characteristic information set of each user, each characteristic information set of comprehensive analysis, to supplement the described characteristic information set lacking certain characteristic information, thus can perfect information characteristic set, the dimension of extend information characteristic set, the data basis of optimizing user portrait, thus the accuracy of user's portrait can be promoted.
By setting up the corresponding relation of internet behavior data and user, the internet behavior data of same for correspondence user being associated, can avoid when user draws a portrait, to the double counting of same user, thus the accuracy of user's portrait can be promoted.
By receiving portrait conditional information, carry out user's portrait with reference to portrait conditional information, can provide user's portrait more targetedly, applicable scene is more wide.
For enabling above-mentioned purpose of the present invention, characteristic sum beneficial effect more becomes apparent, and is described in detail specific embodiments of the invention below in conjunction with accompanying drawing.
Fig. 1 is the structural representation of a kind of data handling system in the embodiment of the present invention.
Data handling system comprises data processing server 11 and user side 12.
Data processing server 11 can obtain internet behavior data from user side 12, and user side 12 can comprise multiple user side, such as, user side 1 shown in Fig. 1, user side 2 ..., user side N.Data processing server 11 can be single server, distributed server or server cluster.User side 12 can be network intelligence service equipment, such as, can be single computer, panel computer, mobile phone etc.
Internet behavior data can be Cookie and application of software data, wherein application of software data can be loaded in the data that application software in user side 12 produces, can be such as QQ, Taobao, micro-letter and each large website cellphone subscriber hold the generation data of the application software such as software.
Data processing server 11 to the internet behavior data analysis of user side 12, can obtain characteristic of correspondence information, carries out user's portrait.
Fig. 2 is the process flow diagram of a kind of data processing method in the embodiment of the present invention, and composition graphs 1 is described.
S21, obtains internet behavior data; Described internet behavior data comprise following at least one: Cookie and application of software data.
Some website, in order to distinguish user identity, carry out session tracking and the data be stored on user local terminal, wherein can comprise the network behavior information of user.Network behavior information is the information that respective user carries out network activity, can be user surf the Net IP address, access websites URL, user-agent, the information such as the user id of the third party website that user is logged.
Application of software data can be the data that the application software being loaded into user side 12 produces, and wherein also can comprise the network behavior information of user.
In concrete enforcement, after acquisition internet behavior data, the corresponding relation of internet behavior data and user can also be set up.As previously mentioned, if not corresponding to internet behavior data user distinguishes, double counting may be carried out when drawing a portrait to same user, and it is not accurate enough to cause user to draw a portrait.
In an embodiment of the present invention, the described corresponding relation setting up described internet behavior data and user comprises: the identification marking generating described Cookie, and described identification marking comprises: machine address, current process, system timestamp; With reference to the network behavior information that the machine address in described identification marking and described Cookie comprise, set up the corresponding relation belonging to the identification marking of the Cookie of same user.
Can think that the Cookie that machine address is identical in identification marking corresponds to same user; With reference to network behavior information, the similarity between two Cookie can be analyzed, if similarity meets setting value, then can think that these two Cookie correspond to same user.
In an alternative embodiment of the invention, the described corresponding relation setting up described internet behavior data and user comprises: the identification marking generating described application of software data; The identification marking of described application of software data comprises mobile phone IMEI and identifies; With reference to the network behavior information in described mobile phone IMEI mark and described software data, set up the corresponding relation belonging to the identification marking of the application of software data of same user.
Can to think in identification marking that mobile phone IMEI identifies identical application of software data and correspond to same user; Can with reference to network behavior information, the similarity between the application of software data analyzing twice acquisition, if similarity meets setting value, then can think that the application of software data of this twice acquisition corresponds to same user.
In an alternative embodiment of the invention, the described corresponding relation setting up described internet behavior data and user comprises: the network behavior information that the network behavior information comprised with reference to described Cookie and described application of software data comprise, and sets up the corresponding relation belonging to the Cookie of same user and the identification marking of application of software data.
Can with reference to network behavior information, the similarity between the internet behavior data analyzing twice acquisition, if similarity meets setting value, then can think that the internet behavior data of this twice acquisition correspond to same user.Internet behavior data can be application of software data herein, or wherein one is application of software data, and another is Cookie.
S22, analyzes the internet behavior data of corresponding each user respectively, and to obtain the characteristic information set of each user, described characteristic information set comprises various features information.
User's characteristic information is the information of characterizing consumer feature, such as, can be sex, the age, income, level of education, the information such as affiliated industry.
By analyzing the internet behavior data of each user, the characteristic information set be made up of multiple user's characteristic information can be obtained.
S23, the comprehensive characteristic information set analyzing described each user, to supplement the described characteristic information set lacking certain characteristic information.
Particular content due to often kind of characteristic information is not complete separate appearance, by comprehensively analyzing the characteristic information set of user, the incidence relation between characteristic information can be found, and then according to existing characteristic information, the described characteristic information set that this lacks certain characteristic information is supplemented.
In concrete enforcement, the characteristic information set of Naive Bayes Classification Algorithm to described each user can be utilized comprehensively to analyze, the described characteristic information set lacking certain characteristic information is supplemented.Namely utilize data mining technology, improved the information of each dimension of audient's portrait by large data mining technology.Such as can originate according to the request of website audient, as request URL, the information such as the source of software APP, use Naive Bayes Classification Algorithm to supplement the sex of audient, the age, income, level of education, the information such as affiliated industry; According to the request content of website audient, by analysis request keyword, the keyword of the supplementary website audients such as request site information, the information such as Behavior preference; According to the header of request, judge the information such as the region of website audient.
Naive Bayes Classification is a kind of sorting algorithm, and its formal definition is as follows:
If x={a 1, a 2... a mbe an item to be sorted, and each a is a characteristic attribute of x;
There is category set C={y 1, y 2... y n;
Calculate P (y 1| x), P (y 2| x) ... P (y n| x);
If P is (y k| x)=max{P (y 1| x), P (y 2| x) ... P (y n| x) }, then x ∈ y k.
In embodiments of the present invention, x can be the characteristic information set lacking certain user's characteristic information, and a is that it is with the user's characteristic information comprised; y 1y 2y kfor the classification of its user's characteristic information set, this classification is carried out comprehensive analysis and classification by the characteristic information set of each user and is obtained; By sorting out x, such attribute corresponding, can supplement the user's characteristic information that x lacks.
S24, with reference to the characteristic information set of described multiple user, carries out user's portrait.
Carrying out user's portrait, can be the portrait user of special group being carried out to different dimensions characteristic information, also can be draw a portrait for the user with certain special characteristic information.
Special group directly can set based on needs, also can be the user group according to having certain special characteristic information.
When special characteristic information is interested in automobile, the portrait of other characteristic informations can be carried out for the colony with this characteristic information, such as draw the distribution plan of this colony, consuming capacity portrait, income portrait, education portrait, sex portrait, occupation portrait, the age portrait etc., portrait can be histogram, comparison diagram, etc.
In concrete enforcement, portrait conditional information can be received before carrying out user's portrait, the namely selection of aforesaid special group user, or the setting to certain special characteristic information; With reference to described portrait conditional information, carry out user's portrait.
See Fig. 3, in concrete enforcement, data processing method can also comprise:
S31, arranges reptile to obtain Back ground Information, and described Back ground Information is suitable for the reference information as extracting described characteristic information.
Can be configured reptile as required, in configuration crawler system, the reptile strategy of data source is increment strategy or full dose strategy, runs crawler system, gathers internet data.Such as, reptile can be set to obtain the Type of website, subject of Web site, the main audient of website, the positional information etc. of website.
S32, the information format of unified described Back ground Information.
The information source of collecting due to reptile is comparatively wide, can unify the form of Back ground Information.The information that reptile is collected can be screened, namely crawler system cleaning.Carry out pattern match, match the data of needs, remaining discards.The standard of cleaning is for realizing data normalization, and Uniform data format, for the correlation inquiry in later stage
S33, formation base knowledge base.
The aforementioned Type of website, subject of Web site, the main audient of website, the positional information etc. of website is comprised in primary knowledge base.
S34, supplements described characteristic information set with reference to described primary knowledge base.
Such as, the sex ratio information of access some websites is comprised in primary knowledge base, during internet behavior data analysis to certain user, find the sex character information that can not obtain separately this user from these internet behavior data, but in the internet behavior data of this user, find some website of access that this user is more, and the common feature in these websites is the more of women orientation, can estimate this user is women.
In concrete enforcement, described characteristic information set can also be verified with reference to described primary knowledge base.Such as, characteristic information in the characteristic set of certain user supplements after the characteristic information set by comprehensively analyzing each user, can reference basis knowledge base verify.
In concrete enforcement, before step S31 to step S34 can be positioned at step S24 (see Fig. 2).
The embodiment of the present invention also provides a kind of data processing equipment, and its structural representation is see Fig. 4.
Data processing equipment comprises: internet behavior data capture unit 41, first analytic unit 42, second analytic unit 43, portrait unit 44; Wherein:
Described internet behavior data capture unit 41, is suitable for obtaining internet behavior data; Described internet behavior data comprise following at least one: Cookie and application of software data;
Described first analytic unit 42, is suitable for the internet behavior data analyzing corresponding each user respectively, and to obtain the characteristic information set of each user, described characteristic information set comprises various features information;
Described second analytic unit 43, is suitable for the characteristic information set comprehensively analyzing described each user, to supplement the described characteristic information set lacking certain characteristic information;
Described portrait unit 44, is suitable for, with reference to the characteristic information set of described multiple user, carrying out user's portrait.
In concrete enforcement, the second analytic unit 43, is suitable for utilizing the characteristic information set of Naive Bayes Classification Algorithm to described each user comprehensively to analyze, supplements the described characteristic information set lacking certain characteristic information.
In concrete enforcement, described data processing equipment can also comprise: primary knowledge base generation unit 45 and the 3rd analytic unit 46; Wherein:
Described primary knowledge base generation unit 45, is suitable for: arrange reptile to obtain Back ground Information; Described Back ground Information is suitable for the reference information as extracting described characteristic information; The information format of unified described Back ground Information; Formation base knowledge base;
Described 3rd analytic unit 46, is suitable for supplementing described characteristic information set with reference to described primary knowledge base.
In concrete enforcement, described data processing equipment can also comprise: the 4th analytic unit, is suitable for verifying described characteristic information set with reference to described primary knowledge base.
In concrete enforcement, described data processing equipment can also comprise: corresponding relation sets up unit 47, is suitable for the corresponding relation setting up described internet behavior data and user.
In concrete enforcement, described corresponding relation is set up unit 47 and is suitable for: the identification marking generating described Cookie, and described identification marking comprises: machine address, current process, system timestamp; With reference to the network behavior information that the machine address in described identification marking and described Cookie comprise, set up the corresponding relation belonging to the identification marking of the Cookie of same user.
In concrete enforcement, described corresponding relation is set up unit 47 and is suitable for: the identification marking generating described application of software data; The identification marking of described application of software data comprises mobile phone IMEI and identifies; With reference to the network behavior information in described mobile phone IMEI mark and described software data, set up the corresponding relation belonging to the identification marking of the application of software data of same user.
In concrete enforcement, described corresponding relation is set up unit 47 and is suitable for: the network behavior information that the network behavior information comprised with reference to described Cookie and described application of software data comprise, and sets up the corresponding relation belonging to the Cookie of same user and the identification marking of application of software data.
In concrete enforcement, described portrait unit 44 is also suitable for: receive portrait conditional information; With reference to described portrait conditional information, carry out user's portrait.
Data processing equipment can be positioned at data processing server 11 (see Fig. 1).
The embodiment of the present invention, by obtaining any one or more in Cookie and application of software data, has widened Data Source channel, has expanded the foundation kind of user's portrait, thus can promote the accuracy of user's portrait; By analyzing internet behavior data, obtain the characteristic information set of each user, each characteristic information set of comprehensive analysis, to supplement the described characteristic information set lacking certain characteristic information, thus can perfect information characteristic set, the dimension of extend information characteristic set, the data basis of optimizing user portrait, thus the accuracy of user's portrait can be promoted.By setting up the corresponding relation of internet behavior data and user, the internet behavior data of same for correspondence user being associated, can avoid when user draws a portrait, to the double counting of same user, thus the accuracy of user's portrait can be promoted.By receiving portrait conditional information, carry out user's portrait with reference to portrait conditional information, provide user's portrait more targetedly, applicable scene is more wide.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is that the hardware that can carry out instruction relevant by program has come, this program can be stored in a computer-readable recording medium, and storage medium can comprise: ROM, RAM, disk or CD etc.
Although the present invention discloses as above, the present invention is not defined in this.Any those skilled in the art, without departing from the spirit and scope of the present invention, all can make various changes or modifications, and therefore protection scope of the present invention should be as the criterion with claim limited range.

Claims (18)

1. a data processing method, is characterized in that, comprising:
Obtain internet behavior data; Described internet behavior data comprise following at least one: Cookie and application of software data;
Analyze the internet behavior data of corresponding each user respectively, to obtain the characteristic information set of each user, described characteristic information set comprises various features information;
The characteristic information set of the described each user of comprehensive analysis, to supplement the described characteristic information set lacking certain characteristic information;
With reference to the characteristic information set of described multiple user, carry out user's portrait.
2. data processing method according to claim 1, it is characterized in that, the characteristic information set of the described each user of described comprehensive analysis, comprise to carry out the described characteristic information set lacking certain characteristic information supplementing: utilize the characteristic information set of Naive Bayes Classification Algorithm to described each user comprehensively to analyze, the described characteristic information set lacking certain characteristic information is supplemented.
3. data processing method according to claim 1, is characterized in that, also comprises: arrange reptile to obtain Back ground Information, and described Back ground Information is suitable for the reference information as extracting described characteristic information; The information format of unified described Back ground Information; Formation base knowledge base; Described characteristic information set is supplemented with reference to described primary knowledge base.
4. data processing method according to claim 3, is characterized in that, also comprises: verify described characteristic information set with reference to described primary knowledge base.
5. data processing method according to claim 1, is characterized in that, also comprises: the corresponding relation setting up described internet behavior data and user.
6. data processing method according to claim 5, is characterized in that, the described corresponding relation setting up described internet behavior data and user comprises:
Generate the identification marking of described Cookie, described identification marking comprises: machine address, current process, system timestamp;
With reference to the network behavior information that the machine address in described identification marking and described Cookie comprise, set up the corresponding relation belonging to the identification marking of the Cookie of same user.
7. data processing method according to claim 5, is characterized in that, the described corresponding relation setting up described internet behavior data and user comprises:
Generate the identification marking of described application of software data; The identification marking of described application of software data comprises mobile phone IMEI and identifies;
With reference to the network behavior information in described mobile phone IMEI mark and described software data, set up the corresponding relation belonging to the identification marking of the application of software data of same user.
8. data processing method according to claim 5, it is characterized in that, the described corresponding relation setting up described internet behavior data and user comprises: the network behavior information that the network behavior information comprised with reference to described Cookie and described application of software data comprise, and sets up the corresponding relation belonging to the Cookie of same user and the identification marking of application of software data.
9. data processing method according to claim 1, is characterized in that, the described characteristic information set with reference to described multiple user, carries out user's portrait, also comprises: receive portrait conditional information; With reference to described portrait conditional information, carry out user's portrait.
10. a data processing equipment, is characterized in that, comprising: internet behavior data capture unit, the first analytic unit, the second analytic unit, portrait unit; Wherein:
Described internet behavior data capture unit, is suitable for obtaining internet behavior data; Described internet behavior data comprise following at least one: Cookie and application of software data;
Described first analytic unit, is suitable for the internet behavior data analyzing corresponding each user respectively, and to obtain the characteristic information set of each user, described characteristic information set comprises various features information;
Described second analytic unit, is suitable for the characteristic information set comprehensively analyzing described each user, to supplement the described characteristic information set lacking certain characteristic information;
Described portrait unit, is suitable for, with reference to the characteristic information set of described multiple user, carrying out user's portrait.
11. data processing equipments according to claim 10, it is characterized in that, described second analytic unit, is suitable for utilizing the characteristic information set of Naive Bayes Classification Algorithm to described each user comprehensively to analyze, supplements the described characteristic information set lacking certain characteristic information.
12. data processing equipments according to claim 10, is characterized in that, also comprise: primary knowledge base generation unit and the 3rd analytic unit; Wherein:
Described primary knowledge base generation unit, is suitable for: arrange reptile to obtain Back ground Information; Described Back ground Information is suitable for the reference information as extracting described characteristic information; The information format of unified described Back ground Information; Formation base knowledge base;
Described 3rd analytic unit, is suitable for supplementing described characteristic information set with reference to described primary knowledge base.
13. data processing equipments according to claim 12, is characterized in that, also comprise: the 4th analytic unit, are suitable for verifying described characteristic information set with reference to described primary knowledge base.
14. data processing equipments according to claim 10, is characterized in that, also comprise: corresponding relation sets up unit, are suitable for the corresponding relation setting up described internet behavior data and user.
15. data processing equipments according to claim 14, is characterized in that, described corresponding relation is set up unit and is suitable for: the identification marking generating described Cookie, and described identification marking comprises: machine address, current process, system timestamp; With reference to the network behavior information that the machine address in described identification marking and described Cookie comprise, set up the corresponding relation belonging to the identification marking of the Cookie of same user.
16. data processing equipments according to claim 14, is characterized in that, described corresponding relation is set up unit and is suitable for: the identification marking generating described application of software data; The identification marking of described application of software data comprises mobile phone IMEI and identifies; With reference to the network behavior information in described mobile phone IMEI mark and described software data, set up the corresponding relation belonging to the identification marking of the application of software data of same user.
17. data processing equipments according to claim 14, it is characterized in that, described corresponding relation is set up unit and is suitable for: the network behavior information that the network behavior information comprised with reference to described Cookie and described application of software data comprise, and sets up the corresponding relation belonging to the Cookie of same user and the identification marking of application of software data.
18. data processing equipments according to claim 10, is characterized in that, described portrait unit, is also suitable for: receive portrait conditional information; With reference to described portrait conditional information, carry out user's portrait.
CN201510843568.4A 2015-11-26 2015-11-26 A kind of data processing method and device Active CN105447147B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510843568.4A CN105447147B (en) 2015-11-26 2015-11-26 A kind of data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510843568.4A CN105447147B (en) 2015-11-26 2015-11-26 A kind of data processing method and device

Publications (2)

Publication Number Publication Date
CN105447147A true CN105447147A (en) 2016-03-30
CN105447147B CN105447147B (en) 2019-02-01

Family

ID=55557323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510843568.4A Active CN105447147B (en) 2015-11-26 2015-11-26 A kind of data processing method and device

Country Status (1)

Country Link
CN (1) CN105447147B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106534164A (en) * 2016-12-05 2017-03-22 公安部第三研究所 Cyberspace user identity-based effective virtual identity description method in computer
CN106933946A (en) * 2017-01-20 2017-07-07 深圳市三体科技有限公司 A kind of big data management method and system based on mobile terminal
CN107463762A (en) * 2016-06-03 2017-12-12 阿里巴巴集团控股有限公司 A kind of man-machine interaction method, device and electronic equipment
CN107578272A (en) * 2017-08-10 2018-01-12 上海斐讯数据通信技术有限公司 A kind of method and device for kinsfolk's portrait
CN108549685A (en) * 2018-04-08 2018-09-18 武志学 Behavior analysis method, device, system and readable storage medium storing program for executing
CN108628980A (en) * 2018-04-27 2018-10-09 四川斐讯信息技术有限公司 A kind of user's portrait method and system based on user network behavior
CN109033149A (en) * 2018-06-12 2018-12-18 北京奇艺世纪科技有限公司 Information recommendation method, device, server and storage medium
CN109658129A (en) * 2018-11-22 2019-04-19 北京奇虎科技有限公司 A kind of generation method and device of user's portrait
CN109977308A (en) * 2019-03-20 2019-07-05 北京字节跳动网络技术有限公司 Construction method, device, storage medium and the electronic equipment of user group's portrait
CN111724187A (en) * 2019-03-21 2020-09-29 上海晶赞融宣科技有限公司 DMP audience data real-time processing method and device and computer readable storage medium
WO2020257993A1 (en) * 2019-06-24 2020-12-30 深圳市欢太科技有限公司 Content pushing method and apparatus, server, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030005046A1 (en) * 2001-06-06 2003-01-02 Lagniappe Marketing System and method for managing marketing applications for a website
US7137009B1 (en) * 2000-01-06 2006-11-14 International Business Machines Corporation Method and apparatus for securing a cookie cache in a data processing system
CN1878096A (en) * 2006-07-04 2006-12-13 陈玲玲 Method for detecting number of computer users in inner compute network
CN101222348A (en) * 2007-01-10 2008-07-16 阿里巴巴公司 Method and system for calculating number of website real user

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7137009B1 (en) * 2000-01-06 2006-11-14 International Business Machines Corporation Method and apparatus for securing a cookie cache in a data processing system
US20030005046A1 (en) * 2001-06-06 2003-01-02 Lagniappe Marketing System and method for managing marketing applications for a website
CN1878096A (en) * 2006-07-04 2006-12-13 陈玲玲 Method for detecting number of computer users in inner compute network
CN101222348A (en) * 2007-01-10 2008-07-16 阿里巴巴公司 Method and system for calculating number of website real user

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴亚平: "《基于个性化的档案检索方式研究》", 《兰台世界》 *
雷良鹏: "《基于路径与页面挖掘的用户浏览行为研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463762A (en) * 2016-06-03 2017-12-12 阿里巴巴集团控股有限公司 A kind of man-machine interaction method, device and electronic equipment
CN106534164B (en) * 2016-12-05 2019-09-03 公安部第三研究所 Effective virtual identity depicting method based on cyberspace user identifier
CN106534164A (en) * 2016-12-05 2017-03-22 公安部第三研究所 Cyberspace user identity-based effective virtual identity description method in computer
CN106933946A (en) * 2017-01-20 2017-07-07 深圳市三体科技有限公司 A kind of big data management method and system based on mobile terminal
CN107578272A (en) * 2017-08-10 2018-01-12 上海斐讯数据通信技术有限公司 A kind of method and device for kinsfolk's portrait
CN108549685A (en) * 2018-04-08 2018-09-18 武志学 Behavior analysis method, device, system and readable storage medium storing program for executing
CN108628980A (en) * 2018-04-27 2018-10-09 四川斐讯信息技术有限公司 A kind of user's portrait method and system based on user network behavior
CN109033149A (en) * 2018-06-12 2018-12-18 北京奇艺世纪科技有限公司 Information recommendation method, device, server and storage medium
CN109658129A (en) * 2018-11-22 2019-04-19 北京奇虎科技有限公司 A kind of generation method and device of user's portrait
CN109977308A (en) * 2019-03-20 2019-07-05 北京字节跳动网络技术有限公司 Construction method, device, storage medium and the electronic equipment of user group's portrait
CN109977308B (en) * 2019-03-20 2021-07-13 北京字节跳动网络技术有限公司 User group portrait construction method and device, storage medium and electronic equipment
CN111724187A (en) * 2019-03-21 2020-09-29 上海晶赞融宣科技有限公司 DMP audience data real-time processing method and device and computer readable storage medium
WO2020257993A1 (en) * 2019-06-24 2020-12-30 深圳市欢太科技有限公司 Content pushing method and apparatus, server, and storage medium

Also Published As

Publication number Publication date
CN105447147B (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN105447147A (en) Data processing method and apparatus
US20200193308A1 (en) System and method for identifying social trends
US10664872B2 (en) Systems and methods for generating network intelligence through real-time analytics
US20200349385A1 (en) Multimedia resource matching method and apparatus, storage medium, and electronic apparatus
CN110278466B (en) Short video advertisement putting method, device and equipment
CN104951544A (en) User data processing method and system and method and system for providing user data
CN105023165A (en) Method, device and system for controlling release tasks in social networking platform
US10311120B2 (en) Method and apparatus for identifying webpage type
CN105144141A (en) Systems and methods for addressing a media database using distance associative hashing
CN107077498B (en) Representing entity relationships in online advertisements
CN102591942A (en) Method and device for automatic application recommendation
CN106296344B (en) Malicious address identification method and device
CN109429103B (en) Method and device for recommending information, computer readable storage medium and terminal equipment
US9606975B2 (en) Apparatus and method for automatically generating visual annotation based on visual language
CN103838754A (en) Information searching device and method
CN105491444A (en) Data identification processing method and device
CN112258254B (en) Internet advertisement risk monitoring method and system based on big data architecture
CN107563807A (en) A kind of regional advertisement supplying system based on data mining
CN113038153A (en) Financial live broadcast violation detection method, device and equipment and readable storage medium
CN103810191A (en) Method and equipment for supplying presenting information to user
CN102930016A (en) Method and equipment for providing search results on mobile terminals
CN111444364B (en) Image detection method and device
CN111400511A (en) Multimedia resource interception method and device
CN109428774B (en) Data processing method of DPI equipment and related DPI equipment
CN106549914B (en) identification method and device for independent visitor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant