CN105447147B - A kind of data processing method and device - Google Patents

A kind of data processing method and device Download PDF

Info

Publication number
CN105447147B
CN105447147B CN201510843568.4A CN201510843568A CN105447147B CN 105447147 B CN105447147 B CN 105447147B CN 201510843568 A CN201510843568 A CN 201510843568A CN 105447147 B CN105447147 B CN 105447147B
Authority
CN
China
Prior art keywords
user
data
information
characteristic information
portrait
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510843568.4A
Other languages
Chinese (zh)
Other versions
CN105447147A (en
Inventor
汤奇峰
万昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Original Assignee
ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd filed Critical ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Priority to CN201510843568.4A priority Critical patent/CN105447147B/en
Publication of CN105447147A publication Critical patent/CN105447147A/en
Application granted granted Critical
Publication of CN105447147B publication Critical patent/CN105447147B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

A kind of data processing method and device, which comprises obtain internet behavior data;The internet behavior data comprise at least one of the following: Cookie and application of software data;The internet behavior data of the corresponding each user of analysis respectively, to obtain the characteristic information set of each user, the characteristic information set includes various features information;The characteristic information set of each user described in comprehensive analysis, to supplement the characteristic information set for lacking certain characteristic information;Referring to the characteristic information set of the multiple user, user's portrait is carried out.The method and device can promote the accuracy of user's Portrait brand technology.

Description

A kind of data processing method and device
Technical field
The present invention relates to data processing field more particularly to a kind of data processing method and devices.
Background technique
With the development of Information technology, more and more daily demands can be expressed by network, and network provides for user The information of magnanimity, while user can also leave one's own footprint on network.It, can with the progress of information gathering techniques To obtain more data from network, the extraction of feature can also be carried out to user or user group based on the information being collected into And conclusion, this extraction and conclusion, also referred to as user draw a portrait.
Existing user's Portrait brand technology accuracy has to be hoisted.
Summary of the invention
Present invention solves the technical problem that being the accuracy for promoting user's portrait.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of data processing method, comprising:
Obtain internet behavior data;The internet behavior data comprise at least one of the following: Cookie and application software number According to;
The internet behavior data of the corresponding each user of analysis respectively, it is described to obtain the characteristic information set of each user Characteristic information set includes various features information;
The characteristic information set of each user described in comprehensive analysis, to the characteristic information for lacking certain characteristic information Set is supplemented;
Referring to the characteristic information set of the multiple user, user's portrait is carried out.
Optionally, the characteristic information set of each user described in the comprehensive analysis, with to lacking certain characteristic information It includes: the characteristic information collection using Naive Bayes Classification Algorithm to each user that the characteristic information set, which carries out supplement, It closes and carries out comprehensive analysis, the characteristic information set for lacking certain characteristic information is supplemented.
Optionally, the data processing method further include: to obtain basic information, the basic information is suitable for setting crawler As the reference information for extracting the characteristic information;The information format of the unified basic information;Generate primary knowledge base;Reference The primary knowledge base supplements the characteristic information set.
Optionally, the data processing method further include: verify the characteristic information set referring to the primary knowledge base.
Optionally, the data processing method further include: establish the corresponding relationship of the internet behavior data and user.
Optionally, described to establish the internet behavior data and the corresponding relationship of user includes:
The identification marking of the Cookie is generated, the identification marking includes: machine address, current process, system time Stamp;
Referring in the identification marking machine address and the Cookie in include network behavior information, foundation belongs to The corresponding relationship of the identification marking of the Cookie of same user.
Optionally, described to establish the internet behavior data and the corresponding relationship of user includes:
Generate the identification marking of the application of software data;The identification marking of the application of software data includes mobile phone IMEI Mark;
Referring to the network behavior information in mobile phone IMEI mark and the software data, foundation belongs to same user's The corresponding relationship of the identification marking of application of software data.
Optionally, the internet behavior data and the corresponding relationship of user established include: referring in the Cookie Including network behavior information and the application of software data in include network behavior information, foundation belong to same user's The corresponding relationship of the identification marking of Cookie and application of software data.
Optionally, the characteristic information set referring to the multiple user carries out user's portrait, further includes: receives picture As conditional information;Referring to the portrait conditional information, user's portrait is carried out.
The embodiment of the present invention also provides a kind of data processing equipment, comprising: internet behavior data capture unit, the first analysis Unit, the second analytical unit, portrait unit;Wherein:
The internet behavior data capture unit is suitable for obtaining internet behavior data;The internet behavior data include with Lower at least one: Cookie and application of software data;
First analytical unit, suitable for analyzing the internet behavior data of corresponding each user respectively, to obtain each use The characteristic information set at family, the characteristic information set include various features information;
Second analytical unit, suitable for the characteristic information set of each user described in comprehensive analysis, with to lacking certain The characteristic information set of characteristic information is supplemented;
The portrait unit carries out user's portrait suitable for the characteristic information set referring to the multiple user.
Optionally, second analytical unit, suitable for the spy using Naive Bayes Classification Algorithm to each user It levies information aggregate and carries out comprehensive analysis, the characteristic information set for lacking certain characteristic information is supplemented.
Optionally, the data processing equipment further include: primary knowledge base generation unit and third analytical unit;Wherein:
The primary knowledge base generation unit, is suitable for: setting crawler is to obtain basic information;The basic information is suitable for making For the reference information for extracting the characteristic information;The information format of the unified basic information;Generate primary knowledge base;
The third analytical unit is suitable for supplementing the characteristic information set referring to the primary knowledge base.
Optionally, the data processing equipment further include: the 4th analytical unit is suitable for verifying referring to the primary knowledge base The characteristic information set.
Optionally, the data processing equipment further include: correspondence relationship establishing unit is adapted to set up the internet behavior number According to the corresponding relationship with user.
Optionally, the correspondence relationship establishing unit is suitable for: generating the identification marking of the Cookie, the identification marking It include: machine address, current process, system timestamp;Referring in the machine address and the Cookie in the identification marking Including network behavior information, establish belong to same user Cookie identification marking corresponding relationship.
Optionally, the correspondence relationship establishing unit is suitable for: generating the identification marking of the application of software data;It is described to answer It include mobile phone IMEI mark with the identification marking of software data;Referring to the net in mobile phone IMEI mark and the software data Network behavioural information establishes the corresponding relationship for belonging to the identification marking of application of software data of same user.
Optionally, the correspondence relationship establishing unit is suitable for: referring to the network behavior information that includes in the Cookie and The network behavior information for including in the application of software data establishes the Cookie's and application of software data for belonging to same user The corresponding relationship of identification marking.
Optionally, the portrait unit, is further adapted for: receiving portrait conditional information;Referring to the portrait conditional information, carry out User's portrait.
Compared with prior art, the technical solution of the embodiment of the present invention has the advantages that
It is any one or more of by obtaining Cookie and application of software data, data source channel has been widened, has been expanded Foundation the type of user's portrait, so as to promote the accuracy that user draws a portrait;By analyzing internet behavior data, obtain every The characteristic information set of a user, each characteristic information set of comprehensive analysis, to the feature for lacking certain characteristic information Information aggregate is supplemented, and so as to perfect information characteristic set, expands the dimension of information characteristics set, optimization user's portrait Data basis, so as to promoted user portrait accuracy.
Further, by establishing the corresponding relationship of internet behavior data and user, by the internet behavior of the same user of correspondence Data are associated, and can be computed repeatedly to avoid when user draws a portrait to same user, so as to promote user's portrait Accuracy.
In addition, by receive portrait conditional information, referring to portrait conditional information carry out user's portrait, can provide more added with Targetedly user draws a portrait, and it is more wide to be applicable in scene.
Detailed description of the invention
Fig. 1 is a kind of structural schematic diagram of data processing system in the embodiment of the present invention;
Fig. 2 is a kind of flow chart of data processing method in the embodiment of the present invention;
Fig. 3 is a kind of partial process view of data processing method in the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of data processing equipment in the embodiment of the present invention.
Specific embodiment
Through inventor the study found that existing user's Portrait brand technology can only often draw a portrait to the data of some websites, And do not include causing the analysis of data that inherently there is limitation in this way to other data in internet, analysis result can only It is a part of this entire website data, and cannot be with the more comparative analyses of internet others website data.
Existing user's Portrait brand technology is not accurate enough to the identification of user i.e. website audient: website data and website by Many interactions be mainly pass through the primary displaying of browser, and conventional website Cookie can only using the same browsing apparatus as Mark, carrys out the behavior of analyzing web site audient.And in real life, the same website audient can be by notebook, mobile phone, plate The multiple terminals equipment such as computer, in different times accesses to website with place.The identification method of the same network audience It may be cell-phone number, E-mail address, QQ number, WeChat ID, a variety of marks such as Taobao's account are associated with not enough between traditional data, lead It causes the same physics audient that different audients is taken as to handle, causes portrait result not accurate enough.
In existing user's Portrait brand technology, since the data dimension data directly acquired are not comprehensive, part dimension can only be done The audient of degree, which draws a portrait, to be analyzed.Make the portrait of website audient, it is necessary to have comprehensive, the audience information of each dimension, such as audient Age, the gender of audient, the educational background of audient, the occupation of audient, the Regional Distribution etc. of audient, and traditional website ERP system The only data of one side or several respects, therefore the audient that can only also do partial dimensional draws a portrait and analyzes.
Existing user's Portrait brand technology cannot customized audience portrait analysis.The data of conventional web sites only have in website The data in portion, therefore the audient of this website can only be analyzed, the portrait of other audiences is supported inadequate.Such as it is right In the portrait analysis of the similar competition client in this website, newly attracts the audient come to draw a portrait for advertiser web site and analyze, because of website The limitation of data draws a portrait to customized audience to analyze and supports not being fine.
Data processing method in the embodiment of the present invention, by obtain any one of Cookie and application of software data or It is a variety of, data source channel has been widened, the foundation type of user's portrait has been expanded, so as to promote the accurate of user's portrait Degree;By analyzing internet behavior data, the characteristic information set of each user is obtained, each characteristic information set of comprehensive analysis, To supplement the characteristic information set for lacking certain characteristic information, so as to perfect information characteristic set, expand The dimension of information characteristics set, the data basis of optimization user's portrait, so as to promote the accuracy of user's portrait.
By establishing the corresponding relationship of internet behavior data and user, the internet behavior data of the same user of correspondence are carried out Association, can compute repeatedly same user to avoid when user draws a portrait, so as to promote the accuracy of user's portrait.
By receiving conditional information of drawing a portrait, user's portrait is carried out referring to portrait conditional information, can be provided more added with being directed to Property user portrait, be applicable in scene it is more wide.
It is understandable to enable above-mentioned purpose of the invention, feature and beneficial effect to become apparent, with reference to the accompanying drawing to this The specific embodiment of invention is described in detail.
Fig. 1 is a kind of structural schematic diagram of data processing system in the embodiment of the present invention.
Data processing system includes data processing server 11 and user terminal 12.
Data processing server 11 can obtain internet behavior data from user terminal 12, and user terminal 12 may include multiple use Family end, for example, user terminal 1, user terminal 2 shown in Fig. 1 ..., user terminal N.Data processing server 11 can be separate unit clothes Business device, distributed server or server cluster.User terminal 12 can be network intelligence service equipment, such as can be separate unit meter Calculation machine, tablet computer, mobile phone etc..
Internet behavior data can be Cookie and application of software data, and wherein application of software data, which can be, is loaded in use The data that application software in family end 12 generates, such as to can be QQ, Taobao, wechat and the mobile phone user end of major website soft The generation data of the application software such as part.
Data processing server 11 can the internet behavior data to user terminal 12 analyze, obtain corresponding feature letter Breath carries out user's portrait.
Fig. 2 is a kind of flow chart of data processing method in the embodiment of the present invention, is illustrated in conjunction with Fig. 1.
S21 obtains internet behavior data;The internet behavior data comprise at least one of the following: Cookie and application are soft Number of packages evidence.
Certain websites in order to distinguish user identity, carry out session tracking and be stored in the data on user local terminal, It wherein may include the network behavior information of user.Network behavior information is the information that corresponding user carries out network activity, can be with It is the IP address of user's online, the URL, user-agent, the user id etc. of the logged third party website of user for accessing website Information.
Application of software data, which can be, is loaded into the data that the application software of user terminal 12 generates, wherein can be also comprising using The network behavior information at family.
In specific implementation, it is corresponding with user that internet behavior data can also be established after obtaining internet behavior data Relationship.As previously mentioned, if not distinguished to the corresponding user of internet behavior data, may in portrait to same user into Row computes repeatedly, and causes user's portrait not accurate enough.
In an embodiment of the present invention, the internet behavior data and the corresponding relationship of user established include: to generate The identification marking of the Cookie, the identification marking include: machine address, current process, system timestamp;Referring to the knowledge Not Biao Shi in machine address and the Cookie in include network behavior information, establish the Cookie for belonging to same user The corresponding relationship of identification marking.
It is considered that the identical Cookie of machine address corresponds to same user in identification marking;It is referred to network behavior Information analyzes the similitude between two Cookie, if similitude meets setting value, it may be considered that the two Cookie correspond to Same user.
In an alternative embodiment of the invention, the internet behavior data and the corresponding relationship of user established include: life At the identification marking of the application of software data;The identification marking of the application of software data includes mobile phone IMEI mark;Reference Network behavior information in the mobile phone IMEI mark and the software data, establishes the application of software data for belonging to same user Identification marking corresponding relationship.
It is considered that mobile phone IMEI identifies identical application of software data corresponding to same user in identification marking;It can join According to network behavior information, the similitude between the application of software data obtained twice is analyzed, it, can be with if similitude meets setting value Think that the application of software data obtained twice corresponds to same user.
In an alternative embodiment of the invention, the internet behavior data and the corresponding relationship of user established include: ginseng According to the network behavior information for including in the network behavior information and the application of software data for including in the Cookie, establishes and belong to In the corresponding relationship of the identification marking of the Cookie and application of software data of same user.
It is referred to network behavior information, analyzes the similitude between the internet behavior data obtained twice, if similitude is full Sufficient setting value, it may be considered that this internet behavior data obtained twice corresponds to same user.Internet behavior data can herein To be application of software data or one of them as application of software data, another is Cookie.
S22, analysis corresponds to the internet behavior data of each user respectively, to obtain the characteristic information set of each user, The characteristic information set includes various features information.
User's characteristic information is to characterize the information of user characteristics, such as can be gender, the age, income, level of education, institute Belong to the information such as industry.
By analyzing the internet behavior data of each user, the available feature letter being made of a variety of user's characteristic informations Breath set.
S23, the characteristic information set of each user described in comprehensive analysis, to the feature for lacking certain characteristic information Information aggregate is supplemented.
Since the particular content of every kind of characteristic information is not to occur independently of each other completely, pass through comprehensive analysis user's Characteristic information set, it can be found that the incidence relation between characteristic information, and then according to existing characteristic information, certain is lacked to this The characteristic information set of kind characteristic information is supplemented.
In specific implementation, can use Naive Bayes Classification Algorithm to the characteristic information set of each user into Row comprehensive analysis supplements the characteristic information set for lacking certain characteristic information.Namely utilize data mining skill Art improves the information of each dimension of audient's portrait by big data digging technology.Such as it can be according to the request of website audient Source, such as request URL, the information such as source of software APP, using Naive Bayes Classification Algorithm supplement audient gender, the age, Income, level of education, the information such as affiliated industry;Net is requested by analysis request keyword according to the request content of website audient Information of standing etc. supplements the keyword of website audient, the information such as Behavior preference;According to the head information of request, the ground of website audient is judged The information such as domain.
Naive Bayes Classification is a kind of sorting algorithm, and formal definition is as follows:
If x={ a1,a2,……amIt is an item to be sorted, and each a is a characteristic attribute of x;
There is category set C={ y1,y2,……yn};
Calculate P (y1|x),P(y2|x)……P(yn|x);
If P (yk| x)=max { P (y1|x),P(y2|x)……P(yn| x) }, then x ∈ yk
In embodiments of the present invention, x can be the absence of the characteristic information set of certain user's characteristic information, and a is it with packet The user's characteristic information contained;y1y2……ykThe feature for passing through each user for the classification of its user's characteristic information set, the category Information aggregate carries out comprehensive analysis and classification and obtains;By sorting out to x, such corresponding attribute can be to the user that x lacks Characteristic information is supplemented.
S24 carries out user's portrait referring to the characteristic information set of the multiple user.
User's portrait is carried out, can be the portrait for carrying out different dimensions characteristic information to the user of special group, it can also be with It is to draw a portrait for the user with certain special characteristic information.
Special group can be the use being also possible to based on need directly to set according to having certain special characteristic information Family group.
When special characteristic information is interested in automobile, being directed to, there is the group of this feature information to carry out other features The portrait of information, for example the distribution map of the group is drawn, consuming capacity portrait, income portrait, education portrait, gender portrait, duty Industry portrait, the age portrait etc., portrait can be histogram, comparison diagram, etc..
In specific implementation, portrait conditional information, that is, special group above-mentioned be can receive before carrying out user's portrait The selection of user, or the setting to certain special characteristic information;Referring to the portrait conditional information, user's portrait is carried out.
Referring to Fig. 3, in specific implementation, data processing method can also include:
Crawler is arranged to obtain basic information in S31, and the basic information is suitable for extracting the reference of the characteristic information Information.
Can according to need and crawler is configured, configure crawler system in data source crawler strategy be increment strategy also It is full dose strategy, runs crawler system, acquires internet data.For example, crawler can be set to obtain the Type of website, website master Topic, the main audient of website, location information of website etc..
S32, the information format of the unified basic information.
Due to crawler collect information source it is wider, can the format to basic information carry out unification.Crawler can be received The information of collection is screened, that is, crawler system cleaning.Pattern match is carried out, the data of needs are matched, remaining discarding Fall.The standard of cleaning is to realize data normalization, and Uniform data format is used for the correlation inquiry in later period
S33 generates primary knowledge base.
In primary knowledge base comprising the aforementioned Type of website, subject of Web site, the main audient of website, website location information Deng.
S34 supplements the characteristic information set referring to the primary knowledge base.
For example, the sex ratio information comprising access some websites in primary knowledge base, to the internet behavior of some user When data are analyzed, discovery can not individually obtain the sex character information of the user from the internet behavior data, but Find the certain websites of the more access of the user in the internet behavior data of the user, and the common feature in these websites is women Orientation it is more, can estimate the user be women.
In specific implementation, the characteristic information set can also be verified referring to the primary knowledge base.Such as some use Characteristic information in the characteristic set at family is supplemented after characteristic information set by each user of comprehensive analysis, is referred to Primary knowledge base is verified.
In specific implementation, step S31 to step S34 can be located at before step S24 (referring to fig. 2).
The embodiment of the present invention also provides a kind of data processing equipment, and structural schematic diagram is referring to fig. 4.
Data processing equipment includes: internet behavior data capture unit 41, the first analytical unit 42, the second analytical unit 43, portrait unit 44;Wherein:
The internet behavior data capture unit 41 is suitable for obtaining internet behavior data;The internet behavior data include Following at least one: Cookie and application of software data;
First analytical unit 42, it is each to obtain suitable for analyzing the internet behavior data of corresponding each user respectively The characteristic information set of user, the characteristic information set include various features information;
Second analytical unit 43, suitable for the characteristic information set of each user described in comprehensive analysis, with to lacking certain The characteristic information set of kind characteristic information is supplemented;
The portrait unit 44 carries out user's portrait suitable for the characteristic information set referring to the multiple user.
In specific implementation, the second analytical unit 43 is suitable for using Naive Bayes Classification Algorithm to each user Characteristic information set carry out comprehensive analysis, the characteristic information set for lacking certain characteristic information is supplemented.
In specific implementation, the data processing equipment can also include: primary knowledge base generation unit 45 and third point Analyse unit 46;Wherein:
The primary knowledge base generation unit 45, is suitable for: setting crawler is to obtain basic information;The basic information is suitable for As the reference information for extracting the characteristic information;The information format of the unified basic information;Generate primary knowledge base;
The third analytical unit 46 is suitable for supplementing the characteristic information set referring to the primary knowledge base.
In specific implementation, the data processing equipment can also include: the 4th analytical unit, be suitable for referring to the basis Characteristic information set described in knowledge base verification.
In specific implementation, the data processing equipment can also include: correspondence relationship establishing unit 47, be adapted to set up institute State internet behavior data and the corresponding relationship of user.
In specific implementation, the correspondence relationship establishing unit 47 is suitable for: the identification marking of the Cookie is generated, it is described Identification marking includes: machine address, current process, system timestamp;Referring to machine address in the identification marking and described The network behavior information for including in Cookie establishes the corresponding relationship for belonging to the identification marking of Cookie of same user.
In specific implementation, the correspondence relationship establishing unit 47 is suitable for: generating the identification mark of the application of software data Know;The identification marking of the application of software data includes mobile phone IMEI mark;Referring to mobile phone IMEI mark and the software Network behavior information in data establishes the corresponding relationship for belonging to the identification marking of application of software data of same user.
In specific implementation, the correspondence relationship establishing unit 47 is suitable for: referring to the network row for including in the Cookie For the network behavior information for including in information and the application of software data, establishes the Cookie for belonging to same user and application is soft The corresponding relationship of the identification marking of number of packages evidence.
In specific implementation, the portrait unit 44 is further adapted for: receiving portrait conditional information;Believe referring to the portrait condition Breath carries out user's portrait.
Data processing equipment can be located at data processing server 11 (referring to Fig. 1).
The embodiment of the present invention is any one or more of by obtaining Cookie and application of software data, has widened data Source channel has expanded the foundation type of user's portrait, so as to promote the accuracy of user's portrait;By analyzing internet behavior Data, obtain the characteristic information set of each user, each characteristic information set of comprehensive analysis, with to lacking certain characteristic information The characteristic information set supplemented, so as to perfect information characteristic set, expand the dimension of information characteristics set, it is excellent Change the data basis of user's portrait, so as to promote the accuracy of user's portrait.By establishing internet behavior data and user Corresponding relationship, the internet behavior data of the same user of correspondence are associated, can be to avoid when user draws a portrait, to same use Family computes repeatedly, so as to promote the accuracy of user's portrait.By receiving conditional information of drawing a portrait, referring to portrait condition letter Breath carries out user's portrait, provides more targeted user's portrait, it is more wide to be applicable in scene.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: ROM, RAM, disk or CD etc..
Although present disclosure is as above, present invention is not limited to this.Anyone skilled in the art are not departing from this It in the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute Subject to the range of restriction.

Claims (16)

1. a kind of data processing method characterized by comprising
Obtain internet behavior data;The internet behavior data comprise at least one of the following: Cookie and application of software data;
The internet behavior data of the corresponding each user of analysis respectively, to obtain the characteristic information set of each user, the feature Information aggregate includes various features information;
The characteristic information set of each user described in comprehensive analysis, to the characteristic information set for lacking certain characteristic information It is supplemented, comprising: comprehensive analysis is carried out using characteristic information set of the Naive Bayes Classification Algorithm to each user, The characteristic information set for lacking certain characteristic information is supplemented;
Referring to the characteristic information set of each user, user's portrait is carried out.
2. data processing method according to claim 1, which is characterized in that further include: setting crawler is to obtain basic letter Breath, the basic information are suitable for extracting the reference information of the characteristic information;The information format of the unified basic information; Generate primary knowledge base;The characteristic information set is supplemented referring to the primary knowledge base;The basic information includes: website class Type, subject of Web site, the main audient of website and the location information of website.
3. data processing method according to claim 2, which is characterized in that further include: it is tested referring to the primary knowledge base Demonstrate,prove the characteristic information set.
4. data processing method according to claim 1, which is characterized in that further include: establish the internet behavior data With the corresponding relationship of user.
5. data processing method according to claim 4, which is characterized in that described to establish the internet behavior data and use The corresponding relationship at family includes:
The identification marking of the Cookie is generated, the identification marking includes: machine address, current process, system timestamp;
Referring in the identification marking machine address and the Cookie in include network behavior information, foundation belong to it is same The corresponding relationship of the identification marking of the Cookie of user.
6. data processing method according to claim 4, which is characterized in that described to establish the internet behavior data and use The corresponding relationship at family includes:
Generate the identification marking of the application of software data;The identification marking of the application of software data includes mobile phone IMEI mark Know;
Referring to the network behavior information in mobile phone IMEI mark and the software data, the application for belonging to same user is established The corresponding relationship of the identification marking of software data.
7. data processing method according to claim 4, which is characterized in that described to establish the internet behavior data and use The corresponding relationship at family includes: to include in the Cookie referring in the network behavior information and the application of software data for including Network behavior information establishes the corresponding relationship of the identification marking of the Cookie and application of software data that belong to same user.
8. data processing method according to claim 1, which is characterized in that the feature referring to each user is believed Breath set, carries out user's portrait, further includes: receives portrait conditional information;Referring to the portrait conditional information, user's picture is carried out Picture.
9. a kind of data processing equipment characterized by comprising internet behavior data capture unit, the first analytical unit, second Analytical unit, portrait unit;Wherein:
The internet behavior data capture unit is suitable for obtaining internet behavior data;The internet behavior data include below extremely Few one kind: Cookie and application of software data;
First analytical unit, suitable for analyzing the internet behavior data of corresponding each user respectively, to obtain each user's Characteristic information set, the characteristic information set include various features information;
Second analytical unit, suitable for using Naive Bayes Classification Algorithm to the characteristic information set of each user into Row comprehensive analysis supplements the characteristic information set for lacking certain characteristic information;
The portrait unit carries out user's portrait suitable for the characteristic information set referring to each user.
10. data processing equipment according to claim 9, which is characterized in that further include: primary knowledge base generation unit and Third analytical unit;Wherein:
The primary knowledge base generation unit, is suitable for: setting crawler is to obtain basic information;The basic information is suitable for mentioning Take the reference information of the characteristic information;The information format of the unified basic information;
Generate primary knowledge base;The basic information includes: the Type of website, subject of Web site, the main audient of website and website Location information;
The third analytical unit is suitable for supplementing the characteristic information set referring to the primary knowledge base.
11. data processing equipment according to claim 10, which is characterized in that further include: the 4th analytical unit is suitable for ginseng The characteristic information set is verified according to the primary knowledge base.
12. data processing equipment according to claim 9, which is characterized in that further include: correspondence relationship establishing unit is fitted In the corresponding relationship for establishing the internet behavior data and user.
13. data processing equipment according to claim 12, which is characterized in that the correspondence relationship establishing unit is suitable for: The identification marking of the Cookie is generated, the identification marking includes: machine address, current process, system timestamp;Referring to institute The network behavior information for including in the machine address in identification marking and the Cookie is stated, foundation belongs to same user's The corresponding relationship of the identification marking of Cookie.
14. data processing equipment according to claim 12, which is characterized in that the correspondence relationship establishing unit is suitable for: Generate the identification marking of the application of software data;The identification marking of the application of software data includes mobile phone IMEI mark;Ginseng According to the network behavior information in mobile phone IMEI mark and the software data, the application software number for belonging to same user is established According to identification marking corresponding relationship.
15. data processing equipment according to claim 12, which is characterized in that the correspondence relationship establishing unit is suitable for: Referring to the network behavior information for including in the network behavior information and the application of software data for including in the Cookie, establish Belong to the corresponding relationship of the Cookie of same user and the identification marking of application of software data.
16. data processing equipment according to claim 9, which is characterized in that the portrait unit is further adapted for: receiving picture As conditional information;Referring to the portrait conditional information, user's portrait is carried out.
CN201510843568.4A 2015-11-26 2015-11-26 A kind of data processing method and device Active CN105447147B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510843568.4A CN105447147B (en) 2015-11-26 2015-11-26 A kind of data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510843568.4A CN105447147B (en) 2015-11-26 2015-11-26 A kind of data processing method and device

Publications (2)

Publication Number Publication Date
CN105447147A CN105447147A (en) 2016-03-30
CN105447147B true CN105447147B (en) 2019-02-01

Family

ID=55557323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510843568.4A Active CN105447147B (en) 2015-11-26 2015-11-26 A kind of data processing method and device

Country Status (1)

Country Link
CN (1) CN105447147B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463762A (en) * 2016-06-03 2017-12-12 阿里巴巴集团控股有限公司 A kind of man-machine interaction method, device and electronic equipment
CN106534164B (en) * 2016-12-05 2019-09-03 公安部第三研究所 Effective virtual identity depicting method based on cyberspace user identifier
CN106933946A (en) * 2017-01-20 2017-07-07 深圳市三体科技有限公司 A kind of big data management method and system based on mobile terminal
CN107578272A (en) * 2017-08-10 2018-01-12 上海斐讯数据通信技术有限公司 A kind of method and device for kinsfolk's portrait
CN108549685A (en) * 2018-04-08 2018-09-18 武志学 Behavior analysis method, device, system and readable storage medium storing program for executing
CN108628980A (en) * 2018-04-27 2018-10-09 四川斐讯信息技术有限公司 A kind of user's portrait method and system based on user network behavior
CN109033149B (en) * 2018-06-12 2020-11-13 北京奇艺世纪科技有限公司 Information recommendation method and device, server and storage medium
CN109658129A (en) * 2018-11-22 2019-04-19 北京奇虎科技有限公司 A kind of generation method and device of user's portrait
CN109977308B (en) * 2019-03-20 2021-07-13 北京字节跳动网络技术有限公司 User group portrait construction method and device, storage medium and electronic equipment
CN111724187A (en) * 2019-03-21 2020-09-29 上海晶赞融宣科技有限公司 DMP audience data real-time processing method and device and computer readable storage medium
WO2020257993A1 (en) * 2019-06-24 2020-12-30 深圳市欢太科技有限公司 Content pushing method and apparatus, server, and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7137009B1 (en) * 2000-01-06 2006-11-14 International Business Machines Corporation Method and apparatus for securing a cookie cache in a data processing system
CN1878096A (en) * 2006-07-04 2006-12-13 陈玲玲 Method for detecting number of computer users in inner compute network
CN101222348A (en) * 2007-01-10 2008-07-16 阿里巴巴公司 Method and system for calculating number of website real user

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030005046A1 (en) * 2001-06-06 2003-01-02 Lagniappe Marketing System and method for managing marketing applications for a website

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7137009B1 (en) * 2000-01-06 2006-11-14 International Business Machines Corporation Method and apparatus for securing a cookie cache in a data processing system
CN1878096A (en) * 2006-07-04 2006-12-13 陈玲玲 Method for detecting number of computer users in inner compute network
CN101222348A (en) * 2007-01-10 2008-07-16 阿里巴巴公司 Method and system for calculating number of website real user

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《基于个性化的档案检索方式研究》;吴亚平;《兰台世界》;20130712;第59页
《基于个性化的档案检索方式研究》;吴亚平;《兰台世界》;20130712;论文第3.2节
《基于路径与页面挖掘的用户浏览行为研究》;雷良鹏;《中国优秀硕士学位论文全文数据库 信息科技辑》;20150415;第59页

Also Published As

Publication number Publication date
CN105447147A (en) 2016-03-30

Similar Documents

Publication Publication Date Title
CN105447147B (en) A kind of data processing method and device
US10530671B2 (en) Methods, systems, and computer readable media for generating and using a web page classification model
US10664872B2 (en) Systems and methods for generating network intelligence through real-time analytics
CN105608179B (en) The method and apparatus for determining the relevance of user identifier
US9015128B2 (en) Method and system for measuring social influence and receptivity of users
CN107515915B (en) User identification association method based on user behavior data
CN110278466B (en) Short video advertisement putting method, device and equipment
WO2007071143A1 (en) Method and apparatus for issuing network information
CN107077498B (en) Representing entity relationships in online advertisements
US20140095308A1 (en) Advertisement distribution apparatus and advertisement distribution method
CN106682686A (en) User gender prediction method based on mobile phone Internet-surfing behavior
CN107896153B (en) Traffic package recommendation method and device based on mobile user internet surfing behavior
CN106603734A (en) CDN service IP detection method and system
US20150161278A1 (en) Method and apparatus for identifying webpage type
CN107918618B (en) Data processing method and device
JP2013125468A (en) Advertisement distribution device
EP1738524A1 (en) Method and system for generating a population representative of a set of users of a communication network
CN105491444A (en) Data identification processing method and device
WO2015084584A2 (en) Method and system for collecting resource access information
CN107766234A (en) A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device
CN106126519A (en) The methods of exhibiting of media information and server
CN107633257B (en) Data quality evaluation method and device, computer readable storage medium and terminal
US20100082359A1 (en) Multi-Granular Age Range Products For Use in Online Marketing
Kotzias et al. Addressing the Sparsity of Location Information on Twitter.
CN105447148B (en) A kind of Cookie mark correlating method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant