CN105447147B - A kind of data processing method and device - Google Patents
A kind of data processing method and device Download PDFInfo
- Publication number
- CN105447147B CN105447147B CN201510843568.4A CN201510843568A CN105447147B CN 105447147 B CN105447147 B CN 105447147B CN 201510843568 A CN201510843568 A CN 201510843568A CN 105447147 B CN105447147 B CN 105447147B
- Authority
- CN
- China
- Prior art keywords
- user
- data
- information
- characteristic information
- portrait
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
A kind of data processing method and device, which comprises obtain internet behavior data;The internet behavior data comprise at least one of the following: Cookie and application of software data;The internet behavior data of the corresponding each user of analysis respectively, to obtain the characteristic information set of each user, the characteristic information set includes various features information;The characteristic information set of each user described in comprehensive analysis, to supplement the characteristic information set for lacking certain characteristic information;Referring to the characteristic information set of the multiple user, user's portrait is carried out.The method and device can promote the accuracy of user's Portrait brand technology.
Description
Technical field
The present invention relates to data processing field more particularly to a kind of data processing method and devices.
Background technique
With the development of Information technology, more and more daily demands can be expressed by network, and network provides for user
The information of magnanimity, while user can also leave one's own footprint on network.It, can with the progress of information gathering techniques
To obtain more data from network, the extraction of feature can also be carried out to user or user group based on the information being collected into
And conclusion, this extraction and conclusion, also referred to as user draw a portrait.
Existing user's Portrait brand technology accuracy has to be hoisted.
Summary of the invention
Present invention solves the technical problem that being the accuracy for promoting user's portrait.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of data processing method, comprising:
Obtain internet behavior data;The internet behavior data comprise at least one of the following: Cookie and application software number
According to;
The internet behavior data of the corresponding each user of analysis respectively, it is described to obtain the characteristic information set of each user
Characteristic information set includes various features information;
The characteristic information set of each user described in comprehensive analysis, to the characteristic information for lacking certain characteristic information
Set is supplemented;
Referring to the characteristic information set of the multiple user, user's portrait is carried out.
Optionally, the characteristic information set of each user described in the comprehensive analysis, with to lacking certain characteristic information
It includes: the characteristic information collection using Naive Bayes Classification Algorithm to each user that the characteristic information set, which carries out supplement,
It closes and carries out comprehensive analysis, the characteristic information set for lacking certain characteristic information is supplemented.
Optionally, the data processing method further include: to obtain basic information, the basic information is suitable for setting crawler
As the reference information for extracting the characteristic information;The information format of the unified basic information;Generate primary knowledge base;Reference
The primary knowledge base supplements the characteristic information set.
Optionally, the data processing method further include: verify the characteristic information set referring to the primary knowledge base.
Optionally, the data processing method further include: establish the corresponding relationship of the internet behavior data and user.
Optionally, described to establish the internet behavior data and the corresponding relationship of user includes:
The identification marking of the Cookie is generated, the identification marking includes: machine address, current process, system time
Stamp;
Referring in the identification marking machine address and the Cookie in include network behavior information, foundation belongs to
The corresponding relationship of the identification marking of the Cookie of same user.
Optionally, described to establish the internet behavior data and the corresponding relationship of user includes:
Generate the identification marking of the application of software data;The identification marking of the application of software data includes mobile phone IMEI
Mark;
Referring to the network behavior information in mobile phone IMEI mark and the software data, foundation belongs to same user's
The corresponding relationship of the identification marking of application of software data.
Optionally, the internet behavior data and the corresponding relationship of user established include: referring in the Cookie
Including network behavior information and the application of software data in include network behavior information, foundation belong to same user's
The corresponding relationship of the identification marking of Cookie and application of software data.
Optionally, the characteristic information set referring to the multiple user carries out user's portrait, further includes: receives picture
As conditional information;Referring to the portrait conditional information, user's portrait is carried out.
The embodiment of the present invention also provides a kind of data processing equipment, comprising: internet behavior data capture unit, the first analysis
Unit, the second analytical unit, portrait unit;Wherein:
The internet behavior data capture unit is suitable for obtaining internet behavior data;The internet behavior data include with
Lower at least one: Cookie and application of software data;
First analytical unit, suitable for analyzing the internet behavior data of corresponding each user respectively, to obtain each use
The characteristic information set at family, the characteristic information set include various features information;
Second analytical unit, suitable for the characteristic information set of each user described in comprehensive analysis, with to lacking certain
The characteristic information set of characteristic information is supplemented;
The portrait unit carries out user's portrait suitable for the characteristic information set referring to the multiple user.
Optionally, second analytical unit, suitable for the spy using Naive Bayes Classification Algorithm to each user
It levies information aggregate and carries out comprehensive analysis, the characteristic information set for lacking certain characteristic information is supplemented.
Optionally, the data processing equipment further include: primary knowledge base generation unit and third analytical unit;Wherein:
The primary knowledge base generation unit, is suitable for: setting crawler is to obtain basic information;The basic information is suitable for making
For the reference information for extracting the characteristic information;The information format of the unified basic information;Generate primary knowledge base;
The third analytical unit is suitable for supplementing the characteristic information set referring to the primary knowledge base.
Optionally, the data processing equipment further include: the 4th analytical unit is suitable for verifying referring to the primary knowledge base
The characteristic information set.
Optionally, the data processing equipment further include: correspondence relationship establishing unit is adapted to set up the internet behavior number
According to the corresponding relationship with user.
Optionally, the correspondence relationship establishing unit is suitable for: generating the identification marking of the Cookie, the identification marking
It include: machine address, current process, system timestamp;Referring in the machine address and the Cookie in the identification marking
Including network behavior information, establish belong to same user Cookie identification marking corresponding relationship.
Optionally, the correspondence relationship establishing unit is suitable for: generating the identification marking of the application of software data;It is described to answer
It include mobile phone IMEI mark with the identification marking of software data;Referring to the net in mobile phone IMEI mark and the software data
Network behavioural information establishes the corresponding relationship for belonging to the identification marking of application of software data of same user.
Optionally, the correspondence relationship establishing unit is suitable for: referring to the network behavior information that includes in the Cookie and
The network behavior information for including in the application of software data establishes the Cookie's and application of software data for belonging to same user
The corresponding relationship of identification marking.
Optionally, the portrait unit, is further adapted for: receiving portrait conditional information;Referring to the portrait conditional information, carry out
User's portrait.
Compared with prior art, the technical solution of the embodiment of the present invention has the advantages that
It is any one or more of by obtaining Cookie and application of software data, data source channel has been widened, has been expanded
Foundation the type of user's portrait, so as to promote the accuracy that user draws a portrait;By analyzing internet behavior data, obtain every
The characteristic information set of a user, each characteristic information set of comprehensive analysis, to the feature for lacking certain characteristic information
Information aggregate is supplemented, and so as to perfect information characteristic set, expands the dimension of information characteristics set, optimization user's portrait
Data basis, so as to promoted user portrait accuracy.
Further, by establishing the corresponding relationship of internet behavior data and user, by the internet behavior of the same user of correspondence
Data are associated, and can be computed repeatedly to avoid when user draws a portrait to same user, so as to promote user's portrait
Accuracy.
In addition, by receive portrait conditional information, referring to portrait conditional information carry out user's portrait, can provide more added with
Targetedly user draws a portrait, and it is more wide to be applicable in scene.
Detailed description of the invention
Fig. 1 is a kind of structural schematic diagram of data processing system in the embodiment of the present invention;
Fig. 2 is a kind of flow chart of data processing method in the embodiment of the present invention;
Fig. 3 is a kind of partial process view of data processing method in the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of data processing equipment in the embodiment of the present invention.
Specific embodiment
Through inventor the study found that existing user's Portrait brand technology can only often draw a portrait to the data of some websites,
And do not include causing the analysis of data that inherently there is limitation in this way to other data in internet, analysis result can only
It is a part of this entire website data, and cannot be with the more comparative analyses of internet others website data.
Existing user's Portrait brand technology is not accurate enough to the identification of user i.e. website audient: website data and website by
Many interactions be mainly pass through the primary displaying of browser, and conventional website Cookie can only using the same browsing apparatus as
Mark, carrys out the behavior of analyzing web site audient.And in real life, the same website audient can be by notebook, mobile phone, plate
The multiple terminals equipment such as computer, in different times accesses to website with place.The identification method of the same network audience
It may be cell-phone number, E-mail address, QQ number, WeChat ID, a variety of marks such as Taobao's account are associated with not enough between traditional data, lead
It causes the same physics audient that different audients is taken as to handle, causes portrait result not accurate enough.
In existing user's Portrait brand technology, since the data dimension data directly acquired are not comprehensive, part dimension can only be done
The audient of degree, which draws a portrait, to be analyzed.Make the portrait of website audient, it is necessary to have comprehensive, the audience information of each dimension, such as audient
Age, the gender of audient, the educational background of audient, the occupation of audient, the Regional Distribution etc. of audient, and traditional website ERP system
The only data of one side or several respects, therefore the audient that can only also do partial dimensional draws a portrait and analyzes.
Existing user's Portrait brand technology cannot customized audience portrait analysis.The data of conventional web sites only have in website
The data in portion, therefore the audient of this website can only be analyzed, the portrait of other audiences is supported inadequate.Such as it is right
In the portrait analysis of the similar competition client in this website, newly attracts the audient come to draw a portrait for advertiser web site and analyze, because of website
The limitation of data draws a portrait to customized audience to analyze and supports not being fine.
Data processing method in the embodiment of the present invention, by obtain any one of Cookie and application of software data or
It is a variety of, data source channel has been widened, the foundation type of user's portrait has been expanded, so as to promote the accurate of user's portrait
Degree;By analyzing internet behavior data, the characteristic information set of each user is obtained, each characteristic information set of comprehensive analysis,
To supplement the characteristic information set for lacking certain characteristic information, so as to perfect information characteristic set, expand
The dimension of information characteristics set, the data basis of optimization user's portrait, so as to promote the accuracy of user's portrait.
By establishing the corresponding relationship of internet behavior data and user, the internet behavior data of the same user of correspondence are carried out
Association, can compute repeatedly same user to avoid when user draws a portrait, so as to promote the accuracy of user's portrait.
By receiving conditional information of drawing a portrait, user's portrait is carried out referring to portrait conditional information, can be provided more added with being directed to
Property user portrait, be applicable in scene it is more wide.
It is understandable to enable above-mentioned purpose of the invention, feature and beneficial effect to become apparent, with reference to the accompanying drawing to this
The specific embodiment of invention is described in detail.
Fig. 1 is a kind of structural schematic diagram of data processing system in the embodiment of the present invention.
Data processing system includes data processing server 11 and user terminal 12.
Data processing server 11 can obtain internet behavior data from user terminal 12, and user terminal 12 may include multiple use
Family end, for example, user terminal 1, user terminal 2 shown in Fig. 1 ..., user terminal N.Data processing server 11 can be separate unit clothes
Business device, distributed server or server cluster.User terminal 12 can be network intelligence service equipment, such as can be separate unit meter
Calculation machine, tablet computer, mobile phone etc..
Internet behavior data can be Cookie and application of software data, and wherein application of software data, which can be, is loaded in use
The data that application software in family end 12 generates, such as to can be QQ, Taobao, wechat and the mobile phone user end of major website soft
The generation data of the application software such as part.
Data processing server 11 can the internet behavior data to user terminal 12 analyze, obtain corresponding feature letter
Breath carries out user's portrait.
Fig. 2 is a kind of flow chart of data processing method in the embodiment of the present invention, is illustrated in conjunction with Fig. 1.
S21 obtains internet behavior data;The internet behavior data comprise at least one of the following: Cookie and application are soft
Number of packages evidence.
Certain websites in order to distinguish user identity, carry out session tracking and be stored in the data on user local terminal,
It wherein may include the network behavior information of user.Network behavior information is the information that corresponding user carries out network activity, can be with
It is the IP address of user's online, the URL, user-agent, the user id etc. of the logged third party website of user for accessing website
Information.
Application of software data, which can be, is loaded into the data that the application software of user terminal 12 generates, wherein can be also comprising using
The network behavior information at family.
In specific implementation, it is corresponding with user that internet behavior data can also be established after obtaining internet behavior data
Relationship.As previously mentioned, if not distinguished to the corresponding user of internet behavior data, may in portrait to same user into
Row computes repeatedly, and causes user's portrait not accurate enough.
In an embodiment of the present invention, the internet behavior data and the corresponding relationship of user established include: to generate
The identification marking of the Cookie, the identification marking include: machine address, current process, system timestamp;Referring to the knowledge
Not Biao Shi in machine address and the Cookie in include network behavior information, establish the Cookie for belonging to same user
The corresponding relationship of identification marking.
It is considered that the identical Cookie of machine address corresponds to same user in identification marking;It is referred to network behavior
Information analyzes the similitude between two Cookie, if similitude meets setting value, it may be considered that the two Cookie correspond to
Same user.
In an alternative embodiment of the invention, the internet behavior data and the corresponding relationship of user established include: life
At the identification marking of the application of software data;The identification marking of the application of software data includes mobile phone IMEI mark;Reference
Network behavior information in the mobile phone IMEI mark and the software data, establishes the application of software data for belonging to same user
Identification marking corresponding relationship.
It is considered that mobile phone IMEI identifies identical application of software data corresponding to same user in identification marking;It can join
According to network behavior information, the similitude between the application of software data obtained twice is analyzed, it, can be with if similitude meets setting value
Think that the application of software data obtained twice corresponds to same user.
In an alternative embodiment of the invention, the internet behavior data and the corresponding relationship of user established include: ginseng
According to the network behavior information for including in the network behavior information and the application of software data for including in the Cookie, establishes and belong to
In the corresponding relationship of the identification marking of the Cookie and application of software data of same user.
It is referred to network behavior information, analyzes the similitude between the internet behavior data obtained twice, if similitude is full
Sufficient setting value, it may be considered that this internet behavior data obtained twice corresponds to same user.Internet behavior data can herein
To be application of software data or one of them as application of software data, another is Cookie.
S22, analysis corresponds to the internet behavior data of each user respectively, to obtain the characteristic information set of each user,
The characteristic information set includes various features information.
User's characteristic information is to characterize the information of user characteristics, such as can be gender, the age, income, level of education, institute
Belong to the information such as industry.
By analyzing the internet behavior data of each user, the available feature letter being made of a variety of user's characteristic informations
Breath set.
S23, the characteristic information set of each user described in comprehensive analysis, to the feature for lacking certain characteristic information
Information aggregate is supplemented.
Since the particular content of every kind of characteristic information is not to occur independently of each other completely, pass through comprehensive analysis user's
Characteristic information set, it can be found that the incidence relation between characteristic information, and then according to existing characteristic information, certain is lacked to this
The characteristic information set of kind characteristic information is supplemented.
In specific implementation, can use Naive Bayes Classification Algorithm to the characteristic information set of each user into
Row comprehensive analysis supplements the characteristic information set for lacking certain characteristic information.Namely utilize data mining skill
Art improves the information of each dimension of audient's portrait by big data digging technology.Such as it can be according to the request of website audient
Source, such as request URL, the information such as source of software APP, using Naive Bayes Classification Algorithm supplement audient gender, the age,
Income, level of education, the information such as affiliated industry;Net is requested by analysis request keyword according to the request content of website audient
Information of standing etc. supplements the keyword of website audient, the information such as Behavior preference;According to the head information of request, the ground of website audient is judged
The information such as domain.
Naive Bayes Classification is a kind of sorting algorithm, and formal definition is as follows:
If x={ a1,a2,……amIt is an item to be sorted, and each a is a characteristic attribute of x;
There is category set C={ y1,y2,……yn};
Calculate P (y1|x),P(y2|x)……P(yn|x);
If P (yk| x)=max { P (y1|x),P(y2|x)……P(yn| x) }, then x ∈ yk。
In embodiments of the present invention, x can be the absence of the characteristic information set of certain user's characteristic information, and a is it with packet
The user's characteristic information contained;y1y2……ykThe feature for passing through each user for the classification of its user's characteristic information set, the category
Information aggregate carries out comprehensive analysis and classification and obtains;By sorting out to x, such corresponding attribute can be to the user that x lacks
Characteristic information is supplemented.
S24 carries out user's portrait referring to the characteristic information set of the multiple user.
User's portrait is carried out, can be the portrait for carrying out different dimensions characteristic information to the user of special group, it can also be with
It is to draw a portrait for the user with certain special characteristic information.
Special group can be the use being also possible to based on need directly to set according to having certain special characteristic information
Family group.
When special characteristic information is interested in automobile, being directed to, there is the group of this feature information to carry out other features
The portrait of information, for example the distribution map of the group is drawn, consuming capacity portrait, income portrait, education portrait, gender portrait, duty
Industry portrait, the age portrait etc., portrait can be histogram, comparison diagram, etc..
In specific implementation, portrait conditional information, that is, special group above-mentioned be can receive before carrying out user's portrait
The selection of user, or the setting to certain special characteristic information;Referring to the portrait conditional information, user's portrait is carried out.
Referring to Fig. 3, in specific implementation, data processing method can also include:
Crawler is arranged to obtain basic information in S31, and the basic information is suitable for extracting the reference of the characteristic information
Information.
Can according to need and crawler is configured, configure crawler system in data source crawler strategy be increment strategy also
It is full dose strategy, runs crawler system, acquires internet data.For example, crawler can be set to obtain the Type of website, website master
Topic, the main audient of website, location information of website etc..
S32, the information format of the unified basic information.
Due to crawler collect information source it is wider, can the format to basic information carry out unification.Crawler can be received
The information of collection is screened, that is, crawler system cleaning.Pattern match is carried out, the data of needs are matched, remaining discarding
Fall.The standard of cleaning is to realize data normalization, and Uniform data format is used for the correlation inquiry in later period
S33 generates primary knowledge base.
In primary knowledge base comprising the aforementioned Type of website, subject of Web site, the main audient of website, website location information
Deng.
S34 supplements the characteristic information set referring to the primary knowledge base.
For example, the sex ratio information comprising access some websites in primary knowledge base, to the internet behavior of some user
When data are analyzed, discovery can not individually obtain the sex character information of the user from the internet behavior data, but
Find the certain websites of the more access of the user in the internet behavior data of the user, and the common feature in these websites is women
Orientation it is more, can estimate the user be women.
In specific implementation, the characteristic information set can also be verified referring to the primary knowledge base.Such as some use
Characteristic information in the characteristic set at family is supplemented after characteristic information set by each user of comprehensive analysis, is referred to
Primary knowledge base is verified.
In specific implementation, step S31 to step S34 can be located at before step S24 (referring to fig. 2).
The embodiment of the present invention also provides a kind of data processing equipment, and structural schematic diagram is referring to fig. 4.
Data processing equipment includes: internet behavior data capture unit 41, the first analytical unit 42, the second analytical unit
43, portrait unit 44;Wherein:
The internet behavior data capture unit 41 is suitable for obtaining internet behavior data;The internet behavior data include
Following at least one: Cookie and application of software data;
First analytical unit 42, it is each to obtain suitable for analyzing the internet behavior data of corresponding each user respectively
The characteristic information set of user, the characteristic information set include various features information;
Second analytical unit 43, suitable for the characteristic information set of each user described in comprehensive analysis, with to lacking certain
The characteristic information set of kind characteristic information is supplemented;
The portrait unit 44 carries out user's portrait suitable for the characteristic information set referring to the multiple user.
In specific implementation, the second analytical unit 43 is suitable for using Naive Bayes Classification Algorithm to each user
Characteristic information set carry out comprehensive analysis, the characteristic information set for lacking certain characteristic information is supplemented.
In specific implementation, the data processing equipment can also include: primary knowledge base generation unit 45 and third point
Analyse unit 46;Wherein:
The primary knowledge base generation unit 45, is suitable for: setting crawler is to obtain basic information;The basic information is suitable for
As the reference information for extracting the characteristic information;The information format of the unified basic information;Generate primary knowledge base;
The third analytical unit 46 is suitable for supplementing the characteristic information set referring to the primary knowledge base.
In specific implementation, the data processing equipment can also include: the 4th analytical unit, be suitable for referring to the basis
Characteristic information set described in knowledge base verification.
In specific implementation, the data processing equipment can also include: correspondence relationship establishing unit 47, be adapted to set up institute
State internet behavior data and the corresponding relationship of user.
In specific implementation, the correspondence relationship establishing unit 47 is suitable for: the identification marking of the Cookie is generated, it is described
Identification marking includes: machine address, current process, system timestamp;Referring to machine address in the identification marking and described
The network behavior information for including in Cookie establishes the corresponding relationship for belonging to the identification marking of Cookie of same user.
In specific implementation, the correspondence relationship establishing unit 47 is suitable for: generating the identification mark of the application of software data
Know;The identification marking of the application of software data includes mobile phone IMEI mark;Referring to mobile phone IMEI mark and the software
Network behavior information in data establishes the corresponding relationship for belonging to the identification marking of application of software data of same user.
In specific implementation, the correspondence relationship establishing unit 47 is suitable for: referring to the network row for including in the Cookie
For the network behavior information for including in information and the application of software data, establishes the Cookie for belonging to same user and application is soft
The corresponding relationship of the identification marking of number of packages evidence.
In specific implementation, the portrait unit 44 is further adapted for: receiving portrait conditional information;Believe referring to the portrait condition
Breath carries out user's portrait.
Data processing equipment can be located at data processing server 11 (referring to Fig. 1).
The embodiment of the present invention is any one or more of by obtaining Cookie and application of software data, has widened data
Source channel has expanded the foundation type of user's portrait, so as to promote the accuracy of user's portrait;By analyzing internet behavior
Data, obtain the characteristic information set of each user, each characteristic information set of comprehensive analysis, with to lacking certain characteristic information
The characteristic information set supplemented, so as to perfect information characteristic set, expand the dimension of information characteristics set, it is excellent
Change the data basis of user's portrait, so as to promote the accuracy of user's portrait.By establishing internet behavior data and user
Corresponding relationship, the internet behavior data of the same user of correspondence are associated, can be to avoid when user draws a portrait, to same use
Family computes repeatedly, so as to promote the accuracy of user's portrait.By receiving conditional information of drawing a portrait, referring to portrait condition letter
Breath carries out user's portrait, provides more targeted user's portrait, it is more wide to be applicable in scene.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage
Medium may include: ROM, RAM, disk or CD etc..
Although present disclosure is as above, present invention is not limited to this.Anyone skilled in the art are not departing from this
It in the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute
Subject to the range of restriction.
Claims (16)
1. a kind of data processing method characterized by comprising
Obtain internet behavior data;The internet behavior data comprise at least one of the following: Cookie and application of software data;
The internet behavior data of the corresponding each user of analysis respectively, to obtain the characteristic information set of each user, the feature
Information aggregate includes various features information;
The characteristic information set of each user described in comprehensive analysis, to the characteristic information set for lacking certain characteristic information
It is supplemented, comprising: comprehensive analysis is carried out using characteristic information set of the Naive Bayes Classification Algorithm to each user,
The characteristic information set for lacking certain characteristic information is supplemented;
Referring to the characteristic information set of each user, user's portrait is carried out.
2. data processing method according to claim 1, which is characterized in that further include: setting crawler is to obtain basic letter
Breath, the basic information are suitable for extracting the reference information of the characteristic information;The information format of the unified basic information;
Generate primary knowledge base;The characteristic information set is supplemented referring to the primary knowledge base;The basic information includes: website class
Type, subject of Web site, the main audient of website and the location information of website.
3. data processing method according to claim 2, which is characterized in that further include: it is tested referring to the primary knowledge base
Demonstrate,prove the characteristic information set.
4. data processing method according to claim 1, which is characterized in that further include: establish the internet behavior data
With the corresponding relationship of user.
5. data processing method according to claim 4, which is characterized in that described to establish the internet behavior data and use
The corresponding relationship at family includes:
The identification marking of the Cookie is generated, the identification marking includes: machine address, current process, system timestamp;
Referring in the identification marking machine address and the Cookie in include network behavior information, foundation belong to it is same
The corresponding relationship of the identification marking of the Cookie of user.
6. data processing method according to claim 4, which is characterized in that described to establish the internet behavior data and use
The corresponding relationship at family includes:
Generate the identification marking of the application of software data;The identification marking of the application of software data includes mobile phone IMEI mark
Know;
Referring to the network behavior information in mobile phone IMEI mark and the software data, the application for belonging to same user is established
The corresponding relationship of the identification marking of software data.
7. data processing method according to claim 4, which is characterized in that described to establish the internet behavior data and use
The corresponding relationship at family includes: to include in the Cookie referring in the network behavior information and the application of software data for including
Network behavior information establishes the corresponding relationship of the identification marking of the Cookie and application of software data that belong to same user.
8. data processing method according to claim 1, which is characterized in that the feature referring to each user is believed
Breath set, carries out user's portrait, further includes: receives portrait conditional information;Referring to the portrait conditional information, user's picture is carried out
Picture.
9. a kind of data processing equipment characterized by comprising internet behavior data capture unit, the first analytical unit, second
Analytical unit, portrait unit;Wherein:
The internet behavior data capture unit is suitable for obtaining internet behavior data;The internet behavior data include below extremely
Few one kind: Cookie and application of software data;
First analytical unit, suitable for analyzing the internet behavior data of corresponding each user respectively, to obtain each user's
Characteristic information set, the characteristic information set include various features information;
Second analytical unit, suitable for using Naive Bayes Classification Algorithm to the characteristic information set of each user into
Row comprehensive analysis supplements the characteristic information set for lacking certain characteristic information;
The portrait unit carries out user's portrait suitable for the characteristic information set referring to each user.
10. data processing equipment according to claim 9, which is characterized in that further include: primary knowledge base generation unit and
Third analytical unit;Wherein:
The primary knowledge base generation unit, is suitable for: setting crawler is to obtain basic information;The basic information is suitable for mentioning
Take the reference information of the characteristic information;The information format of the unified basic information;
Generate primary knowledge base;The basic information includes: the Type of website, subject of Web site, the main audient of website and website
Location information;
The third analytical unit is suitable for supplementing the characteristic information set referring to the primary knowledge base.
11. data processing equipment according to claim 10, which is characterized in that further include: the 4th analytical unit is suitable for ginseng
The characteristic information set is verified according to the primary knowledge base.
12. data processing equipment according to claim 9, which is characterized in that further include: correspondence relationship establishing unit is fitted
In the corresponding relationship for establishing the internet behavior data and user.
13. data processing equipment according to claim 12, which is characterized in that the correspondence relationship establishing unit is suitable for:
The identification marking of the Cookie is generated, the identification marking includes: machine address, current process, system timestamp;Referring to institute
The network behavior information for including in the machine address in identification marking and the Cookie is stated, foundation belongs to same user's
The corresponding relationship of the identification marking of Cookie.
14. data processing equipment according to claim 12, which is characterized in that the correspondence relationship establishing unit is suitable for:
Generate the identification marking of the application of software data;The identification marking of the application of software data includes mobile phone IMEI mark;Ginseng
According to the network behavior information in mobile phone IMEI mark and the software data, the application software number for belonging to same user is established
According to identification marking corresponding relationship.
15. data processing equipment according to claim 12, which is characterized in that the correspondence relationship establishing unit is suitable for:
Referring to the network behavior information for including in the network behavior information and the application of software data for including in the Cookie, establish
Belong to the corresponding relationship of the Cookie of same user and the identification marking of application of software data.
16. data processing equipment according to claim 9, which is characterized in that the portrait unit is further adapted for: receiving picture
As conditional information;Referring to the portrait conditional information, user's portrait is carried out.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510843568.4A CN105447147B (en) | 2015-11-26 | 2015-11-26 | A kind of data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510843568.4A CN105447147B (en) | 2015-11-26 | 2015-11-26 | A kind of data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105447147A CN105447147A (en) | 2016-03-30 |
CN105447147B true CN105447147B (en) | 2019-02-01 |
Family
ID=55557323
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510843568.4A Active CN105447147B (en) | 2015-11-26 | 2015-11-26 | A kind of data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105447147B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463762A (en) * | 2016-06-03 | 2017-12-12 | 阿里巴巴集团控股有限公司 | A kind of man-machine interaction method, device and electronic equipment |
CN106534164B (en) * | 2016-12-05 | 2019-09-03 | 公安部第三研究所 | Effective virtual identity depicting method based on cyberspace user identifier |
CN106933946A (en) * | 2017-01-20 | 2017-07-07 | 深圳市三体科技有限公司 | A kind of big data management method and system based on mobile terminal |
CN107578272A (en) * | 2017-08-10 | 2018-01-12 | 上海斐讯数据通信技术有限公司 | A kind of method and device for kinsfolk's portrait |
CN108549685A (en) * | 2018-04-08 | 2018-09-18 | 武志学 | Behavior analysis method, device, system and readable storage medium storing program for executing |
CN108628980A (en) * | 2018-04-27 | 2018-10-09 | 四川斐讯信息技术有限公司 | A kind of user's portrait method and system based on user network behavior |
CN109033149B (en) * | 2018-06-12 | 2020-11-13 | 北京奇艺世纪科技有限公司 | Information recommendation method and device, server and storage medium |
CN109658129A (en) * | 2018-11-22 | 2019-04-19 | 北京奇虎科技有限公司 | A kind of generation method and device of user's portrait |
CN109977308B (en) * | 2019-03-20 | 2021-07-13 | 北京字节跳动网络技术有限公司 | User group portrait construction method and device, storage medium and electronic equipment |
CN111724187A (en) * | 2019-03-21 | 2020-09-29 | 上海晶赞融宣科技有限公司 | DMP audience data real-time processing method and device and computer readable storage medium |
WO2020257993A1 (en) * | 2019-06-24 | 2020-12-30 | 深圳市欢太科技有限公司 | Content pushing method and apparatus, server, and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7137009B1 (en) * | 2000-01-06 | 2006-11-14 | International Business Machines Corporation | Method and apparatus for securing a cookie cache in a data processing system |
CN1878096A (en) * | 2006-07-04 | 2006-12-13 | 陈玲玲 | Method for detecting number of computer users in inner compute network |
CN101222348A (en) * | 2007-01-10 | 2008-07-16 | 阿里巴巴公司 | Method and system for calculating number of website real user |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030005046A1 (en) * | 2001-06-06 | 2003-01-02 | Lagniappe Marketing | System and method for managing marketing applications for a website |
-
2015
- 2015-11-26 CN CN201510843568.4A patent/CN105447147B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7137009B1 (en) * | 2000-01-06 | 2006-11-14 | International Business Machines Corporation | Method and apparatus for securing a cookie cache in a data processing system |
CN1878096A (en) * | 2006-07-04 | 2006-12-13 | 陈玲玲 | Method for detecting number of computer users in inner compute network |
CN101222348A (en) * | 2007-01-10 | 2008-07-16 | 阿里巴巴公司 | Method and system for calculating number of website real user |
Non-Patent Citations (3)
Title |
---|
《基于个性化的档案检索方式研究》;吴亚平;《兰台世界》;20130712;第59页 |
《基于个性化的档案检索方式研究》;吴亚平;《兰台世界》;20130712;论文第3.2节 |
《基于路径与页面挖掘的用户浏览行为研究》;雷良鹏;《中国优秀硕士学位论文全文数据库 信息科技辑》;20150415;第59页 |
Also Published As
Publication number | Publication date |
---|---|
CN105447147A (en) | 2016-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105447147B (en) | A kind of data processing method and device | |
US10530671B2 (en) | Methods, systems, and computer readable media for generating and using a web page classification model | |
US10664872B2 (en) | Systems and methods for generating network intelligence through real-time analytics | |
CN105608179B (en) | The method and apparatus for determining the relevance of user identifier | |
US9015128B2 (en) | Method and system for measuring social influence and receptivity of users | |
CN107515915B (en) | User identification association method based on user behavior data | |
CN110278466B (en) | Short video advertisement putting method, device and equipment | |
WO2007071143A1 (en) | Method and apparatus for issuing network information | |
CN107077498B (en) | Representing entity relationships in online advertisements | |
US20140095308A1 (en) | Advertisement distribution apparatus and advertisement distribution method | |
CN106682686A (en) | User gender prediction method based on mobile phone Internet-surfing behavior | |
CN107896153B (en) | Traffic package recommendation method and device based on mobile user internet surfing behavior | |
CN106603734A (en) | CDN service IP detection method and system | |
US20150161278A1 (en) | Method and apparatus for identifying webpage type | |
CN107918618B (en) | Data processing method and device | |
JP2013125468A (en) | Advertisement distribution device | |
EP1738524A1 (en) | Method and system for generating a population representative of a set of users of a communication network | |
CN105491444A (en) | Data identification processing method and device | |
WO2015084584A2 (en) | Method and system for collecting resource access information | |
CN107766234A (en) | A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device | |
CN106126519A (en) | The methods of exhibiting of media information and server | |
CN107633257B (en) | Data quality evaluation method and device, computer readable storage medium and terminal | |
US20100082359A1 (en) | Multi-Granular Age Range Products For Use in Online Marketing | |
Kotzias et al. | Addressing the Sparsity of Location Information on Twitter. | |
CN105447148B (en) | A kind of Cookie mark correlating method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |