CN107818145A - A kind of user behavior tag along sort extracting method based on dynamic reptile - Google Patents
A kind of user behavior tag along sort extracting method based on dynamic reptile Download PDFInfo
- Publication number
- CN107818145A CN107818145A CN201710969018.6A CN201710969018A CN107818145A CN 107818145 A CN107818145 A CN 107818145A CN 201710969018 A CN201710969018 A CN 201710969018A CN 107818145 A CN107818145 A CN 107818145A
- Authority
- CN
- China
- Prior art keywords
- along sort
- reptile
- tag along
- user behavior
- resource identifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9562—Bookmark management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of user behavior tag along sort extracting method based on dynamic reptile, including step:User's surfing flow is perceived using deep packet inspection technical, identifies that the resource identifier of user access resources obtains resource identifier database;Insert in new website address request url, the resource identifier of user access resources is crawled by reptile, obtain the tag along sort on the page, and match relative users;The resource identifier and user behavior tag along sort of the user access resources are extracted according to the frequency of occurrence of setting, and is stored in into static reptile storehouse;And outdated information deletion action is carried out to static reptile storehouse;When exist matching less than user behavior tag along sort when, again from the resource identifier of user access resources extraction obtain user behavior tag along sort.The present invention completes matching extraction label by way of dynamic reptile, makees data basis for user behavior analysis, can improve the accuracy rate of mark user behavior tag along sort.
Description
Technical field
The present invention relates to a kind of user behavior tag along sort extracting method based on dynamic reptile, belong to Internet Users
According to the technical field of excavation.
Background technology
In recent years, operator carried out behavioural analysis in the case where obtaining user website visit capacity master data to user,
And be combined with net marketing strategy etc., the problem of so as to find to there may be in current network marketing activity, and it is further
Correct or reformulate net marketing strategy and foundation is provided.It is how smart and tag along sort is the basic data of user behavior analysis
Accurate extraction user tag turn into now with it is to be solved the problem of.
Nowadays the realization of the user behavior tag along sort of main flow, it is directed to a kind of user's row based on static reptile database
For the method for classification, mainly completed by three steps:First, reptile is carried out to all resources on the resource websites such as electric business, video,
Establish the huge static database of a resource name, resource website, resource classification;Second, the number of resources that identification user accesses
According to the static database established based on the first step is matched, and is matched the tag along sort of resource, is realized the label of user behavior
Match somebody with somebody;3rd, regularly static database is updated at regular intervals, the resource data on resource website is carried out again
Crawl.
The realization of the user behavior tag along sort of above-mentioned main flow, all resources on website need to be crawled.Now electricity
The main flow resource website quantity such as business, video is a lot, and each resource website has more than one hundred million individual resources again, and therefore, each resource website will
Needing to carry out the reptiles of more than one hundred million times, and finally produce the record of more than one hundred million, this consumption to hard disk resources is very big, and reptile
Time spends length, it is easy to receives the anti-reptile of resource website;Useless resource is many in static database, many records not by with
The resource that family accesses is matched, and it is low to match utilization rate, finally also results in that match time is long, efficiency is low;Once resource website adds
New resource or the modification that resource identifier is carried out to old resource, then resources bank need regularly to be updated, once more
Newly not in time, the accuracy rate of user tag extraction will be reduced.
The content of the invention
The technical problems to be solved by the invention are overcome the deficiencies in the prior art, there is provided a kind of based on dynamic reptile
User behavior tag along sort extracting method, solves that static reptile database largely takes the overhead issues of storage resource, high frequency is climbed
The problem of updating some site resources is taken, by way of dynamic reptile, improves the accuracy rate of mark user behavior tag along sort.
It is of the invention specifically to solve above-mentioned technical problem using following technical scheme:
A kind of user behavior tag along sort extracting method based on dynamic reptile, it is characterised in that comprise the following steps:
Step 1: perceiving user's surfing flow using deep packet inspection technical, identify that the resource identifier of user access resources obtains
To resource identifier database;
Step 2: the resource identifier database that step 1 obtains is inserted in new website address request url, by reptile to new
Url requests network address is crawled, and obtains the tag along sort on resource Webpage, and classify tag along sort as user behavior
Tag match is to relative users;
Step 3: resource identifier and step 2 according to user access resources obtained by the frequency of occurrence extraction step one of setting
Acquired user behavior tag along sort, and be stored in into static reptile storehouse;And outdated information deletion action is carried out to static reptile storehouse,
Resource classification storehouse after being updated is for extraction matching next time;When exist matching less than user behavior tag along sort when,
Using step 2, extraction obtains user behavior tag along sort from the resource identifier of user access resources again.
Further, as a preferred technical solution of the present invention:Also include in the step 1 to resource identifier
Database carries out deduplication operation.
Further, as a preferred technical solution of the present invention:Also include in the step 1 to resource identifier
Resource identifier in database is ranked up according to frequency of occurrence temperature.
The present invention uses above-mentioned technical proposal, can produce following technique effect:
The present invention is the method for researching and solving high efficiency extraction user behavior label, by way of dynamic reptile, finally and user
Matching extraction label is completed, makees data basis for user behavior analysis.Can solve static reptile database and largely take storage
The overhead issues of resource, high frequency crawl the problem of updating some site resources, by way of dynamic reptile, improve mark user
The accuracy rate of behavior tag along sort.
Therefore the present invention has had the advantage that:
(1)The inventive method only needs to establish the resource classification storehouse of high frequency time temperature, solves static reptile database and largely takes
The overhead issues of storage resource, and tag match performance can be improved;
(2)The inventive method makes the ability that system has preferably resistance signal conflict, and throughput of system and case propagation delays all obtain
To improvement.
Brief description of the drawings
Fig. 1 is the principle schematic of the user behavior tag along sort extracting method of the invention based on dynamic reptile.
Fig. 2 accesses record schematic diagram for user in the present invention.
Fig. 3 is that the temperature of resource identifier in the present invention ranks schematic diagram.
Fig. 4 is popular high frequency static state reptile database schematic diagram in the present invention.
Fig. 5 is matching result schematic diagram in the present invention.
Embodiment
Embodiments of the present invention are described with reference to Figure of description.
As shown in figure 1, the present invention devises a kind of user behavior tag along sort extracting method based on dynamic reptile, the party
Method specifically includes following steps:
Step 1: perceiving user's surfing flow using deep packet inspection technical, identification user accesses the money of the resources such as electric business, video
Source identifier obtains resource identifier database.
Preferably, in addition to resource identifier database carry out deduplication operation, amount of compressed data, and according to occur frequency
Secondary temperature is ranked up.
Step 2: the resource identifier database that step 1 obtains is inserted in new website address request url, pass through reptile pair
New url requests network address is crawled, and such as accesses business to user, the resource identifier of video resource crawls, acquisition page
Tag along sort on the resource Webpage of face, and relative users are matched using tag along sort as user behavior tag along sort, it is
User behavior analysis does data basis.
Step 3: the resource identification according to the user access resources obtained by the frequency of occurrence extraction step one of setting
User behavior tag along sort acquired in symbol and step 2, and be stored in into static reptile storehouse;Periodically or static reptile storehouse is entered daily
Row outdated information deletion action, portion after being updated in the recent period the resource classification storehouse such as the electric business of high frequency time temperature, video with
In extraction matching next time;
Finally, when exist matching less than user behavior tag along sort when, using step 2 again dynamically from user access provide
Extraction obtains user behavior tag along sort in the resource identifier in source.
In order to verify the method method that is that by high efficiency extraction user behavior label of the present invention, spy enumerates a checking
Example illustrates.
Step 1: first, using the method for the present invention by deep packet analytic technique, identify user access resources website
Url, acquire user as shown in Figure 2 and access record.
Then, these resource identifiers are gone by the resource identifier of the resource such as the electric business to identification user's access, video
Operate, amount of compressed data, and be ranked up according to frequency of occurrence temperature again, obtain seniority among brothers and sisters schematic diagram as shown in Figure 3.
Step 2: the resource identifier that the resources such as business, video are accessed user by reptile crawls, obtain on the page
Tag along sort, and match relative users using tag along sort as user behavior tag along sort.
Step 3: extracting the of a relatively high resource identifier of frequency of occurrence temperature, tag along sort, arrange into static reptile
Storehouse, form static reptile database as shown in Figure 4.And outdated information deletion action is carried out to static reptile storehouse daily, obtain
A resource classification storehouse such as the electric business of high frequency time temperature, video in the recent period, is matched, most for the extraction of next user behavior tag along sort
Afterwards those matching less than user behavior tag along sort, using step 2 again go extract user behavior tag along sort.
The user behavior tag along sort extraction result finally obtained is as shown in figure 5, corresponded to the use of extraction by user identity
Family behavior tag along sort.
To sum up, the present invention finally completes to match extraction label, for user behavior point by way of dynamic reptile with user
Data basis is made in analysis.Can solve static reptile database largely take the overhead issues of storage resource, high frequency crawl renewal certain
The problem of a little site resources, by way of dynamic reptile, improve the accuracy rate of mark user behavior tag along sort.
Embodiments of the present invention are explained in detail above in conjunction with accompanying drawing, but the present invention is not limited to above-mentioned implementation
Mode, can also be on the premise of present inventive concept not be departed from those of ordinary skill in the art's possessed knowledge
Make a variety of changes.
Claims (4)
1. a kind of user behavior tag along sort extracting method based on dynamic reptile, it is characterised in that comprise the following steps:
Step 1: perceiving user's surfing flow, identify that the resource identifier of user access resources obtains resource identifier database;
Step 2: the resource identifier database that step 1 obtains is inserted in new website address request url, by reptile to new
Url requests network address is crawled, and obtains the tag along sort on resource Webpage, and classify tag along sort as user behavior
Tag match is to relative users;
Step 3: resource identifier and step 2 according to user access resources obtained by the frequency of occurrence extraction step one of setting
Acquired user behavior tag along sort, and it is stored in static reptile storehouse;Outdated information deletion action is carried out to static reptile storehouse simultaneously,
Resource classification storehouse after being updated is for extraction matching next time;When exist matching less than user behavior tag along sort when,
Using step 2, extraction obtains user behavior tag along sort from the resource identifier of user access resources again.
2. the user behavior tag along sort extracting method based on dynamic reptile according to claim 1, it is characterised in that:It is described
Also include carrying out deduplication operation to resource identifier database in step 1.
3. the user behavior tag along sort extracting method based on dynamic reptile according to claim 1, it is characterised in that:It is described
Also include being ranked up the resource identifier in resource identifier database according to frequency of occurrence temperature in step 1.
4. the user behavior tag along sort extracting method based on dynamic reptile according to claim 1, it is characterised in that:It is described
In step 1 user's surfing flow is perceived using deep packet inspection technical.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710969018.6A CN107818145A (en) | 2017-10-18 | 2017-10-18 | A kind of user behavior tag along sort extracting method based on dynamic reptile |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710969018.6A CN107818145A (en) | 2017-10-18 | 2017-10-18 | A kind of user behavior tag along sort extracting method based on dynamic reptile |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107818145A true CN107818145A (en) | 2018-03-20 |
Family
ID=61608110
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710969018.6A Pending CN107818145A (en) | 2017-10-18 | 2017-10-18 | A kind of user behavior tag along sort extracting method based on dynamic reptile |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107818145A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875781A (en) * | 2018-05-07 | 2018-11-23 | 腾讯科技(深圳)有限公司 | A kind of labeling method, apparatus, electronic equipment and storage medium |
CN111400627A (en) * | 2020-03-09 | 2020-07-10 | 政采云有限公司 | Information acquisition method and device, electronic equipment and readable storage medium |
CN112000748A (en) * | 2020-07-14 | 2020-11-27 | 北京神州泰岳智能数据技术有限公司 | Data processing method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102143224A (en) * | 2011-01-25 | 2011-08-03 | 张金海 | Mobile phone Internet accessing-based user behavior analysis method and device |
CN106446115A (en) * | 2016-09-18 | 2017-02-22 | 成都九鼎瑞信科技股份有限公司 | Mobile Internet user classification method and device |
CN106484889A (en) * | 2016-10-18 | 2017-03-08 | 合信息技术(北京)有限公司 | The flooding method and apparatus of Internet resources |
CN106815297A (en) * | 2016-12-09 | 2017-06-09 | 宁波大学 | A kind of academic resources recommendation service system and method |
CN107124653A (en) * | 2017-05-16 | 2017-09-01 | 四川长虹电器股份有限公司 | The construction method of TV user portrait |
-
2017
- 2017-10-18 CN CN201710969018.6A patent/CN107818145A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102143224A (en) * | 2011-01-25 | 2011-08-03 | 张金海 | Mobile phone Internet accessing-based user behavior analysis method and device |
CN106446115A (en) * | 2016-09-18 | 2017-02-22 | 成都九鼎瑞信科技股份有限公司 | Mobile Internet user classification method and device |
CN106484889A (en) * | 2016-10-18 | 2017-03-08 | 合信息技术(北京)有限公司 | The flooding method and apparatus of Internet resources |
CN106815297A (en) * | 2016-12-09 | 2017-06-09 | 宁波大学 | A kind of academic resources recommendation service system and method |
CN107124653A (en) * | 2017-05-16 | 2017-09-01 | 四川长虹电器股份有限公司 | The construction method of TV user portrait |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875781A (en) * | 2018-05-07 | 2018-11-23 | 腾讯科技(深圳)有限公司 | A kind of labeling method, apparatus, electronic equipment and storage medium |
CN108875781B (en) * | 2018-05-07 | 2022-08-19 | 腾讯科技(深圳)有限公司 | Label classification method and device, electronic equipment and storage medium |
CN111400627A (en) * | 2020-03-09 | 2020-07-10 | 政采云有限公司 | Information acquisition method and device, electronic equipment and readable storage medium |
CN111400627B (en) * | 2020-03-09 | 2023-07-07 | 政采云有限公司 | Information acquisition method and device, electronic equipment and readable storage medium |
CN112000748A (en) * | 2020-07-14 | 2020-11-27 | 北京神州泰岳智能数据技术有限公司 | Data processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102929928B (en) | Multidimensional-similarity-based personalized news recommendation method | |
US9405746B2 (en) | User behavior models based on source domain | |
CN103546326A (en) | Website traffic statistic method | |
CN104869009B (en) | The system and method for website data statistics | |
CN103218431B (en) | A kind ofly can identify the system that info web gathers automatically | |
CN102164186B (en) | Method and system for realizing cloud search service | |
CN102831114B (en) | Realize method and the device of internet user access Statistic Analysis | |
CN104394118A (en) | User identity identification method and system | |
CN102831193A (en) | Topic detecting device and topic detecting method based on distributed multistage cluster | |
CN104899273A (en) | Personalized webpage recommendation method based on topic and relative entropy | |
WO2017084205A1 (en) | Network user identity authentication method and system | |
CN101409690A (en) | Method and system for obtaining internet user behaviors | |
CN103530429B (en) | Webpage content extracting method | |
CN104216889B (en) | Data dissemination analyzing and predicting method and system based on cloud service | |
CN107818145A (en) | A kind of user behavior tag along sort extracting method based on dynamic reptile | |
CN105183873A (en) | Malicious clicking behavior detection method and device | |
CN102710795A (en) | Hotspot collecting method and device | |
CN102968510B (en) | The searching method of internet personage information and system | |
CN102769818A (en) | Method and device for pushing information in mobile internet | |
CN109947935A (en) | The generation method and device of media event | |
CN104699851A (en) | Service tag extension method in big data environment | |
CN104765823A (en) | Method and device for collecting website data | |
CN103745383A (en) | Method and system of realizing redirection service based on operator data | |
CN107086925B (en) | Deep learning-based internet traffic big data analysis method | |
CN105653550B (en) | Webpage filtering method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180320 |
|
WD01 | Invention patent application deemed withdrawn after publication |