CN103905266A - Distributed internet behavior analysis method, device and system - Google Patents
Distributed internet behavior analysis method, device and system Download PDFInfo
- Publication number
- CN103905266A CN103905266A CN201210581807.XA CN201210581807A CN103905266A CN 103905266 A CN103905266 A CN 103905266A CN 201210581807 A CN201210581807 A CN 201210581807A CN 103905266 A CN103905266 A CN 103905266A
- Authority
- CN
- China
- Prior art keywords
- url
- analysis device
- webpage
- web page
- log
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention discloses a distributed internet behavior analysis method, device and system. High-individuation log analysis devices are in distributed deployment in each province, and common and universal webpage analysis devices are in concentration deployment, so that each log analysis device can select log analysis mode flexibly, mass log data is prevented from transmitting in a network, the efficiency of network analysis is improved, and the time of the network analysis is reduced; and meanwhile, the repeated construction of the webpage analysis devices is prevented, and the cost of network construction is reduced.
Description
Technical field
The present invention relates to data service technical field, relate in particular to a kind of distributed network the Internet behavior analysis method, Apparatus and system.
Background technology
By analysis user internet behavior, life track and life requirement that can digging user, degree of depth understanding client, fully understand demand, and then the precision marketing of realization based on mobile Internet, by the analysis to user's internet behavior, hold market artery, also for new product, the new model explored based on client's life have been established Research foundation simultaneously.
The customer behavior analysis of prior art based on web page contents and service system comprise the Radius processor that the message scheduling that is positioned at data center of operator is analyzed display systems, the information pushers being connected with core router, is connected with operator charge system, and message scheduling analysis display systems comprises policy manager, information indicator, user behavior analysis device.By the daily record of user's online is carried out to analysis user individualized feature.
Mainly there is following defect in prior art:
One, image data amount is large, and daily record data is with current customer volume state, the log access amount of each province 600G-1T every day, and only wap daily record, if add gn daily record, quantity 3-4 is doubly.
Two, means of numerical analysis lacks, numerous for the analytical method of the Internet.
Three, by daily record data concentrate post analysis, transfer of data becomes bottleneck, every day 1T transfer of data, network bottleneck is serious.If a point province sets up, the Internet analysis belongs to repeated construction content, sets up separately without each province.
Summary of the invention
Excessive in order to solve in prior art daily record data amount, the technical problem of transfer of data difficulty, the present invention proposes a kind of distributed interconnection behavior analysis method, Apparatus and system.
One aspect of the present invention, a kind of distributed interconnection behavior analysis method is provided, comprise: multiple log analysis devices of distributed deployment obtain on-site user's internet log, extract webpage URL from user's internet log, webpage URL are reported to the web page analysis device of centralized deployment; Web page analysis device obtains corresponding webpage and webpage is analyzed according to webpage URL, obtains the URL related information of webpage, and the URL related information of webpage is sent to each log analysis device; Log analysis device is analyzed user's the Internet behavior according to the URL related information of webpage.
Another aspect of the present invention, provides a kind of log analysis device, comprising: acquisition module, for obtaining on-site user's internet log; Extraction module, for extracting webpage URL from user's internet log; URL information bank, for storing URL related information; Reporting module, for being reported to webpage URL in the web page analysis device of centralized deployment; Behavioural analysis module, for analyzing user's the Internet behavior according to the URL related information of the webpage obtaining from web page analysis device.
Another aspect of the present invention, provides a kind of web page analysis device, comprising: webpage acquisition module, for obtaining corresponding webpage according to webpage URL; Web page analysis module, for webpage is analyzed, obtains the URL related information of webpage; URL information bank, for storing the URL related information of webpage; Synchronization module, for being synchronized to the URL related information of webpage the information bank of each log analysis device.
Another aspect of the present invention, provides a kind of distributed interconnection behavioural analysis system, comprising: multiple log analysis devices of distributed deployment and the web page analysis device of centralized deployment; Log analysis device for obtaining on-site user's internet log, extracts webpage URL from user's internet log, webpage URL is reported to the web page analysis device of centralized deployment; According to the URL related information of the webpage obtaining from web page analysis device, user's the Internet behavior is analyzed; Web page analysis device, for obtaining corresponding webpage according to webpage URL and webpage being analyzed, obtains the URL related information of webpage, the URL related information of webpage is stored in the URL information bank of web page analysis device; The URL related information of webpage is sent to each log analysis device.
Distributed interconnection behavior analysis method of the present invention, Apparatus and system, by by the device distributed stronger personalization log analysis each province that is deployed in, concentrate common general web page analysis device to build.Like this, each log analysis device can be selected log analysis mode flexibly, has avoided the daily record data in transmission over networks magnanimity, has improved the efficiency of network analysis, has reduced the time of network analysis.Avoid the repeated construction of web page analysis device simultaneously, reduced network construction cost.
Brief description of the drawings
Fig. 1 is the structure chart of distributed interconnection behavioural analysis system embodiment of the present invention;
Fig. 2 is the structure chart of log analysis device embodiment of the present invention;
Fig. 3 is the structure chart of web page analysis device embodiment of the present invention;
Fig. 4 is the flow chart of distributed interconnection behavior analysis method embodiment of the present invention;
Fig. 5 is the flow chart of log analysis device real-time report webpage URL of the present invention;
Fig. 6 is the non real-time flow chart that reports webpage URL of log analysis device of the present invention.
Embodiment
In the present invention, the Internet behavioural analysis system is divided into log analysis device and web page analysis device.Log analysis device is realized the several functions such as the loading, statistical analysis, application of user's internet log.The several functions such as web page analysis device is realized the crawling of webpage, resolved, classification.By the device distributed stronger personalization log analysis each province that is deployed in, common general web page analysis device is concentrated and built.Below in conjunction with accompanying drawing, the present invention is described in detail.
As shown in Figure 1, distributed interconnection behavioural analysis system embodiment of the present invention comprises: distributed deployment is at multiple log analysis device 11a, 11b, the 11c of each province ... web page analysis device 12 with centralized deployment.
Log analysis device obtains on-site user's internet log, extracts webpage URL from user's internet log, webpage URL is reported to the web page analysis device of centralized deployment.
Particularly, if stored some webpage URL in the URL information bank of log analysis device, judge in its URL information bank whether had this webpage URL, in the time there is not this webpage URL in its information bank, webpage URL is reported to the web page analysis device of centralized deployment.
Web page analysis device obtains corresponding webpage and webpage is analyzed according to webpage URL, obtains the URL related information of webpage, the URL related information of webpage is synchronized in the URL information bank of each log analysis device.
Log analysis device, from web page analysis device gets the URL related information of webpage, is analyzed user's the Internet behavior.
URL related information comprises: the URL of webpage, title, text, keyword, label, classification, META information etc.
Wherein, the interface of log analysis device and web page analysis device comprises:
IF1 interface: log analysis subsystem is regularly as the criterion with URL information bank and the rule/configuration management system/library of web page analysis subsystem, carries out synchronous.Also submit various statistical summaries results to, the whole network analysis is used altogether simultaneously.
IF2 interface: log analysis subsystem regularly provides URL full dose to web page analysis subsystem, and non-classified url list.IF1 interface and the support of IF2 interface are by Ftp agreement, Http agreement or the transmission of the proprietary protocol based on socket data.
As shown in Figure 2, in the present invention, the concrete structure of log analysis device comprises: acquisition module 21, extraction module 22, URL information bank 23, judge module 24, reporting module 25 and behavioural analysis module 26.Acquisition module obtains on-site user's internet log; Extraction module extracts webpage URL from user's internet log; URL information bank storage URL related information; Judge module judges in URL information bank, whether there has been this webpage URL; Reporting module, in the time there is not this webpage URL in URL information bank, is reported to webpage URL in the web page analysis device of centralized deployment; Behavioural analysis module is analyzed user's the Internet behavior according to user's internet log.
This log analysis device also comprises timing module 27, calls time for judging whether to arrive to specify; Reporting module, in the time calling time in arrival appointment, sends to web page analysis device by all non-existent webpage URL with file mode.
This log analysis device also comprises download module 28.Timing module judges whether to arrive and specifies download time; When download module is specified download time when arrival, contain the file of the URL related information of webpage from the URL information bank download package of web page analysis device.
As shown in Figure 3, the concrete structure of web page analysis device of the present invention comprises: webpage acquisition module 31, web page analysis module 32, URL information bank 33 and synchronization module 34.Webpage acquisition module obtains corresponding webpage according to webpage URL; Web page analysis module is analyzed webpage, obtains the URL related information of webpage; The URL related information of URL information bank storage webpage; Synchronization module is synchronized to the URL related information of webpage in the information bank of each log analysis device.
Wherein in log analysis device and web page analysis device, there is URL information bank.Be as the criterion with the URL related information in web page analysis device URL information bank, carry out synchronously to log analysis device.
As shown in Figure 4, distributed interconnection behavior analysis method embodiment of the present invention comprises following flow process:
Step 402, distributed deployment is obtained on-site user's internet log at the log analysis device of each province;
Step 404, log analysis device extracts webpage URL from user's internet log; The cleaning of log analysis device to daily record data, comprises and removes disconnected row, fills null field etc., reads url field wherein, for further analysis, can adopt the mapreduce technology of hadoop to realize;
Step 406, log analysis device judges in its information bank, whether there has been this webpage URL; If existed, execution step 408, if there is no, execution step 410;
Step 408, log analysis device is directly obtained this webpage URL classification, and user's the Internet behavior is analyzed;
Step 410, log analysis device is reported to webpage URL in the web page analysis device of centralized deployment;
Step 412, web page analysis device obtains corresponding webpage and webpage is analyzed according to webpage URL, obtains the URL related information of webpage, the URL related information of this webpage is stored in the URL information bank of web page analysis device;
Web page analysis device adopts the mode of reptile to crawl the webpage that this URL is corresponding, and then resolves and obtain text, and webpage is put into web web page library; Can adopt the methods such as Bayes svm to classify to web page text, and the URL related information of webpage be put into the URL information bank of web page analysis device;
Step 414, the URL information bank of web page analysis device is synchronized to the URL related information of webpage in the URL information bank of each log analysis device, performs step afterwards 408.
Log analysis device can have in real time or two kinds of modes of off-line with the synchronizeing of URL information bank of web page analysis device:
1, real-time mode
As shown in Figure 5, idiographic flow is as follows:
2, non real-time mode
Interval certain hour, for example 1 hour, or 1 day, transmit non-existent URL with file mode to web page analysis device.As shown in Figure 6, idiographic flow is as follows:
Step 610, more fresh content supplement is to local URL information bank.
Distributed interconnection behavior analysis method of the present invention, Apparatus and system, by by the device distributed stronger personalization log analysis each province that is deployed in, concentrate common general web page analysis device to build.Like this, each log analysis device can be selected log analysis mode flexibly, has avoided the daily record data in transmission over networks magnanimity, has improved the efficiency of network analysis, has reduced the time of network analysis.Avoid the repeated construction of web page analysis device simultaneously, reduced network construction cost.
It should be noted that: above embodiment is only unrestricted in order to the present invention to be described, the present invention is also not limited in above-mentioned giving an example, and all do not depart from technical scheme and the improvement thereof of the spirit and scope of the present invention, and it all should be encompassed in claim scope of the present invention.
Claims (12)
1. a distributed interconnection behavior analysis method, is characterized in that, comprising:
Multiple log analysis devices of distributed deployment obtain on-site user's internet log, from described user's internet log, extract webpage URL, described webpage URL are reported to the web page analysis device of centralized deployment;
Described web page analysis device obtains corresponding webpage according to described webpage URL and described webpage is analyzed, and obtains the URL related information of described webpage, and the URL related information of described webpage is sent to each log analysis device;
Described log analysis device is analyzed user's the Internet behavior according to the URL related information of described webpage.
2. method according to claim 1, is characterized in that, log analysis device is reported to the web page analysis device of centralized deployment to comprise described webpage URL:
Log analysis device judges in its URL information bank whether had this webpage URL, in the time there is not this webpage URL in its URL information bank, described webpage URL is reported to described web page analysis device.
3. method according to claim 1, is characterized in that, described web page analysis device obtains corresponding webpage according to described webpage URL and described webpage analysis is comprised:
Described web page analysis device crawls webpage corresponding to described webpage URL by reptile mode;
Described webpage is analyzed, and the URL related information that obtains described webpage comprises: URL, title, text, keyword, label, classification and META information.
4. method according to claim 1, is characterized in that, described log analysis device is reported to the web page analysis device of centralized deployment to comprise described webpage URL:
Described log analysis device sends to described web page analysis device with API inquiry mode by described webpage URL.
5. method according to claim 1, is characterized in that, also comprises: described web page analysis device stores the URL related information of described webpage in the URL information bank of web page analysis device;
Described log analysis device is reported to the web page analysis device of centralized deployment to comprise described webpage URL:
Described log analysis device judges whether to arrive to specify and calls time, and while calling time, all non-existent webpage URL is sent to described web page analysis device with file mode on arrival is specified;
Described web page analysis device sends to each log analysis device by the URL related information of described webpage and comprises:
Described log analysis device judges whether to arrive and specifies download time, in the time that download time is specified in arrival, contains the file of the URL related information of described webpage from the URL information bank download package of described web page analysis device.
6. method according to claim 1, is characterized in that, between described log analysis device and web page analysis device, transmits data by Ftp agreement, Http agreement or the proprietary protocol based on socket.
7. a log analysis device, is characterized in that, comprising:
Acquisition module, for obtaining on-site user's internet log;
Extraction module, for extracting webpage URL from described user's internet log;
URL information bank, for storing URL related information;
Reporting module, for being reported to described webpage URL in the web page analysis device of centralized deployment;
Behavioural analysis module, for analyzing user's the Internet behavior according to the URL related information of the webpage obtaining from web page analysis device.
8. log analysis device according to claim 7, is characterized in that, also comprises:
Judge module, for judging whether URL information bank has existed this webpage URL;
Described reporting module, when there is not this webpage URL when URL information bank, is reported to described webpage URL in the web page analysis device of centralized deployment.
9. log analysis device according to claim 7, is characterized in that, also comprises:
Timing module, calls time for judging whether to arrive to specify;
Described reporting module, in the time calling time in arrival appointment, sends to described web page analysis device by all non-existent webpage URL with file mode.
10. log analysis device according to claim 7, is characterized in that, also comprises: download module,
Described timing module, specifies download time for judging whether to arrive;
Described download module, in the time that download time is specified in arrival, contains the file of the URL related information of described webpage from the URL information bank download package of described web page analysis device.
11. 1 kinds of web page analysis devices, is characterized in that, comprising:
Webpage acquisition module, for obtaining corresponding webpage according to described webpage URL;
Web page analysis module, for described webpage is analyzed, obtains the URL related information of described webpage;
URL information bank, for storing the URL related information of described webpage;
Synchronization module, for being synchronized to the URL related information of described webpage the information bank of each log analysis device.
12. 1 kinds of distributed interconnection behavioural analysis systems, is characterized in that, comprising: multiple log analysis devices of distributed deployment and the web page analysis device of centralized deployment;
Described log analysis device for obtaining on-site user's internet log, extracts webpage URL from described user's internet log, described webpage URL is reported to the web page analysis device of centralized deployment; According to the URL related information of the webpage obtaining from web page analysis device, user's the Internet behavior is analyzed;
Described web page analysis device, for obtaining corresponding webpage according to described webpage URL and described webpage being analyzed, obtains the URL related information of described webpage, the URL related information of described webpage is stored in the URL information bank of web page analysis device; The URL related information of described webpage is sent to each log analysis device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210581807.XA CN103905266A (en) | 2012-12-27 | 2012-12-27 | Distributed internet behavior analysis method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210581807.XA CN103905266A (en) | 2012-12-27 | 2012-12-27 | Distributed internet behavior analysis method, device and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103905266A true CN103905266A (en) | 2014-07-02 |
Family
ID=50996423
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210581807.XA Pending CN103905266A (en) | 2012-12-27 | 2012-12-27 | Distributed internet behavior analysis method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103905266A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105897466A (en) * | 2016-03-30 | 2016-08-24 | 中国联合网络通信集团有限公司 | Method and device for evaluating webpage resource distribution |
WO2017071179A1 (en) * | 2015-10-28 | 2017-05-04 | 华为技术有限公司 | Method and apparatus for recognizing user behaviour object based on flow analysis |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120221997A1 (en) * | 2008-01-18 | 2012-08-30 | International Business Machines Corporation | Navigation-independent access to elements of an integrated development environment (ide) using uniform resource locators (urls) |
CN102685224A (en) * | 2012-04-28 | 2012-09-19 | 华为技术有限公司 | User behavior analysis method, related equipment and system |
-
2012
- 2012-12-27 CN CN201210581807.XA patent/CN103905266A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120221997A1 (en) * | 2008-01-18 | 2012-08-30 | International Business Machines Corporation | Navigation-independent access to elements of an integrated development environment (ide) using uniform resource locators (urls) |
CN102685224A (en) * | 2012-04-28 | 2012-09-19 | 华为技术有限公司 | User behavior analysis method, related equipment and system |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017071179A1 (en) * | 2015-10-28 | 2017-05-04 | 华为技术有限公司 | Method and apparatus for recognizing user behaviour object based on flow analysis |
US10769254B2 (en) | 2015-10-28 | 2020-09-08 | Huawei Technologies Co., Ltd. | Method and apparatus for identifying user behavior object based on traffic analysis |
CN105897466A (en) * | 2016-03-30 | 2016-08-24 | 中国联合网络通信集团有限公司 | Method and device for evaluating webpage resource distribution |
CN105897466B (en) * | 2016-03-30 | 2018-10-12 | 中国联合网络通信集团有限公司 | A kind of evaluation method and device of web page resources distribution |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103442024B (en) | A kind of system and method for intelligent mobile terminal and cloud virtual mobile terminal synchronization | |
CN107040863B (en) | Real-time service recommendation method and system | |
CN103761309A (en) | Operation data processing method and system | |
CN101039281B (en) | Method for sharing load of stream media server | |
CN102208991A (en) | Blog processing method, device and system | |
CN101902497B (en) | Cloud computing based internet information monitoring system and method | |
CN104394211A (en) | Design and implementation method for user behavior analysis system based on Hadoop | |
CN107315810A (en) | A kind of internet of things equipment behavior portrait method | |
CN107888666A (en) | A kind of cross-region data-storage system and method for data synchronization and device | |
CN108513094A (en) | Video frequency monitoring method and device | |
CN110071847A (en) | Message processing method, device, terminal device and storage medium | |
CN109982293A (en) | Flow product method for pushing, system, electronic equipment and storage medium | |
CN102347930A (en) | Method and system for obtaining webpage content | |
Ahammad et al. | Software-defined dew, roof, fog and cloud (SD-DRFC) framework for IoT ecosystem: the journey, novel framework architecture, simulation, and use cases | |
CN102572806A (en) | Mobile terminal adapting system and method based on Msky platform | |
CN104881788A (en) | Data processing method of electricity customer, system and customer service management platform | |
CN103905266A (en) | Distributed internet behavior analysis method, device and system | |
CN103944779B (en) | A kind of WAP service features monitoring method and system | |
CN102685155B (en) | The method that content transmits, content delivering server and content transmit proxy server | |
CN104202389A (en) | Monitoring method for storage space and running state in cloud environment and cloud storage system | |
CN109218142A (en) | One kind being based on OneM2M agreement platform of internet of things terminal access method and device | |
CN110032612A (en) | Information-pushing method and device | |
CN202548557U (en) | Clock network system | |
CN107612744A (en) | A kind of intelligent transportation equipment operational system based on mobile platform | |
CN109062758A (en) | A kind of server system delay machine processing method, system, medium and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140702 |