CN103905266A - Distributed internet behavior analysis method, device and system - Google Patents

Distributed internet behavior analysis method, device and system Download PDF

Info

Publication number
CN103905266A
CN103905266A CN201210581807.XA CN201210581807A CN103905266A CN 103905266 A CN103905266 A CN 103905266A CN 201210581807 A CN201210581807 A CN 201210581807A CN 103905266 A CN103905266 A CN 103905266A
Authority
CN
China
Prior art keywords
url
analysis device
webpage
web page
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210581807.XA
Other languages
Chinese (zh)
Inventor
徐萌
何鸿凌
钱岭
杜宇健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201210581807.XA priority Critical patent/CN103905266A/en
Publication of CN103905266A publication Critical patent/CN103905266A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a distributed internet behavior analysis method, device and system. High-individuation log analysis devices are in distributed deployment in each province, and common and universal webpage analysis devices are in concentration deployment, so that each log analysis device can select log analysis mode flexibly, mass log data is prevented from transmitting in a network, the efficiency of network analysis is improved, and the time of the network analysis is reduced; and meanwhile, the repeated construction of the webpage analysis devices is prevented, and the cost of network construction is reduced.

Description

A kind of distributed interconnection behavior analysis method, Apparatus and system
Technical field
The present invention relates to data service technical field, relate in particular to a kind of distributed network the Internet behavior analysis method, Apparatus and system.
Background technology
By analysis user internet behavior, life track and life requirement that can digging user, degree of depth understanding client, fully understand demand, and then the precision marketing of realization based on mobile Internet, by the analysis to user's internet behavior, hold market artery, also for new product, the new model explored based on client's life have been established Research foundation simultaneously.
The customer behavior analysis of prior art based on web page contents and service system comprise the Radius processor that the message scheduling that is positioned at data center of operator is analyzed display systems, the information pushers being connected with core router, is connected with operator charge system, and message scheduling analysis display systems comprises policy manager, information indicator, user behavior analysis device.By the daily record of user's online is carried out to analysis user individualized feature.
Mainly there is following defect in prior art:
One, image data amount is large, and daily record data is with current customer volume state, the log access amount of each province 600G-1T every day, and only wap daily record, if add gn daily record, quantity 3-4 is doubly.
Two, means of numerical analysis lacks, numerous for the analytical method of the Internet.
Three, by daily record data concentrate post analysis, transfer of data becomes bottleneck, every day 1T transfer of data, network bottleneck is serious.If a point province sets up, the Internet analysis belongs to repeated construction content, sets up separately without each province.
Summary of the invention
Excessive in order to solve in prior art daily record data amount, the technical problem of transfer of data difficulty, the present invention proposes a kind of distributed interconnection behavior analysis method, Apparatus and system.
One aspect of the present invention, a kind of distributed interconnection behavior analysis method is provided, comprise: multiple log analysis devices of distributed deployment obtain on-site user's internet log, extract webpage URL from user's internet log, webpage URL are reported to the web page analysis device of centralized deployment; Web page analysis device obtains corresponding webpage and webpage is analyzed according to webpage URL, obtains the URL related information of webpage, and the URL related information of webpage is sent to each log analysis device; Log analysis device is analyzed user's the Internet behavior according to the URL related information of webpage.
Another aspect of the present invention, provides a kind of log analysis device, comprising: acquisition module, for obtaining on-site user's internet log; Extraction module, for extracting webpage URL from user's internet log; URL information bank, for storing URL related information; Reporting module, for being reported to webpage URL in the web page analysis device of centralized deployment; Behavioural analysis module, for analyzing user's the Internet behavior according to the URL related information of the webpage obtaining from web page analysis device.
Another aspect of the present invention, provides a kind of web page analysis device, comprising: webpage acquisition module, for obtaining corresponding webpage according to webpage URL; Web page analysis module, for webpage is analyzed, obtains the URL related information of webpage; URL information bank, for storing the URL related information of webpage; Synchronization module, for being synchronized to the URL related information of webpage the information bank of each log analysis device.
Another aspect of the present invention, provides a kind of distributed interconnection behavioural analysis system, comprising: multiple log analysis devices of distributed deployment and the web page analysis device of centralized deployment; Log analysis device for obtaining on-site user's internet log, extracts webpage URL from user's internet log, webpage URL is reported to the web page analysis device of centralized deployment; According to the URL related information of the webpage obtaining from web page analysis device, user's the Internet behavior is analyzed; Web page analysis device, for obtaining corresponding webpage according to webpage URL and webpage being analyzed, obtains the URL related information of webpage, the URL related information of webpage is stored in the URL information bank of web page analysis device; The URL related information of webpage is sent to each log analysis device.
Distributed interconnection behavior analysis method of the present invention, Apparatus and system, by by the device distributed stronger personalization log analysis each province that is deployed in, concentrate common general web page analysis device to build.Like this, each log analysis device can be selected log analysis mode flexibly, has avoided the daily record data in transmission over networks magnanimity, has improved the efficiency of network analysis, has reduced the time of network analysis.Avoid the repeated construction of web page analysis device simultaneously, reduced network construction cost.
Brief description of the drawings
Fig. 1 is the structure chart of distributed interconnection behavioural analysis system embodiment of the present invention;
Fig. 2 is the structure chart of log analysis device embodiment of the present invention;
Fig. 3 is the structure chart of web page analysis device embodiment of the present invention;
Fig. 4 is the flow chart of distributed interconnection behavior analysis method embodiment of the present invention;
Fig. 5 is the flow chart of log analysis device real-time report webpage URL of the present invention;
Fig. 6 is the non real-time flow chart that reports webpage URL of log analysis device of the present invention.
Embodiment
In the present invention, the Internet behavioural analysis system is divided into log analysis device and web page analysis device.Log analysis device is realized the several functions such as the loading, statistical analysis, application of user's internet log.The several functions such as web page analysis device is realized the crawling of webpage, resolved, classification.By the device distributed stronger personalization log analysis each province that is deployed in, common general web page analysis device is concentrated and built.Below in conjunction with accompanying drawing, the present invention is described in detail.
As shown in Figure 1, distributed interconnection behavioural analysis system embodiment of the present invention comprises: distributed deployment is at multiple log analysis device 11a, 11b, the 11c of each province ... web page analysis device 12 with centralized deployment.
Log analysis device obtains on-site user's internet log, extracts webpage URL from user's internet log, webpage URL is reported to the web page analysis device of centralized deployment.
Particularly, if stored some webpage URL in the URL information bank of log analysis device, judge in its URL information bank whether had this webpage URL, in the time there is not this webpage URL in its information bank, webpage URL is reported to the web page analysis device of centralized deployment.
Web page analysis device obtains corresponding webpage and webpage is analyzed according to webpage URL, obtains the URL related information of webpage, the URL related information of webpage is synchronized in the URL information bank of each log analysis device.
Log analysis device, from web page analysis device gets the URL related information of webpage, is analyzed user's the Internet behavior.
URL related information comprises: the URL of webpage, title, text, keyword, label, classification, META information etc.
Wherein, the interface of log analysis device and web page analysis device comprises:
IF1 interface: log analysis subsystem is regularly as the criterion with URL information bank and the rule/configuration management system/library of web page analysis subsystem, carries out synchronous.Also submit various statistical summaries results to, the whole network analysis is used altogether simultaneously.
IF2 interface: log analysis subsystem regularly provides URL full dose to web page analysis subsystem, and non-classified url list.IF1 interface and the support of IF2 interface are by Ftp agreement, Http agreement or the transmission of the proprietary protocol based on socket data.
As shown in Figure 2, in the present invention, the concrete structure of log analysis device comprises: acquisition module 21, extraction module 22, URL information bank 23, judge module 24, reporting module 25 and behavioural analysis module 26.Acquisition module obtains on-site user's internet log; Extraction module extracts webpage URL from user's internet log; URL information bank storage URL related information; Judge module judges in URL information bank, whether there has been this webpage URL; Reporting module, in the time there is not this webpage URL in URL information bank, is reported to webpage URL in the web page analysis device of centralized deployment; Behavioural analysis module is analyzed user's the Internet behavior according to user's internet log.
This log analysis device also comprises timing module 27, calls time for judging whether to arrive to specify; Reporting module, in the time calling time in arrival appointment, sends to web page analysis device by all non-existent webpage URL with file mode.
This log analysis device also comprises download module 28.Timing module judges whether to arrive and specifies download time; When download module is specified download time when arrival, contain the file of the URL related information of webpage from the URL information bank download package of web page analysis device.
As shown in Figure 3, the concrete structure of web page analysis device of the present invention comprises: webpage acquisition module 31, web page analysis module 32, URL information bank 33 and synchronization module 34.Webpage acquisition module obtains corresponding webpage according to webpage URL; Web page analysis module is analyzed webpage, obtains the URL related information of webpage; The URL related information of URL information bank storage webpage; Synchronization module is synchronized to the URL related information of webpage in the information bank of each log analysis device.
Wherein in log analysis device and web page analysis device, there is URL information bank.Be as the criterion with the URL related information in web page analysis device URL information bank, carry out synchronously to log analysis device.
As shown in Figure 4, distributed interconnection behavior analysis method embodiment of the present invention comprises following flow process:
Step 402, distributed deployment is obtained on-site user's internet log at the log analysis device of each province;
Step 404, log analysis device extracts webpage URL from user's internet log; The cleaning of log analysis device to daily record data, comprises and removes disconnected row, fills null field etc., reads url field wherein, for further analysis, can adopt the mapreduce technology of hadoop to realize;
Step 406, log analysis device judges in its information bank, whether there has been this webpage URL; If existed, execution step 408, if there is no, execution step 410;
Step 408, log analysis device is directly obtained this webpage URL classification, and user's the Internet behavior is analyzed;
Step 410, log analysis device is reported to webpage URL in the web page analysis device of centralized deployment;
Step 412, web page analysis device obtains corresponding webpage and webpage is analyzed according to webpage URL, obtains the URL related information of webpage, the URL related information of this webpage is stored in the URL information bank of web page analysis device;
Web page analysis device adopts the mode of reptile to crawl the webpage that this URL is corresponding, and then resolves and obtain text, and webpage is put into web web page library; Can adopt the methods such as Bayes svm to classify to web page text, and the URL related information of webpage be put into the URL information bank of web page analysis device;
Step 414, the URL information bank of web page analysis device is synchronized to the URL related information of webpage in the URL information bank of each log analysis device, performs step afterwards 408.
Log analysis device can have in real time or two kinds of modes of off-line with the synchronizeing of URL information bank of web page analysis device:
1, real-time mode
As shown in Figure 5, idiographic flow is as follows:
Step 502, log analysis device obtains non-existent webpage URL in existing URL information bank;
Step 504, log analysis device adopts API Access mode to submit this URL to web page analysis device, i.e. the inquiry API using URL as parameter call web page analysis device;
Step 506, web page analysis device analysis obtains returning after URL related information, and upgrades self information storehouse;
Step 508, log analysis device inserts URL information bank after receiving URL related information.
2, non real-time mode
Interval certain hour, for example 1 hour, or 1 day, transmit non-existent URL with file mode to web page analysis device.As shown in Figure 6, idiographic flow is as follows:
Step 602, log analysis device judges whether to arrive to specify and calls time;
Step 604, while calling time, sends to web page analysis device by all non-existent webpage URL with file mode on arrival is specified, and can send by ftp or other modes;
Step 606, log analysis device judges whether to arrive and specifies download time;
Step 608, in the time that download time is specified in arrival, contains the file of the URL related information of webpage from the URL information bank download package of web page analysis device;
Step 610, more fresh content supplement is to local URL information bank.
Distributed interconnection behavior analysis method of the present invention, Apparatus and system, by by the device distributed stronger personalization log analysis each province that is deployed in, concentrate common general web page analysis device to build.Like this, each log analysis device can be selected log analysis mode flexibly, has avoided the daily record data in transmission over networks magnanimity, has improved the efficiency of network analysis, has reduced the time of network analysis.Avoid the repeated construction of web page analysis device simultaneously, reduced network construction cost.
It should be noted that: above embodiment is only unrestricted in order to the present invention to be described, the present invention is also not limited in above-mentioned giving an example, and all do not depart from technical scheme and the improvement thereof of the spirit and scope of the present invention, and it all should be encompassed in claim scope of the present invention.

Claims (12)

1. a distributed interconnection behavior analysis method, is characterized in that, comprising:
Multiple log analysis devices of distributed deployment obtain on-site user's internet log, from described user's internet log, extract webpage URL, described webpage URL are reported to the web page analysis device of centralized deployment;
Described web page analysis device obtains corresponding webpage according to described webpage URL and described webpage is analyzed, and obtains the URL related information of described webpage, and the URL related information of described webpage is sent to each log analysis device;
Described log analysis device is analyzed user's the Internet behavior according to the URL related information of described webpage.
2. method according to claim 1, is characterized in that, log analysis device is reported to the web page analysis device of centralized deployment to comprise described webpage URL:
Log analysis device judges in its URL information bank whether had this webpage URL, in the time there is not this webpage URL in its URL information bank, described webpage URL is reported to described web page analysis device.
3. method according to claim 1, is characterized in that, described web page analysis device obtains corresponding webpage according to described webpage URL and described webpage analysis is comprised:
Described web page analysis device crawls webpage corresponding to described webpage URL by reptile mode;
Described webpage is analyzed, and the URL related information that obtains described webpage comprises: URL, title, text, keyword, label, classification and META information.
4. method according to claim 1, is characterized in that, described log analysis device is reported to the web page analysis device of centralized deployment to comprise described webpage URL:
Described log analysis device sends to described web page analysis device with API inquiry mode by described webpage URL.
5. method according to claim 1, is characterized in that, also comprises: described web page analysis device stores the URL related information of described webpage in the URL information bank of web page analysis device;
Described log analysis device is reported to the web page analysis device of centralized deployment to comprise described webpage URL:
Described log analysis device judges whether to arrive to specify and calls time, and while calling time, all non-existent webpage URL is sent to described web page analysis device with file mode on arrival is specified;
Described web page analysis device sends to each log analysis device by the URL related information of described webpage and comprises:
Described log analysis device judges whether to arrive and specifies download time, in the time that download time is specified in arrival, contains the file of the URL related information of described webpage from the URL information bank download package of described web page analysis device.
6. method according to claim 1, is characterized in that, between described log analysis device and web page analysis device, transmits data by Ftp agreement, Http agreement or the proprietary protocol based on socket.
7. a log analysis device, is characterized in that, comprising:
Acquisition module, for obtaining on-site user's internet log;
Extraction module, for extracting webpage URL from described user's internet log;
URL information bank, for storing URL related information;
Reporting module, for being reported to described webpage URL in the web page analysis device of centralized deployment;
Behavioural analysis module, for analyzing user's the Internet behavior according to the URL related information of the webpage obtaining from web page analysis device.
8. log analysis device according to claim 7, is characterized in that, also comprises:
Judge module, for judging whether URL information bank has existed this webpage URL;
Described reporting module, when there is not this webpage URL when URL information bank, is reported to described webpage URL in the web page analysis device of centralized deployment.
9. log analysis device according to claim 7, is characterized in that, also comprises:
Timing module, calls time for judging whether to arrive to specify;
Described reporting module, in the time calling time in arrival appointment, sends to described web page analysis device by all non-existent webpage URL with file mode.
10. log analysis device according to claim 7, is characterized in that, also comprises: download module,
Described timing module, specifies download time for judging whether to arrive;
Described download module, in the time that download time is specified in arrival, contains the file of the URL related information of described webpage from the URL information bank download package of described web page analysis device.
11. 1 kinds of web page analysis devices, is characterized in that, comprising:
Webpage acquisition module, for obtaining corresponding webpage according to described webpage URL;
Web page analysis module, for described webpage is analyzed, obtains the URL related information of described webpage;
URL information bank, for storing the URL related information of described webpage;
Synchronization module, for being synchronized to the URL related information of described webpage the information bank of each log analysis device.
12. 1 kinds of distributed interconnection behavioural analysis systems, is characterized in that, comprising: multiple log analysis devices of distributed deployment and the web page analysis device of centralized deployment;
Described log analysis device for obtaining on-site user's internet log, extracts webpage URL from described user's internet log, described webpage URL is reported to the web page analysis device of centralized deployment; According to the URL related information of the webpage obtaining from web page analysis device, user's the Internet behavior is analyzed;
Described web page analysis device, for obtaining corresponding webpage according to described webpage URL and described webpage being analyzed, obtains the URL related information of described webpage, the URL related information of described webpage is stored in the URL information bank of web page analysis device; The URL related information of described webpage is sent to each log analysis device.
CN201210581807.XA 2012-12-27 2012-12-27 Distributed internet behavior analysis method, device and system Pending CN103905266A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210581807.XA CN103905266A (en) 2012-12-27 2012-12-27 Distributed internet behavior analysis method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210581807.XA CN103905266A (en) 2012-12-27 2012-12-27 Distributed internet behavior analysis method, device and system

Publications (1)

Publication Number Publication Date
CN103905266A true CN103905266A (en) 2014-07-02

Family

ID=50996423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210581807.XA Pending CN103905266A (en) 2012-12-27 2012-12-27 Distributed internet behavior analysis method, device and system

Country Status (1)

Country Link
CN (1) CN103905266A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105897466A (en) * 2016-03-30 2016-08-24 中国联合网络通信集团有限公司 Method and device for evaluating webpage resource distribution
WO2017071179A1 (en) * 2015-10-28 2017-05-04 华为技术有限公司 Method and apparatus for recognizing user behaviour object based on flow analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221997A1 (en) * 2008-01-18 2012-08-30 International Business Machines Corporation Navigation-independent access to elements of an integrated development environment (ide) using uniform resource locators (urls)
CN102685224A (en) * 2012-04-28 2012-09-19 华为技术有限公司 User behavior analysis method, related equipment and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221997A1 (en) * 2008-01-18 2012-08-30 International Business Machines Corporation Navigation-independent access to elements of an integrated development environment (ide) using uniform resource locators (urls)
CN102685224A (en) * 2012-04-28 2012-09-19 华为技术有限公司 User behavior analysis method, related equipment and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017071179A1 (en) * 2015-10-28 2017-05-04 华为技术有限公司 Method and apparatus for recognizing user behaviour object based on flow analysis
US10769254B2 (en) 2015-10-28 2020-09-08 Huawei Technologies Co., Ltd. Method and apparatus for identifying user behavior object based on traffic analysis
CN105897466A (en) * 2016-03-30 2016-08-24 中国联合网络通信集团有限公司 Method and device for evaluating webpage resource distribution
CN105897466B (en) * 2016-03-30 2018-10-12 中国联合网络通信集团有限公司 A kind of evaluation method and device of web page resources distribution

Similar Documents

Publication Publication Date Title
CN103442024B (en) A kind of system and method for intelligent mobile terminal and cloud virtual mobile terminal synchronization
CN107040863B (en) Real-time service recommendation method and system
CN103761309A (en) Operation data processing method and system
CN101039281B (en) Method for sharing load of stream media server
CN102208991A (en) Blog processing method, device and system
CN101902497B (en) Cloud computing based internet information monitoring system and method
CN104394211A (en) Design and implementation method for user behavior analysis system based on Hadoop
CN107315810A (en) A kind of internet of things equipment behavior portrait method
CN107888666A (en) A kind of cross-region data-storage system and method for data synchronization and device
CN108513094A (en) Video frequency monitoring method and device
CN110071847A (en) Message processing method, device, terminal device and storage medium
CN109982293A (en) Flow product method for pushing, system, electronic equipment and storage medium
CN102347930A (en) Method and system for obtaining webpage content
Ahammad et al. Software-defined dew, roof, fog and cloud (SD-DRFC) framework for IoT ecosystem: the journey, novel framework architecture, simulation, and use cases
CN102572806A (en) Mobile terminal adapting system and method based on Msky platform
CN104881788A (en) Data processing method of electricity customer, system and customer service management platform
CN103905266A (en) Distributed internet behavior analysis method, device and system
CN103944779B (en) A kind of WAP service features monitoring method and system
CN102685155B (en) The method that content transmits, content delivering server and content transmit proxy server
CN104202389A (en) Monitoring method for storage space and running state in cloud environment and cloud storage system
CN109218142A (en) One kind being based on OneM2M agreement platform of internet of things terminal access method and device
CN110032612A (en) Information-pushing method and device
CN202548557U (en) Clock network system
CN107612744A (en) A kind of intelligent transportation equipment operational system based on mobile platform
CN109062758A (en) A kind of server system delay machine processing method, system, medium and equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140702