CN102724059B - Website operation state monitoring and abnormal detection based on MapReduce - Google Patents

Website operation state monitoring and abnormal detection based on MapReduce Download PDF

Info

Publication number
CN102724059B
CN102724059B CN201210095037.8A CN201210095037A CN102724059B CN 102724059 B CN102724059 B CN 102724059B CN 201210095037 A CN201210095037 A CN 201210095037A CN 102724059 B CN102724059 B CN 102724059B
Authority
CN
China
Prior art keywords
access
website
flow
rule
peak
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210095037.8A
Other languages
Chinese (zh)
Other versions
CN102724059A (en
Inventor
邹权
唐振坤
蒋文瑞
林琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHANGSHU ZHITANG TOWN XINSHENG TECHNICAL CONSULTATION SERVICE CO LTD
Original Assignee
CHANGSHU ZHITANG TOWN XINSHENG TECHNICAL CONSULTATION SERVICE CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHANGSHU ZHITANG TOWN XINSHENG TECHNICAL CONSULTATION SERVICE CO LTD filed Critical CHANGSHU ZHITANG TOWN XINSHENG TECHNICAL CONSULTATION SERVICE CO LTD
Priority to CN201210095037.8A priority Critical patent/CN102724059B/en
Publication of CN102724059A publication Critical patent/CN102724059A/en
Application granted granted Critical
Publication of CN102724059B publication Critical patent/CN102724059B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses website operation state monitoring and abnormal detection based on MapReduce. By utilizing a MapReduce parallel programming model, the state monitoring and abnormal detection can be finished, an optimal information point is captured from massive journal files, and abnormal behaviours in the access process are effectively and accurately captured by adopting an effective strategy. A website journal is analyzed by adopting four parallel strategies, namely state monitoring, characteristic abnormal detection, flow peak value detection and decision tree learning access rules, so that the monitoring and abnormal access detection of the website operation state can be achieved.

Description

Based on website running state monitoring and the abnormality detection of MapReduce
Technical field
The technical field by improving website service performance to the excavation process of web log file is belonged to based on the website running state monitoring of MapReduce and abnormality detection
Background technology
Website, the place given out information towards the whole world that internet lastblock is fixed, is made up of domain name (namely station address) and web space, generally includes homepage and other have the page of hyperlink file.Its modern society that appears as brings irreplaceable effect, as publicized self-image, providing abundant information easily, for business is activated business channel etc. can run on internet steadily in the long term to allow website, allow user in fast changing information-intensive society, catch more network business opportunity, the maintenance of website and improvement become vital link.
The maintenance of website is mostly based on the access log of website, the log recording access behavior of all users, effectively can find out the mutual rule of user and website, thus to improving the attention rate of website and improving website service (comprise the validity of improving link and utilize buffer memory to improve accelerating website access etc.) and have important effect.
Traditional Web log mining process many employings machine learning (Li Ming etc., 2004) and the visual method in conjunction with artificial treatment.Especially recently the fresh approach of machine learning research field has been used to daily record data excavation, as the Active Learning (Georges et al, 2010) based on feedback.Due to the more difficult acquisition of feedback information, this method is applied to the personalized recommendation system of search engine usually.Domestic this respect also has good work, and Liu great You, Yang Bo professor etc. as Jilin University utilizes machine learning method to recognize iterative task in daily record, and then saves operation time (Li Jiafei etc., 2007); The Chen Guolong professor of University of Fuzhou utilizes machine learning and optimization method to detect invasion in journal file, but these traditional methods do not utilize parallel mechanism, is doomed to be applied to the huge portal website of visit capacity.And internet information is fast changing, the value of the network information that the time delay that traditional analysis brings reduces greatly.Under these circumstances, propose a kind of real-time high-efficiency, accurately analysis strategy, just seem particularly important.
MapReduce is a kind of parallel programming model that Google proposes, for the concurrent operation of large-scale dataset (can be greater than 1TB).MapReduce is by the large-scale operation of data set, each node be distributed on network realizes reliability and distributivity, for these distribution data respectively allocating task process, to reach the treatment effect of parallelization, even if pending data rapidly increase, only need the dynamic interstitial content increasing cluster, and node needs not be the high-performance machine of tool specific function, only needs common business PC.How to utilize this parallel processing mechanism with low cost to come website Treatment Analysis efficiently, and then reach the running status monitoring website in real time, detecting abnormal behaviour, is key issue to be solved by this invention.
Summary of the invention
Traditional log analysis method cannot solve the huge portal website of visit capacity, and the daily record that namely enable process is a large amount of, and the delay in processing time can reduce the potential value of log analysis greatly.And in the face of non-structured journal file, take which kind of analysis strategy, analyze which aspect of daily record, the construction for website improves also vital impact.For these problems, the present invention adopts MapReduce parallel programming model, captures best information point from the journal file of magnanimity, and adopts effective strategy, catches out the abnormal behaviour in access process efficiently, accurately.
The invention is characterized in, the monitoring of its completion status and abnormality detection, and adopt following steps successively: this invention is successively containing following steps:
1. condition monitoring
step (1.1):analyze abnormality code, adopt linear regression strategy automatically to report abnormal conditions;
Conditional code energy effecting reaction in log recording goes out the running status of website, and common abnormality code has:
3xx-be redirected:
Client browser must take more multioperation to realize request, and browser may be had to the different page on request server, or repeats this request by proxy server, common: 301: to be forever redirected, 302: be redirected temporarily;
4xx-client error:
Make a mistake, client has problem, the non-existent page of client-requested, and client does not provide effective authentication information, common: 404: do not find, and does not exist;
5xx-server error:
Extract this three classes abnormality code every day out by daily record, observe the running status of website, common effect comprises:
Find out dead link;
Find out interim being redirected, for 404, need to check whether this file exists, if file exists and returns 404, the reason then caused comprises server destabilizing factor, the problem of server own or server and is attacked, if file does not inherently exist and spider also can remove to climb that non-existent page, be then because also have other pages to be linked to that non-existent page;
step (1.2):access times per second and traffic statistics, and calculate when per day access times per second and flow system, and before rank 10 access times per second and flow;
The report of flowing of access is used for telling whether forecast has capture program capturing website data in a large number, this behavior can have a strong impact on the service performance of website, by before report every day rank 10 access times per second and flow and average index, the appearance whether having capture program can be reflected intuitively;
step (1.3):the statistical analysis of spider reptile
The visiting quantity of spider and frequency are one of indexs of website health degree and weight of website, and the crawl frequency Main Function of statistics search engine spider is:
1. predict its keyword ranking:
If spider will come hundreds and thousands of times every day in the past, so this time, your website possesses attraction, often the performance of keyword is also more sane, and when spider comes to visit the larger minimizing of number of times generation, illustrate that your website there occurs problem, may be because of correcting or the reason being subject to punishment, this time be reduced by visiting number of times, you just should have a kind of premonition, and keyword rank wants change;
2. find which thing unnecessary, picture search engine have accessed, then can, with robots.txt file forbidding, because spider will more go to access useful thing like this, thus allow website more be included;
step (1.4):paging partition plate access rank:
Adding up the page access amount of each plate, is analyzing web site content, the whether attractive most direct mode of typesetting;
illustrate:the each step of above condition monitoring all adopts MapReduce parallel model;
2. feature abnormalities detects
step (2.1):mapReduce model walks abreast and counts the IP that every daily visit crosses ten thousand, and preservation is further analyzed;
step (2.2):for the IP meeting step 2.1, its user agent's information of procuratorial work is empty probability, if probability is greater than threshold value T1, then enters next step analysis;
step (2.3):continue procuratorial work and meet step 2.1, the IP of 2.2, verify the distributed architecture of its request resource type, if the probability of request HTML is greater than threshold value T2, enter next step and analyze, described distributed architecture comprises HTML, XML, CSS and JS;
step (2.4):for meeting step 2.1, the IP of 2.2,2.3, adopt the access frequency of MapReduce parallel model statistics current IP, if access frequency is greater than threshold value T3, then current IP is judged as abnormal capture program;
3. peak flow detects
step (3.1): read website visiting rule
The flowing of access of normal website is general comparatively steady, although also there is the situation that peak value takes place frequently, all presents certain regularity, by setting this empirical rule relevant to class carrying out log analysis, can contribute to accuracy and the reliability of flow detection.
This program by specify sky, week, time information specify the access rule had been found that, shown in following form:
My god Week Initial time End time Annotation
ALL ALL 07:00:00 09:00:00 The user of # every morning accesses peak
ALL 6 21:00:00 23:00:00 # Saturday, this user had one grade of comparatively burning hot program two days weekends, and therefore flow is higher, has some ball matches live at ordinary times
In the rule file of the same name be associated with journal file, above-mentioned Rule Information can be specified.Content in the second row of form above, meaning, namely in 7:00 to the 9:00 period of every morning, there will be some flowing of access peaks.When these peaks being detected, can be defaulted as is normal discharge access.Content in the third line of form above then represents and all there will be higher flow peak on every Saturdays, and therefore setting these rule of reason when searching for the access of abnormal flow peak value, can improve the reliability of program.
step (3.2): obtain the overall deviation of flow
By observing, the overall deviation of flow directly can reflect the distribution situation of flowing of access, access situation evenly normal flow shows as lower deviate, and the abnormal daily record flow taken place frequently then shows as higher deviate, and overall deviation then reflects the overall flow access situation of this website.
step (3.3): search anomaly peak accessing points
By setting window wand deviation factor kinitial value, the peak value access that program will be come with fixing window-unit in the detection time of interval.First, program can first by calculating the flow deviation of this time interval s ', with overall flowing of access deviation scompare, if s '> k* s, then show between this window region winside there is anomaly peak.Then by constantly detecting this interval backward continuously, until find maximum peak point, judge whether this peak point appears in the website rule of reason definition of previous definition, if it is report that this peak value is normal, and point out matched rule, otherwise the access of report abnormal flow peak value, reminds with eye-catching red status.
decision tree learning access rule
step (4.1): display abnormal flow list
At peak flow detection-phase, the each flowing of access in website will be marked by anomaly peak searching algorithm, these anomaly peaks not only can trigger abnormality alarm, remind keeper to note present flow rate state, operation interface also can be provided further to analyze the truth of current state to keeper.
step (4.2): manual correction abnormal flow
If belong to normal at current anomaly peak, there is no this rule-like in meaning and current access rule, belong to wrong report situation, then by the interactive operation of keeper in operation interface, further study can be provided, to improve the accuracy rate of program monitoring and detection.
step (4.3): decision tree learning access rule
Decision tree learning carrys out classified instance by example is aligned to certain leaf node from root node, and leaf node is the classification belonging to example.The test of each node specification on tree to certain attribute of example, and each follow-up branch of this node corresponds to a probable value of this attribute.
Here, it is as follows that first we extract each characteristic attribute for web log file Visitor Logs: sky, week, the time, flow value, whether abnormal, application simultaneously can process the C4.5 decision tree learning algorithm of connection attribute, on its basis, also add the characteristic of incremental learning, make learning process can not only complete the extraction of website visiting rule, the feature of web log file access streaming record can also be met simultaneously, on the basis of not losing original learning rules, add new data, further learn.
step (4.4): upgrade website visiting rule
After completing the decision tree learning stage, these new regulations learnt can dynamically be updated in the Traffic anomaly detection stage by system, thus improve the accuracy rate of abnormality detection.
Accompanying drawing explanation
fig. 1.web log file format description;
fig. 2.mapReduce parallel model schematic diagram;
fig. 3.abnormality code analyzes schematic diagram;
fig. 4.spider reptile is come to visit statistical conditions schematic diagram;
fig. 5.paging partition plate acess control schematic diagram;
fig. 6.feature abnormalities overhaul flow chart;
fig. 7.peak flow detects main surface chart;
fig. 8.peak flow trace routine-selection data file content figure;
fig. 9.peak flow detects operational effect figure.
Embodiment
The maintenance of website is mostly based on the access log of website, the log recording access behavior of all users, effectively can find out the mutual rule of user and website, thus to improving the attention rate of website and improving website service (comprise the validity of improving link and utilize buffer memory to improve accelerating website access etc.) and have important effect
Traditional log analysis method does not utilize parallel mechanism, is doomed to be applied to the huge portal website of visit capacity.And internet information is fast changing, the value of the network information that the time delay that traditional analysis brings reduces greatly.Under these circumstances, propose a kind of cheap, real-time, efficiently, analysis strategy accurately, just seem particularly important.
The present invention takes four kinds of paralleling tactics to carry out analyzing and processing to network address daily record, thus the monitoring reached website running status and abnormal access detection.These four kinds of strategies are condition monitoring respectively, feature abnormalities detects, peak flow detects, decision tree learning access rule.
Wherein condition monitoring is with the conditional code of MapReduce parallel model statistics and analysis daily record, access times per second, flowing of access per second, spider reptile access several aspect such as situation, paging partition column access situation, thus drawing the running status of whole website efficiently, the Recycle mechanism for website has a very big significance.
The application example figure of point of the present invention is shown in Fig. 3, Fig. 4, Fig. 5
Feature abnormalities detects from access IP, the large essential characteristic of request resource type two, takes MapReduce parallel computation strategy in the journal file of magnanimity, to excavate abnormal access point rapidly
Peak flow detects and catches this essential characteristic of website visiting flow, analyzes peak flow and the overall deviation of flow.The overall deviation of flow directly can reflect the distribution situation of flowing of access, and access situation evenly normal flow shows as lower deviate, and the abnormal daily record flow taken place frequently then shows as higher deviate.For the judgement of peak value, by setting window wand deviation factor kinitial value, the peak value access that program will be come with fixing window-unit in the detection time of interval.First, program can first by calculating the flow deviation of this time interval s ', with overall flowing of access deviation scompare, if s '> k* s, then show between this window region winside there is anomaly peak.Then by constantly detecting this interval backward continuously, until find maximum peak point, judge whether this peak point appears in the website rule of reason definition of previous definition, if it is report that this peak value is normal, and point out matched rule, otherwise the access of report abnormal flow peak value, reminds with eye-catching red status.
The application example figure of point of the present invention is shown in Fig. 9
Decision tree learning access rule adopts the decision tree classification thought in machine learning, build the parallel model automatic categorizer based on MapReduce, the structure of this grader is characterized as input with normal discharge feature and abnormal flow, by learning the parallel sorting device obtained based on MapReduce.

Claims (1)

1. based on the website running state monitoring of MapReduce and the method for abnormality detection, it is characterized in that, the method is successively containing following steps:
1. condition monitoring
Step (1.1): analyze abnormality code, adopts linear regression strategy automatically to report abnormal conditions;
Conditional code energy effecting reaction in log recording goes out the running status of website, and common abnormality code has:
3xx-be redirected:
Client browser must take more multioperation to realize request, and browser is had to the different page on request server, or repeats this request by proxy server, common: 301: to be forever redirected, 302: be redirected temporarily;
4xx-client error:
Make a mistake, client has problem, the non-existent page of client-requested, and client does not provide effective authentication information, common: 404: do not find, and does not exist;
5xx-server error:
Extract this three classes abnormality code every day out by daily record, observe the running status of website, common effect comprises:
Find out dead link;
Find out interim being redirected, for 404, need to check whether this file exists, if file exists and returns 404, the reason then caused comprises server destabilizing factor, the problem of server own or server and is attacked, if file does not inherently exist and spider also can remove to climb that non-existent page, be then because also have other pages to be linked to that non-existent page;
Step (1.2): access times per second and traffic statistics, and calculate when per day access times per second and flow system, and before rank 10 access times per second and flow;
The report of flowing of access is used for telling whether forecast has capture program capturing website data in a large number, this behavior can have a strong impact on the service performance of website, by before report every day rank 10 access times per second and flow and average index, the appearance whether having capture program can be reflected intuitively;
Step (1.3): spider reptile statistical analysis
The visiting quantity of spider and frequency are one of indexs of website health degree and weight of website, and the crawl frequency Main Function of statistics search engine spider is:
1. predict its keyword ranking:
If spider will come hundreds and thousands of times every day in the past, so this time, your website possesses attraction, and often the performance of keyword is also relatively more sane, and when spider comes to visit the larger minimizing of number of times generation, illustrates that your website there occurs problem;
2. find which thing unnecessary, picture search engine have accessed, then with robots.txt file forbidding, because spider will more go to access useful thing like this, thus allow website more be included;
Step (1.4): paging partition plate access rank:
Adding up the page access amount of each plate, is analyzing web site content, the whether attractive most direct mode of typesetting;
Illustrate: each step of above condition monitoring all adopts MapReduce parallel model;
2. feature abnormalities detects
Step (2.1): MapReduce model walks abreast and counts the IP that every daily visit crosses ten thousand, and preservation is further analyzed;
Step (2.2): for the IP meeting step 2.1, its user agent's information of procuratorial work is empty probability, if probability is greater than threshold value T1, then enters next step analysis;
Step (2.3): continue procuratorial work and meet step 2.1, the IP of 2.2, verify the distributed architecture of its request resource type, if the probability of request HTML is greater than threshold value T2, enter next step and analyze, described distributed architecture comprises HTML, XML, CSS and JS;
Step (2.4): for meeting step 2.1, the IP of 2.2,2.3, adopt the access frequency of MapReduce parallel model statistics current IP, if access frequency is greater than threshold value T3, then current IP is judged as abnormal capture program;
3. peak flow detects
Step (3.1): read website visiting rule
The flowing of access of normal website is general comparatively steady, although also there is the situation that peak value takes place frequently, all presents certain regularity, by setting this empirical rule relevant to class carrying out log analysis, contributes to accuracy and the reliability of flow detection;
This program by specify sky, week, time information specify the access rule had been found that, shown in following form:
My god Week Initial time End time Annotation ALL ALL 07:00:00 09:00:00 The user of # every morning accesses peak ALL 6 21:00:00 23:00:00 # Saturday, this user had one grade of comparatively burning hot program two days weekends, and therefore flow is higher, has some ball matches live at ordinary times
In the rule file of the same name be associated with journal file, above-mentioned Rule Information can be specified, content in the second row of form above, meaning is namely in 7:00 to the 9:00 period of every morning, there will be some flowing of access peaks, when these peaks being detected, being defaulted as is normal discharge access, content in the third line of form above then represents and all there will be higher flow peak on every Saturdays, therefore set these rule of reason and when searching for the access of abnormal flow peak value, the reliability of program can be improved;
Step (3.2): obtain the overall deviation of flow
By observing, the overall deviation of flow directly can reflect the distribution situation of flowing of access, access situation evenly normal flow shows as lower deviate, and the abnormal daily record flow taken place frequently then shows as higher deviate, and overall deviation then reflects the overall flow access situation of this website;
Step (3.3): search anomaly peak accessing points
By setting window wand deviation factor kinitial value, the peak value access that program will be come with fixing window-unit in the detection time of interval, first, program can first by calculating the flow deviation of this time interval s ', with overall flowing of access deviation scompare, if s '> k* s, then show between this window region winside there is anomaly peak, then by constantly detecting this interval backward continuously, until find maximum peak point, judge whether this peak point appears in the website rule of reason definition of previous definition, if it is report that this peak value is normal, and point out matched rule, otherwise the access of report abnormal flow peak value, remind with eye-catching red status;
4. decision tree learning access rule
Step (4.1): display abnormal flow list
At peak flow detection-phase, the each flowing of access in website will be marked by anomaly peak searching algorithm, these anomaly peaks not only can trigger abnormality alarm, remind keeper to note present flow rate state, operation interface also can be provided further to analyze the truth of current state to keeper;
Step (4.2): manual correction abnormal flow
If belong to normal at current anomaly peak, there is no this rule-like in meaning and current access rule, belong to wrong report situation, then by the interactive operation of keeper in operation interface, further study can be provided, to improve the accuracy rate of program monitoring and detection;
Step (4.3): decision tree learning access rule
Decision tree learning carrys out classified instance by example is aligned to certain leaf node from root node, leaf node is the classification belonging to example, the test of each node specification on tree to certain attribute of example, and each follow-up branch of this node corresponds to a probable value of this attribute;
Here, it is as follows that first we extract each characteristic attribute for web log file Visitor Logs: sky, week, the time, flow value, whether abnormal, application simultaneously can process the C4.5 decision tree learning algorithm of connection attribute, on its basis, also add the characteristic of incremental learning, make learning process can not only complete the extraction of website visiting rule, the feature of web log file access streaming record can also be met simultaneously, on the basis of not losing original learning rules, add new data, further learn;
Step (4.4): upgrade website visiting rule
After completing the decision tree learning stage, these new regulations learnt can dynamically be updated in the Traffic anomaly detection stage by system, thus improve the accuracy rate of abnormality detection.
CN201210095037.8A 2012-03-31 2012-03-31 Website operation state monitoring and abnormal detection based on MapReduce Expired - Fee Related CN102724059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210095037.8A CN102724059B (en) 2012-03-31 2012-03-31 Website operation state monitoring and abnormal detection based on MapReduce

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210095037.8A CN102724059B (en) 2012-03-31 2012-03-31 Website operation state monitoring and abnormal detection based on MapReduce

Publications (2)

Publication Number Publication Date
CN102724059A CN102724059A (en) 2012-10-10
CN102724059B true CN102724059B (en) 2015-03-11

Family

ID=46949728

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210095037.8A Expired - Fee Related CN102724059B (en) 2012-03-31 2012-03-31 Website operation state monitoring and abnormal detection based on MapReduce

Country Status (1)

Country Link
CN (1) CN102724059B (en)

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103812715A (en) * 2012-11-07 2014-05-21 江苏仕德伟网络科技股份有限公司 Method for judging running state of website
CN103019855B (en) * 2012-11-21 2015-06-03 北京航空航天大学 Method for forecasting executive time of Map Reduce operation
CN103077107B (en) * 2012-12-31 2016-12-28 Tcl集团股份有限公司 A kind of data maintaining method and system
CN104077328B (en) * 2013-03-29 2019-05-24 百度在线网络技术(北京)有限公司 The operation diagnostic method and equipment of MapReduce distributed system
CN103248625B (en) * 2013-04-27 2016-09-14 北京京东尚科信息技术有限公司 A kind of web crawlers operation exception monitoring method and system
CN103605714B (en) * 2013-11-14 2017-10-03 北京国双科技有限公司 The recognition methods of website abnormal data and device
CN103605735B (en) * 2013-11-19 2017-11-21 北京国双科技有限公司 website data analysis method and device
CN104657392B (en) * 2013-11-25 2020-02-11 腾讯科技(深圳)有限公司 Method and device for realizing retrieval abnormity restoration
CN104239197A (en) * 2014-10-10 2014-12-24 浪潮电子信息产业股份有限公司 Administrative user abnormal behavior detection method based on big data log analysis
CN105930255B (en) * 2015-10-16 2019-01-29 中国银联股份有限公司 A kind of system health degree prediction technique and device
CN106611023B (en) * 2015-10-27 2020-11-24 北京国双科技有限公司 Method and device for detecting website access abnormality
CN105610616B (en) * 2015-12-29 2019-04-26 赛尔网络有限公司 The single IP average flow rate statistical method of access net and system based on ICP liveness
CN107819727B (en) * 2016-09-13 2020-11-17 腾讯科技(深圳)有限公司 Network security protection method and system based on IP address security credit
CN108255868B (en) * 2016-12-29 2020-11-24 北京国双科技有限公司 Method and device for checking links in website
CN108270727A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 Abnormal data analysis method and device
CN108459936B (en) * 2017-02-20 2021-05-14 北京畅游时空软件技术有限公司 Accurate statistical method and device based on content modularization
CN109257196A (en) * 2017-07-12 2019-01-22 阿里巴巴集团控股有限公司 A kind of abnormality eliminating method and equipment
CN107196968B (en) * 2017-07-12 2020-10-20 深圳市活力天汇科技股份有限公司 Crawler identification method
CN107454083A (en) * 2017-08-08 2017-12-08 四川长虹电器股份有限公司 The method of anti-reptile
CN107438079B (en) * 2017-08-18 2020-05-01 杭州安恒信息技术股份有限公司 Method for detecting unknown abnormal behaviors of website
CN109560977A (en) * 2017-09-25 2019-04-02 北京国双科技有限公司 Web site traffic monitoring method, device, storage medium, processor and electronic equipment
CN107707427B (en) * 2017-09-28 2021-12-17 南华大学 Website availability monitoring system
CN109586942A (en) * 2017-09-29 2019-04-05 北京国双科技有限公司 Web site performance assessment method and device
CN107809331B (en) * 2017-10-25 2020-11-24 北京京东尚科信息技术有限公司 Method and device for identifying abnormal flow
CN107707574A (en) * 2017-11-23 2018-02-16 四川长虹电器股份有限公司 A kind of anti-reptile method based on the behavior of access
CN107743135A (en) * 2017-12-01 2018-02-27 江彩莲 Flow monitoring method
CN109120592A (en) * 2018-07-09 2019-01-01 四川大学 A kind of Web abnormality detection system based on user behavior
CN110019987B (en) * 2018-11-28 2023-05-09 创新先进技术有限公司 Log matching method and device based on decision tree
CN110008100B (en) * 2019-03-08 2023-03-14 创新先进技术有限公司 Method and device for detecting abnormal access volume of web page
CN110852387B (en) * 2019-11-13 2022-04-22 江苏能来能源互联网研究院有限公司 Energy internet super real-time state studying and judging algorithm
CN110969358A (en) * 2019-12-04 2020-04-07 国网浙江省电力有限公司 Risk control method for power electronic channel operation
CN112989157A (en) * 2019-12-13 2021-06-18 网宿科技股份有限公司 Method and device for detecting crawler request
CN111106959B (en) * 2019-12-20 2022-10-14 贵州黔岸科技有限公司 Abnormity monitoring and alarming system and method for transportation management system
CN112019508A (en) * 2020-07-28 2020-12-01 杭州安恒信息技术股份有限公司 Method, system and electronic device for detecting DDos attack based on Web log analysis
CN112039854A (en) * 2020-08-13 2020-12-04 深圳市信锐网科技术有限公司 Data transmission method, device and storage medium
CN114285612B (en) * 2021-12-14 2023-09-26 北京天融信网络安全技术有限公司 Method, system, device, equipment and medium for detecting abnormal data
CN114253811A (en) * 2021-12-24 2022-03-29 深圳市盘古数据有限公司 Intelligent monitoring method for data center system
CN114546881B (en) * 2022-03-22 2022-10-28 通号智慧城市研究设计院有限公司 Application software testing method, electronic device and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169505A (en) * 2011-05-16 2011-08-31 苏州两江科技有限公司 Recommendation system building method based on cloud computing
CN102393849A (en) * 2011-07-18 2012-03-28 电子科技大学 Web log data preprocessing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080209030A1 (en) * 2007-02-28 2008-08-28 Microsoft Corporation Mining Web Logs to Debug Wide-Area Connectivity Problems

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169505A (en) * 2011-05-16 2011-08-31 苏州两江科技有限公司 Recommendation system building method based on cloud computing
CN102393849A (en) * 2011-07-18 2012-03-28 电子科技大学 Web log data preprocessing method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"分布式多主题网络爬虫系统的研究与实现";白鹤等;《计算机工程》;20091030;第35卷(第19期);全文 *
"基于Hadoop的Web日志预处理的设计与实现";宋莹等;《电信工程技术与标准化》;20111130;全文 *

Also Published As

Publication number Publication date
CN102724059A (en) 2012-10-10

Similar Documents

Publication Publication Date Title
CN102724059B (en) Website operation state monitoring and abnormal detection based on MapReduce
US9590880B2 (en) Dynamic collection analysis and reporting of telemetry data
CN103297435B (en) A kind of abnormal access behavioral value method and system based on WEB daily record
CN106778253A (en) Threat context aware information security Initiative Defense model based on big data
CN105243159A (en) Visual script editor-based distributed web crawler system
CN105677842A (en) Log analysis system based on Hadoop big data processing technique
CN107800591A (en) A kind of analysis method of unified daily record data
CN103546326A (en) Website traffic statistic method
CN105718587A (en) Network content resource evaluation method and evaluation system
CN106407429A (en) File tracking method, device and system
CN111259073A (en) Intelligent business system running state studying and judging system based on logs, flow and business access
CN111459698A (en) Database cluster fault self-healing method and device
CN103559203A (en) Method, device and system for web page sorting
US20240095170A1 (en) Multi-cache based digital output generation
Sujatha Improved user navigation pattern prediction technique from web log data
CN109446441A (en) A kind of credible distributed capture storage system of general Web Community
Liu et al. Big Data architecture for IT incident management
Maske et al. A real time processing and streaming of wireless network data using storm
Ganapathi et al. Web analytics and the art of data summarization
Zhao et al. Collecting, managing and analyzing social networking data effectively
Mary et al. Performance enhancement in session identification
Tsai et al. Object architected design and efficient dynamic adjustment mechanism of distributed web crawlers
Huang et al. An improved referrer-based session identification algorithm using MapReduce
Kirci et al. " Is my internet down?" sifting through user-affecting outages with Google trends
CN107145542A (en) The high efficiency extraction subscription client ID method and system from URL

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150311

Termination date: 20160331