CN102724059B - Website operation state monitoring and abnormal detection based on MapReduce - Google Patents
Website operation state monitoring and abnormal detection based on MapReduce Download PDFInfo
- Publication number
- CN102724059B CN102724059B CN201210095037.8A CN201210095037A CN102724059B CN 102724059 B CN102724059 B CN 102724059B CN 201210095037 A CN201210095037 A CN 201210095037A CN 102724059 B CN102724059 B CN 102724059B
- Authority
- CN
- China
- Prior art keywords
- access
- website
- flow
- rule
- peak
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Abstract
The invention discloses website operation state monitoring and abnormal detection based on MapReduce. By utilizing a MapReduce parallel programming model, the state monitoring and abnormal detection can be finished, an optimal information point is captured from massive journal files, and abnormal behaviours in the access process are effectively and accurately captured by adopting an effective strategy. A website journal is analyzed by adopting four parallel strategies, namely state monitoring, characteristic abnormal detection, flow peak value detection and decision tree learning access rules, so that the monitoring and abnormal access detection of the website operation state can be achieved.
Description
Technical field
The technical field by improving website service performance to the excavation process of web log file is belonged to based on the website running state monitoring of MapReduce and abnormality detection
Background technology
Website, the place given out information towards the whole world that internet lastblock is fixed, is made up of domain name (namely station address) and web space, generally includes homepage and other have the page of hyperlink file.Its modern society that appears as brings irreplaceable effect, as publicized self-image, providing abundant information easily, for business is activated business channel etc. can run on internet steadily in the long term to allow website, allow user in fast changing information-intensive society, catch more network business opportunity, the maintenance of website and improvement become vital link.
The maintenance of website is mostly based on the access log of website, the log recording access behavior of all users, effectively can find out the mutual rule of user and website, thus to improving the attention rate of website and improving website service (comprise the validity of improving link and utilize buffer memory to improve accelerating website access etc.) and have important effect.
Traditional Web log mining process many employings machine learning (Li Ming etc., 2004) and the visual method in conjunction with artificial treatment.Especially recently the fresh approach of machine learning research field has been used to daily record data excavation, as the Active Learning (Georges et al, 2010) based on feedback.Due to the more difficult acquisition of feedback information, this method is applied to the personalized recommendation system of search engine usually.Domestic this respect also has good work, and Liu great You, Yang Bo professor etc. as Jilin University utilizes machine learning method to recognize iterative task in daily record, and then saves operation time (Li Jiafei etc., 2007); The Chen Guolong professor of University of Fuzhou utilizes machine learning and optimization method to detect invasion in journal file, but these traditional methods do not utilize parallel mechanism, is doomed to be applied to the huge portal website of visit capacity.And internet information is fast changing, the value of the network information that the time delay that traditional analysis brings reduces greatly.Under these circumstances, propose a kind of real-time high-efficiency, accurately analysis strategy, just seem particularly important.
MapReduce is a kind of parallel programming model that Google proposes, for the concurrent operation of large-scale dataset (can be greater than 1TB).MapReduce is by the large-scale operation of data set, each node be distributed on network realizes reliability and distributivity, for these distribution data respectively allocating task process, to reach the treatment effect of parallelization, even if pending data rapidly increase, only need the dynamic interstitial content increasing cluster, and node needs not be the high-performance machine of tool specific function, only needs common business PC.How to utilize this parallel processing mechanism with low cost to come website Treatment Analysis efficiently, and then reach the running status monitoring website in real time, detecting abnormal behaviour, is key issue to be solved by this invention.
Summary of the invention
Traditional log analysis method cannot solve the huge portal website of visit capacity, and the daily record that namely enable process is a large amount of, and the delay in processing time can reduce the potential value of log analysis greatly.And in the face of non-structured journal file, take which kind of analysis strategy, analyze which aspect of daily record, the construction for website improves also vital impact.For these problems, the present invention adopts MapReduce parallel programming model, captures best information point from the journal file of magnanimity, and adopts effective strategy, catches out the abnormal behaviour in access process efficiently, accurately.
The invention is characterized in, the monitoring of its completion status and abnormality detection, and adopt following steps successively: this invention is successively containing following steps:
1. condition monitoring
step (1.1):analyze abnormality code, adopt linear regression strategy automatically to report abnormal conditions;
Conditional code energy effecting reaction in log recording goes out the running status of website, and common abnormality code has:
3xx-be redirected:
Client browser must take more multioperation to realize request, and browser may be had to the different page on request server, or repeats this request by proxy server, common: 301: to be forever redirected, 302: be redirected temporarily;
4xx-client error:
Make a mistake, client has problem, the non-existent page of client-requested, and client does not provide effective authentication information, common: 404: do not find, and does not exist;
5xx-server error:
Extract this three classes abnormality code every day out by daily record, observe the running status of website, common effect comprises:
Find out dead link;
Find out interim being redirected, for 404, need to check whether this file exists, if file exists and returns 404, the reason then caused comprises server destabilizing factor, the problem of server own or server and is attacked, if file does not inherently exist and spider also can remove to climb that non-existent page, be then because also have other pages to be linked to that non-existent page;
step (1.2):access times per second and traffic statistics, and calculate when per day access times per second and flow system, and before rank 10 access times per second and flow;
The report of flowing of access is used for telling whether forecast has capture program capturing website data in a large number, this behavior can have a strong impact on the service performance of website, by before report every day rank 10 access times per second and flow and average index, the appearance whether having capture program can be reflected intuitively;
step (1.3):the statistical analysis of spider reptile
The visiting quantity of spider and frequency are one of indexs of website health degree and weight of website, and the crawl frequency Main Function of statistics search engine spider is:
1. predict its keyword ranking:
If spider will come hundreds and thousands of times every day in the past, so this time, your website possesses attraction, often the performance of keyword is also more sane, and when spider comes to visit the larger minimizing of number of times generation, illustrate that your website there occurs problem, may be because of correcting or the reason being subject to punishment, this time be reduced by visiting number of times, you just should have a kind of premonition, and keyword rank wants change;
2. find which thing unnecessary, picture search engine have accessed, then can, with robots.txt file forbidding, because spider will more go to access useful thing like this, thus allow website more be included;
step (1.4):paging partition plate access rank:
Adding up the page access amount of each plate, is analyzing web site content, the whether attractive most direct mode of typesetting;
illustrate:the each step of above condition monitoring all adopts MapReduce parallel model;
2. feature abnormalities detects
step (2.1):mapReduce model walks abreast and counts the IP that every daily visit crosses ten thousand, and preservation is further analyzed;
step (2.2):for the IP meeting step 2.1, its user agent's information of procuratorial work is empty probability, if probability is greater than threshold value T1, then enters next step analysis;
step (2.3):continue procuratorial work and meet step 2.1, the IP of 2.2, verify the distributed architecture of its request resource type, if the probability of request HTML is greater than threshold value T2, enter next step and analyze, described distributed architecture comprises HTML, XML, CSS and JS;
step (2.4):for meeting step 2.1, the IP of 2.2,2.3, adopt the access frequency of MapReduce parallel model statistics current IP, if access frequency is greater than threshold value T3, then current IP is judged as abnormal capture program;
3. peak flow detects
step (3.1): read website visiting rule
The flowing of access of normal website is general comparatively steady, although also there is the situation that peak value takes place frequently, all presents certain regularity, by setting this empirical rule relevant to class carrying out log analysis, can contribute to accuracy and the reliability of flow detection.
This program by specify sky, week, time information specify the access rule had been found that, shown in following form:
My god | Week | Initial time | End time | Annotation |
ALL | ALL | 07:00:00 | 09:00:00 | The user of # every morning accesses peak |
ALL | 6 | 21:00:00 | 23:00:00 | # Saturday, this user had one grade of comparatively burning hot program two days weekends, and therefore flow is higher, has some ball matches live at ordinary times |
In the rule file of the same name be associated with journal file, above-mentioned Rule Information can be specified.Content in the second row of form above, meaning, namely in 7:00 to the 9:00 period of every morning, there will be some flowing of access peaks.When these peaks being detected, can be defaulted as is normal discharge access.Content in the third line of form above then represents and all there will be higher flow peak on every Saturdays, and therefore setting these rule of reason when searching for the access of abnormal flow peak value, can improve the reliability of program.
step (3.2): obtain the overall deviation of flow
By observing, the overall deviation of flow directly can reflect the distribution situation of flowing of access, access situation evenly normal flow shows as lower deviate, and the abnormal daily record flow taken place frequently then shows as higher deviate, and overall deviation then reflects the overall flow access situation of this website.
step (3.3): search anomaly peak accessing points
By setting window
wand deviation factor
kinitial value, the peak value access that program will be come with fixing window-unit in the detection time of interval.First, program can first by calculating the flow deviation of this time interval
s ', with overall flowing of access deviation
scompare, if
s '>
k*
s, then show between this window region
winside there is anomaly peak.Then by constantly detecting this interval backward continuously, until find maximum peak point, judge whether this peak point appears in the website rule of reason definition of previous definition, if it is report that this peak value is normal, and point out matched rule, otherwise the access of report abnormal flow peak value, reminds with eye-catching red status.
decision tree learning access rule
step (4.1): display abnormal flow list
At peak flow detection-phase, the each flowing of access in website will be marked by anomaly peak searching algorithm, these anomaly peaks not only can trigger abnormality alarm, remind keeper to note present flow rate state, operation interface also can be provided further to analyze the truth of current state to keeper.
step (4.2): manual correction abnormal flow
If belong to normal at current anomaly peak, there is no this rule-like in meaning and current access rule, belong to wrong report situation, then by the interactive operation of keeper in operation interface, further study can be provided, to improve the accuracy rate of program monitoring and detection.
step (4.3): decision tree learning access rule
Decision tree learning carrys out classified instance by example is aligned to certain leaf node from root node, and leaf node is the classification belonging to example.The test of each node specification on tree to certain attribute of example, and each follow-up branch of this node corresponds to a probable value of this attribute.
Here, it is as follows that first we extract each characteristic attribute for web log file Visitor Logs: sky, week, the time, flow value, whether abnormal, application simultaneously can process the C4.5 decision tree learning algorithm of connection attribute, on its basis, also add the characteristic of incremental learning, make learning process can not only complete the extraction of website visiting rule, the feature of web log file access streaming record can also be met simultaneously, on the basis of not losing original learning rules, add new data, further learn.
step (4.4): upgrade website visiting rule
After completing the decision tree learning stage, these new regulations learnt can dynamically be updated in the Traffic anomaly detection stage by system, thus improve the accuracy rate of abnormality detection.
Accompanying drawing explanation
fig. 1.web log file format description;
fig. 2.mapReduce parallel model schematic diagram;
fig. 3.abnormality code analyzes schematic diagram;
fig. 4.spider reptile is come to visit statistical conditions schematic diagram;
fig. 5.paging partition plate acess control schematic diagram;
fig. 6.feature abnormalities overhaul flow chart;
fig. 7.peak flow detects main surface chart;
fig. 8.peak flow trace routine-selection data file content figure;
fig. 9.peak flow detects operational effect figure.
Embodiment
The maintenance of website is mostly based on the access log of website, the log recording access behavior of all users, effectively can find out the mutual rule of user and website, thus to improving the attention rate of website and improving website service (comprise the validity of improving link and utilize buffer memory to improve accelerating website access etc.) and have important effect
Traditional log analysis method does not utilize parallel mechanism, is doomed to be applied to the huge portal website of visit capacity.And internet information is fast changing, the value of the network information that the time delay that traditional analysis brings reduces greatly.Under these circumstances, propose a kind of cheap, real-time, efficiently, analysis strategy accurately, just seem particularly important.
The present invention takes four kinds of paralleling tactics to carry out analyzing and processing to network address daily record, thus the monitoring reached website running status and abnormal access detection.These four kinds of strategies are condition monitoring respectively, feature abnormalities detects, peak flow detects, decision tree learning access rule.
Wherein condition monitoring is with the conditional code of MapReduce parallel model statistics and analysis daily record, access times per second, flowing of access per second, spider reptile access several aspect such as situation, paging partition column access situation, thus drawing the running status of whole website efficiently, the Recycle mechanism for website has a very big significance.
The application example figure of point of the present invention is shown in Fig. 3, Fig. 4, Fig. 5
Feature abnormalities detects from access IP, the large essential characteristic of request resource type two, takes MapReduce parallel computation strategy in the journal file of magnanimity, to excavate abnormal access point rapidly
Peak flow detects and catches this essential characteristic of website visiting flow, analyzes peak flow and the overall deviation of flow.The overall deviation of flow directly can reflect the distribution situation of flowing of access, and access situation evenly normal flow shows as lower deviate, and the abnormal daily record flow taken place frequently then shows as higher deviate.For the judgement of peak value, by setting window
wand deviation factor
kinitial value, the peak value access that program will be come with fixing window-unit in the detection time of interval.First, program can first by calculating the flow deviation of this time interval
s ', with overall flowing of access deviation
scompare, if
s '>
k*
s, then show between this window region
winside there is anomaly peak.Then by constantly detecting this interval backward continuously, until find maximum peak point, judge whether this peak point appears in the website rule of reason definition of previous definition, if it is report that this peak value is normal, and point out matched rule, otherwise the access of report abnormal flow peak value, reminds with eye-catching red status.
The application example figure of point of the present invention is shown in Fig. 9
Decision tree learning access rule adopts the decision tree classification thought in machine learning, build the parallel model automatic categorizer based on MapReduce, the structure of this grader is characterized as input with normal discharge feature and abnormal flow, by learning the parallel sorting device obtained based on MapReduce.
Claims (1)
1. based on the website running state monitoring of MapReduce and the method for abnormality detection, it is characterized in that, the method is successively containing following steps:
1. condition monitoring
Step (1.1): analyze abnormality code, adopts linear regression strategy automatically to report abnormal conditions;
Conditional code energy effecting reaction in log recording goes out the running status of website, and common abnormality code has:
3xx-be redirected:
Client browser must take more multioperation to realize request, and browser is had to the different page on request server, or repeats this request by proxy server, common: 301: to be forever redirected, 302: be redirected temporarily;
4xx-client error:
Make a mistake, client has problem, the non-existent page of client-requested, and client does not provide effective authentication information, common: 404: do not find, and does not exist;
5xx-server error:
Extract this three classes abnormality code every day out by daily record, observe the running status of website, common effect comprises:
Find out dead link;
Find out interim being redirected, for 404, need to check whether this file exists, if file exists and returns 404, the reason then caused comprises server destabilizing factor, the problem of server own or server and is attacked, if file does not inherently exist and spider also can remove to climb that non-existent page, be then because also have other pages to be linked to that non-existent page;
Step (1.2): access times per second and traffic statistics, and calculate when per day access times per second and flow system, and before rank 10 access times per second and flow;
The report of flowing of access is used for telling whether forecast has capture program capturing website data in a large number, this behavior can have a strong impact on the service performance of website, by before report every day rank 10 access times per second and flow and average index, the appearance whether having capture program can be reflected intuitively;
Step (1.3): spider reptile statistical analysis
The visiting quantity of spider and frequency are one of indexs of website health degree and weight of website, and the crawl frequency Main Function of statistics search engine spider is:
1. predict its keyword ranking:
If spider will come hundreds and thousands of times every day in the past, so this time, your website possesses attraction, and often the performance of keyword is also relatively more sane, and when spider comes to visit the larger minimizing of number of times generation, illustrates that your website there occurs problem;
2. find which thing unnecessary, picture search engine have accessed, then with robots.txt file forbidding, because spider will more go to access useful thing like this, thus allow website more be included;
Step (1.4): paging partition plate access rank:
Adding up the page access amount of each plate, is analyzing web site content, the whether attractive most direct mode of typesetting;
Illustrate: each step of above condition monitoring all adopts MapReduce parallel model;
2. feature abnormalities detects
Step (2.1): MapReduce model walks abreast and counts the IP that every daily visit crosses ten thousand, and preservation is further analyzed;
Step (2.2): for the IP meeting step 2.1, its user agent's information of procuratorial work is empty probability, if probability is greater than threshold value T1, then enters next step analysis;
Step (2.3): continue procuratorial work and meet step 2.1, the IP of 2.2, verify the distributed architecture of its request resource type, if the probability of request HTML is greater than threshold value T2, enter next step and analyze, described distributed architecture comprises HTML, XML, CSS and JS;
Step (2.4): for meeting step 2.1, the IP of 2.2,2.3, adopt the access frequency of MapReduce parallel model statistics current IP, if access frequency is greater than threshold value T3, then current IP is judged as abnormal capture program;
3. peak flow detects
Step (3.1): read website visiting rule
The flowing of access of normal website is general comparatively steady, although also there is the situation that peak value takes place frequently, all presents certain regularity, by setting this empirical rule relevant to class carrying out log analysis, contributes to accuracy and the reliability of flow detection;
This program by specify sky, week, time information specify the access rule had been found that, shown in following form:
In the rule file of the same name be associated with journal file, above-mentioned Rule Information can be specified, content in the second row of form above, meaning is namely in 7:00 to the 9:00 period of every morning, there will be some flowing of access peaks, when these peaks being detected, being defaulted as is normal discharge access, content in the third line of form above then represents and all there will be higher flow peak on every Saturdays, therefore set these rule of reason and when searching for the access of abnormal flow peak value, the reliability of program can be improved;
Step (3.2): obtain the overall deviation of flow
By observing, the overall deviation of flow directly can reflect the distribution situation of flowing of access, access situation evenly normal flow shows as lower deviate, and the abnormal daily record flow taken place frequently then shows as higher deviate, and overall deviation then reflects the overall flow access situation of this website;
Step (3.3): search anomaly peak accessing points
By setting window
wand deviation factor
kinitial value, the peak value access that program will be come with fixing window-unit in the detection time of interval, first, program can first by calculating the flow deviation of this time interval
s ', with overall flowing of access deviation
scompare, if
s '>
k*
s, then show between this window region
winside there is anomaly peak, then by constantly detecting this interval backward continuously, until find maximum peak point, judge whether this peak point appears in the website rule of reason definition of previous definition, if it is report that this peak value is normal, and point out matched rule, otherwise the access of report abnormal flow peak value, remind with eye-catching red status;
4. decision tree learning access rule
Step (4.1): display abnormal flow list
At peak flow detection-phase, the each flowing of access in website will be marked by anomaly peak searching algorithm, these anomaly peaks not only can trigger abnormality alarm, remind keeper to note present flow rate state, operation interface also can be provided further to analyze the truth of current state to keeper;
Step (4.2): manual correction abnormal flow
If belong to normal at current anomaly peak, there is no this rule-like in meaning and current access rule, belong to wrong report situation, then by the interactive operation of keeper in operation interface, further study can be provided, to improve the accuracy rate of program monitoring and detection;
Step (4.3): decision tree learning access rule
Decision tree learning carrys out classified instance by example is aligned to certain leaf node from root node, leaf node is the classification belonging to example, the test of each node specification on tree to certain attribute of example, and each follow-up branch of this node corresponds to a probable value of this attribute;
Here, it is as follows that first we extract each characteristic attribute for web log file Visitor Logs: sky, week, the time, flow value, whether abnormal, application simultaneously can process the C4.5 decision tree learning algorithm of connection attribute, on its basis, also add the characteristic of incremental learning, make learning process can not only complete the extraction of website visiting rule, the feature of web log file access streaming record can also be met simultaneously, on the basis of not losing original learning rules, add new data, further learn;
Step (4.4): upgrade website visiting rule
After completing the decision tree learning stage, these new regulations learnt can dynamically be updated in the Traffic anomaly detection stage by system, thus improve the accuracy rate of abnormality detection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210095037.8A CN102724059B (en) | 2012-03-31 | 2012-03-31 | Website operation state monitoring and abnormal detection based on MapReduce |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210095037.8A CN102724059B (en) | 2012-03-31 | 2012-03-31 | Website operation state monitoring and abnormal detection based on MapReduce |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102724059A CN102724059A (en) | 2012-10-10 |
CN102724059B true CN102724059B (en) | 2015-03-11 |
Family
ID=46949728
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210095037.8A Expired - Fee Related CN102724059B (en) | 2012-03-31 | 2012-03-31 | Website operation state monitoring and abnormal detection based on MapReduce |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102724059B (en) |
Families Citing this family (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103812715A (en) * | 2012-11-07 | 2014-05-21 | 江苏仕德伟网络科技股份有限公司 | Method for judging running state of website |
CN103019855B (en) * | 2012-11-21 | 2015-06-03 | 北京航空航天大学 | Method for forecasting executive time of Map Reduce operation |
CN103077107B (en) * | 2012-12-31 | 2016-12-28 | Tcl集团股份有限公司 | A kind of data maintaining method and system |
CN104077328B (en) * | 2013-03-29 | 2019-05-24 | 百度在线网络技术(北京)有限公司 | The operation diagnostic method and equipment of MapReduce distributed system |
CN103248625B (en) * | 2013-04-27 | 2016-09-14 | 北京京东尚科信息技术有限公司 | A kind of web crawlers operation exception monitoring method and system |
CN103605714B (en) * | 2013-11-14 | 2017-10-03 | 北京国双科技有限公司 | The recognition methods of website abnormal data and device |
CN103605735B (en) * | 2013-11-19 | 2017-11-21 | 北京国双科技有限公司 | website data analysis method and device |
CN104657392B (en) * | 2013-11-25 | 2020-02-11 | 腾讯科技(深圳)有限公司 | Method and device for realizing retrieval abnormity restoration |
CN104239197A (en) * | 2014-10-10 | 2014-12-24 | 浪潮电子信息产业股份有限公司 | Administrative user abnormal behavior detection method based on big data log analysis |
CN105930255B (en) * | 2015-10-16 | 2019-01-29 | 中国银联股份有限公司 | A kind of system health degree prediction technique and device |
CN106611023B (en) * | 2015-10-27 | 2020-11-24 | 北京国双科技有限公司 | Method and device for detecting website access abnormality |
CN105610616B (en) * | 2015-12-29 | 2019-04-26 | 赛尔网络有限公司 | The single IP average flow rate statistical method of access net and system based on ICP liveness |
CN107819727B (en) * | 2016-09-13 | 2020-11-17 | 腾讯科技(深圳)有限公司 | Network security protection method and system based on IP address security credit |
CN108255868B (en) * | 2016-12-29 | 2020-11-24 | 北京国双科技有限公司 | Method and device for checking links in website |
CN108270727A (en) * | 2016-12-30 | 2018-07-10 | 北京国双科技有限公司 | Abnormal data analysis method and device |
CN108459936B (en) * | 2017-02-20 | 2021-05-14 | 北京畅游时空软件技术有限公司 | Accurate statistical method and device based on content modularization |
CN109257196A (en) * | 2017-07-12 | 2019-01-22 | 阿里巴巴集团控股有限公司 | A kind of abnormality eliminating method and equipment |
CN107196968B (en) * | 2017-07-12 | 2020-10-20 | 深圳市活力天汇科技股份有限公司 | Crawler identification method |
CN107454083A (en) * | 2017-08-08 | 2017-12-08 | 四川长虹电器股份有限公司 | The method of anti-reptile |
CN107438079B (en) * | 2017-08-18 | 2020-05-01 | 杭州安恒信息技术股份有限公司 | Method for detecting unknown abnormal behaviors of website |
CN109560977A (en) * | 2017-09-25 | 2019-04-02 | 北京国双科技有限公司 | Web site traffic monitoring method, device, storage medium, processor and electronic equipment |
CN107707427B (en) * | 2017-09-28 | 2021-12-17 | 南华大学 | Website availability monitoring system |
CN109586942A (en) * | 2017-09-29 | 2019-04-05 | 北京国双科技有限公司 | Web site performance assessment method and device |
CN107809331B (en) * | 2017-10-25 | 2020-11-24 | 北京京东尚科信息技术有限公司 | Method and device for identifying abnormal flow |
CN107707574A (en) * | 2017-11-23 | 2018-02-16 | 四川长虹电器股份有限公司 | A kind of anti-reptile method based on the behavior of access |
CN107743135A (en) * | 2017-12-01 | 2018-02-27 | 江彩莲 | Flow monitoring method |
CN109120592A (en) * | 2018-07-09 | 2019-01-01 | 四川大学 | A kind of Web abnormality detection system based on user behavior |
CN110019987B (en) * | 2018-11-28 | 2023-05-09 | 创新先进技术有限公司 | Log matching method and device based on decision tree |
CN110008100B (en) * | 2019-03-08 | 2023-03-14 | 创新先进技术有限公司 | Method and device for detecting abnormal access volume of web page |
CN110852387B (en) * | 2019-11-13 | 2022-04-22 | 江苏能来能源互联网研究院有限公司 | Energy internet super real-time state studying and judging algorithm |
CN110969358A (en) * | 2019-12-04 | 2020-04-07 | 国网浙江省电力有限公司 | Risk control method for power electronic channel operation |
CN112989157A (en) * | 2019-12-13 | 2021-06-18 | 网宿科技股份有限公司 | Method and device for detecting crawler request |
CN111106959B (en) * | 2019-12-20 | 2022-10-14 | 贵州黔岸科技有限公司 | Abnormity monitoring and alarming system and method for transportation management system |
CN112019508A (en) * | 2020-07-28 | 2020-12-01 | 杭州安恒信息技术股份有限公司 | Method, system and electronic device for detecting DDos attack based on Web log analysis |
CN112039854A (en) * | 2020-08-13 | 2020-12-04 | 深圳市信锐网科技术有限公司 | Data transmission method, device and storage medium |
CN114285612B (en) * | 2021-12-14 | 2023-09-26 | 北京天融信网络安全技术有限公司 | Method, system, device, equipment and medium for detecting abnormal data |
CN114253811A (en) * | 2021-12-24 | 2022-03-29 | 深圳市盘古数据有限公司 | Intelligent monitoring method for data center system |
CN114546881B (en) * | 2022-03-22 | 2022-10-28 | 通号智慧城市研究设计院有限公司 | Application software testing method, electronic device and computer readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102169505A (en) * | 2011-05-16 | 2011-08-31 | 苏州两江科技有限公司 | Recommendation system building method based on cloud computing |
CN102393849A (en) * | 2011-07-18 | 2012-03-28 | 电子科技大学 | Web log data preprocessing method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080209030A1 (en) * | 2007-02-28 | 2008-08-28 | Microsoft Corporation | Mining Web Logs to Debug Wide-Area Connectivity Problems |
-
2012
- 2012-03-31 CN CN201210095037.8A patent/CN102724059B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102169505A (en) * | 2011-05-16 | 2011-08-31 | 苏州两江科技有限公司 | Recommendation system building method based on cloud computing |
CN102393849A (en) * | 2011-07-18 | 2012-03-28 | 电子科技大学 | Web log data preprocessing method |
Non-Patent Citations (2)
Title |
---|
"分布式多主题网络爬虫系统的研究与实现";白鹤等;《计算机工程》;20091030;第35卷(第19期);全文 * |
"基于Hadoop的Web日志预处理的设计与实现";宋莹等;《电信工程技术与标准化》;20111130;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN102724059A (en) | 2012-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102724059B (en) | Website operation state monitoring and abnormal detection based on MapReduce | |
US9590880B2 (en) | Dynamic collection analysis and reporting of telemetry data | |
CN103297435B (en) | A kind of abnormal access behavioral value method and system based on WEB daily record | |
CN106778253A (en) | Threat context aware information security Initiative Defense model based on big data | |
CN105243159A (en) | Visual script editor-based distributed web crawler system | |
CN105677842A (en) | Log analysis system based on Hadoop big data processing technique | |
CN107800591A (en) | A kind of analysis method of unified daily record data | |
CN103546326A (en) | Website traffic statistic method | |
CN105718587A (en) | Network content resource evaluation method and evaluation system | |
CN106407429A (en) | File tracking method, device and system | |
CN111259073A (en) | Intelligent business system running state studying and judging system based on logs, flow and business access | |
CN111459698A (en) | Database cluster fault self-healing method and device | |
CN103559203A (en) | Method, device and system for web page sorting | |
US20240095170A1 (en) | Multi-cache based digital output generation | |
Sujatha | Improved user navigation pattern prediction technique from web log data | |
CN109446441A (en) | A kind of credible distributed capture storage system of general Web Community | |
Liu et al. | Big Data architecture for IT incident management | |
Maske et al. | A real time processing and streaming of wireless network data using storm | |
Ganapathi et al. | Web analytics and the art of data summarization | |
Zhao et al. | Collecting, managing and analyzing social networking data effectively | |
Mary et al. | Performance enhancement in session identification | |
Tsai et al. | Object architected design and efficient dynamic adjustment mechanism of distributed web crawlers | |
Huang et al. | An improved referrer-based session identification algorithm using MapReduce | |
Kirci et al. | " Is my internet down?" sifting through user-affecting outages with Google trends | |
CN107145542A (en) | The high efficiency extraction subscription client ID method and system from URL |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20150311 Termination date: 20160331 |