CN106612216B - Method and device for detecting website access abnormality - Google Patents

Method and device for detecting website access abnormality Download PDF

Info

Publication number
CN106612216B
CN106612216B CN201510708785.2A CN201510708785A CN106612216B CN 106612216 B CN106612216 B CN 106612216B CN 201510708785 A CN201510708785 A CN 201510708785A CN 106612216 B CN106612216 B CN 106612216B
Authority
CN
China
Prior art keywords
information
characteristic information
client
preset
time period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510708785.2A
Other languages
Chinese (zh)
Other versions
CN106612216A (en
Inventor
祁国晟
裴松年
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510708785.2A priority Critical patent/CN106612216B/en
Publication of CN106612216A publication Critical patent/CN106612216A/en
Application granted granted Critical
Publication of CN106612216B publication Critical patent/CN106612216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications

Abstract

The invention discloses a method and a device for detecting website access abnormity. Wherein, the method comprises the following steps: acquiring one or more pieces of recorded characteristic information of a client of a visiting website, wherein the one or more pieces of characteristic information are used for describing the attribute of the client; acquiring the information gain rate of each characteristic information of each client in a preset time period; comparing the information gain rate of each piece of characteristic information of each client in a preset time period with a corresponding preset information gain rate threshold value to obtain a comparison result of each piece of characteristic information of each client; and determining whether the access abnormality occurs to the website according to the comparison result. The invention solves the technical problem of inaccurate detection of the website access abnormality in the prior art, and realizes the technical effect of accurately, simply and conveniently detecting whether the website access abnormality occurs.

Description

Method and device for detecting website access abnormality
Technical Field
The invention relates to the field of internet, in particular to a method and a device for detecting website access abnormity.
Background
Generally, when a website is visited, the website records the client information of the visitor. Wherein, the client information includes: browser type, screen resolution, geographic information, etc. In the process of operating or maintaining the website, whether the website is abnormally accessed can be judged through all or part of valuable client information.
At present, methods for judging whether a website has access abnormality or not through client information mainly include a statistical analysis method and a machine learning method. The statistical analysis method is greatly influenced by abnormal information of a single client, so that the problem of low accuracy of detection results exists; the machine learning method needs to solve the convex quadratic programming problem, and has the problem of high difficulty in obtaining detection results due to high time complexity, large data volume and long time consumption.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a method and a device for detecting website access abnormity, which at least solve the technical problem of inaccurate detection of the website access abnormity in the prior art.
According to an aspect of the embodiments of the present invention, there is provided a method for detecting website access abnormality, the method including: acquiring one or more pieces of recorded characteristic information of a client of a visiting website, wherein the one or more pieces of characteristic information are used for describing the attribute of the client; acquiring the information gain rate of each characteristic information of each client in a preset time period; comparing the information gain rate of each piece of characteristic information of each client in a preset time period with a corresponding preset information gain rate threshold value to obtain a comparison result of each piece of characteristic information of each client; and determining whether the website is abnormally accessed according to the comparison result.
Further, comparing the information gain rate of each piece of feature information of each client in a preset time period with a corresponding preset information gain rate threshold, and obtaining a comparison result of each piece of feature information of each client includes: if the information gain rate of the characteristic information in a preset time period is greater than the corresponding preset information gain rate threshold, obtaining a first comparison parameter, wherein the first comparison parameter is used for indicating that the characteristic information is abnormal; and if the information gain rate of the characteristic information in a preset time period is not greater than the corresponding preset information gain rate threshold, obtaining a second comparison parameter, wherein the second comparison parameter is used for indicating that the characteristic information is normal.
Further, determining whether the website is abnormally accessed according to the comparison result includes: summarizing comparison results of information gain rates of all the characteristic information of all the clients accessing the website; counting the comparison result of the information gain rate of one or more preset characteristic information in the one or more preset characteristic information obtained by gathering to obtain a statistical result; judging whether the statistical result is larger than a preset value or not; and if the statistical result is not greater than the preset value, determining that the access abnormality of the website does not occur.
Further, counting a comparison result of information gain rates of one or more preset feature information in the one or more preset feature information obtained by the summary, wherein the obtaining of the statistical result includes: summing and calculating parameter values corresponding to the comparison result of the information gain rates of the preset one or more pieces of characteristic information to obtain a calculation result; acquiring the feature number of the preset one or more feature information; and calculating the ratio of the calculated result to the characteristic number to obtain the statistical result.
Further, obtaining an information gain rate of each piece of characteristic information of each client in a preset time period includes: acquiring the entropy value of each piece of characteristic information of each client in the preset time period and the entropy value of each piece of characteristic information of each client in a historical time period; by passing
Figure BDA0000831712380000021
Obtaining an information gain rate of each piece of feature information of each client in the preset time period, wherein G is the information gain rate of each piece of feature information of each client in the preset time period, and S1Entropy value of each characteristic information of each client in the preset time period, the S2Entropy values of each characteristic information of each client in the historical time periods.
According to another aspect of the embodiments of the present invention, there is also provided a device for detecting website access abnormality, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring one or more pieces of recorded characteristic information of a client of a visiting website, and the one or more pieces of characteristic information are used for describing the attribute of the client; the second acquisition module is used for acquiring the information gain rate of each piece of characteristic information of each client in a preset time period; the comparison module is used for comparing the information gain rate of each piece of characteristic information of each client in a preset time period with the corresponding preset information gain rate threshold value to obtain the comparison result of each piece of characteristic information of each client; and the determining module is used for determining whether the website is abnormally accessed according to the comparison result.
Further, the comparing module comprises: a first obtaining sub-module, configured to obtain a first comparison parameter if an information gain rate of the feature information in a preset time period is greater than a corresponding preset information gain rate threshold, where the first comparison parameter is used to indicate that the feature information is abnormal; and a second obtaining submodule, configured to obtain a second comparison parameter if an information gain rate of the feature information in a preset time period is not greater than a corresponding preset information gain rate threshold, where the second comparison parameter is used to indicate that the feature information is normal.
Further, the determining module includes: a summarization sub-module, configured to summarize comparison results of information gain ratios of the feature information of all the clients accessing the website; the statistic submodule is used for counting a comparison result of information gain rates of one or more preset characteristic information in the one or more preset characteristic information obtained through gathering to obtain a statistic result; the judgment submodule is used for judging whether the statistical result is larger than a preset value or not; and the determining submodule is used for determining that the access abnormality occurs to the website if the statistical result is greater than the preset value, and determining that the access abnormality does not occur to the website if the statistical result is not greater than the preset value.
Further, the statistical submodule includes: the first calculation submodule is used for summing and calculating parameter values corresponding to the comparison result of the information gain rates of the preset one or more pieces of characteristic information to obtain a calculation result; a third obtaining submodule, configured to obtain a feature number of the preset one or more feature information; and the second calculation submodule is used for calculating the ratio of the calculation result to the number of the features to obtain the statistical result.
Further, the second obtaining module includes: a fourth obtaining submodule, configured to obtain an entropy value of each piece of feature information of each client in the preset time period and an entropy value of each piece of feature information of each client in a historical time period; a fifth obtaining submodule for passing
Figure BDA0000831712380000031
Obtaining an information gain rate of each characteristic information of each client in the preset time period, wherein G is each characteristic information of each clientInformation gain rate of the predetermined time period, S1Entropy value of each characteristic information of each client in the preset time period, the S2Entropy values of each characteristic information of each client in the historical time periods.
In the embodiment of the invention, the purpose of determining whether the website is abnormally accessed according to the comparison result of each piece of characteristic information is achieved by adopting a mode of acquiring one or more pieces of recorded characteristic information of the client side for accessing the website and acquiring the information gain rate of each piece of characteristic information of each client side in a preset time period and comparing the information gain rate of each piece of characteristic information of each client side in the preset time period with the corresponding preset information gain rate threshold value, so that the technical effect of accurately and simply detecting whether the website is abnormally accessed is realized, and the technical problem of inaccurate detection of the website access abnormality in the prior art is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of an alternative method for detecting website access anomalies according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an alternative website access abnormality detection apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In accordance with an embodiment of the present invention, there is provided an embodiment of a method for detecting website access anomalies, it is noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system, such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Example 1
Fig. 1 is a flowchart of a method for detecting website access abnormality according to an embodiment of the present invention, and as shown in fig. 1, the method may include the following steps:
step S102, one or more pieces of recorded characteristic information of the client side of the access website are obtained, wherein the one or more pieces of characteristic information are used for describing the attribute of the client side;
step S104, acquiring the information gain rate of each characteristic information of each client in a preset time period;
step S106, comparing the information gain rate of each piece of characteristic information of each client in a preset time period with a corresponding preset information gain rate threshold value to obtain a comparison result of each piece of characteristic information of each client;
and step S108, determining whether the website is abnormally accessed according to the comparison result.
In the embodiment of the invention, the purpose of determining whether the website is abnormally accessed according to the comparison result of each piece of characteristic information is achieved by adopting a mode of acquiring one or more pieces of recorded characteristic information of the client side for accessing the website and acquiring the information gain rate of each piece of characteristic information of each client side in a preset time period and comparing the information gain rate of each piece of characteristic information of each client side in the preset time period with the corresponding preset information gain rate threshold value, so that the technical effect of accurately and simply detecting whether the website is abnormally accessed is realized, and the technical problem of inaccurate detection of the website access abnormality in the prior art is solved.
The preset time period may be a certain time range selected manually or a preset time range. The information gain ratio is a probabilistic term that reflects the degree of difference between two probability distributions. In the present embodiment, the information gain ratio reflects a degree of difference between a probability distribution of a certain event indicated by certain characteristic information at a current time period with respect to a probability distribution of the event indicated by the characteristic information at a history time period. Different from the information gain, the difference is reflected by a relative value, and the information gain rate reflects the difference by an absolute value, so that the result of the difference reflected by the information gain rate is more objective, and the defect of overfitting existing in the information gain is solved as a compensation measure.
Alternatively, as shown in table 1, the characteristic information of the client of a certain website may be network operator information, geographical location information, device type information, browser type information, and screen resolution information, and typically, when a visitor enables the client to visit the website, the website server records and stores the characteristic information of each client in a classified manner in real time, so that the website operation and maintenance manager or technical support staff can retrieve the characteristic information when needed.
TABLE 1
Network operator Geographic location Type of device Browser type Screen resolution
Client A Shanxi movement Shanxi Taiyuan Mobile phone Leopard browser 640*1136
Client B Sichuan telecommunication All of Sichuan Tablet personal computer Firefox browser 2048*1536
Client C Lake north link Wuhan Hubei Notebook computer 360 browser 1366*768
Client D China netcom Xi' an of Shaanxi Desk type computer IE browser 1600*900
In table 1, 5 pieces of feature information of 4 clients accessing the website recorded by the website are listed, where the 4 clients include client a, client B, client C, and client D. The 5 feature information includes network operator information, geographical location information, device type information, browser type information, and screen resolution information. Taking the device type information as an example, detecting the website at a certain day, and finding that the website records the following specific information:
the website records the device type information of 4 clients above the current day, and the information comprises: client A uses the mobile phone to access the website on the current day, client B uses the mobile phone to access the website on the current day, client C uses the mobile phone to access the website on the current day, and client D uses the mobile phone to access the website on the current day.
The website records the device type information of 4 clients above the historical time period before the current day, and the information comprises: the device type commonly used when the client A accesses the website in the historical time period is a mobile phone, the device type commonly used when the client B accesses the website in the historical time period is a tablet computer, the device type commonly used when the client C accesses the website in the historical time period is a notebook computer, the device type commonly used when the client D accesses the website in the historical time period is a desktop computer, and the 4 clients do not have an event that the client A accesses the website by using the devices of the same type on the same day in the historical time period.
In the above embodiment, since an event that occurs on the same day as "4 clients access a website using a device of the same type on the same day" is an opponent event to an event that "4 clients do not access a website using a device of the same type on the same day in a historical time period", the event occurring on the same day may be an abnormal event occurring when accessing a website, that is, an access abnormality occurs to a website.
It should be noted that the characteristic information of each client in table 1 is only an exemplary description, and does not constitute a specific limitation to the solution described in the present embodiment. In addition, the rest 4 pieces of feature information except for the "device type information" in table 1 are similar to the "device type information", and thus are not described in detail.
Optionally, the comparing the information gain rate of each piece of feature information of each client in a preset time period with the corresponding preset information gain rate threshold, and obtaining the comparison result of each piece of feature information of each client includes: if the information gain rate of the characteristic information in a preset time period is greater than a corresponding preset information gain rate threshold value, obtaining a first comparison parameter, wherein the first comparison parameter is used for indicating that the characteristic information is abnormal; and if the information gain rate of the characteristic information in the preset time period is not greater than the corresponding preset information gain rate threshold value, obtaining a second comparison parameter, wherein the second comparison parameter is used for indicating that the characteristic information is normal.
For example, the preset information gain rate threshold of a certain website is a uniquely set numerical value α, different comparison results are obtained by comparing the information gain rate of each piece of characteristic information of each client in a preset time period with the numerical value α, the comparison results can be embodied in a parameter form, for example, if the information gain rate of the characteristic information in the preset time period is greater than the numerical value α, a first comparison parameter "1" is obtained, the first comparison parameter "1" is used for indicating that the characteristic information is abnormal, and if the information gain rate of the characteristic information in the preset time period is not greater than the numerical value α, a second comparison parameter "0" is obtained, and the second comparison parameter "0" is used for indicating that the characteristic information is normal.
Optionally, determining whether the access abnormality occurs to the website according to the comparison result of the feature information includes: summarizing comparison results of information gain rates of all characteristic information of all clients accessing a website; counting the comparison result of the information gain rate of one or more preset characteristic information in the one or more preset characteristic information obtained by gathering to obtain a statistical result; judging whether the statistical result is larger than a preset value or not; and if the statistical result is not greater than the preset value, determining that the access abnormality occurs to the website.
The method comprises the steps of summarizing comparison results of information gain rates of all characteristic information of all clients, wherein the comparison results are expressed in a set form, each comparison result has a one-to-one correspondence relation with elements in the set, the elements in the set can be different parameters with parameter values of 0 or 1, for example, 2 clients visit a website at a certain moment, the current strategy of the website is to record 5 characteristic information of each client in real time, when the available parameter value is 0 or 1, the comparison result of the information gain rate of each characteristic information of one client recorded by the website is {0, 1, 0, 1, 1}, when the available parameter value is 0 or 1', the comparison result of the information gain rate of each characteristic information of another client recorded by the website is {1, 0, 0, 0, 1, 1, 1}, if the summary parameter set A is a number of {0, 1, 0, 1, 1} which is known by the set A, if the comparison result of 10 characteristic information detected by the website is detected, if the comparison result is not greater than a number of the parameter gain rate of the parameter set is a preset value of a gamma < n > 356, and if the comparison result is not greater than a number of the parameter value is greater than a preset value of the parameter gain rate is not greater than a preset value of the parameter set, wherein the number is determined by the parameter is 366, and the number of the parameter is not greater than a preset statistic parameter is used for determining that the parameter is not greater than a statistic parameter set.
Optionally, counting a comparison result of information gain rates of one or more preset pieces of feature information in the one or more pieces of feature information obtained through aggregation, where obtaining the statistical result includes: summing and calculating parameter values corresponding to the comparison result of the information gain rates of one or more preset characteristic information to obtain a calculation result; acquiring the feature number of one or more preset feature information; and calculating the ratio of the calculation result to the number of the features to obtain a statistical result.
For example, when the available parameter value is "0" or "1", a set a reflecting whether or not a plurality of pieces of feature information of a certain website are abnormal is {0, 1, 0, 1, 1, 1, 0, 0, 1, 1}, the total of all parameter values in the set a is calculated to obtain a value "6", and then, when the number of features from which a plurality of pieces of feature information can be obtained by the set is "10", the statistical result β is 6/10 is 0.6.
Optionally, the obtaining an information gain rate of each piece of feature information of each client in a preset time period includes: acquiring an entropy value of each piece of characteristic information of each client in a preset time period and an entropy value of each piece of characteristic information of each client in a historical time period; by passing
Figure BDA0000831712380000071
Obtaining the information gain rate of each piece of characteristic information of each client in a preset time period, wherein G is the information gain rate of each piece of characteristic information of each client in the preset time period, and S1Entropy value, S, for each characteristic information of each client in a preset time period2Entropy values for each characteristic information of each client over historical time periods.
Wherein S is1As entropy of information, S2Is a conditional entropy. S1-S2The information gain represents the degree of random uncertainty removal of the characteristic information. But the simple information gain is only a relative value and depends on the magnitude of the conditional entropy. Therefore, the present embodiment uses the information gain ratio as a more objective metric than the information gain.
In the embodiment of the invention, the purpose of determining whether the website is abnormally accessed according to the comparison result of each piece of characteristic information is achieved by adopting a mode of acquiring one or more pieces of recorded characteristic information of the client side for accessing the website and acquiring the information gain rate of each piece of characteristic information of each client side in a preset time period and comparing the information gain rate of each piece of characteristic information of each client side in the preset time period with the corresponding preset information gain rate threshold value, so that the technical effect of accurately and simply detecting whether the website is abnormally accessed is realized, and the technical problem of inaccurate detection of the website access abnormality in the prior art is solved.
Example 2
According to an embodiment of the present application, there is further provided a device for detecting website access abnormality, as shown in fig. 2, the device may include: a first acquisition module 22, a second acquisition module 24, a comparison module 26, and a determination module 28.
The first obtaining module 22 is configured to obtain one or more pieces of recorded feature information of a client accessing a website, where the one or more pieces of feature information are used to describe an attribute of the client;
a second obtaining module 24, configured to obtain an information gain rate of each piece of feature information of each client in a preset time period;
the comparison module 26 is configured to compare an information gain rate of each piece of feature information of each client in a preset time period with a corresponding preset information gain rate threshold, so as to obtain a comparison result of each piece of feature information of each client;
and the determining module 28 is used for determining whether the access abnormality occurs to the website according to the comparison result.
The preset time period may be a certain time range selected manually or a preset time range. The information gain ratio is a probabilistic term that reflects the degree of difference between two probability distributions. In the present embodiment, the information gain ratio reflects a degree of difference between a probability distribution of a certain event indicated by certain characteristic information at a current time period with respect to a probability distribution of the event indicated by the characteristic information at a history time period.
Optionally, as shown in table 1, the characteristic information of the client of a certain website acquired by the first acquiring module 22 may be network operator information, geographic location information, device type information, browser type information, and screen resolution information, and generally, when a visitor enables the client to visit the website, the website server may record and store the characteristic information of each client in a classified manner in real time, so that the website operation and maintenance manager or technical support staff can call the characteristic information when needed.
Optionally, the comparison module 26 comprises: the first obtaining submodule is used for obtaining a first comparison parameter if the information gain rate of the characteristic information in a preset time period is greater than a corresponding preset information gain rate threshold value, wherein the first comparison parameter is used for indicating that the characteristic information is abnormal; and the second obtaining submodule is used for obtaining a second comparison parameter if the information gain rate of the characteristic information in the preset time period is not greater than the corresponding preset information gain rate threshold, wherein the second comparison parameter is used for indicating that the characteristic information is normal.
For example, the preset information gain rate threshold of a certain website is a uniquely set numerical value α, different comparison results are obtained by comparing the information gain rate of each piece of characteristic information of each client in a preset time period with the numerical value α, the comparison results can be embodied in a parameter form, for example, if the information gain rate of the characteristic information in the preset time period is greater than the numerical value α, a first comparison parameter "1" is obtained, the first comparison parameter "1" is used for indicating that the characteristic information is abnormal, and if the information gain rate of the characteristic information in the preset time period is not greater than the numerical value α, a second comparison parameter "0" is obtained, and the second comparison parameter "0" is used for indicating that the characteristic information is normal.
Optionally, the determining module 28 comprises: the summarizing submodule is used for summarizing comparison results of information gain rates of all characteristic information of all clients accessing the website; the statistic submodule is used for counting a comparison result of information gain rates of one or more preset characteristic information in the one or more preset characteristic information obtained through gathering to obtain a statistic result; the judgment submodule is used for judging whether the statistical result is larger than a preset value or not; and the determining submodule is used for determining that the access abnormality occurs to the website if the statistical result is greater than the preset value, and determining that the access abnormality does not occur to the website if the statistical result is not greater than the preset value.
The method comprises the steps of summarizing comparison results of information gain rates of all characteristic information of all clients, wherein the comparison results are expressed in a set form, each comparison result has a one-to-one correspondence relation with elements in the set, the elements in the set can be different parameters with parameter values of 0 or 1, for example, 2 clients visit a website at a certain moment, the current strategy of the website is to record 5 characteristic information of each client in real time, when the available parameter value is 0 or 1, the comparison result of the information gain rate of each characteristic information of one client recorded by the website is {0, 1, 0, 1, 1}, when the available parameter value is 0 or 1', the comparison result of the information gain rate of each characteristic information of another client recorded by the website is {1, 0, 0, 1, 1, 1}, the summarized parameter set A is {0, 1, 0, 0, 1, 1, 1}, as can be known by the set A, the detection result is that whether the comparison result of 10 characteristic information detected by the website is abnormal, if the comparison result is larger than a preset parameter gain rate of the parameter value of the website is more than a preset parameter value of a 365, and whether the comparison result is larger than a preset parameter value of the parameter gain rate is used for determining that the parameter is not larger than a corresponding to be more than gamma < gamma > a parameter is not larger than a specific statistic parameter value of a preset parameter value, wherein the parameter is used for determining that the parameter is not larger than a statistic parameter is a statistic parameter of the parameter is a 365.
Optionally, the statistics submodule includes: the first calculation submodule is used for summing parameter values corresponding to comparison results of information gain rates of one or more preset characteristic information to obtain calculation results; the third obtaining submodule is used for obtaining the feature number of one or more preset feature information; and the second calculation submodule is used for calculating the ratio of the calculation result to the number of the features to obtain a statistical result.
For example, when the available parameter value is "0" or "1", a set a reflecting whether or not a plurality of pieces of feature information of a certain website are abnormal is {0, 1, 0, 1, 1, 1, 0, 0, 1, 1}, the total of all parameter values in the set a is calculated to obtain a value "6", and then, when the number of features from which a plurality of pieces of feature information can be obtained by the set is "10", the statistical result β is 6/10 is 0.6.
Optionally, the second obtaining module includes: the fourth obtaining submodule is used for obtaining the entropy value of each piece of characteristic information of each client in a preset time period and the entropy value of each piece of characteristic information of each client in a historical time period; a fifth obtaining submodule for passing
Figure BDA0000831712380000101
Obtaining the information gain rate of each piece of characteristic information of each client in a preset time period, wherein G is the information gain rate of each piece of characteristic information of each client in the preset time period, and S1Entropy value, S, for each characteristic information of each client in a preset time period2Entropy values for each characteristic information of each client over historical time periods.
Wherein S is1As entropy of information, S2Is conditional entropy, S1-S2The information gain represents the degree of random uncertainty removal of the characteristic information. But the simple information gain is only a relative value and depends on the magnitude of the conditional entropy. Therefore, the present embodiment uses the information gain ratio as a more objective metric than the information gain.
In the embodiment of the invention, the purpose of determining whether the website is abnormally accessed according to the comparison result of each piece of characteristic information is achieved by adopting a mode of acquiring one or more pieces of recorded characteristic information of the client side for accessing the website and acquiring the information gain rate of each piece of characteristic information of each client side in a preset time period and comparing the information gain rate of each piece of characteristic information of each client side in the preset time period with the corresponding preset information gain rate threshold value, so that the technical effect of accurately and simply detecting whether the website is abnormally accessed is realized, and the technical problem of inaccurate detection of the website access abnormality in the prior art is solved.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (6)

1. A method for detecting website access abnormality is characterized by comprising the following steps:
acquiring one or more pieces of recorded characteristic information of a client of a visiting website, wherein the one or more pieces of characteristic information are used for describing the attribute of the client;
acquiring the information gain rate of each characteristic information of each client in a preset time period;
comparing the information gain rate of each piece of characteristic information of each client in a preset time period with a corresponding preset information gain rate threshold value to obtain a comparison result of each piece of characteristic information of each client;
determining whether the website is abnormally accessed according to the comparison result;
wherein, determining whether the website is abnormally accessed according to the comparison result comprises: summarizing comparison results of information gain rates of all characteristic information of all the clients accessing the website; counting the comparison result of the information gain rate of one or more preset characteristic information in the one or more preset characteristic information obtained by gathering to obtain a statistical result; judging whether the statistical result is larger than a preset value or not; if the statistical result is larger than the preset value, determining that the access abnormality occurs to the website, and if the statistical result is not larger than the preset value, determining that the access abnormality does not occur to the website;
comparing the information gain rate of each piece of characteristic information of each client in a preset time period with a corresponding preset information gain rate threshold value, and obtaining a comparison result of each piece of characteristic information of each client comprises: if the information gain rate of the characteristic information in a preset time period is greater than the corresponding preset information gain rate threshold, obtaining a first comparison parameter, wherein the first comparison parameter is used for indicating that the characteristic information is abnormal; and if the information gain rate of the characteristic information in a preset time period is not greater than the corresponding preset information gain rate threshold, obtaining a second comparison parameter, wherein the second comparison parameter is used for indicating that the characteristic information is normal.
2. The detection method according to claim 1, wherein counting a comparison result of information gain rates of one or more preset pieces of feature information among the one or more pieces of feature information obtained by the aggregation, and obtaining the statistical result comprises:
summing the parameter values corresponding to the comparison result of the information gain ratios of the preset one or more pieces of characteristic information to obtain a calculation result;
acquiring the feature number of the preset one or more feature information;
and calculating the ratio of the calculation result to the number of the features to obtain the statistical result.
3. The detection method according to claim 1, wherein obtaining an information gain rate of each characteristic information of each client in a preset time period comprises:
acquiring an entropy value of each piece of characteristic information of each client in the preset time period and an entropy value of each piece of characteristic information of each client in a historical time period;
by passingObtaining an information gain rate of each piece of characteristic information of each client in the preset time period, wherein G is the information gain rate of each piece of characteristic information of each client in the preset time period, and S1Entropy value of each characteristic information of each client in the preset time period, S2Entropy values for each characteristic information of each client over the historical time period.
4. An apparatus for detecting abnormality in website access, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring one or more pieces of recorded characteristic information of a client of a visiting website, and the one or more pieces of characteristic information are used for describing the attribute of the client;
the second acquisition module is used for acquiring the information gain rate of each piece of characteristic information of each client in a preset time period;
the comparison module is used for comparing the information gain rate of each piece of characteristic information of each client in a preset time period with a corresponding preset information gain rate threshold value to obtain a comparison result of each piece of characteristic information of each client;
the determining module is used for determining whether the website is abnormally accessed according to the comparison result;
wherein the determining module comprises: the summarizing submodule is used for summarizing comparison results of information gain rates of all the characteristic information of all the clients accessing the website; the statistic submodule is used for counting a comparison result of information gain rates of one or more preset characteristic information in the one or more preset characteristic information obtained through gathering to obtain a statistic result; the judgment submodule is used for judging whether the statistical result is larger than a preset value or not; the determining submodule is used for determining that the access abnormality occurs to the website if the statistical result is larger than the preset value, and determining that the access abnormality does not occur to the website if the statistical result is not larger than the preset value;
the comparison module comprises: the first obtaining submodule is used for obtaining a first comparison parameter if the information gain rate of the characteristic information in a preset time period is greater than the corresponding preset information gain rate threshold, wherein the first comparison parameter is used for indicating that the characteristic information is abnormal; and the second obtaining submodule is used for obtaining a second comparison parameter if the information gain rate of the characteristic information in a preset time period is not greater than the corresponding preset information gain rate threshold, wherein the second comparison parameter is used for indicating that the characteristic information is normal.
5. The detection device of claim 4, wherein the statistics submodule comprises:
the first calculation submodule is used for summing and calculating parameter values corresponding to the comparison result of the information gain rates of the preset one or more pieces of characteristic information to obtain a calculation result;
the third obtaining submodule is used for obtaining the feature number of the preset feature information or the preset feature information;
and the second calculation submodule is used for calculating the ratio of the calculation result to the number of the features to obtain the statistical result.
6. The detection apparatus according to claim 4, wherein the second acquisition module comprises:
a fourth obtaining submodule, configured to obtain an entropy value of each piece of feature information of each client in the preset time period and an entropy value of each piece of feature information of each client in a historical time period;
a fifth obtaining submodule for passing
Figure FDA0002262571820000031
Obtaining an information gain rate of each piece of characteristic information of each client in the preset time period, wherein G is the information gain rate of each piece of characteristic information of each client in the preset time period, and S1Entropy value of each characteristic information of each client in the preset time period, S2Entropy values for each characteristic information of each client over the historical time period.
CN201510708785.2A 2015-10-27 2015-10-27 Method and device for detecting website access abnormality Active CN106612216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510708785.2A CN106612216B (en) 2015-10-27 2015-10-27 Method and device for detecting website access abnormality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510708785.2A CN106612216B (en) 2015-10-27 2015-10-27 Method and device for detecting website access abnormality

Publications (2)

Publication Number Publication Date
CN106612216A CN106612216A (en) 2017-05-03
CN106612216B true CN106612216B (en) 2020-02-07

Family

ID=58614489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510708785.2A Active CN106612216B (en) 2015-10-27 2015-10-27 Method and device for detecting website access abnormality

Country Status (1)

Country Link
CN (1) CN106612216B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107281755B (en) * 2017-07-14 2020-05-05 网易(杭州)网络有限公司 Detection model construction method and device, storage medium and terminal
CN107438079B (en) * 2017-08-18 2020-05-01 杭州安恒信息技术股份有限公司 Method for detecting unknown abnormal behaviors of website
CN108959493A (en) * 2018-06-25 2018-12-07 阿里巴巴集团控股有限公司 Detection method, device and the equipment of Indexes Abnormality fluctuation
CN109040295B (en) * 2018-08-30 2021-07-20 上海九山电子科技有限公司 Method and device for determining abnormal disconnection, terminal and storage medium
CN109146574A (en) * 2018-09-06 2019-01-04 深圳市木瓜移动科技有限公司 Ad click cheating monitoring method and device
CN110401636A (en) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 A kind of big data air control method and apparatus for supervising abnormal access
CN111510340B (en) * 2020-03-10 2021-12-28 北京三快在线科技有限公司 Access request detection method and device, electronic equipment and readable storage medium
CN112188291B (en) * 2020-09-24 2022-11-29 北京明略昭辉科技有限公司 Method and device for identifying advertisement position abnormity
CN113840157B (en) * 2021-09-23 2023-07-18 上海哔哩哔哩科技有限公司 Access detection method, system and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102130800A (en) * 2011-04-01 2011-07-20 苏州赛特斯网络科技有限公司 Device and method for detecting network access abnormality based on data stream behavior analysis
CN103117893A (en) * 2013-01-22 2013-05-22 北京奇虎科技有限公司 Monitor method and device of network accessing behaviour and client device
CN103684885A (en) * 2013-12-31 2014-03-26 新浪网技术(中国)有限公司 Method and device for determining web server accessing abnormities
CN103944757A (en) * 2014-04-11 2014-07-23 珠海市君天电子科技有限公司 Network anomaly detecting method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102130800A (en) * 2011-04-01 2011-07-20 苏州赛特斯网络科技有限公司 Device and method for detecting network access abnormality based on data stream behavior analysis
CN103117893A (en) * 2013-01-22 2013-05-22 北京奇虎科技有限公司 Monitor method and device of network accessing behaviour and client device
CN103684885A (en) * 2013-12-31 2014-03-26 新浪网技术(中国)有限公司 Method and device for determining web server accessing abnormities
CN103944757A (en) * 2014-04-11 2014-07-23 珠海市君天电子科技有限公司 Network anomaly detecting method and device

Also Published As

Publication number Publication date
CN106612216A (en) 2017-05-03

Similar Documents

Publication Publication Date Title
CN106612216B (en) Method and device for detecting website access abnormality
CN108833184B (en) Service fault positioning method and device, computer equipment and storage medium
US20170371757A1 (en) System monitoring method and apparatus
CN110381151B (en) Abnormal equipment detection method and device
CN108038500B (en) Clustering method, apparatus, computer device, storage medium, and program product
CN106936778B (en) Method and device for detecting abnormal website traffic
CN108366012B (en) Social relationship establishing method and device and electronic equipment
CN104427519B (en) IP address ownership place management method and device
CN107820209B (en) Interest recommendation method and device and server
US11184449B2 (en) Network-based probabilistic device linking
WO2016145993A1 (en) Method and system for user device identification
CN106611023B (en) Method and device for detecting website access abnormality
CN108509634A (en) Jitterbug monitoring method, monitoring device and computer readable storage medium
WO2018130284A1 (en) Anomaly detection of media event sequences
CN106202280A (en) A kind of information processing method and server
CN107682345A (en) Detection method, detection means and the electronic equipment of IP address
EP3494525B1 (en) Realtime busyness for places
CN113726783A (en) Abnormal IP address identification method and device, electronic equipment and readable storage medium
CN111612085B (en) Method and device for detecting abnormal points in peer-to-peer group
CN110516752A (en) Clustering cluster method for evaluating quality, device, equipment and storage medium
CN107277624B (en) Duration calculation method and device
CN110851758B (en) Webpage visitor quantity counting method and device
CN104484389A (en) Method and system for discovering maximum likelihood geographic position of internet users
Weiß Fully observed INAR (1) processes
CN109598525B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant