CN110708339B - Correlation analysis method based on WEB log - Google Patents
Correlation analysis method based on WEB log Download PDFInfo
- Publication number
- CN110708339B CN110708339B CN201911076385.9A CN201911076385A CN110708339B CN 110708339 B CN110708339 B CN 110708339B CN 201911076385 A CN201911076385 A CN 201911076385A CN 110708339 B CN110708339 B CN 110708339B
- Authority
- CN
- China
- Prior art keywords
- interface
- access
- logs
- group
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1466—Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
Abstract
The invention discloses a correlation analysis method based on WEB logs, which comprises the following steps: the method comprises the steps of carrying out standardized processing on log data, obtaining event behavior chains in logs, and obtaining access characteristic attributes of a group by calling bearing relation probability through a statistical interface; and calculating the similarity matching between the event behavior chain of the user and the group to obtain the total abnormal behavior score. According to the method, through key log correlation analysis, the similarity of the user actual interface access behavior chain and the behavior chain of the group is matched, abnormal behaviors can be identified accurately in a targeted manner, and a system administrator can be informed in time; the analyzed data is web access logs, large concurrency and cross relation exist, direct time line sequence relation is abandoned, classification is established by using field identification in log data, abnormal relation is distinguished by using individual-group comparison, and the applicability is wider.
Description
Technical Field
The invention relates to the technical field of log security analysis, in particular to a correlation analysis method based on WEB logs.
Background
With the development of Web technology, the Web2.0 is born, the advantage of convenient WEB application deployment and maintenance is gradually reflected, Internet applications based on a Web environment are more and more extensive, various information applications of enterprises are erected on a Web platform, the rapid development of Web services also causes strong attention of hackers, Web security threats are also followed, and hackers acquire the control authority of a Web server by using system bugs, SQL injection bugs and other modes of Web service programs, slightly tamper the content of a webpage, seriously steal important internal data, and more seriously implant some malicious codes in the webpage, so that other visitors of the website are all infringed. The Web access log records various kinds of original information such as a processing request received by the Web server and a runtime error. Through security analysis of the WEB log, the method not only can help us to locate attackers, but also can help us to restore attack paths, find security vulnerabilities existing in websites and repair the vulnerabilities. In the existing log analysis system, by extracting Web access log information, it is possible to clearly know what IP, what time, what operating system, what browser, and which page of your website a user has accessed, whether the access was successful, and other information. The method has the defects that whether a single log contains abnormal access problems or attack behaviors is analyzed independently, the incidence relation among the logs is not analyzed, and the condition that a plurality of requests are combined to attack a system cannot be identified.
Disclosure of Invention
The invention aims to provide a correlation analysis method based on WEB logs, which is used for solving the problem that whether a single log is abnormal or not is independently analyzed in the prior art, and the correlation between the logs is not analyzed, so that a plurality of requests cannot be identified to jointly attack a system.
The invention solves the problems through the following technical scheme:
a correlation analysis method based on WEB logs comprises the following steps:
step S100: standardized processing of log data
Each session between the browser and the server has a session, which is a unique location identifier for identifying the session and the user agent. The log data are collected by using the WEB server and the filter script, and all fields transmitted during interface access can be intercepted, wherein the fields comprise the inherent data fields needing to be collected: url of access, access time, requester and returner, and converts these content assertions into json formatted data. According to the access flow, one session is taken as a basic unit, namely, the session is grouped, and the log data in the single session is grouped into one group;
step S200: obtaining event behavior chains in logs
Event action chain, i.e. the current user accesses the sequential list of all interfaces in one session. The logs to be analyzed are grouped according to the session, an access interface path urlPath, an access interface method and an access time timestamp of each log are extracted from each group of logs, and the logs are sequenced according to the access time timestamp to serve as a complete event behavior chain;
step S300: statistical interface call acceptance relation probability
And analyzing the grouped and sequenced log data, and counting to obtain the next interface with the highest probability of accessing the interface after each interface is accessed, wherein the first N interfaces are obtained. The specific method comprises the following steps:
and acquiring the next calling interface accessed by each interface, adding the next calling interface into the list if the next calling interface is newly appeared, setting the number of appearance times to be 1, and adding 1 to the number of appearance times if the next calling interface is repeatedly appeared.
Obtaining the first N with more times, wherein the access characteristic attributes of the group can be obtained by the operation, and the storage format is as follows:
{
CurrentInterface: "Current interface information"
nextInterfaceList:[nextInterface1,nextInterface2,…nextInterfaceN]
}
Obtaining access characteristic attributes of the group;
step S400: similarity matching of event behavior chains
Calculating the event behavior chain data of a single session of a user, sequentially extracting each interface and the next interface thereof, and storing the data into a relation object relationship, wherein the basic format is as follows:
{
CurrentInterface: "current interface information",
NextInterface: "next interface information"
};
Matching each relationship instance to the access characteristic attribute of the group in turn:
if nextjnterface is in nextjnterface list, i.e., returns a threat score of 0,
if NextInterface is not in nextInterfaceList, the threat score is 1;
and accumulating all threat scores obtained by matching to obtain a total abnormal behavior score.
According to the invention, through the correlation analysis among the web access logs, whether a user contains a behavior threatening the web application system in one session access behavior can be more accurately judged, and the data assets can be better protected.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) according to the method, through key log correlation analysis, the similarity of the user actual interface access behavior chain and the behavior chain of the group is matched, and abnormal behaviors can be identified accurately in a targeted manner and can be notified to a system administrator in time.
(2) The data analyzed by the method is web access logs, larger concurrency and cross relation exist, direct time line sequence relation is abandoned, classification is established by using field identification in log data, abnormal relation is distinguished by using comparison of individuals and groups, and the method is wider in applicability.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a system framework diagram of log collection in the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto.
Example 1:
referring to fig. 1, a method for analyzing association based on a WEB log includes the following steps:
1) log data collection
As shown in FIG. 2, log data in a unified format is collected using nginx + lua. Openreserve, which is a high-performance Web platform based on Nginx and Lua, may also be used directly, where the data to be collected includes the following fields:
session identification id: the session id is set to be the same as the session id,
accessing an interface path: the flow rate of the oil is urlPath,
the method for accessing the interface comprises the following steps: method of producing a metal oxide
And (4) the user ip: clientip
Access time: timestamp
2) Log data pre-processing
Converting the collected log information into a standard json format, wherein the time format is unified as yyyy-MM-dd HH: MM: ss
Such as:
and taking the session Id as a grouping condition, and dividing the logs into different groups, namely, the data of each group is the data in the same web access session with the session as an association.
2) Obtaining event behavior chains in logs
And sequencing the log data by time to obtain an event behavior chain.
3) Statistical interface call acceptance relation probability
And (4) counting the incidence relation of the user to access the interfaces, and the next interface access time top3 of each interface. 3.1) number of recordings
The user A accesses the interface in the following sequence: logic, userInfo, updateUser, articleList
The user B accesses the interface in the following sequence: logic, userInfo, updateUser
The user C accesses the interface in the order: logic, articleList
The user D accesses the interface in the order: logic, friendList
……
Recording the next interface times of the logic interface, wherein the format is as follows:
3.2) obtaining interface data of top3
4) Similarity matching of behavioral chains
4.1) acquiring the event behavior chain data of the single session of the user A, and sequentially extracting each interface and the next interface. The basic format is:
4.2) matching whether the sequence of the user's interface access is abnormal
Traversing InterFaceList, and obtaining data of the first logic interface
Finding out top3 data of logic interface in common interface List as
List:[userInfo,articleList,friendList]
In this List, the recording threat score of the next access interface userInfo of user a is: 0
If the login interface access data of the user E is:
after matching, recording the threat score as: 1
4.3) similarly, the threat scores of the users in the whole session are obtained and accumulated.
4.4) the interface has abnormal behavior and has threat score, and the part of user data is written into a threat data table for subsequent viewing.
Although the present invention has been described herein with reference to the illustrated embodiments thereof, which are intended to be preferred embodiments of the present invention, it is to be understood that the invention is not limited thereto, and that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure.
Claims (1)
1. A correlation analysis method based on WEB logs is characterized by comprising the following steps:
step S100: standardized processing of log data
Collecting log data by using a WEB server and a filtering script, wherein the log data comprises: accessing a path urlPath, an access time timestamp, a request body and a return body of an interface, and uniformly converting log data into json format data; according to the access flow, the log data are grouped according to sessionid, and the sessionid is the identification id of the session;
step S200: obtaining event behavior chains in logs
In each group of logs, extracting a path urlPath of an access interface, a method of the access interface and access time timestamp of each log, and sequencing according to the access time timestamp to serve as an event behavior chain;
step S300: statistical interface call acceptance relation probability
Analyzing the grouped and sequenced log data, and counting to obtain N interfaces with more access times of the next interface after the current interface is accessed, wherein the storage format is as follows:
{
CurrentInterface: "Current interface information"
nextInterfaceList:[nextInterface1,nextInterface2,…nextInterfaceN]
}
Obtaining access characteristic attributes of the group;
step S400: similarity matching of event behavior chains
Calculating the event behavior chain data of a single session of a user, sequentially extracting each interface and the next interface thereof, and storing the data into a relation object relationship, wherein the basic format is as follows:
{
CurrentInterface: "current interface information",
NextInterface: "next interface information"
};
Matching each relationship instance to the access characteristic attribute of the group in turn:
if nextjnterface is in nextjnterface list, i.e., returns a threat score of 0,
if NextInterface is not in nextInterfaceList, the threat score is 1;
and accumulating all threat scores to obtain a total abnormal behavior score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911076385.9A CN110708339B (en) | 2019-11-06 | 2019-11-06 | Correlation analysis method based on WEB log |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911076385.9A CN110708339B (en) | 2019-11-06 | 2019-11-06 | Correlation analysis method based on WEB log |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110708339A CN110708339A (en) | 2020-01-17 |
CN110708339B true CN110708339B (en) | 2021-06-22 |
Family
ID=69205376
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911076385.9A Active CN110708339B (en) | 2019-11-06 | 2019-11-06 | Correlation analysis method based on WEB log |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110708339B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111552536A (en) * | 2020-04-29 | 2020-08-18 | 广东天亿马信息产业股份有限公司 | Management system and management method for electronic government affair self-service terminal |
CN111708681B (en) * | 2020-06-15 | 2021-05-07 | 北京优特捷信息技术有限公司 | Log processing method, device, equipment and storage medium |
CN111752727B (en) * | 2020-06-30 | 2023-06-20 | 上海观安信息技术股份有限公司 | Log analysis-based three-layer association recognition method for database |
CN113342744B (en) * | 2021-06-02 | 2022-02-15 | 北京优特捷信息技术有限公司 | Parallel construction method, device and equipment of call chain and storage medium |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103297435B (en) * | 2013-06-06 | 2016-12-28 | 中国科学院信息工程研究所 | A kind of abnormal access behavioral value method and system based on WEB daily record |
CN104217030B (en) * | 2014-09-28 | 2018-12-11 | 北京奇虎科技有限公司 | A kind of method and apparatus that user's classification is carried out according to server search daily record data |
US20170046510A1 (en) * | 2015-08-14 | 2017-02-16 | Qualcomm Incorporated | Methods and Systems of Building Classifier Models in Computing Devices |
CN105553740B (en) * | 2015-12-25 | 2018-07-31 | 北京奇虎科技有限公司 | Data-interface monitoring method and device |
CN106209781B (en) * | 2016-06-27 | 2019-09-06 | 航天云网科技发展有限责任公司 | One kind accessing recognition methods based on statistical exceptional interface |
CN108665297B (en) * | 2017-03-31 | 2021-01-26 | 北京京东尚科信息技术有限公司 | Method and device for detecting abnormal access behavior, electronic equipment and storage medium |
CN107438079B (en) * | 2017-08-18 | 2020-05-01 | 杭州安恒信息技术股份有限公司 | Method for detecting unknown abnormal behaviors of website |
CN109428857B (en) * | 2017-08-23 | 2021-01-05 | 腾讯科技(深圳)有限公司 | Detection method and device for malicious detection behaviors |
CN110224870B (en) * | 2019-06-19 | 2023-03-24 | 腾讯云计算(北京)有限责任公司 | Interface monitoring method and device, computing equipment and storage medium |
-
2019
- 2019-11-06 CN CN201911076385.9A patent/CN110708339B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110708339A (en) | 2020-01-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110708339B (en) | Correlation analysis method based on WEB log | |
CN103297435B (en) | A kind of abnormal access behavioral value method and system based on WEB daily record | |
CN105930727B (en) | Reptile recognition methods based on Web | |
CN109816397B (en) | Fraud discrimination method, device and storage medium | |
US8244752B2 (en) | Classifying search query traffic | |
CN108156131B (en) | Webshell detection method, electronic device and computer storage medium | |
US8126874B2 (en) | Systems and methods for generating statistics from search engine query logs | |
CN101971591B (en) | System and method of analyzing web addresses | |
CN108154029A (en) | Intrusion detection method, electronic equipment and computer storage media | |
CN102436564A (en) | Method and device for identifying falsified webpage | |
US9871826B1 (en) | Sensor based rules for responding to malicious activity | |
CN107547490B (en) | Scanner identification method, device and system | |
US7630987B1 (en) | System and method for detecting phishers by analyzing website referrals | |
CN114244564B (en) | Attack defense method, device, equipment and readable storage medium | |
CN114915479B (en) | Web attack stage analysis method and system based on Web log | |
CN113949577A (en) | Data attack analysis method applied to cloud service and server | |
CN108337269A (en) | A kind of WebShell detection methods | |
CN107592305A (en) | A kind of anti-brush method and system based on elk and redis | |
CN111859234A (en) | Illegal content identification method and device, electronic equipment and storage medium | |
CN110572402B (en) | Internet hosting website detection method and system based on network access behavior analysis and readable storage medium | |
CN108270754B (en) | Detection method and device for phishing website | |
CN110619075A (en) | Webpage identification method and equipment | |
Lagopoulos et al. | Web robot detection in academic publishing | |
US8909795B2 (en) | Method for determining validity of command and system thereof | |
CN116150541B (en) | Background system identification method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |