CN110708339B - Correlation analysis method based on WEB log - Google Patents

Correlation analysis method based on WEB log Download PDF

Info

Publication number
CN110708339B
CN110708339B CN201911076385.9A CN201911076385A CN110708339B CN 110708339 B CN110708339 B CN 110708339B CN 201911076385 A CN201911076385 A CN 201911076385A CN 110708339 B CN110708339 B CN 110708339B
Authority
CN
China
Prior art keywords
interface
access
logs
group
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911076385.9A
Other languages
Chinese (zh)
Other versions
CN110708339A (en
Inventor
代波
李成东
常清雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201911076385.9A priority Critical patent/CN110708339B/en
Publication of CN110708339A publication Critical patent/CN110708339A/en
Application granted granted Critical
Publication of CN110708339B publication Critical patent/CN110708339B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1466Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Abstract

The invention discloses a correlation analysis method based on WEB logs, which comprises the following steps: the method comprises the steps of carrying out standardized processing on log data, obtaining event behavior chains in logs, and obtaining access characteristic attributes of a group by calling bearing relation probability through a statistical interface; and calculating the similarity matching between the event behavior chain of the user and the group to obtain the total abnormal behavior score. According to the method, through key log correlation analysis, the similarity of the user actual interface access behavior chain and the behavior chain of the group is matched, abnormal behaviors can be identified accurately in a targeted manner, and a system administrator can be informed in time; the analyzed data is web access logs, large concurrency and cross relation exist, direct time line sequence relation is abandoned, classification is established by using field identification in log data, abnormal relation is distinguished by using individual-group comparison, and the applicability is wider.

Description

Correlation analysis method based on WEB log
Technical Field
The invention relates to the technical field of log security analysis, in particular to a correlation analysis method based on WEB logs.
Background
With the development of Web technology, the Web2.0 is born, the advantage of convenient WEB application deployment and maintenance is gradually reflected, Internet applications based on a Web environment are more and more extensive, various information applications of enterprises are erected on a Web platform, the rapid development of Web services also causes strong attention of hackers, Web security threats are also followed, and hackers acquire the control authority of a Web server by using system bugs, SQL injection bugs and other modes of Web service programs, slightly tamper the content of a webpage, seriously steal important internal data, and more seriously implant some malicious codes in the webpage, so that other visitors of the website are all infringed. The Web access log records various kinds of original information such as a processing request received by the Web server and a runtime error. Through security analysis of the WEB log, the method not only can help us to locate attackers, but also can help us to restore attack paths, find security vulnerabilities existing in websites and repair the vulnerabilities. In the existing log analysis system, by extracting Web access log information, it is possible to clearly know what IP, what time, what operating system, what browser, and which page of your website a user has accessed, whether the access was successful, and other information. The method has the defects that whether a single log contains abnormal access problems or attack behaviors is analyzed independently, the incidence relation among the logs is not analyzed, and the condition that a plurality of requests are combined to attack a system cannot be identified.
Disclosure of Invention
The invention aims to provide a correlation analysis method based on WEB logs, which is used for solving the problem that whether a single log is abnormal or not is independently analyzed in the prior art, and the correlation between the logs is not analyzed, so that a plurality of requests cannot be identified to jointly attack a system.
The invention solves the problems through the following technical scheme:
a correlation analysis method based on WEB logs comprises the following steps:
step S100: standardized processing of log data
Each session between the browser and the server has a session, which is a unique location identifier for identifying the session and the user agent. The log data are collected by using the WEB server and the filter script, and all fields transmitted during interface access can be intercepted, wherein the fields comprise the inherent data fields needing to be collected: url of access, access time, requester and returner, and converts these content assertions into json formatted data. According to the access flow, one session is taken as a basic unit, namely, the session is grouped, and the log data in the single session is grouped into one group;
step S200: obtaining event behavior chains in logs
Event action chain, i.e. the current user accesses the sequential list of all interfaces in one session. The logs to be analyzed are grouped according to the session, an access interface path urlPath, an access interface method and an access time timestamp of each log are extracted from each group of logs, and the logs are sequenced according to the access time timestamp to serve as a complete event behavior chain;
step S300: statistical interface call acceptance relation probability
And analyzing the grouped and sequenced log data, and counting to obtain the next interface with the highest probability of accessing the interface after each interface is accessed, wherein the first N interfaces are obtained. The specific method comprises the following steps:
and acquiring the next calling interface accessed by each interface, adding the next calling interface into the list if the next calling interface is newly appeared, setting the number of appearance times to be 1, and adding 1 to the number of appearance times if the next calling interface is repeatedly appeared.
Obtaining the first N with more times, wherein the access characteristic attributes of the group can be obtained by the operation, and the storage format is as follows:
{
CurrentInterface: "Current interface information"
nextInterfaceList:[nextInterface1,nextInterface2,…nextInterfaceN]
}
Obtaining access characteristic attributes of the group;
step S400: similarity matching of event behavior chains
Calculating the event behavior chain data of a single session of a user, sequentially extracting each interface and the next interface thereof, and storing the data into a relation object relationship, wherein the basic format is as follows:
{
CurrentInterface: "current interface information",
NextInterface: "next interface information"
};
Matching each relationship instance to the access characteristic attribute of the group in turn:
if nextjnterface is in nextjnterface list, i.e., returns a threat score of 0,
if NextInterface is not in nextInterfaceList, the threat score is 1;
and accumulating all threat scores obtained by matching to obtain a total abnormal behavior score.
According to the invention, through the correlation analysis among the web access logs, whether a user contains a behavior threatening the web application system in one session access behavior can be more accurately judged, and the data assets can be better protected.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) according to the method, through key log correlation analysis, the similarity of the user actual interface access behavior chain and the behavior chain of the group is matched, and abnormal behaviors can be identified accurately in a targeted manner and can be notified to a system administrator in time.
(2) The data analyzed by the method is web access logs, larger concurrency and cross relation exist, direct time line sequence relation is abandoned, classification is established by using field identification in log data, abnormal relation is distinguished by using comparison of individuals and groups, and the method is wider in applicability.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a system framework diagram of log collection in the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto.
Example 1:
referring to fig. 1, a method for analyzing association based on a WEB log includes the following steps:
1) log data collection
As shown in FIG. 2, log data in a unified format is collected using nginx + lua. Openreserve, which is a high-performance Web platform based on Nginx and Lua, may also be used directly, where the data to be collected includes the following fields:
session identification id: the session id is set to be the same as the session id,
accessing an interface path: the flow rate of the oil is urlPath,
the method for accessing the interface comprises the following steps: method of producing a metal oxide
And (4) the user ip: clientip
Access time: timestamp
2) Log data pre-processing
Converting the collected log information into a standard json format, wherein the time format is unified as yyyy-MM-dd HH: MM: ss
Such as:
Figure BDA0002262588310000041
Figure BDA0002262588310000051
and taking the session Id as a grouping condition, and dividing the logs into different groups, namely, the data of each group is the data in the same web access session with the session as an association.
2) Obtaining event behavior chains in logs
And sequencing the log data by time to obtain an event behavior chain.
Figure BDA0002262588310000052
Figure BDA0002262588310000061
3) Statistical interface call acceptance relation probability
And (4) counting the incidence relation of the user to access the interfaces, and the next interface access time top3 of each interface. 3.1) number of recordings
The user A accesses the interface in the following sequence: logic, userInfo, updateUser, articleList
The user B accesses the interface in the following sequence: logic, userInfo, updateUser
The user C accesses the interface in the order: logic, articleList
The user D accesses the interface in the order: logic, friendList
……
Recording the next interface times of the logic interface, wherein the format is as follows:
Figure BDA0002262588310000062
3.2) obtaining interface data of top3
Figure BDA0002262588310000063
Figure BDA0002262588310000071
4) Similarity matching of behavioral chains
4.1) acquiring the event behavior chain data of the single session of the user A, and sequentially extracting each interface and the next interface. The basic format is:
Figure BDA0002262588310000072
Figure BDA0002262588310000081
4.2) matching whether the sequence of the user's interface access is abnormal
Traversing InterFaceList, and obtaining data of the first logic interface
Figure BDA0002262588310000082
Finding out top3 data of logic interface in common interface List as
List:[userInfo,articleList,friendList]
In this List, the recording threat score of the next access interface userInfo of user a is: 0
If the login interface access data of the user E is:
Figure BDA0002262588310000083
after matching, recording the threat score as: 1
4.3) similarly, the threat scores of the users in the whole session are obtained and accumulated.
4.4) the interface has abnormal behavior and has threat score, and the part of user data is written into a threat data table for subsequent viewing.
Although the present invention has been described herein with reference to the illustrated embodiments thereof, which are intended to be preferred embodiments of the present invention, it is to be understood that the invention is not limited thereto, and that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure.

Claims (1)

1. A correlation analysis method based on WEB logs is characterized by comprising the following steps:
step S100: standardized processing of log data
Collecting log data by using a WEB server and a filtering script, wherein the log data comprises: accessing a path urlPath, an access time timestamp, a request body and a return body of an interface, and uniformly converting log data into json format data; according to the access flow, the log data are grouped according to sessionid, and the sessionid is the identification id of the session;
step S200: obtaining event behavior chains in logs
In each group of logs, extracting a path urlPath of an access interface, a method of the access interface and access time timestamp of each log, and sequencing according to the access time timestamp to serve as an event behavior chain;
step S300: statistical interface call acceptance relation probability
Analyzing the grouped and sequenced log data, and counting to obtain N interfaces with more access times of the next interface after the current interface is accessed, wherein the storage format is as follows:
{
CurrentInterface: "Current interface information"
nextInterfaceList:[nextInterface1,nextInterface2,…nextInterfaceN]
}
Obtaining access characteristic attributes of the group;
step S400: similarity matching of event behavior chains
Calculating the event behavior chain data of a single session of a user, sequentially extracting each interface and the next interface thereof, and storing the data into a relation object relationship, wherein the basic format is as follows:
{
CurrentInterface: "current interface information",
NextInterface: "next interface information"
};
Matching each relationship instance to the access characteristic attribute of the group in turn:
if nextjnterface is in nextjnterface list, i.e., returns a threat score of 0,
if NextInterface is not in nextInterfaceList, the threat score is 1;
and accumulating all threat scores to obtain a total abnormal behavior score.
CN201911076385.9A 2019-11-06 2019-11-06 Correlation analysis method based on WEB log Active CN110708339B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911076385.9A CN110708339B (en) 2019-11-06 2019-11-06 Correlation analysis method based on WEB log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911076385.9A CN110708339B (en) 2019-11-06 2019-11-06 Correlation analysis method based on WEB log

Publications (2)

Publication Number Publication Date
CN110708339A CN110708339A (en) 2020-01-17
CN110708339B true CN110708339B (en) 2021-06-22

Family

ID=69205376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911076385.9A Active CN110708339B (en) 2019-11-06 2019-11-06 Correlation analysis method based on WEB log

Country Status (1)

Country Link
CN (1) CN110708339B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552536A (en) * 2020-04-29 2020-08-18 广东天亿马信息产业股份有限公司 Management system and management method for electronic government affair self-service terminal
CN111708681B (en) * 2020-06-15 2021-05-07 北京优特捷信息技术有限公司 Log processing method, device, equipment and storage medium
CN111752727B (en) * 2020-06-30 2023-06-20 上海观安信息技术股份有限公司 Log analysis-based three-layer association recognition method for database
CN113342744B (en) * 2021-06-02 2022-02-15 北京优特捷信息技术有限公司 Parallel construction method, device and equipment of call chain and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297435B (en) * 2013-06-06 2016-12-28 中国科学院信息工程研究所 A kind of abnormal access behavioral value method and system based on WEB daily record
CN104217030B (en) * 2014-09-28 2018-12-11 北京奇虎科技有限公司 A kind of method and apparatus that user's classification is carried out according to server search daily record data
US20170046510A1 (en) * 2015-08-14 2017-02-16 Qualcomm Incorporated Methods and Systems of Building Classifier Models in Computing Devices
CN105553740B (en) * 2015-12-25 2018-07-31 北京奇虎科技有限公司 Data-interface monitoring method and device
CN106209781B (en) * 2016-06-27 2019-09-06 航天云网科技发展有限责任公司 One kind accessing recognition methods based on statistical exceptional interface
CN108665297B (en) * 2017-03-31 2021-01-26 北京京东尚科信息技术有限公司 Method and device for detecting abnormal access behavior, electronic equipment and storage medium
CN107438079B (en) * 2017-08-18 2020-05-01 杭州安恒信息技术股份有限公司 Method for detecting unknown abnormal behaviors of website
CN109428857B (en) * 2017-08-23 2021-01-05 腾讯科技(深圳)有限公司 Detection method and device for malicious detection behaviors
CN110224870B (en) * 2019-06-19 2023-03-24 腾讯云计算(北京)有限责任公司 Interface monitoring method and device, computing equipment and storage medium

Also Published As

Publication number Publication date
CN110708339A (en) 2020-01-17

Similar Documents

Publication Publication Date Title
CN110708339B (en) Correlation analysis method based on WEB log
CN103297435B (en) A kind of abnormal access behavioral value method and system based on WEB daily record
CN105930727B (en) Reptile recognition methods based on Web
CN109816397B (en) Fraud discrimination method, device and storage medium
US8244752B2 (en) Classifying search query traffic
CN108156131B (en) Webshell detection method, electronic device and computer storage medium
US8126874B2 (en) Systems and methods for generating statistics from search engine query logs
CN101971591B (en) System and method of analyzing web addresses
CN108154029A (en) Intrusion detection method, electronic equipment and computer storage media
CN102436564A (en) Method and device for identifying falsified webpage
US9871826B1 (en) Sensor based rules for responding to malicious activity
CN107547490B (en) Scanner identification method, device and system
US7630987B1 (en) System and method for detecting phishers by analyzing website referrals
CN114244564B (en) Attack defense method, device, equipment and readable storage medium
CN114915479B (en) Web attack stage analysis method and system based on Web log
CN113949577A (en) Data attack analysis method applied to cloud service and server
CN108337269A (en) A kind of WebShell detection methods
CN107592305A (en) A kind of anti-brush method and system based on elk and redis
CN111859234A (en) Illegal content identification method and device, electronic equipment and storage medium
CN110572402B (en) Internet hosting website detection method and system based on network access behavior analysis and readable storage medium
CN108270754B (en) Detection method and device for phishing website
CN110619075A (en) Webpage identification method and equipment
Lagopoulos et al. Web robot detection in academic publishing
US8909795B2 (en) Method for determining validity of command and system thereof
CN116150541B (en) Background system identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant