CN110602038B - Abnormal UA detection and analysis method and system based on rules - Google Patents

Abnormal UA detection and analysis method and system based on rules Download PDF

Info

Publication number
CN110602038B
CN110602038B CN201910706278.3A CN201910706278A CN110602038B CN 110602038 B CN110602038 B CN 110602038B CN 201910706278 A CN201910706278 A CN 201910706278A CN 110602038 B CN110602038 B CN 110602038B
Authority
CN
China
Prior art keywords
abnormal
uas
normal
client
http
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910706278.3A
Other languages
Chinese (zh)
Other versions
CN110602038A (en
Inventor
苟高鹏
熊刚
陈洁
李镇
徐安林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201910706278.3A priority Critical patent/CN110602038B/en
Publication of CN110602038A publication Critical patent/CN110602038A/en
Application granted granted Critical
Publication of CN110602038B publication Critical patent/CN110602038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0263Rule management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Abstract

The invention provides a method and a system for detecting and analyzing abnormal UA based on rules, which are characterized in that network traffic is captured based on a Spark network traffic capture platform, HTTP traffic is filtered from all network traffic according to an HTTP format, and UA fields of the HTTP traffic are extracted, so that the abnormal UA in the network traffic can be effectively detected and analyzed, and network management and malicious software detection are facilitated.

Description

Abnormal UA detection and analysis method and system based on rules
Technical Field
The invention belongs to the technical field of network information, and particularly relates to a method and a system for detecting and analyzing abnormal UA (user agent) based on rules.
Background
Key fields in network traffic play a crucial role in network traffic. Key fields in the Domain Name System (DNS) can be used to resolve the remaining trust in the Domain to see the evolution of DNS resolution, as well as to detect malware behavior in the network. Similarly, key fields in HyperText Transfer Protocol (HTTP) and Transport Layer Security/Secure Socket Layer (TLS/SSL) protocols, such as UA, cookie, and Server Name Indication (SNI), play a crucial role in network behavior analysis and malicious behavior detection.
Since HTTP takes up nearly half of all protocol traffic generated every day, the frequency of HTTP usage by users is high and the number of users involved is large, and the User Agent field in HTTP contains information of the client, including the operating system and version of the client, CPU type, browser and version, browser rendering engine, browser language, browser plug-in, etc. Therefore, the research User Agent field can be considered as a research on the condition that the flow key field in the network contains abnormal characters, and can also analyze the reason of the abnormal characters from the perspective of the client, because the client which generates the abnormal characters possibly has malicious behaviors. In order to research the phenomenon that abnormal characters exist in key fields of various protocols in network traffic, a User Agent field of an HTTP protocol is used as data to be detected and analyzed. Since the UA may contain information of the client, the UA may also be used to identify malware, while the client's preferences may be revealed by accounting for information of the client's operating system, browser, and device.
In a high-speed network environment, deep analysis of a network protocol is realized, and extracting the content of a key field is the primary premise of mapping and marking the network and the flow attribute, however, due to the complexity of the network protocol, the existing analysis tool often has the condition that abnormal characters exist in the key field during protocol analysis in the high-speed network environment, and the abnormal characters in the key field introduce polluted error information for realizing effective mapping and marking of network flow.
The abnormal characters of the key fields are generally ignored in the related research on UA in the past, and the key fields are not directly processed. Since there is also a certain reaction to these UAs that the behavior of the client and the client are closely connected, these UAs should not be ignored, and they also represent the ecosystem of UAs in network traffic.
Disclosure of Invention
The invention aims to provide a method and a system for detecting and analyzing abnormal UA based on rules, which can effectively detect and analyze the abnormal UA in network flow by extracting UA fields from HTTP flow in a network, thereby facilitating network management and malicious software detection.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method of rule-based abnormal UA detection and analysis, comprising the steps of:
capturing network traffic based on a Spark network traffic capturing platform;
carrying out protocol analysis on the captured traffic, filtering the HTTP traffic from all network traffic according to an HTTP format, extracting UA fields and IP information of a client and storing the UA fields and the IP information as a log;
carrying out abnormal detection on the extracted UA through a regular expression, judging whether the UA has abnormal characters, and if the UA has the abnormal characters, judging that the UA is abnormal;
according to the detected abnormal UA, calculating the similarity between the abnormal UA and the normal UA for the data in the log, and storing the normal UA with the similarity larger than 0 with the abnormal UA;
analyzing the first plurality of clients with the maximum number of abnormal UAs to find out the reasons of abnormal characters;
and carrying out custom classification on the stored normal UAs and classifying the normal UAs according to the custom types, judging the normal UAs which do not conform to the custom classification as abnormal UAs again, carrying out preference analysis on the device type and the browser type used by the client side containing the abnormal UAs, and detecting the malicious client side.
Further, the UA field and the IP information of the client form a log in the format of < client ID, UA >.
Further, the similarity of the abnormal UA and the normal UA is calculated for the data in the log using the Levenshtein distance.
Further, the number of clients refers to the clients containing the abnormal UA total accounting for 80% of all the abnormal UA total.
Further, the abnormal UA is stored separately, and the number is counted.
Further, the reasons for the occurrence of the abnormal character include: the malicious software generates abnormal UA by itself, and the abnormal UA is generated by different encoding and decoding modes of the UA.
A system for rule-based abnormal UA detection and analysis, comprising:
the Spark network traffic capturing platform is used for capturing network traffic;
the filter is used for carrying out protocol analysis on the captured flow, extracting UA field and IP information of the client and storing the UA field and the IP information as a log, carrying out anomaly detection on the extracted UA through a regular expression, and detecting abnormal UA containing abnormal characters and normal UA with the similarity larger than 0;
the analyzer is used for analyzing the first clients with the maximum number of abnormal UAs and finding out the reasons of abnormal characters; and carrying out custom classification on the stored normal UAs and classifying the stored normal UAs according to the custom types, judging the normal UAs which do not conform to the custom classification as abnormal UAs again, carrying out preference analysis on the device type and the browser type used by the client side containing the abnormal UAs, and detecting the malicious client side.
Further, the filter comprises an HTTP extractor, a UA extractor and an IP extractor, wherein the HTTP extractor is used for filtering HTTP traffic from all network traffic according to an HTTP format, the UA extractor is used for extracting a UA field from the HTTP traffic, and the IP extractor is used for extracting the IP information of the client from the HTTP traffic.
The method aims to pay attention to UAs which are usually ignored and contain abnormal characters, filter the UAs containing the abnormal characters from all UAs in network traffic and count the number of the UAs by using a rule-based method (namely a regular expression), and analyze malicious clients from the UAs. The method realizes passive measurement of the high-speed network flow, captures the network flow by using a Spark-based high-speed network flow capture platform, identifies and deeply analyzes HTTP, and extracts UA fields in the HTTP. The detection method of abnormal UA which is usually ignored and the reasons of the abnormal UA are researched, a rule-based method is used, namely, the regular expression is used for detecting the abnormal UA in the UA field, and the regular expression is used for successfully distinguishing the UA containing abnormal characters from the normal UA. The similarity of each abnormal UA and other normal UAs is calculated by using the Levenshtein distance, and the normal UAs with the similarity larger than 0 among the abnormal UAs are saved for analysis. The reason for generating abnormal characters in network traffic is revealed from the perspective of coding and malicious users.
The method of the invention has the following advantages:
(1) focusing on details, measurement analyses UA that contain anomalous characters in network traffic.
(2) UA containing anomalous characters are detected from all UA fields using a rule-based regular expression, which can be faster than a statistical-based approach in terms of time consumption. And the normal UA has a fixed format and characters, and the condition of misjudgment cannot occur by using a correct rule method.
(3) The first several (for example, the first 20) clients containing the largest number of abnormal characters are analyzed, so that the interference analysis result of the UA containing the abnormal characters caused by accidental factors is avoided, and the reason of the abnormal UA is analyzed from the perspective of the client.
(4) Not only are abnormal UAs analyzed, but also normal UAs with similarity greater than 0 calculated by using Levenshtein distance are analyzed, the formally normal UAs may be abnormal in meaning, the method not only detects the formally abnormal UAs, but also detects the semantically abnormal UAs, and shows an 'ecosystem' of the abnormal UAs in network flow.
Drawings
Fig. 1 is a flow diagram of a method for rule-based abnormal UA detection and analysis.
Fig. 2 is a system framework diagram of a rule-based abnormal UA detection and analysis.
Detailed Description
In order to make the aforementioned features and advantages of the present invention more comprehensible, a method for detecting and analyzing an abnormal UA based on rules disclosed in the present invention is described in detail below with reference to the accompanying drawings, as shown in a flowchart of fig. 1, and includes the following steps:
firstly, a detection stage:
(1) capturing network traffic: and capturing the high-speed traffic by using a Spark-based high-speed network traffic capturing platform, and waiting for processing.
(2) Network traffic filtering and key field extraction: and carrying out protocol analysis on the captured traffic, filtering the HTTP traffic from all network traffic according to an HTTP format, extracting UA fields and IP information of a client according to the HTTP format, and forming logs in a < client ID, UA > format for storage.
(3) And (3) abnormal UA detection: and detecting the extracted UA through a regular expression, judging whether abnormal characters exist or not, if the UA does not accord with the established rule, judging that the abnormal characters exist, storing the abnormal characters separately, and counting the number of the abnormal characters at the same time.
(4) Normal UA extraction: according to the detected abnormal UAs, the similarity between the abnormal UAs and the normal UAs is calculated by using data of Levenshtein distance in the collected logs, and the normal UAs with the similarity larger than 0 with the abnormal UAs are saved.
II, an analysis stage:
(1) abnormal UA analysis: in order to prevent abnormal characters from appearing in UAs caused by accidental factors in a network, the client with the abnormal UA number of the first 20 clients is selected to perform reason analysis of the abnormal characters appearing in all the clients, and the total number of the abnormal UAs contained in the 20 clients accounts for about 80% of the total number of all the abnormal UAs. Through the analysis of the filtered abnormal UAs, two reasons for the occurrence of the UAs are found, wherein the abnormal UAs are mainly generated by malware itself, because a large number of identical abnormal UAs are generated by the client, and meanwhile, the different encoding and decoding modes of the UAs are also one of the reasons for the abnormal UAs. Malicious behavior of the malware may be detected and tracked through the anomalous UAs, which is advantageous for maintaining network security, and ecosystems that exhibit anomalous UAs may detect and track malicious behavior of the malware through the anomalous UAs, which is advantageous for maintaining network security, and ecosystems that exhibit anomalous UAs.
(2) Normal UA analysis: since these clients produce many abnormal UAs, the normal UAs of these clients must also be normal in the sense of UA usage, which are custom classified and categorized and analyzed from the client's usage of device type and browser.
The method is implemented by a system for detecting and analyzing abnormal UA based on rules, as shown in fig. 1 and fig. 2, and specifically includes the following parts:
the Spark network traffic capturing platform is used for capturing network traffic;
the filter is used for carrying out protocol analysis on the captured flow, extracting UA field and IP information of the client and storing the UA field and the IP information as a log, carrying out anomaly detection on the extracted UA through a regular expression, and detecting abnormal UA containing abnormal characters and normal UA with the similarity larger than 0; specifically, the filter includes an HTTP extractor, a UA extractor, and an IP extractor, where the HTTP extractor is configured to filter HTTP traffic from all network traffic according to an HTTP format, the UA extractor is configured to extract a UA field from the HTTP traffic, and the IP extractor is configured to extract IP information of the client from the HTTP traffic.
The analyzer is used for analyzing the first clients with the maximum number of abnormal UAs and finding out the reasons of abnormal characters; and carrying out custom classification on the stored normal UAs and classifying the stored normal UAs according to the custom types, judging the normal UAs which do not conform to the custom classification as abnormal UAs again, carrying out preference analysis on the device type and the browser type used by the client side containing the abnormal UAs, and detecting the malicious client side.
The process of the invention is further illustrated by the following specific example:
as shown in fig. 2, traffic is captured for 2 months using a traffic capture platform, and a total of over 1500 hundred million UAs are collected, wherein nearly 2200 million UAs contain abnormal characters, the ratio of these abnormal UAs to normal UAs is about 0.1485 ‰, wherein the number of clients containing abnormal characters is about 91000, and they are distributed around the world.
The client with the number of abnormal UAs as the top 20 of all clients is selected to search the reason, so that the interference brought by accidental factors to the analysis is avoided. Two reasons are found to cause abnormal UAs, one is that the decoding and encoding methods of UAs do not match, and the other is that users/applications themselves produce these abnormal UAs, which malicious users are more likely to generate abnormal UAs for malicious activities, and their formats are different from those of normal UAs.
And finally, carrying out custom classification on the filtered normal UA, classifying the normal UA by using a regular expression according to the custom classification, finding 3 UA types which are abnormal in meaning (namely are not in accordance with the custom classification), and showing an ecosystem of the abnormal UA.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (10)

1. A method of rule-based abnormal UA detection and analysis, comprising the steps of:
capturing network traffic based on a Spark network traffic capturing platform;
carrying out protocol analysis on the captured traffic, filtering the HTTP traffic from all network traffic according to an HTTP format, extracting UA fields and IP information of a client and storing the UA fields and the IP information as a log;
carrying out abnormal detection on the extracted UA through a regular expression, judging whether the UA has abnormal characters, and if the UA has the abnormal characters, judging that the UA is abnormal;
according to the detected abnormal UA, calculating the similarity between the abnormal UA and the normal UA for the data in the log, and storing the normal UA with the similarity larger than 0 with the abnormal UA;
analyzing the first plurality of clients with the maximum number of abnormal UAs to find out the reasons of abnormal characters;
and carrying out custom classification on the stored normal UAs and classifying the normal UAs according to the custom types, judging the normal UAs which do not conform to the custom classification as abnormal UAs again, carrying out preference analysis on the device type and the browser type used by the client side containing the abnormal UAs, and detecting the malicious client side.
2. The method of claim 1, wherein the UA field and the client's IP information are logged in a format of < client ID, UA >.
3. The method of claim 1, wherein the similarity of the abnormal UA and the normal UA is calculated for data in the log using Levenshtein distance.
4. The method of claim 1, wherein the number of clients refers to clients that contain a total number of anomalous UAs that is 80% of the total number of all anomalous UAs.
5. The method of claim 1, wherein the abnormal UAs are stored separately and counted.
6. The method of claim 1, wherein the cause of the occurrence of the anomalous character comprises: the malicious software generates abnormal UA by itself, and the abnormal UA is generated by different encoding and decoding modes of the UA.
7. A system for rule-based abnormal UA detection and analysis, comprising:
the Spark network traffic capturing platform is used for capturing network traffic;
the filter is used for carrying out protocol analysis on the captured flow, extracting the IP information of a UA field and a client and storing the IP information as a log, carrying out abnormal detection on the extracted UA through a regular expression, detecting abnormal UA containing abnormal characters, calculating the similarity between the abnormal UA and a normal UA, and storing the normal UA of which the similarity with the abnormal UA is more than 0;
the analyzer is used for analyzing the first clients with the maximum number of abnormal UAs and finding out the reasons of abnormal characters; and carrying out custom classification on the stored normal UAs and classifying the stored normal UAs according to the custom types, judging the normal UAs which do not conform to the custom classification as abnormal UAs again, carrying out preference analysis on the device type and the browser type used by the client side containing the abnormal UAs, and detecting the malicious client side.
8. The system of claim 7, wherein the filter comprises an HTTP extractor for filtering HTTP traffic from all network traffic according to the HTTP format, a UA extractor for extracting UA fields from the HTTP traffic, and an IP extractor for extracting client IP information from the HTTP traffic.
9. The system of claim 8, wherein the UA field and the client's IP information are logged in a format of < client ID, UA >.
10. The system of claim 7, wherein the plurality of clients refers to clients that contain a total number of anomalous UAs that is 80% of the total number of all anomalous UAs.
CN201910706278.3A 2019-08-01 2019-08-01 Abnormal UA detection and analysis method and system based on rules Active CN110602038B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910706278.3A CN110602038B (en) 2019-08-01 2019-08-01 Abnormal UA detection and analysis method and system based on rules

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910706278.3A CN110602038B (en) 2019-08-01 2019-08-01 Abnormal UA detection and analysis method and system based on rules

Publications (2)

Publication Number Publication Date
CN110602038A CN110602038A (en) 2019-12-20
CN110602038B true CN110602038B (en) 2020-12-04

Family

ID=68853368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910706278.3A Active CN110602038B (en) 2019-08-01 2019-08-01 Abnormal UA detection and analysis method and system based on rules

Country Status (1)

Country Link
CN (1) CN110602038B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113382000A (en) * 2021-06-09 2021-09-10 北京天融信网络安全技术有限公司 UA character string anomaly detection method, device, equipment and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109691023A (en) * 2017-05-25 2019-04-26 微软技术许可有限责任公司 Resolver

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100493094C (en) * 2006-08-25 2009-05-27 清华大学 P2P data message detection method based on character code
CN103856524A (en) * 2012-12-04 2014-06-11 中山大学深圳研究院 Method and system for identifying legal content on basis of white list of user agent
US9729509B2 (en) * 2013-03-23 2017-08-08 Fortinet, Inc. System and method for integrated header, state, rate and content anomaly prevention for session initiation protocol
US9215240B2 (en) * 2013-07-25 2015-12-15 Splunk Inc. Investigative and dynamic detection of potential security-threat indicators from events in big data
CN107483488B (en) * 2017-09-18 2021-04-30 济南互信软件有限公司 Malicious Http detection method and system
CN109583472A (en) * 2018-10-30 2019-04-05 中国科学院计算技术研究所 A kind of web log user identification method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109691023A (en) * 2017-05-25 2019-04-26 微软技术许可有限责任公司 Resolver

Also Published As

Publication number Publication date
CN110602038A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN109960729B (en) Method and system for detecting HTTP malicious traffic
CN108683687B (en) Network attack identification method and system
CN108471429B (en) Network attack warning method and system
CN108881263B (en) Network attack result detection method and system
US10721245B2 (en) Method and device for automatically verifying security event
CN111277578A (en) Encrypted flow analysis feature extraction method, system, storage medium and security device
CN108833185B (en) Network attack route restoration method and system
CN108718298B (en) Malicious external connection flow detection method and device
Deokar et al. Intrusion detection system using log files and reinforcement learning
CN113259313A (en) Malicious HTTPS flow intelligent analysis method based on online training algorithm
CN107547490B (en) Scanner identification method, device and system
Elshoush et al. An improved framework for intrusion alert correlation
US10348751B2 (en) Device, system and method for extraction of malicious communication pattern to detect traffic caused by malware using traffic logs
CN110839042B (en) Flow-based self-feedback malicious software monitoring system and method
CN110602038B (en) Abnormal UA detection and analysis method and system based on rules
Machlica et al. Learning detectors of malicious web requests for intrusion detection in network traffic
CN108763916B (en) Service interface security assessment method and device
CN113779573A (en) Large-scale Lesox software analysis method and analysis device based on system tracing graph
Iqbal et al. Analysis of a payload-based network intrusion detection system using pattern recognition processors
CN115134147A (en) E-mail detection method and device
Sulaiman et al. Big data analytic of intrusion detection system
CN108650229A (en) A kind of network application behavior parsing restoring method and system
CN113381986B (en) Reduction method and device for network security scanning rule set
EP4024253B1 (en) Detection of malicious activity on endpoint computers by utilizing anomaly detection in web access patterns, in organizational environments
CN113783736B (en) IOC information extraction method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant