CN114143058B - Full-flow vulnerability acquisition method for data - Google Patents

Full-flow vulnerability acquisition method for data Download PDF

Info

Publication number
CN114143058B
CN114143058B CN202111409419.9A CN202111409419A CN114143058B CN 114143058 B CN114143058 B CN 114143058B CN 202111409419 A CN202111409419 A CN 202111409419A CN 114143058 B CN114143058 B CN 114143058B
Authority
CN
China
Prior art keywords
analysis
data
network
flow
backtracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111409419.9A
Other languages
Chinese (zh)
Other versions
CN114143058A (en
Inventor
赵慧奇
范芳
钟广源
杨明
武莹莹
张蓓
刘高源
张华杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Science and Technology
Original Assignee
Shandong University of Science and Technology
Filing date
Publication date
Application filed by Shandong University of Science and Technology filed Critical Shandong University of Science and Technology
Priority to CN202111409419.9A priority Critical patent/CN114143058B/en
Publication of CN114143058A publication Critical patent/CN114143058A/en
Application granted granted Critical
Publication of CN114143058B publication Critical patent/CN114143058B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a data full-flow vulnerability collection method, in particular to a method for locking vulnerabilities based on network full-flow backtracking analysis by a distributed architecture, which comprises the following steps: the method comprises the steps of (1) grabbing a network data packet (2) to identify and analyze a common protocol, and (3) collecting network full flow and carrying out historical data backtracking analysis (4) carrying out data backtracking and data mining; the invention can perform omnibearing deep perspective analysis and backtracking on the network full flow, the analysis data is displayed to the data packet level of the bottommost layer, the display time is accurate to microsecond level, various network information security problems, APT Trojan infiltration, hacker invasion, zombie remote control and the like can be accurately analyzed and rapidly positioned, and the leak is accurately locked.

Description

Full-flow vulnerability acquisition method for data
Technical Field
The invention relates to the technical field of network space security, in particular to a full-flow data acquisition vulnerability method of a distributed architecture.
Background
The method is formally proposed to make up for the shortages of the domestic flow analysis technology, solves the problem of leak finding in full flow analysis, achieves full-direction deep perspective analysis and backtracking on the network full flow, displays analysis data to the lowest data packet level, achieves accurate time to microsecond level, achieves various network information security problems, APT Trojan penetration, hacker invasion, zombie remote control and the like, can achieve accurate analysis and rapid positioning, can change passive to active, break through outages of overseas mainstream manufacturers, get rid of dependence on overseas products, and can create brand new full-flow analysis core technology innovation and research and development products of domestic intellectual property rights, and provide a management model for network space security work.
Disclosure of Invention
The invention provides a brand new full-flow acquisition method and a network full-flow data storage technology for overcoming the short boards in the above content, and can meet the requirements of effective storage of network attacks and attack traceability vulnerabilities.
The invention relates to a full-flow vulnerability acquisition method which is characterized by comprising the following steps of: the method comprises the steps of (1) grabbing a network data packet (2) to identify and analyze a common protocol, and (3) collecting network full flow and carrying out historical data backtracking analysis (4) carrying out data backtracking and data mining;
in step (1), the network packets are grabbed.
In the step (2), the protocol recognition engine is adopted to recognize and analyze the original network traffic in real time in the full traffic analysis, the protocol analysis covers the common L2-L7 layer main stream communication protocol in most networks, such as HTTP protocol, IGMP protocol, OSPF protocol, kerberos protocol or industrial control protocol, nonstandard protocol, and the like, thereby accurately decoding and counting the protocols in the network, adopting the TRE protocol recognition filter module, and being capable of filtering P2P, video, audio, online live broadcast and other low-value data in the traffic in real time as required and only storing the high-value traffic. The system provides eight metadata logs such as HTTP, DNS, mail log, database log, FTP log, remote access log, encryption session log, ICMP log and the like, and carries out omnibearing identification and audit on common web access, mail, IM chat, file transmission, OA, database and other Internet behaviors and intranet behaviors in the network.
In the step (3), a backtracking analysis server is adopted to collect and analyze the original flow in the network in real time and completely store the original flow; the analysis console is used for providing man-machine data interaction and is used for connecting an analysis server to perform data display and backtracking analysis, and various data analysis operations such as data mining, alarm analysis, abnormal communication analysis, backtracking analysis, data packet analysis and the like can be performed after the analysis console is successfully connected to the backtracking analysis server. The remote centralized analysis can be performed by connecting to a plurality of analysis servers at the same time through a console.
In step (4), using network traffic statistics to count various network indexes in a fixed time period, extracting some available features in the statistics data, reversely searching for an original data packet according to the features to complete network data analysis, generating a log by utilizing alarm and behavior analysis, and filtering and analyzing the original data packet by using information such as IP in the log as an entry point. And (3) completely backtracking any time through the original data packet, and positioning the time of the security event, the source IP, the destination IP, the event cause, the event passing and the influence caused by the event.
The invention has the positive progress effects that:
The invention adopts the independent intellectual property protocol recognition engine leading in the industry in full flow analysis, and can realize network protocol recognition in 1500 and decoding of more than 650 kinds of protocol fine fields.
At present, the recognition and decoding technology aiming at common protocols such as HTTP, FTP, SMTP/POP3/IMAP4 and the like is realized, but aiming at network application such as QQ, thunder, dog searching and the like, the situation of recognition errors, omission and the like can occur when the conventional protocol recognition engine is used for recognizing the common protocols. The invention provides a new application recognition engine which can accurately recognize network applications, integrates the label characteristics (address, port, flag bit, data packet length) of the traffic and the load characteristics (content fingerprint, repetitive relation and response relation) of the traffic, and can rapidly and accurately recognize the applications to which the traffic belongs.
The invention provides network basic data summary statistics and statistics of IP addresses, protocols, ports, countries and the like according to network data statistical analysis. The statistics are summarized indexes in the process of analyzing the network problems for a long time, the indexes represent network flow information to a great extent, and the network analysis is carried out according to the information, so that the vulnerability can be accurately locked.
The invention provides a general data access interface, the development interface is designed to adopt HTTP and HTTPS communication protocol, which supports the export of statistical data, original data packet and log data, etc., and provides the data to other third party application programs.
Description of the drawings:
FIG. 1 is a functional architecture diagram of a controller and a server according to the present invention
FIG. 2 is a system architecture diagram of a controller and a server according to the present invention
FIG. 3 is a flow chart of the statistical analysis and main body of network data according to the present invention
FIG. 4 is a core flow chart of statistical analysis of network data according to the present invention
FIG. 5 is a flow chart of statistics recording and log querying according to the present invention
FIG. 6 is a flow chart of the RestfulAPI interface implementation of the present invention
Detailed description of the preferred embodiments
The present invention will be further described in detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.
The invention provides a new application recognition engine which can accurately recognize network applications, integrates the label characteristics (address, port, zone bit, data packet length) of the flow and the load characteristics (content fingerprint, repetitive relation and response relation) of the flow, and can rapidly and accurately recognize the applications to which the flow belongs.
The application recognition engine includes three recognition modules: single packet identification, single stream identification, and multi-stream identification.
Single packet identification: the application of the current data packet is determined by one or more combinations of the four attributes, namely port, TCP Flags, data packet length and fingerprint.
Single stream identification: the application of the current stream is determined by the repetitive behavior of a series of feature packets within the same stream.
Multi-stream identification: the engine records the IP address port pairs of other streams of known applications obtained by an external analysis module (analyzing the FTP stream or the certificate stream) and can be quickly searched.
The network application identification engine can rapidly and accurately identify common applications in the network and count and analyze communication parameters thereof, including communication time, duration, communication flow, communication content and the like, and can extract and filter specific fields.
In addition, the method provides fast decoding for a specific protocol, supports single-thread decoding speed of >10wpps, is simple and easy to use, supports protocol text specification and protocol definition/filtering/output, and has high performance, expandability and strong flexibility.
The protocol recognition engine provides the following capabilities:
1) Supporting the identification of over 1100 common protocols. The following protocol:
Network base protocol: IP, TCP, IPV6, NTP, SCTP, STUN, TPTK;
Mail protocol: SMTP, POP3, IMAP4;
File transfer protocol: FTP, SFTP, NFS, RSYNC, SMB, SMBMailSloPro, SMBPipPro;
streaming media protocol: h323, SIP, H248, MGCP;
Tunnel encapsulation protocol: GRE, L2TP, PPTP, GTP, GTPV2;
Network user identity authentication protocol: RADIUS, diameter, kerberos, PAP, NTLM, TACACS, WPA/WPA2, WEP;
Network cryptographic protocol: ISAKMP, ESP, AH, SSL/TLS, SSH, OCSP, X509Cer, etc.
2) And (3) stream identification: protocols in one stream are identified in units of streams.
3) Single packet identification: the identification is performed in units of data packets, each of which is identified.
As shown in fig. 1, the invention adopts a backtracking analysis server to collect and analyze the network original flow in real time and completely store the original flow; the analysis console is mainly used for providing man-machine data interaction and is used for connecting the analysis server to perform data display and backtracking analysis, and after the analysis server is successfully connected to the backtracking analysis server, various data analysis operations such as data mining, alarm analysis, abnormal communication analysis, backtracking analysis, data packet analysis and the like can be performed, and the analysis system can be simultaneously connected to a plurality of analysis servers to perform remote centralized analysis through the control console.
As shown in fig. 2, the analysis console supports one-to-many concurrent connection with the analysis servers, that is, one analysis console can manage and analyze data of a plurality of analysis servers at the same time, so that unified management analysis is convenient.
The data processing flow of the analysis server is as follows:
1) And (3) data acquisition: collecting network original flow in real time;
2) Pretreatment of data packets: sorting, copying, identifying and splitting;
3) Multithreaded network analysis: data analysis, statistics and advanced analysis;
4) And (3) data storage: data packets, statistics, behavior logs, alarm logs.
Overview of the analytical Console principles of operation:
1) Firstly, establishing connection between an analysis control console and a server;
2) The console initiates a query;
3) The server checks the validity of the inquiry, such as login authority, parameter conditions, time range and the like;
4) The server executes the inquiry and returns data;
5) The console receives and processes the return data.
As shown in fig. 3, the present invention adopts statistical analysis of network data, and according to summary statistics of network basic data and statistics of IP address, protocol, port, country, etc. These statistics are indicators that are summarized during the long-term analysis of network problems, and which to a large extent characterize the information of network traffic from which the analysis is performed.
Long-time storage of statistical results
The amount of statistics is smaller than that of the data packet, and the value of statistics is much greater than that of the data packet itself in most cases, so that it is necessary to preserve the length of time in months, since it can play a decisive role in forensic analysis.
Key IP address is configurable
The key IP address list can be customized, and the IP addresses in the list are independently stored in the key IP list.
Custom application configurability
The user can input IP address, TCP/UDP port, IP address group and address field, port group or range and the combination of several conditions in the user's definition, the system matches the combination according to the conditions, and the result after identification is counted in the application statistics table.
Summary statistics
Providing total traffic, incoming traffic, outgoing traffic, total data packets, incoming data packets, outgoing data packets, total data packet peaks, blocking summary statistics, alarm summary statistics, etc.
Protocol statistics
And providing statistics of communication flow, data packet, load flow, TCP synchronous packet and the like of the protocol.
Port statistics
And providing statistics of communication flow, data packet, load flow, TCP synchronous packet and the like of the cooperative port.
National statistics
And providing statistics of communication flow, total data packet, sending flow, receiving flow, sending data packet, receiving data packet and the like according to the country statistics.
IP address statistics
And providing statistics of communication traffic, address position, country, total traffic, total data packet, sending traffic, receiving traffic, sending data packet, receiving data packet, traffic receiving-sending ratio and the like of the IP address.
Key IP address analysis
And providing custom-added key IP addresses and carrying out communication statistics of the IP.
Applying statistical analysis
The identification of the application is carried out through the custom application such as the port, the IP and other custom conditions or the application list provided by the system is used for the identification and the display of the application.
IP session statistics
Statistics of communication endpoints, countries, total traffic, total packets, endpoint 1 traffic, endpoint 2 traffic, etc. providing IP sessions.
TCP session statistics
Statistics are provided for client IP address, client geographic location, client country, server IP address, server geographic location, server country, total traffic, traffic per second, client traffic, server traffic, etc.
UDP session statistics
Statistics of node 1IP address, node 1 geographic location, node 1 country, node 2IP address, node 2 geographic location, node 2 country, total traffic, traffic per second, total packets, etc. are provided.
Alarm log statistics
Providing an alarm name, trigger time, alarm level, alarm type, trigger IP address, port, target IP address, etc.
Behavior log statistics
Log output statistics of the triggered behavior model are provided.
Other metadata log statistics
DNS, HTTP, SSL, etc.
Export out
When the number of the related statistics is too large and various statistics need to be analyzed, the statistics need to be locally managed, the data in the list is led out to exist in the csv format, and the text is directly written for segmentation.
As shown in FIG. 4, the present invention provides rich data type analysis, such as statistics and storage analysis of alarm logs, abnormal communication behavior, IP sessions, TCP sessions, etc.
Excavating machine
As shown in fig. 5, the conventional packet traffic identification and analysis technique only analyzes the "5Tuples" in the packet header, i.e., the "five-tuple" information, including the source address, the destination address, the source port, the destination port, and the protocol type. Various statistical tables support the filtering of the security fields, such as the table containing the quintuple fields, and the filtering can be performed through the standard quintuple filtering condition, so as to achieve the aim of data mining.
As shown in FIG. 6, the invention is realized by using the Restful through the HTTPS, a third party does not need to log in and authenticate, the request adopts an HTTPS encryption mode, and each request corresponds to a unique resource identifier. The interface belongs to short connection, and each request is a new session. At present, two forms of access of GET and POST are supported. The definition of the URL is standard and unified, has strong practicability, and is convenient and easy to use. The following shapes can be adopted: http:// tsaserver: port/colasoft/tsa/{ client_name }/apiname/{ param }, wherein CLIENT NAME can be configured according to practical situations, apiname defines specific data acquisition types, and param fills in necessary request parameters, so that unified management is facilitated.
The third party initiates a request according to the specified URL and the parameters of the parameters apiname and apiname, and after receiving the request, the system processes the request and returns the data or error information of the request, and the request is ended.

Claims (4)

1. The full-flow vulnerability data acquisition method is characterized by comprising the following steps in sequence:
(1) Grabbing a network data packet;
(2) Identifying, analyzing and content restoring the common protocol;
aiming at common communication protocols in a network, including HTTP, FTP, telnet and SMTP/POP3, the step (2) adopts a rapid protocol identification engine CSTRE to rapidly identify and count the common protocols and carry out restoration analysis on transmission contents; extracting file types of HTTP communication transmission, including picture files, document files, compressed files, URLs, file sizes and duration time information; meanwhile, the fast decoding of the protocol is supported, and the decoding field can be flexibly expanded;
The rapid protocol identification engine CSTRE integrates the tag characteristics of the traffic and the load characteristics of the traffic; comprises three identification modules: single packet identification, single stream identification and multi-stream identification:
single packet identification: determining the application of the current data packet through one or more of the combination of the port, TCP Flags, the data packet length and the fingerprint;
single stream identification: determining the application of the current stream by the repetitive behavior of a series of feature packets within the same stream;
Multi-stream identification: recording the IP address port pairs of other streams of the known application obtained by the external analysis module in the engine, and quickly searching;
(3) Network full-flow collection and historical data backtracking analysis;
(4) Data backtracking and data mining.
2. The method for collecting full-flow loopholes of data according to claim 1, wherein the method comprises the following steps: the step (3) adopts a backtracking analysis server to collect and analyze the network original flow in real time and completely store the original flow; simultaneously, a communication interface and a control platform are provided for data transmission, an analysis console is used for providing man-machine data interaction, and the analysis server is connected for data display and backtracking analysis; the backtracking analysis server and the analysis console adopt a C/S technical architecture, the analysis server responds to the analysis console command in real time and returns response data in time, and when a target network appointed by analysis needs to be detected, the analysis server can be connected to the server through the analysis console to carry out remote data viewing and security event analysis.
3. The method for collecting the full-flow loopholes of the data according to claim 2, wherein the method comprises the following steps: in the step (4), various network indexes in a fixed time period are counted by using network traffic statistics, so that backtracking analysis of any network object is supported, complete backtracking of any time is supported through an original data packet, and the time of a security event, a source IP, a destination IP, an event cause, an event pass and the influence caused by the event are positioned.
4. The method for collecting the full-flow loopholes of the data according to claim 2, wherein the method comprises the following steps: the time precision of the backtracking analysis is nanosecond, and the flow data is retrieved, mined and extracted in a plurality of time windows of minutes, hours and days.
CN202111409419.9A 2021-11-25 Full-flow vulnerability acquisition method for data Active CN114143058B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111409419.9A CN114143058B (en) 2021-11-25 Full-flow vulnerability acquisition method for data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111409419.9A CN114143058B (en) 2021-11-25 Full-flow vulnerability acquisition method for data

Publications (2)

Publication Number Publication Date
CN114143058A CN114143058A (en) 2022-03-04
CN114143058B true CN114143058B (en) 2024-06-04

Family

ID=

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108418758A (en) * 2018-01-05 2018-08-17 网宿科技股份有限公司 A kind of list packet recognition methods and flow bootstrap technique
CN110535855A (en) * 2019-08-28 2019-12-03 北京安御道合科技有限公司 A kind of network event method for monitoring and analyzing and system, information data processing terminal
CN111277570A (en) * 2020-01-10 2020-06-12 中电长城网际系统应用有限公司 Data security monitoring method and device, electronic equipment and readable medium
CN111866102A (en) * 2020-07-08 2020-10-30 张肇宁 Network IP address traceability system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108418758A (en) * 2018-01-05 2018-08-17 网宿科技股份有限公司 A kind of list packet recognition methods and flow bootstrap technique
CN110535855A (en) * 2019-08-28 2019-12-03 北京安御道合科技有限公司 A kind of network event method for monitoring and analyzing and system, information data processing terminal
CN111277570A (en) * 2020-01-10 2020-06-12 中电长城网际系统应用有限公司 Data security monitoring method and device, electronic equipment and readable medium
CN111866102A (en) * 2020-07-08 2020-10-30 张肇宁 Network IP address traceability system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于回溯技术的全景信息安全防护探究;叶水勇;;电力信息与通信技术(第07期);第38-43页 *

Similar Documents

Publication Publication Date Title
US20160191549A1 (en) Rich metadata-based network security monitoring and analysis
US7623466B2 (en) Symmetric connection detection
CN101924757B (en) Method and system for reviewing Botnet
US20140059216A1 (en) Methods and systems for network flow analysis
CN108111487B (en) Safety monitoring method and system
CN103795709A (en) Network security detection method and system
US10498618B2 (en) Attributing network address translation device processed traffic to individual hosts
US9479523B2 (en) System and method for automated configuration of intrusion detection systems
KR20160019397A (en) System and method for extracting and preserving metadata for analyzing network communications
Mazhar Rathore et al. Exploiting encrypted and tunneled multimedia calls in high-speed big data environment
Ren et al. Distributed agent-based real time network intrusion forensics system architecture design
Feng et al. Active profiling of physical devices at internet scale
CN114338600A (en) Equipment fingerprint selection method and device, electronic equipment and medium
CN110636076A (en) Host attack detection method and system
Tsai et al. WhatsApp network forensics: Discovering the communication payloads behind cybercriminals
Zou et al. A flow classifier with tamper-resistant features and an evaluation of its portability to new domains
CN114143058B (en) Full-flow vulnerability acquisition method for data
Buric et al. Challenges in network forensics
US7266088B1 (en) Method of monitoring and formatting computer network data
CN112653657A (en) Network data analysis and fusion method, system, electronic equipment and storage medium
CN111343008B (en) Comprehensive measurement method and system for discovering IPv6 accelerated deployment state
Nie et al. Intrusion detection using a graphical fingerprint model
CN114143058A (en) Data full-flow vulnerability acquisition method
Gezer et al. Exploitation of ICMP time exceeded packets for a large-scale router delay analysis.
CN114553546A (en) Message capturing method and device based on network application

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant