CN105376110A - Network data packet analysis method and system in big data stream technology - Google Patents

Network data packet analysis method and system in big data stream technology Download PDF

Info

Publication number
CN105376110A
CN105376110A CN201510703275.6A CN201510703275A CN105376110A CN 105376110 A CN105376110 A CN 105376110A CN 201510703275 A CN201510703275 A CN 201510703275A CN 105376110 A CN105376110 A CN 105376110A
Authority
CN
China
Prior art keywords
message
network
time
key
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510703275.6A
Other languages
Chinese (zh)
Other versions
CN105376110B (en
Inventor
陈红
朱梦源
谢朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eccom Network System Co Ltd
Original Assignee
Eccom Network System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eccom Network System Co Ltd filed Critical Eccom Network System Co Ltd
Priority to CN201510703275.6A priority Critical patent/CN105376110B/en
Publication of CN105376110A publication Critical patent/CN105376110A/en
Application granted granted Critical
Publication of CN105376110B publication Critical patent/CN105376110B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]

Abstract

The invention provides a network data packet analysis method and system in a big data stream technology. The method comprises a step of retransmission index analysis and/or a step of time-delay index analysis. The invention further provides a device for calculating the network index of the network data packet in the big data stream technology. Due to combination with big data Spark, the expansibility and the maintainability of the system are greatly enhanced; and as time goes on, the network structure is more and more complex, the network flow is higher and higher, the capacity of the system needs expanding, the expansibility of Spark is well used, and the advantages of Spark clusters are effectively utilized.

Description

Analytical method and the system of network packet is realized with high amount of traffic formula technology
Technical field
The present invention relates to high amount of traffic formula technical field, be specifically related to the analytical method and the system that realize network packet with high amount of traffic formula technology, particularly utilize the streaming technology of large data Computational frame Spark for real-time calculating, build based on network data packet analysis system that is cluster distributed, low delay.
Background technology
Along with the development of large data, the processing requirements of people to large data is also more and more higher, and original batch processing framework MapReduce is applicable to calculated off-line, but cannot the higher business of requirement of real time, as real-time recommendation, user behavior analysis etc.SparkStreaming is the real-time Computational frame be based upon on Spark, the API enriched provided by it, the high-speed execution engine based on internal memory, and user can ask application in conjunction with streaming, batch processing and mutual audit trial.Wherein, Spark is the universal parallel framework of the class HadoopMapReduce that UCBerkeleyAMPlab increases income.
Network traffic analysis system great majority on existing market are flow distribution of each layer in Real-Time Monitoring user network seven-layer structure, carry out the comprehensive analysis of agreement, flow.Along with network application is more and more extensive, network size day by day increases, the business carried in network is also more and more abundanter, so the treating capacity of network traffic analysis is very huge, traditional technology can not be satisfied with this type of super large data processing, and the ability that the framework of system also determines capacity extension is also a kind of bottleneck.
Through retrieving prior art, find following coordinate indexing result.
Coordinate indexing result 1:
Application number: 200810171806.1
Title: based on the network flow analysis method of application layer service analysis
This publication disclose a kind of method being applied to network traffic analysis, it is trace analysis that Internet service is conversated, extract the flow of each session, QoS (QoS), session state information etc., these information forms service conversation statistical information data storehouse.This patent documentation Corpus--based Method theory, solve under the develop rapidly of current network applied business upgrades frequent situation, by the complete analysis to application layer service data, avoid tradition can only analyze the following information of the 4th layer protocol of TCP/IP drawback based on CiscoNetflow technology, avoid NetFlow based on the information distortion of the data sampling techniques of sampling statistics, in network traffic analysis, QoS measurement, abnormal flow identification, have very important significance.The present invention is based on application layer business detection technique and traffic statistics measuring technology, statistics is accurate, is convenient to the network operation under complex network environment, secure localization, quality control of the business etc.
Technical essential compares:
The index of the flow, abnormal flow, QoS and so on of this patent documentation detection network application, monitoring for concrete business aspect does not almost relate to, so just general network application common index monitoring, monitoring capacity for the concrete service port needing key monitoring just seems not enough, and the abnormal index of concrete business cannot represent comprehensively.Abnormality alarming aspect is also fairly simple, does not have emphasis point out and completely to represent.
The invention provides the network monitoring of service port aspect, not only have the monitoring of network traffics, index such as 0 window, the TCP monitored in addition on concrete business network port retransmits, applies time delay, client time delay, network delay.And the alarm view of various dimensions is provided in alarm, network and service exception situation can be fullyed understand timely, help location and solve abnormal to provide in time, reliably, reference frame accurately.
Coordinate indexing result 2:
Application number: 201310749557.0
Title: a kind of based on distributed network traffic analysis system and method
This publication disclose a kind of based on distributed network traffic analysis system and method.This patent documentation uses distributed computing technology, constructs the network traffic analysis system that can be used for large-scale network traffic data analysis.Described system comprises: Web server, flow analysis system cluster and file server.Described system is first by flow information in flow collection module acquires network, then network layer, transport layer and the application layer message in described original flow information is extracted, again by carrying out analyzing and processing to described network layer, transport layer and application layer message, mainly total flow situation, IP to IP data on flows, IP layer network data message and application layer protocol information are analyzed, for enterprises and institutions user provide convenience, quick and safe network online service.
Technical essential compares:
Although this patent documentation have employed distributed framework, but there is the deficiency of Single Point of Faliure, system extension aspect.For network flow data statistics and represent aspect and also have obvious deficiency, too single, without concrete network index as retransmitted, the analysis of time delay.
The present invention adopts aggregated structure, has clear superiority for the autgmentability of system, robustness.Network Development is very quick, and it is also that geometry rank increases that flow increases, so adopt the aggregated structure that is combined with large data can the infinite expanding of resolution system capacity very well.Analysis indexes has more specific aim, as 0 window, TCP re-transmission, time delay etc.
In sum, prior art obviously also exists weak point, but along with the appearance of large data technique, and constantly ripe, can consider that the advantage in conjunction with it is applied in system, thus can be good at improved system.
Summary of the invention
For defect of the prior art, the object of this invention is to provide a kind of analytical method and the system that realize network packet with high amount of traffic formula technology.
According to a kind of analytical method realizing network packet with high amount of traffic formula technology provided by the invention, comprise and retransmit index analysis step and/or time delay index analysis step;
Described re-transmission index analysis step, comprises the steps:
Steps A: the feature string obtaining each TCP message, is specially:
By the test serial number seq in TCP message header, confirmation ack, source IP, object IP with character string forms composition characteristic character string, wherein, source IP represents that in ICP/IP protocol, transmit leg IP, object IP represent recipient IP in ICP/IP protocol;
Step B: the quantity of adding up identical feature string, retransmits message amount using the quantity of described identical feature string as TCP;
Described time delay index analysis step, comprises the steps:
Step 1: by data flow temporally T the time interval carry out burst;
Step 2: operated the continuous data that obtained by burst becomes time T DStream data set as the data transaction of a computing by the sliding window of the real-time Computational frame of SparkStreaming;
Step 3: the DStream data set of time T is carried out map and is converted to message set map, then carries out a groupByKey operation, produces the message set map that key value is unique by message set map; Wherein, key represents the key in message set map;
Step 4: the message set map that the message amount extracted from the unique message set map of key value is greater than 2, forms a new message set map;
Step 5: travel through described new message set map, calculates the time interval between the value in described new message set map, that is:
T a=T 2-T 1
T b=T 3-T 2
T c=T 3-T 1
Wherein, T arepresent application delay time value, T brepresent client delay time value, T crepresent network delay time value, T 1represent first time handshake message timestamp, T 2represent second handshake message time stamp, T 3represent third time handshake message timestamp;
Step 6: by all T in time T a, T b, T ccalculate mean value and maximum respectively as network delay index.
Preferably, in message set map:
The computing formula of the key of the handshake message of SYN=1, ACK=0 that client sends is:
Key=source IP+ source port+object IP+ destination interface+test serial number seq;
The SYN=1 that service end is replied, the computing formula of the key of the confirmation message of ACK=1 is:
Key=object IP+ destination interface+source IP+ source port+(message confirmation ack-1);
The computing formula of the key of other messages is:
Key=source IP+ source port+object IP+ destination interface+(test serial number seq-1).
According to a kind of network index calculation element realizing network packet with high amount of traffic formula technology provided by the invention, comprise and retransmit index analysis device and/or time delay index analysis device;
Described re-transmission index analysis device, comprises as lower device:
Acquisition device: for obtaining the feature string of each TCP message, be specially:
By the test serial number seq in TCP message header, confirmation ack, source IP, object IP with character string forms composition characteristic character string, wherein, source IP represents that in ICP/IP protocol, transmit leg IP, object IP represent recipient IP in ICP/IP protocol;
Statistic device: for adding up the quantity of identical feature string, the quantity of described identical feature string is retransmitted message amount as TCP;
Described time delay index analysis device, comprises as lower device:
Slicing apparatus: for by data flow temporally T the time interval carry out burst;
First conversion equipment: for operating by the sliding window of the real-time Computational frame of SparkStreaming the DStream data set that the continuous data obtained by burst converts time T to;
Second conversion equipment: be converted to message set map for the DStream data set of time T is carried out map, then carries out a groupByKey operation, produces the message set map that key value is unique by message set map; Wherein, key represents the key in message set map;
Extraction element: for the message set map that the message amount extracted from the unique message set map of key value is greater than 2, form a new message set map;
Calculation element: for traveling through described new message set map, calculate the time interval between the value in described new message set map, that is:
T a=T 2-T 1
T b=T 3-T 2
T c=T 3-T 1
Wherein, T arepresent application delay time value, T brepresent client delay time value, T crepresent network delay time value, T 1represent first time handshake message timestamp, T 2represent second handshake message time stamp, T 3represent third time handshake message timestamp;
Processing unit: for by all T in time T a, T b, T ccalculate mean value and maximum respectively as network delay index.
Preferably, in message set map:
The computing formula of the key of the handshake message of SYN=1, ACK=0 that client sends is:
Key=source IP+ source port+object IP+ destination interface+test serial number seq;
The SYN=1 that service end is replied, the computing formula of the key of the confirmation message of ACK=1 is:
Key=object IP+ destination interface+source IP+ source port+(message confirmation ack-1);
The computing formula of the key of other messages is:
Key=source IP+ source port+object IP+ destination interface+(test serial number seq-1).
According to a kind of analytical system realizing network packet with high amount of traffic formula technology provided by the invention, comprise as lower device:
Self-defined network analysis Index module, for the network index of self-defined configuration Water demand;
Network index computing module, for the network packet received being sent in real-time Computational frame SparkStreaming, with the signature identification according to network packet, analytical calculation network index;
Described network index computing module comprises the above-mentioned network index calculation element realizing network packet with high amount of traffic formula technology.
Preferably, also comprise following any one or appoint multiple device:
Alarm module, for carrying out alarm and storage for the data exceeding baseline threshold values in network index, and produces alarm record;
Service path module, for going out network topological diagram according to the IP address automatic detection that there is access relation in whole network;
Memory module, for being stored in the database of distributed memory system by the network index calculated;
Aggregate query module, for inquiring about described network index.
Compared with prior art, the present invention has following beneficial effect:
1, the present invention is due to the combination with large data Spark, and the autgmentability of system, maintainability obtain very large enhancing.Passing in time, network configuration becomes increasingly complex, and network traffics are increasing, and when needing expanding system capacity, the autgmentability of Spark is embodied very well, effectively make use of the advantage of Spark cluster.
2, in the present invention, the granularity of Service-Port rank is monitored, and can monitor embody rule situation, and is not only the monitoring of network level.
3, the present invention is by automatically producing alarm baseline threshold values, and such baseline just can dynamic conditioning in time, and monitor control index has more intelligent.
4, the present invention utilizes the large data storage such as HBASE can unconfined extension storage space, and can dynamically increase.
Accompanying drawing explanation
By reading the detailed description done non-limiting example with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is system configuration schematic diagram.
Fig. 2 is the method flow diagram of time delay index analysis step.
Fig. 3 is the method flow diagram retransmitting index analysis step.
Embodiment
Below in conjunction with specific embodiment, the present invention is described in detail.Following examples will contribute to those skilled in the art and understand the present invention further, but not limit the present invention in any form.It should be pointed out that to those skilled in the art, without departing from the inventive concept of the premise, some changes and improvements can also be made.These all belong to protection scope of the present invention.
According to a kind of analytical system realizing network packet with high amount of traffic formula technology provided by the invention, comprise as lower device:
Self-defined network analysis Index module, for the network index of self-defined configuration Water demand; Particularly, self-defined network analysis Index module, network analysis index is by flexible accurate self-defined configuration, various pel can be placed in configuration view optional position by objective interface, and support various drag and drop, movement, stretched operation, the network index making view represent each node of business pointedly provides possibility.
Network index computing module, for the network packet received being sent in real-time Computational frame SparkStreaming, with the signature identification according to network packet, analytical calculation network index; Particularly, network index computing module, be sent in real-time Computational frame SparkStreaming according to the network packet received, Spark is according to the signature identification of network packet, analytical calculation network index, index comprises flow (total flow, influent stream amount, outflow), bag quantity (total quantity, enter quantity, go out quantity), 0 window, TCP retransmit, application time delay, client time delay, network delay.
Wherein, flow (total flow, influent stream amount, outflow) obtains by directly adding up TCP message quantity per second; Bag quantity (total quantity, enter quantity, go out quantity) is obtained by direct statistics TCP message quantity; 0 window is by directly judging that 0 window mark position in TCP message head obtains.
Described network index computing module comprises the above-mentioned network index calculation element realizing network packet with high amount of traffic formula technology.
Preferably, also comprise following any one or appoint multiple device:
Alarm module, for carrying out alarm and storage for the data exceeding baseline threshold values in network index, and produces alarm record; Particularly, alarm module, for the data exceeding baseline threshold values in network index, system will carry out alarm, and produce alarm record, alarm initial data also by association store, to carry out the inquiry of initial data.Alarm will present warning information from different dimensions in two ways, a kind of is the full index alarm of nearest 30 minutes, a kind of is the alarm of one day 24 hour all the period of time, user can grasp network condition fast by these two kinds of omnibearing warning information, and can accomplish very clear to the network condition of a day, quick position.
Service path module, for going out network topological diagram according to the IP address automatic detection that there is access relation in whole network; Particularly, service path module, IP address automatic detection according to there is access relation in whole network goes out network topological diagram, user can recognize current and recent access path relation according to service path, and manually can preserve node location information, to give top priority to what is the most important more clearly access path according to the demand of user.
Memory module, for being stored in the database of distributed memory system by the network index calculated; Particularly, memory module, the network index calculated is stored in the distributed memory system HBASE database of large data, utilizes the high reliability of HBASE, high-performance, the advantage such as scalable, realize erecting large-scale structure storage cluster on cheap PCServer.
Aggregate query module, for inquiring about described network index.Particularly, aggregate query module, for the flow distribution situation of aggregate query each client in server access, also comprises the TOP rank of client traffic distribution.Gathering of warning information, user can be provided in section sometime, to have understanding of overall importance to warning information.
The described network index calculation element realizing network packet with high amount of traffic formula technology, comprises and retransmits index analysis device and/or time delay index analysis device;
Described re-transmission index analysis device, comprises as lower device:
Acquisition device: for obtaining the feature string of each TCP message, be specially:
By the test serial number seq in TCP message header, confirmation ack, source IP, object IP with character string forms composition characteristic character string, wherein, source IP represents that in ICP/IP protocol, transmit leg IP, object IP represent recipient IP in ICP/IP protocol;
Statistic device: for adding up the quantity of identical feature string, the quantity of described identical feature string is retransmitted message amount as TCP;
Described time delay index analysis device, comprises as lower device:
Slicing apparatus: for by data flow temporally T the time interval carry out burst;
First conversion equipment: for operating by the sliding window of the real-time Computational frame of SparkStreaming the DStream data set that the continuous data obtained by burst converts time T to;
Second conversion equipment: be converted to message set map for the DStream data set of time T is carried out map, then carries out a groupByKey operation, produces the message set map that key value is unique by message set map; Wherein, key represents the key in message set map;
Extraction element: for the message set map that the message amount extracted from the unique message set map of key value is greater than 2, form a new message set map;
Calculation element: for traveling through described new message set map, calculate the time interval between the value in described new message set map, that is:
T a=T 2-T 1
T b=T 3-T 2
T c=T 3-T 1
Wherein, T arepresent application delay time value, T brepresent client delay time value, T crepresent network delay time value, T 1represent first time handshake message timestamp, T 2represent second handshake message time stamp, T 3represent third time handshake message timestamp;
Processing unit: for by all T in time T a, T b, T ccalculate mean value and maximum respectively as network delay index.
Preferably, in message set map:
The computing formula of the key of the handshake message of SYN=1, ACK=0 that client sends is:
Key=source IP+ source port+object IP+ destination interface+test serial number seq;
The SYN=1 that service end is replied, the computing formula of the key of the confirmation message of ACK=1 is:
Key=object IP+ destination interface+source IP+ source port+(message confirmation ack-1);
The computing formula of the key of other messages is:
Key=source IP+ source port+object IP+ destination interface+(test serial number seq-1).
Describedly realize the analytical system of network packet with high amount of traffic formula technology, network index calculation element can by realizing according to a kind of steps flow chart realizing the analytical method of network packet with high amount of traffic formula technology provided by the invention.The described analytical method realizing network packet with high amount of traffic formula technology can be interpreted as and describedly realize the analytical system of network packet, an embodiment of network index calculation element with high amount of traffic formula technology by those skilled in the art.
Particularly, according to a kind of analytical method realizing network packet with high amount of traffic formula technology provided by the invention, comprise and retransmit index analysis step and/or time delay index analysis step;
Described re-transmission index analysis step, comprises the steps:
Steps A: the feature string obtaining each TCP message, is specially:
By the test serial number seq in TCP message header, confirmation ack, source IP, object IP with character string forms composition characteristic character string, wherein, source IP represents that in ICP/IP protocol, transmit leg IP, object IP represent recipient IP in ICP/IP protocol;
Step B: the quantity of adding up identical feature string, retransmits message amount using the quantity of described identical feature string as TCP;
Described time delay index analysis step, comprises the steps:
Step 1: by data flow temporally T the time interval carry out burst;
Step 2: operated the continuous data that obtained by burst becomes time T DStream data set as the data transaction of a computing by the sliding window of the real-time Computational frame of SparkStreaming;
Step 3: the DStream data set of time T is carried out map and is converted to message set map, then carries out a groupByKey operation, produces the message set map that key value is unique by message set map; Wherein, key represents the key in message set map;
Step 4: the message set map that the message amount extracted from the unique message set map of key value is greater than 2, forms a new message set map;
Step 5: travel through described new message set map, calculates the time interval between the value in described new message set map, that is:
T a=T 2-T 1
T b=T 3-T 2
T c=T 3-T 1
Wherein, T arepresent application delay time value, T brepresent client delay time value, T crepresent network delay time value, T 1represent first time handshake message timestamp, T 2represent second handshake message time stamp, T 3represent third time handshake message timestamp;
Step 6: by all T in time T a, T b, T ccalculate mean value and maximum respectively as network delay index.
Preferably, in message set map:
The computing formula of the key of the handshake message of SYN=1, ACK=0 that client sends is:
Key=source IP+ source port+object IP+ destination interface+test serial number seq;
The SYN=1 that service end is replied, the computing formula of the key of the confirmation message of ACK=1 is:
Key=object IP+ destination interface+source IP+ source port+(message confirmation ack-1);
The computing formula of the key of other messages is:
Key=source IP+ source port+object IP+ destination interface+(test serial number seq-1).
More specific detail is carried out to the analytical method realizing network packet with high amount of traffic formula technology below.
Retransmit index analysis mainly according to the test serial number seq in TCP message header, confirmation ack, source IP, object IP zone bit information counting statistics.Described re-transmission index analysis step, comprises the steps:
Steps A: the feature string obtaining each TCP message, be specially: by the test serial number seq in TCP message header, confirmation ack, source IP, object IP with character string forms composition characteristic character string, namely form the character string of " test serial number seq+ confirmation ack+ source IP+ object IP ".Wherein, seq represents test serial number, and ack represents confirmation number, and source IP represents that in ICP/IP protocol, transmit leg IP, object IP represent recipient IP in ICP/IP protocol;
Step B: the quantity of adding up identical feature string, retransmits message amount using the quantity of described identical feature string as TCP.
Time delay index analysis mainly calculates according to the timestamp of TCP three-way handshake, and TCP three-way handshake process is as follows:
First time shakes hands: user end to server initiates TCP connection request, sends the handshake message that is designated SYN=1, ACK=0, the test serial number seq=x of this message;
Wherein, the flag of shaking hands used when SYN represents that TCP/IP connects, the value of SYN is that 1 expression SYN flag equals 1; ACK represents confirmation flag, and the value of ACK is that 0 expression confirms that flag equals 0; X is positive integer, and seq represents test serial number;
Second handshake: server receive client send be designated the handshake message of SYN=1, ACK=0 after, reply one and be designated SYN=1, the confirmation message of ACK=1, ack=x+1, the test serial number seq=y of this confirmation message, confirmation ack=x+1;
Wherein, the value of ACK is that 1 expression acknowledgement indicator position equals 1; Y is positive integer, and ack represents the confirmation number of message;
Third time shakes hands: the confirmation number that client receives service end is after the confirmation message of ack=x+1, returns the last item confirmation message being designated ACK=1, the confirmation ack=y+1 of this last item confirmation message to server;
According to message identification Sum fanction such as the step process of three-way handshake and SYN, ACK, seq, ack of message, time delay index analysis step (Spark algorithm), comprises the steps:
Step 1: by data flow temporally T the time interval (such as time T can be one minute) carry out burst, namely when initialization StreamingContext, fixed time spacing parameter (being such as appointed as 1 second); Wherein, StreamingContext represents Streaming context type in Spark;
Step 2: operated the continuous data (such as the continuous data of a minute) that obtained by burst becomes time T DStream data set as the data transaction of a computing by the sliding window of the real-time Computational frame of SparkStreaming; Be one minute for time T, the width of the sliding window when carrying out Window operation is 60 seconds, and window moving step length is 1 second; Wherein, DStream refers to a kind of data type in Spark, and DStream data set refers to the set of DStream data;
Step 3: the DStream data set of a minute is carried out map conversion, the key=source IP+ source port+object IP+ destination interface+seq of the message of SYN=1, ACK=0 that client sends, the SYN=1 that service end is replied, key=object IP+ destination interface+source IP+ source port+(ack-1) of the confirmation message of ACK=1, key=source IP+ source port+object IP+ destination interface+(seq-1) of other messages, then map is carried out a groupByKey operation, produce the message set map of key unique value; Wherein, map refers to message set, and map conversion is the one operation in Spark, key represents the key of key/value data in map, source IP represents that in ICP/IP protocol, transmit leg IP, object IP represent recipient IP in ICP/IP protocol, and groupByKey is a kind of handling function in Spark;
Step 4: message set map step 3 obtained filters out the map that message amount is greater than 2, namely there is the packet filtering of three-way handshake out, forms a new map;
Step 5: the new map that step 4 obtains is traveled through, the time interval between the value that calculating makes new advances in map (i.e. three message bags), that is:
T a=T 2-T 1
T b=T 3-T 2
T c=T 3-T 1
Wherein, T arepresent application delay time value, T brepresent client delay time value, T crepresent network delay time value, T 1represent first time handshake message timestamp, T 2represent second handshake message time stamp, T 3represent third time handshake message timestamp;
Step 6: by all T in a minute at bt c(application time delay, client time delay, network delay) calculates mean value and maximum, i.e. network delay index respectively.
More specific detail is carried out to the present invention below.
Realize in an embodiment of the analytical system of network packet described with high amount of traffic formula technology, overall system architecture as shown in Figure 1, arranges SPAN at the port of switch 100, then this port linking probe 20, probe 20 adopts High_speed NIC, only does collection network data message and uses.Wherein, SPAN is expressed as a kind of Switched Port Analyzer of switch;
Network data message filters out by demand according to the parameter of configuration by collector 30, sends into central processing module (central processing module comprises packet handing module 40 and SOCKET interface 50).
Packet handing module 40 points of three thread process, a threads store warehouse-in, a thread is by the packet of network data message press-in transmit queue, and a thread does simple flow process.
Message is sent to the SparkStreaming being connected to this interface by SOCKET interface 50.This interface is monitored according to configured port, and each probe has oneself different port, so probe listening port.
Memory module 70 adopts large data HBASE Cluster Database, stores original message and network index.
SOCKET interface 50 uses SOCKET (ICP/IP protocol calling interface) that original message header is sent to SparkStreaming, SparkStreaming adopts Flow Technique to receive message information continuously, these messages are admitted to spark computing module 60, the message that this module adopts sliding window mode treatment to receive, the width of sliding window is 1 minute, just calculate the network index value of all packets in this minute every each minute, after network index value has calculated, directly store warehouse-in HBASE.Per minute differ to establish a capital there is network index value, so desired value is discontinuous discrete data.
Further particularly, the delay algorithm performed by spark computing module 60 as shown in Figure 2,
Step S201, the contextual time parameter of initialization StreamingContext, Streaming 1 second, then obtains the DStream duration data stream of assigned ip and port.Wherein, Streaming represents stream data process;
Step S202, window makes slide with the time width of 60 seconds, the sliding step of window 1 second, and the flatMap operation of DStream can change duration data circulation into Record record.Wherein, flatMap represents a kind of handling function in Spark, and Record represents a kind of record set in Spark;
Step S203, carries out the map conversion of K/V after the window operation of a minute, need to process according to TCP three-way handshake rule Record record when producing key.Wherein, K/V represents that key/value (key/value) becomes a partner data;
Step S204, when the mark that client sends message is SYN=1, ACK=0, message is first SYN message of TCP three-way handshake, key=source IP+ source port+object IP+ destination interface+seq.Wherein, the handshake message used when SYN message represents that TCP/IP connects;
Step S205, after service end receives client SYN message, send and be designated SYN=1, the confirmation message of ACK=1, so the key of key=object IP+ destination interface+these two messages of source IP+ source port+(ack-1), above-mentioned SYN and ACK will be equal.
Step S206, key=source IP+ source port+object IP+ destination interface+(seq-1) of other all messages, like this if the key of three-way handshake message can be equal.
Step S207, the map of K/V value is done a groupByKey operation, produce the map that new key is unique, the value of map is message set.
Step S208, filters out the record that message collective number is greater than 2 by new map, be exactly the message set of three-way handshake.
Step S209, travels through map, calculates the time interval between three bags, wherein:
T application time delay=T second handshake message-T handshake message for the first time
T client time delay=T handshake message for the third time-T second handshake message
T network delay=T handshake message for the third time-T handshake message for the first time
Step S210, by all desired values in a minute and delay time calculating mean value and maximum, last network index value stores warehouse-in.
Therefore, the invention has the advantages that: 1) analytical challenge of current network data is that data volume constantly can increase increasing with number of users along with business is constantly expanded, the system that meets or exceeds very soon originally designed bearing capacity, so by the combination with large data, utilize the advantage of large data sets group, can seamless extension process node, greatly improve system reliability, autgmentability.2) network traffic analysis product general on Vehicles Collected from Market is compared to, the present invention is by the definition of concrete client, server and distributed probe deployment, make monitor particles degree more careful, be more suitable for the network index monitoring of concrete business, service end response time delay or the client time delay of business can be monitored, more quick positioning question point, saves the troubleshooting time.3) by historical data statistical analysis, automatically produce the alarm baseline threshold values of the history same period, make the present invention have machine learning ability, avoid the artificial complexity arranged as far as possible, improve the convenience that user uses.
Those skilled in the art will know that, except realizing except system provided by the invention and each device thereof in pure computer readable program code mode, system provided by the invention and each device thereof can be made to realize identical function with the form of gate, switch, application-specific integrated circuit (ASIC), programmable logic controller (PLC) and embedded microcontroller etc. by method step being carried out programming in logic completely.So system provided by the invention and every device thereof can be considered to a kind of hardware component, and to the structure that also can be considered as the device realizing various function in hardware component comprised in it; Also the device being used for realizing various function can be considered as not only can be implementation method software module but also can be structure in hardware component.
Above specific embodiments of the invention are described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, those skilled in the art can make a variety of changes within the scope of the claims or revise, and this does not affect flesh and blood of the present invention.When not conflicting, the feature in the embodiment of the application and embodiment can combine arbitrarily mutually.

Claims (6)

1. realize an analytical method for network packet with high amount of traffic formula technology, it is characterized in that, comprise and retransmit index analysis step and/or time delay index analysis step;
Described re-transmission index analysis step, comprises the steps:
Steps A: the feature string obtaining each TCP message, is specially:
By the test serial number seq in TCP message header, confirmation ack, source IP, object IP with character string forms composition characteristic character string, wherein, source IP represents that in ICP/IP protocol, transmit leg IP, object IP represent recipient IP in ICP/IP protocol;
Step B: the quantity of adding up identical feature string, retransmits message amount using the quantity of described identical feature string as TCP;
Described time delay index analysis step, comprises the steps:
Step 1: by data flow temporally T the time interval carry out burst;
Step 2: operated the continuous data that obtained by burst becomes time T DStream data set as the data transaction of a computing by the sliding window of the real-time Computational frame of SparkStreaming;
Step 3: the DStream data set of time T is carried out map and is converted to message set map, then carries out a groupByKey operation, produces the message set map that key value is unique by message set map; Wherein, key represents the key in message set map;
Step 4: the message set map that the message amount extracted from the unique message set map of key value is greater than 2, forms a new message set map;
Step 5: travel through described new message set map, calculates the time interval between the value in described new message set map, that is:
T a=T 2-T 1
T b=T 3-T 2
T c=T 3-T 1
Wherein, T arepresent application delay time value, T brepresent client delay time value, T crepresent network delay time value, T 1represent first time handshake message timestamp, T 2represent second handshake message time stamp, T 3represent third time handshake message timestamp;
Step 6: by all T in time T a, T b, T ccalculate mean value and maximum respectively as network delay index.
2. the analytical method realizing network packet with high amount of traffic formula technology according to claim 1, is characterized in that, in message set map:
The computing formula of the key of the handshake message of SYN=1, ACK=0 that client sends is:
Key=source IP+ source port+object IP+ destination interface+test serial number seq;
The SYN=1 that service end is replied, the computing formula of the key of the confirmation message of ACK=1 is:
Key=object IP+ destination interface+source IP+ source port+(message confirmation ack-1);
The computing formula of the key of other messages is:
Key=source IP+ source port+object IP+ destination interface+(test serial number seq-1).
3. realize a network index calculation element for network packet with high amount of traffic formula technology, it is characterized in that, comprise and retransmit index analysis device and/or time delay index analysis device;
Described re-transmission index analysis device, comprises as lower device:
Acquisition device: for obtaining the feature string of each TCP message, be specially:
By the test serial number seq in TCP message header, confirmation ack, source IP, object IP with character string forms composition characteristic character string, wherein, source IP represents that in ICP/IP protocol, transmit leg IP, object IP represent recipient IP in ICP/IP protocol;
Statistic device: for adding up the quantity of identical feature string, the quantity of described identical feature string is retransmitted message amount as TCP;
Described time delay index analysis device, comprises as lower device:
Slicing apparatus: for by data flow temporally T the time interval carry out burst;
First conversion equipment: for operating by the sliding window of the real-time Computational frame of SparkStreaming the DStream data set that the continuous data obtained by burst converts time T to;
Second conversion equipment: be converted to message set map for the DStream data set of time T is carried out map, then carries out a groupByKey operation, produces the message set map that key value is unique by message set map; Wherein, key represents the key in message set map;
Extraction element: for the message set map that the message amount extracted from the unique message set map of key value is greater than 2, form a new message set map;
Calculation element: for traveling through described new message set map, calculate the time interval between the value in described new message set map, that is:
T a=T 2-T 1
T b=T 3-T 2
T c=T 3-T 1
Wherein, T arepresent application delay time value, T brepresent client delay time value, T crepresent network delay time value, T 1represent first time handshake message timestamp, T 2represent second handshake message time stamp, T 3represent third time handshake message timestamp;
Processing unit: for by all T in time T a, T b, T ccalculate mean value and maximum respectively as network delay index.
4. the network index calculation element realizing network packet with high amount of traffic formula technology according to claim 3, is characterized in that, in message set map:
The computing formula of the key of the handshake message of SYN=1, ACK=0 that client sends is:
Key=source IP+ source port+object IP+ destination interface+test serial number seq;
The SYN=1 that service end is replied, the computing formula of the key of the confirmation message of ACK=1 is:
Key=object IP+ destination interface+source IP+ source port+(message confirmation ack-1);
The computing formula of the key of other messages is:
Key=source IP+ source port+object IP+ destination interface+(test serial number seq-1).
5. realize an analytical system for network packet with high amount of traffic formula technology, it is characterized in that, comprise as lower device:
Self-defined network analysis Index module, for the network index of self-defined configuration Water demand;
Network index computing module, for the network packet received being sent in real-time Computational frame SparkStreaming, with the signature identification according to network packet, analytical calculation network index;
Described network index computing module comprises the network index calculation element realizing network packet with high amount of traffic formula technology according to claim 3.
6. the analytical system realizing network packet with high amount of traffic formula technology according to claim 5, is characterized in that, also comprise following any one or appoint multiple device:
Alarm module, for carrying out alarm and storage for the data exceeding baseline threshold values in network index, and produces alarm record;
Service path module, for going out network topological diagram according to the IP address automatic detection that there is access relation in whole network;
Memory module, for being stored in the database of distributed memory system by the network index calculated;
Aggregate query module, for inquiring about described network index.
CN201510703275.6A 2015-10-26 2015-10-26 The analysis method and system of network packet are realized with big data streaming technology Active CN105376110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510703275.6A CN105376110B (en) 2015-10-26 2015-10-26 The analysis method and system of network packet are realized with big data streaming technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510703275.6A CN105376110B (en) 2015-10-26 2015-10-26 The analysis method and system of network packet are realized with big data streaming technology

Publications (2)

Publication Number Publication Date
CN105376110A true CN105376110A (en) 2016-03-02
CN105376110B CN105376110B (en) 2018-10-30

Family

ID=55377937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510703275.6A Active CN105376110B (en) 2015-10-26 2015-10-26 The analysis method and system of network packet are realized with big data streaming technology

Country Status (1)

Country Link
CN (1) CN105376110B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648934A (en) * 2016-12-27 2017-05-10 中科天玑数据科技股份有限公司 Method and system for high-efficiency data transmission between Impala and HBase
CN107766413A (en) * 2017-09-05 2018-03-06 珠海宇能云企科技有限公司 A kind of implementation method of real-time stream aggregate query
CN108289125A (en) * 2018-01-26 2018-07-17 华南理工大学 TCP sessions recombination based on Stream Processing and statistical data extracting method
CN108389134A (en) * 2018-03-20 2018-08-10 张家林 The monitoring system and method for Portfolio Selection
CN108400992A (en) * 2018-03-06 2018-08-14 电信科学技术第五研究所有限公司 A kind of streaming traffic data protocol analysis software frame realization system and method
CN108989152A (en) * 2018-08-08 2018-12-11 成都俊云科技有限公司 Obtain the method and device and computer storage medium of network delay
CN109347701A (en) * 2018-11-09 2019-02-15 公安部第三研究所 Realize the system and method that Network Isolation properties of product are carried out with testing and control
CN109446200A (en) * 2018-10-30 2019-03-08 中国银联股份有限公司 A kind of method and device of data processing
CN109617734A (en) * 2018-12-25 2019-04-12 北京市天元网络技术股份有限公司 Network operation capability analysis method and device
CN112291280A (en) * 2020-12-31 2021-01-29 博智安全科技股份有限公司 Network flow monitoring and auditing method and system
CN112311815A (en) * 2020-12-31 2021-02-02 博智安全科技股份有限公司 Monitoring, auditing and anti-cheating method and system under training competition
CN112994965A (en) * 2019-12-13 2021-06-18 北京金山云网络技术有限公司 Network anomaly detection method and device and server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1964288A (en) * 2005-11-11 2007-05-16 大唐移动通信设备有限公司 An analysis method and system for transmission rate of packet service data
US20090046717A1 (en) * 2007-08-15 2009-02-19 Qing Li Methods to improve transmission control protocol (tcp) performance over large bandwidth long delay links
US20120195287A1 (en) * 2011-01-28 2012-08-02 Industry-Academic Cooperation Foundation, Yonsei University Communication method using duplicated acknowledgement
CN103096356A (en) * 2013-01-21 2013-05-08 北京拓明科技有限公司 Wireless network performance analysis method
CN103957118A (en) * 2014-04-18 2014-07-30 国家电网公司 Real-time intelligent analysis method for network flow of electric power data communication network and system thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1964288A (en) * 2005-11-11 2007-05-16 大唐移动通信设备有限公司 An analysis method and system for transmission rate of packet service data
US20090046717A1 (en) * 2007-08-15 2009-02-19 Qing Li Methods to improve transmission control protocol (tcp) performance over large bandwidth long delay links
US20120195287A1 (en) * 2011-01-28 2012-08-02 Industry-Academic Cooperation Foundation, Yonsei University Communication method using duplicated acknowledgement
CN103096356A (en) * 2013-01-21 2013-05-08 北京拓明科技有限公司 Wireless network performance analysis method
CN103957118A (en) * 2014-04-18 2014-07-30 国家电网公司 Real-time intelligent analysis method for network flow of electric power data communication network and system thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHENMENG11: "TCP重传", 《BLOG.CHINAUNIX.NET/UID-15014334-ID-3451855.HTML》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648934A (en) * 2016-12-27 2017-05-10 中科天玑数据科技股份有限公司 Method and system for high-efficiency data transmission between Impala and HBase
CN106648934B (en) * 2016-12-27 2019-12-03 中国科学院计算技术研究所 A kind of efficient data transfer method and system between Impala and HBase
CN107766413A (en) * 2017-09-05 2018-03-06 珠海宇能云企科技有限公司 A kind of implementation method of real-time stream aggregate query
CN107766413B (en) * 2017-09-05 2023-07-07 珠海宇能云企科技有限公司 Method for realizing real-time data stream aggregation query
CN108289125A (en) * 2018-01-26 2018-07-17 华南理工大学 TCP sessions recombination based on Stream Processing and statistical data extracting method
CN108289125B (en) * 2018-01-26 2021-05-28 华南理工大学 TCP session recombination and statistical data extraction method based on stream processing
CN108400992A (en) * 2018-03-06 2018-08-14 电信科学技术第五研究所有限公司 A kind of streaming traffic data protocol analysis software frame realization system and method
CN108400992B (en) * 2018-03-06 2020-05-26 电信科学技术第五研究所有限公司 System and method for realizing streaming communication data protocol analysis software framework
CN108389134A (en) * 2018-03-20 2018-08-10 张家林 The monitoring system and method for Portfolio Selection
CN108989152A (en) * 2018-08-08 2018-12-11 成都俊云科技有限公司 Obtain the method and device and computer storage medium of network delay
CN109446200A (en) * 2018-10-30 2019-03-08 中国银联股份有限公司 A kind of method and device of data processing
CN109446200B (en) * 2018-10-30 2021-04-16 中国银联股份有限公司 Data processing method and device
CN109347701A (en) * 2018-11-09 2019-02-15 公安部第三研究所 Realize the system and method that Network Isolation properties of product are carried out with testing and control
CN109617734A (en) * 2018-12-25 2019-04-12 北京市天元网络技术股份有限公司 Network operation capability analysis method and device
CN112994965A (en) * 2019-12-13 2021-06-18 北京金山云网络技术有限公司 Network anomaly detection method and device and server
CN112994965B (en) * 2019-12-13 2022-09-02 北京金山云网络技术有限公司 Network anomaly detection method and device and server
CN112311815A (en) * 2020-12-31 2021-02-02 博智安全科技股份有限公司 Monitoring, auditing and anti-cheating method and system under training competition
CN112291280A (en) * 2020-12-31 2021-01-29 博智安全科技股份有限公司 Network flow monitoring and auditing method and system

Also Published As

Publication number Publication date
CN105376110B (en) 2018-10-30

Similar Documents

Publication Publication Date Title
CN105376110A (en) Network data packet analysis method and system in big data stream technology
CN108040074B (en) Real-time network abnormal behavior detection system and method based on big data
CN102202064B (en) Method for extracting behavior characteristics of Trojan communication based on network data flow analysis
US8095635B2 (en) Managing network traffic for improved availability of network services
CN104052639B (en) Real-time multi-application network flow identification method based on support vector machine
US20120099465A1 (en) Method and its devices of network tcp traffic online identification using features in the head of the data flow
CN103532940A (en) Network security detection method and device
Karimi et al. Distributed network traffic feature extraction for a real-time IDS
CN106470118B (en) A kind of application system performance method for detecting abnormality and system
CN101714952A (en) Method and device for identifying traffic of access network
CN111708667B (en) Business edge calculation method and system
CN108846275A (en) Unknown Method of Detecting Operating System based on RIPPER algorithm
CN108289125A (en) TCP sessions recombination based on Stream Processing and statistical data extracting method
CN110661807A (en) Automatic acquisition method and device for IPv6 address
CN105357071A (en) Identification method and identification system for network complex traffic
CN102780591A (en) Method and apparatus for distinguishing and sampling bi-directional network traffic at a conversation level
CN106161339B (en) Obtain the method and device of IP access relations
CN106789429B (en) A kind of adaptive low-cost SDN network link utilization measurement method and system
CN104125106A (en) Network purity detection device and method based on classified decision tree
Huang et al. A statistical-feature-based approach to internet traffic classification using machine learning
CN108540539A (en) A kind of air pollution intelligent monitor system
CN114679318A (en) Lightweight Internet of things equipment identification method in high-speed network
CN103118078B (en) The recognition methods and equipment of P2P flow
CN103457773B (en) A kind of method and device of terminal client experience management
Zhao et al. Traffic classification and application identification based on machine learning in large-scale supercomputing center

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant