A kind of network flow and protocol massages analysis platform based on mass data
Technical field
The present invention relates to communication fields, and in particular to a kind of network flow and protocol massages analysis based on mass data is flat
Platform.
Background technology
In recent years, it with State Grid Corporation of China's informationization constantly bringing forth new ideas in company's power generation, management and sends out
Exhibition, various network applications emerge one after another, and demand of the informationization technology personnel to each application system is more and more.These are applied full
While sufficient power informatization business demand, more and more problems is also brought.
Using more and more security breaches are brought, become the security risk of all personnel;Meanwhile certain application meetings
Individual privacy information is uploaded, personal information is caused to leak.
Some applications swallow network bandwidth without restraint, can influence use of other office workers to network:Under P2P, video
Carry and the flow of program request can bandwidth-hogging resource, cause other office workers crucial application such as MAIL, ERP obstruction or can not
It uses;The especially continuous evolution of P2P agreements uses dynamic negotiation, data message the modes such as to encrypt, to answering in the process of running
Identification proposes more stern challenge.
With the concern to application layer during security protection, it is based on deep packet inspection technical and deep stream detection technique gesture
It must go.
Company has built up including synergetic office work, mail, economic law, marketing system and Internet college etc. at present
Over two hundred set operation system, cover full company's hundreds of thousands even millions of users.Fortune inspection center, which has been directed to, now nets some networks
Flow is controlled, such as port controlling, QOS control.
But as P2P, Internet video and individual application systems are in State Grid Corporation of China's Internet exportation flow accounting
It is larger, have resulted in the case where certain application systems are accessed slowly or even can not be opened.
When the network is congested, occur to access webpage delay larger (200ms or so) often;Receiving and dispatching mail is slower, big mail
More usually need 3 times or more time etc.;
However, expanding Intranet and Internet exportation bandwidth can not fundamentally solve to cause certain applications system normal
The problems such as access, needs to be collected, analyze for all flows.In addition, in order to ensure chasing after for subsequent network safety problem
The property looked into is needed regular or is stored in real time to network flow data, this will occupy huge memory space, and store
When also need to establish corresponding address information table, this also can occupying system resources cause addressing difficult.
Invention content
In view of the deficiencies of the prior art, present invention mainly solves two aspect problems, one is adopted to network flow data
Collection and analysis, the other is the storage to data on flows.
Based on this, the present invention proposes a kind of network flow based on mass data and protocol massages analysis platform, the net
Network flow and protocol massages analysis platform by all-network data traffic (such as:Between data center, each provincial electric power company and general headquarters
Between and routine office work flow etc.) or even the protocol massages of these flows analyzed, it is following each for State Grid Corporation of China
Application system counts and flow protocol classification is significant.
Specifically, the present invention provides a kind of network flow based on mass data and protocol massages analysis platform, it is special
Sign is that the network flow and protocol massages analysis platform include:Network flow collector, data on flows memory, ETL numbers
According to extraction tool.
Preferably, the network flow and protocol massages analysis platform further include data processing equipment, the data processing
Device includes:User draw a portrait module, using portrait module, relationship analysis module and volume forecasting module.
Preferably, the data processing equipment further includes:Pushing module.
Preferably, the data on flows memory includes virtualization network storage equipment.
Preferably, the virtualization network storage equipment includes virtualization storage access equipment and multiple physical stores
Device.
On the other hand, the present invention provides a kind of network flow data processing based on mass data and storage method, special
Sign is that the method includes following step:
Step 1 is acquired network flow data using network flow collector;
Step 2 caches the network flow data acquired;
Step 3 extracts the network flow data acquired using ETL data extraction tools;
Step 4 classifies to the network flow data extracted;
Step 5 compresses the network flow data of the initial data and each classification that are acquired;
Step 6 carries out classification storage to compressed network flow data.
The present invention can realize that the storage of information Intranet and Internet exportation mass data and quick historical data are returned
It traces back analysis, network analysis break through is made to limit, it is more accurate, high in data mining, tracing and positioning and security forensics etc.
Effect, to improve network O&M level.
(1) network state is grasped comprehensively.By the construction of platform, realizes the Centralized Monitoring management to network, grasp comprehensively
All kinds of business-critical data, especially grasp key business, the operating status of key network link.
(2) fast playback network failure.Failure is quickly carried out also according to the time that failure occurs by the construction of platform
The reason of original reappears network failure phenomenon, changes from passive to active, and analysis failure occurs, avoids the generation again of same fault.
Finally, the present invention improves network fortune by grasping the O&Ms targets such as network state, fast playback network failure comprehensively
It ties up efficiency and O&M is horizontal.
Description of the drawings
Fig. 1 is the configuration diagram of the analysis platform of the present invention.
Specific implementation mode
Below in conjunction with attached drawing and embodiment, the present invention is described in detail, but not therefore by the protection model of the present invention
It encloses and is limited among the range of embodiment description.
Embodiment 1
The present invention is for state's net company information Intranet and the network flow and protocol massages of Internet exportation mass data
Storage and problem analysis, present embodiments provide a kind of network flow based on mass data and protocol massages analysis platform,
For the data in network to be handled and are stored.
The network flow and protocol massages analysis platform include:Network flow collector, data on flows memory, ETL
Data extraction tool, mutual connection between three, user can select according to it, and (1) is to network flow collector
The data of acquisition directly store, and then carry out data pick-up by ETL data extraction tools, or (2) acquire network flow
The data of device acquisition first carry out ETL data pick-ups, store the data after extracting, or (3) all carry out the data of (1) and (2)
Storage.
For the network flow collector for acquiring, the flow in the Internet exportation of target network determines rate of discharge
In involved destination address and related application, and count the uninterrupted each applied, service time and detailed
Flow information.
The data on flows memory is used to store the collected data of network flow collector institute.
The data extraction tools such as Datastage, informatica may be used in the ETL data extraction tools, use master
The Data Extraction Technology and analysis method of stream solve difficulty existing for current system, flexibly realize number subsequently to optimize O&M
According to the big storage architecture of amount, the multi dimensional analysis of real-time property and data, while protocol massages analysis is carried out to flow, to
Find out the relationship of high value.
ETL data extraction tools can be used for extracting to given link, some application or certain class application specified to specify
The traffic trends of the set period (such as whole day period or daytime period) in period (such as after 5 years) are predicted.
In addition, the platform of the present invention can also include data display equipment, by system visualization capability, user can pass through
Interface selection operation (such as selecting to specify period, set period on interface), carries out the graphical representation of data pick-up result.
Preferably, can also include data processing equipment, data disposal plant is for based on the data extracted, establishing whole
A business scenario needs business model to be used:Using portrait, volume forecasting.
In the realization of entire business scenario, the analysis method used has:
● simple statistics class:The flow of the flow of link set period, the specified set period applied, user's visit capacity,
Cross-domain visit capacity etc.
● data mining class:Link flows to trend, user's Regional Distribution of the application distribution application of link;
● intellectual analysis class:It is predicted using the sorting flow of portrait.
Preferably, data processing equipment can also include message analyzing device, and Wireshark may be used in message analysis
It realizes.By message analyzing device (module), agreement can be carried out to the mass data in Intranet and Internet exportation flow
Message analysis realizes the in-depth analysis to TCP data stream and UDP message stream, to the transmission situation of data, the trading processing of application
Process is analysed in depth, and the transmission process of data is clearly shown;Selective analysis can be carried out to the interactive process of agreement,
From the interactive process of agreement, the root of the failures such as network access exception is found, realize the quick positioning to network failure and go back
It is former.
Preferably, it can also be used for including flow quantity intelligent scheduler module, flow scheduling module:
1) the whole network flow is planned:It, can be from global analysis whole network stream by application portrait and the visualization capability of flow
The present situation of amount, by checking the flux and flow direction information of link, the specific application class situation in chain road, it is possible to specify whole network
Whole flow planning.Traffic control rule generation is carried out by analyzing data and prediction data in real time, is counted in analysis method
Class includes link flow trend, and link flows to trend, and the application point of link, intellectual analysis class includes the whole network traffic trends
Prediction, application class.
2) according to plane-generating traffic policy, if detection link flow does not meet flow planning, using precisely push
Mode reminds flow user.Flow planning strategy may include flow at times use bandwidth requirement, a point region for application makes
With bandwidth requirement, link bandwidth threshold requirement.
3) when occurring congestion in network, or in prediction network congestion occurs for emergency traffic management and control, then notify user or
Directly to collector sending flow rate control strategy, can low value application be controlled or be dredged that (CAR labels and passes through plan
Slightly routing forwarding), while business game guarantee (configuration Qos strategies) is carried out to high value applications
4) entire business needs business model to be used:Traffic trends are analyzed, using portrait, precisely push, flow quantity intelligent
Management and control.
Management and control flow is to formulate different scheduling and Managed Solution by the network demand to different application.For example, to net
Network postpones more sensitive real-time class application, provides higher Bandwidth guaranteed transmission quality, for the insensitive application of network speed,
According to period or the amount of bandwidth occupied according to bandwidth availability ratio dynamic limit, ensure, using under the premise of normal use, to close
Reason utilizes Intranet bandwidth resources, for the data transmission of backup class and inoperative period, off-peak hours transmission is dispatched to, for local
Data resource is abundant, pushes related local resource information, avoids strange land from inquiring or download, occupies wan resource etc..
Preferably, it can also include the intelligent assurance module of application, be used for:
1) this application system is supported to the bandwidth situation that entire application system uses by the analysis to application system
Bandwidth usage is analyzed, and is issued QoE strategies to collector, is provided the bandwidth of service to application system external and support is answered
It is ensured with the bandwidth that system uses.
2) what entire analysis used statistical method is the bandwidth usage of application, the bandwidth situation of support system.Using
Excavation and intelligent analysis method are to apply relationship, traffic trends prediction.
The integrated stand composition of the present invention passes through as shown in Figure 1, first by network flow collector acquisition real-time traffic data
Real-time traffic data collection module passes through the ETL works of big data platform after the data on flows acquired in real time is stored local
Tool completes data pick-up and arrangement.
Embodiment 2
In the present embodiment, the storage to network flow data is primarily focused on, the data volume of network flow data is huge, such as
Fruit is individually for it, and to be arranged storage device cost higher, therefore, in the present embodiment, network flow data is packaged virtual by storage
Change access equipment to store it in the physical store of analyzed target.Storage Virtualization access equipment mentioned herein can be with
Using virtual storage controller or other equipment.When needing to store network flow data, network flow data is compressed into
The compressed package of predefined size, is sent to virtual storage controller, and virtual storage controller is used to control the storage of analyzed target
Network flow data compressed package is stored in corresponding physical storage by device, virtual storage controller, multiple physical storages
Distributed storage is constituted by virtual storage controller.
During carrying out data storage using distributed storage, need multiple storage devices passing through virtual memory control
Device processed establishes storage pool, and the storage resource in storage pool is managed collectively and is distributed.And when being managed to storage pool,
Since using distributed storage, this just needs to establish huge address mapping table, by every burst of data of each user
Address of cache is established, needs to establish a large amount of addressing data mapping table, to not only occupy the resource of virtual storage controller, but also drop
Low access efficiency.
For the above problem present in existing virtualization storage, a kind of new storage method is present embodiments provided,
It is especially suitable for storage network flow data.
The date storage method of the present embodiment includes:
Step 1 stores network flow data to compress packet form at monomer file according to predefined size;
Network flow data compressed package is sent to virtualization storage access equipment by step 2.
Step 3, virtualized storage receive the data, are buffered in temporal cache and (are stored in caching), and sentence
The size for the compression data packet of breaking searches the blank sector in physical storage device, distribution based on acquired total data size
Corresponding target storage domain, obtains the address table in target storage domain.Then, compression data packet is divided into several data sheets
Member, when compressing data packet distributes target storage domain, each memory block corresponds to a data cell, and in memory block
When domain is distributed, the memory space of certain byte is reserved (that is, the memory block for each connected storage at least one block in the block
The memory capacity of block is slightly larger than the size of corresponding data unit), then, establishes target storage data and target storage block reflects
Firing table.Next, virtualization storage access equipment presses memory block to each data cell in received compression data packet
Block is packaged.When packaged, it is judged as whether the address of the memory block of adjacent data cell distribution is continuous, if consecutive number
The storage address distributed according to unit is continuous, then is directly stored to the data cell, if depositing for some data cell distribution
The address in storage area domain and the address distributed for its previous data cell are discontinuous, then are packaged in previous data cell
When, using original write-in data as data subject, association address information is added at data subject end (or hand end), this is associatedly
Location information storage is in the memory space of reserved byte.For example, when data store, for the first data sheet of write-in
Member first determines whether the storage address distributed for its subsequent second data cell and its storage address are continuous, if continuously,
It is then not processed, directly stores, if discontinuously, the second data cell is added (alternatively, can be at the end of the data cell
Include the address of next data cell discontinuously stored) predistribution storage address information the second data cell is sentenced
Whether the storage address distributed for its subsequent third data cell of breaking and its storage address are continuous, if continuously, not doing and locating
Reason directly stores, if discontinuously, the address information of third data cell is added at the end of the data cell, and so on,
The address information of the first data cell is added at the end of the last one data cell, to form closed loop.Then, by the number of encapsulation
It is stored according to according to the target storage domain pre-allocated before, then, deletes in virtualized storage and remove the first data
Data are only written in address mapping table except the address mapping table of unit by the storage address of the first data cell and entirely
In information update to the address mapping table of virtualized storage.
If user needs to read network flow data, opposite when mode of operation is with deposit.
Virtualized storage determines the first address for the data to be read, and then, the virtualized storage judges
Read whether data are network flow data compressed packages, if not network flow data compressed package, then normal to read, if network
Data on flows compressed package is obtained from address mapping table the in the big file of monomer then when being read to network flow data compressed package
Address corresponding to one data cell, is read to spatial cache, and opening dress is carried out to the first data cell, will be original
Whether the first data cell returns to user, then, judge in the first data cell to include association address information, if including association
Address information, then the association address information added when being encapsulated according to the first data cell carry out the reading of next data cell, if
Do not include, then read the second data cell in next sequence address of the first data cell, it is returned via Cloud Server
To user, and so on, until reading the last one data cell.
The present embodiment can greatly simplify address mapping table so that address mapping relation is more clear, for accounting for for address
With situation, it is only necessary to pass through a 0-1 marker bit, so that it may which be marked with the occupancy situation to address to be embodied in completely
In address mapping table, or table statistics can be occupied by address.
The foregoing is merely presently preferred embodiments of the present invention, not does limitation in any form to the present invention, all at this
Within the spirit and principle of invention, any simple modification made to the above embodiment, equivalent according to the technical essence of the invention
Variation and modification, still fall within protection scope of the present invention.
Although the principle of the present invention is described in detail above in conjunction with the preferred embodiment of the present invention, this field skill
Art personnel are it should be understood that above-described embodiment is only the explanation to the exemplary implementation of the present invention, not to present invention packet
Restriction containing range.Details in embodiment is simultaneously not meant to limit the scope of the invention, without departing substantially from the present invention spirit and
In the case of range, any equivalent transformation, simple replacement based on technical solution of the present invention etc. obviously changes, and all falls within
Within the scope of the present invention.