CN103647790B

CN103647790B - Extra-large file protocol analytical and statistical method

Info

Publication number: CN103647790B
Application number: CN201310722859.9A
Authority: CN
Inventors: 李晓芳; 庄燕滨; 彭建华; 肖贤建
Original assignee: Changzhou Institute of Technology
Current assignee: Changzhou Dongji Road International Trade Co., Ltd.
Priority date: 2013-12-24
Filing date: 2013-12-24
Publication date: 2017-01-25
Anticipated expiration: 2033-12-24
Also published as: CN103647790A

Abstract

The invention discloses an extra-large file protocol analytical and statistical method. The extra-large file protocol analytical and statistical method comprises the steps of partitioning extra-large files into numbered small files, performing parallel synchronous analysis on the small partitioned files, outputting analysis results to the small independent files saved according to serial numbers, and then merging the analysis results, namely the small files to form a whole analysis result file. The results are directly inquired from the merged file. A tool can perform quick and efficient analytic statistic on extra-large protocol data files through the parallel partitioning analysis processing.

Description

A kind of super large file protocol analytic statistics methods

Technical field

The present invention relates to a kind of protocal analysis statistical method, particularly a kind of super large file protocol analytic statistics methods.

Background technology

Based in the data analysis system of operator, towards data be substantially mass data based on communication network, In random network, these data contents are real-time and do not fix, once these systems go wrong, position, analyze solution and ask Topic acquires a certain degree of difficulty, and at this moment generally requires to capture network data bag, position problem depending on the data APMB package capturing by analysis, looks for To solve problem the reason problem.

Analysis of network instrument mainly has sniffer, netxray and wireshark now, and in actual commercial system In, once system goes wrong, because wireshark belongs to open source projects, system provides the accident analysis personnel of company general Carry out packet capturing using wireshark, data is saved as file, then analyze these data files and with orientation problem and solve to ask Topic, in order to obtain enough network data information, this data file is often in more than 4g, and will analyze these data files, Whether sniffer or wireshark instrument, all has no idea to open, and therefore these instruments cannot analyze these big numbers According to file, so analysis solve problem there is difficulty.

Content of the invention

For problems of the prior art, it is an object of the invention to provide one kind need not manually-operated super large literary composition Part protocal analysis statistical tool.

In order to achieve the above object, the present invention employs the following technical solutions: a kind of super large file protocol analytic statistics methods, Step includes:

1) multiple super large file cutting modules are opened, file blanking punch block number is joined by the quantity of computer cpu core Put, under default condition, cutting module opens 3, by public affairs

Formula

m = rounded (\frac{totalsize}{sepsize})

n = {\begin{matrix} \begin{matrix} m + 1 . (m * sepsize < totalsize) \\ m . (m * sepsize = totalsize) \end{matrix} \end{matrix}

Obtain super large document size, cut super large parallel File, by calculating document of agreement size, is cut into the small documents that default size is 50mb, execution step afterwards super large file 2)；

2) open a list, set up file cutting index, list element records each small documents data in big file Initial, end position, execution step 3 afterwards)；

3) cutting module order obtains element from list, before element in obtaining list, this list is locked, obtains row After table element, this list is unlocked, removes the element being acquired in lists simultaneously, each module is read-only to open super large data literary composition Part, moves to the document location of element assignment, reads data from this sequence of positions and is written to a file, file name is pressed According to list element order name, until this element index is to step 2) described in end position, execution step afterwards 4)；

4) set up a list, the deficiency of data bag in file sequence number and segmentation file split in list element record, File analyzing module analytical data, when discovery packet is complete, then execution step 6), otherwise when detection deficiency of data bag It is first bag of file, then search list, otherwise execution step 5), when a upper file of this file sequence number of detection is in list In, then this element in more new-found list, this deficiency of data bag is added in the packet end of this element, no A then newly-built element, the file sequence number of record this deficiency of data bag current and this deficiency of data bag, and this Individual element adds list, execution step 6 afterwards)；

5) when deficiency of data bag be file last bag, then search list, when detect this file sequence number next Individual file in lists, then this element in more new-found list, this deficiency of data bag is added in this yuan of prime number According to the beginning of bag, an otherwise newly-built element, the file sequence number of record this deficiency of data bag current and this is imperfect Packet, and this element is added list, execution step 6 afterwards)；

6) packet that analysis module obtains is complete, then analyze this packet, and analysis result according to file Sequence number is written in analysis result file, execution step 7 afterwards)；

7) small documents and the mapping relations preserving small documents interim findings file, the little literary composition after Synchronization Analysis cutting are set up Part, and analysis result is put in the interim findings file of corresponding small documents, execution step 8 afterwards)；

8) merger interim findings file is in a unified destination file, execution step 9 afterwards)；

9) merger process is carried out to destination file, obtain the final analysis result of big file.

After technique scheme, the method have the advantages that the present invention can manually operate with regard to energy Analysis large data files.

Brief description

The flow chart of Fig. 1 present invention.

Specific embodiment

Below according to Figure of description and specific embodiment, the present invention is further explained.

As shown in figure 1, a kind of super large file protocol analytic statistics methods, step includes:

Formula

m = rounded (\frac{totalsize}{sepsize})

n = {\begin{matrix} \begin{matrix} m + 1 . (m * sepsize < totalsize) \\ m . (m * sepsize = totalsize) \end{matrix} \end{matrix}

9) destination file is carried out with merger process, notifications merger function pair analysis result file carries out merger, pressing The analysis result file merger of sequence number fraction is a complete destination file and exports.Obtain the final analysis knot of big file Really.Output analytic statisticss result.By input inquiry condition, meet the statistical result of condition according to querying condition output.

So far, that is, complete the protocal analysis statistical tool of super large file.

Claims

1. a kind of super large file protocol analytic statistics methods is it is characterised in that step includes:

1) multiple super large file cutting modules are opened, file blanking punch block number is configured by the quantity of computer cpu core, lacks Province's situation incision is cut module and is opened 3, by formula

Obtain super large document size, parallel cutting super large literary composition Part, by calculating document of agreement size, is cut into the small documents that default size is 50mb, execution step afterwards super large file 2)；

2) open a list, set up file cutting index, list element record each small documents in big file data rise Beginning, end position, execution step 3 afterwards)；

3) cutting module order obtains element from list, before element in obtaining list, this list is locked, obtains list element After element, this list is unlocked, removes the element being acquired in lists, each module is read-only to open super large data file simultaneously, Move to the document location of element assignment, read data being written to a file from this sequence of positions, file name according to List element order name, until this element index is to step 2) described in end position, execution step 4 afterwards)；

4) set up a list, the deficiency of data bag in file sequence number and segmentation file, file split in list element record Analysis module analytical data, when discovery packet is complete, then execution step 6), otherwise when detection deficiency of data bag is literary composition First bag of part, then search list, otherwise when detection deficiency of data bag is not first bag of file, execution step 5) and, work as inspection Survey a upper file of this file sequence number in lists, then this element in more new-found list, imperfect this Packet is added in the packet end of this element, otherwise a newly-built element, the literary composition of record this deficiency of data bag current Part sequence number and this deficiency of data bag, and this element is added list, execution step 6 afterwards)；

5) when deficiency of data bag is last bag of file, then search list, when the next literary composition detecting this file sequence number Part in lists, then this element in more new-found list, this deficiency of data bag is added in this element data bag Beginning, an otherwise newly-built element, the file sequence number of record this deficiency of data bag current and this deficiency of data Bag, and this element is added list, execution step 6 afterwards)；

6) packet that analysis module obtains is complete, then analyze this packet, and analysis result according to file sequence number It is written in analysis result file, execution step 7 afterwards)；

7) mapping relations set up small documents and preserve small documents interim findings file, the small documents after Synchronization Analysis cutting, and Analysis result is put in the interim findings file of corresponding small documents, execution step 8 afterwards)；