CN103647790B - Extra-large file protocol analytical and statistical method - Google Patents

Extra-large file protocol analytical and statistical method Download PDF

Info

Publication number
CN103647790B
CN103647790B CN201310722859.9A CN201310722859A CN103647790B CN 103647790 B CN103647790 B CN 103647790B CN 201310722859 A CN201310722859 A CN 201310722859A CN 103647790 B CN103647790 B CN 103647790B
Authority
CN
China
Prior art keywords
file
list
execution step
deficiency
afterwards
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310722859.9A
Other languages
Chinese (zh)
Other versions
CN103647790A (en
Inventor
李晓芳
庄燕滨
彭建华
肖贤建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Dongji Road International Trade Co., Ltd.
Original Assignee
Changzhou Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Institute of Technology filed Critical Changzhou Institute of Technology
Priority to CN201310722859.9A priority Critical patent/CN103647790B/en
Publication of CN103647790A publication Critical patent/CN103647790A/en
Application granted granted Critical
Publication of CN103647790B publication Critical patent/CN103647790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an extra-large file protocol analytical and statistical method. The extra-large file protocol analytical and statistical method comprises the steps of partitioning extra-large files into numbered small files, performing parallel synchronous analysis on the small partitioned files, outputting analysis results to the small independent files saved according to serial numbers, and then merging the analysis results, namely the small files to form a whole analysis result file. The results are directly inquired from the merged file. A tool can perform quick and efficient analytic statistic on extra-large protocol data files through the parallel partitioning analysis processing.

Description

A kind of super large file protocol analytic statistics methods
Technical field
The present invention relates to a kind of protocal analysis statistical method, particularly a kind of super large file protocol analytic statistics methods.
Background technology
Based in the data analysis system of operator, towards data be substantially mass data based on communication network, In random network, these data contents are real-time and do not fix, once these systems go wrong, position, analyze solution and ask Topic acquires a certain degree of difficulty, and at this moment generally requires to capture network data bag, position problem depending on the data APMB package capturing by analysis, looks for To solve problem the reason problem.
Analysis of network instrument mainly has sniffer, netxray and wireshark now, and in actual commercial system In, once system goes wrong, because wireshark belongs to open source projects, system provides the accident analysis personnel of company general Carry out packet capturing using wireshark, data is saved as file, then analyze these data files and with orientation problem and solve to ask Topic, in order to obtain enough network data information, this data file is often in more than 4g, and will analyze these data files, Whether sniffer or wireshark instrument, all has no idea to open, and therefore these instruments cannot analyze these big numbers According to file, so analysis solve problem there is difficulty.
Content of the invention
For problems of the prior art, it is an object of the invention to provide one kind need not manually-operated super large literary composition Part protocal analysis statistical tool.
In order to achieve the above object, the present invention employs the following technical solutions: a kind of super large file protocol analytic statistics methods, Step includes:
1) multiple super large file cutting modules are opened, file blanking punch block number is joined by the quantity of computer cpu core Put, under default condition, cutting module opens 3, by public affairs
Formula m = rounded ( totalsize sepsize )
n = { m + 1 . ( m * sepsize < totalsize ) m . ( m * sepsize = totalsize ) Obtain super large document size, cut super large parallel File, by calculating document of agreement size, is cut into the small documents that default size is 50mb, execution step afterwards super large file 2);
2) open a list, set up file cutting index, list element records each small documents data in big file Initial, end position, execution step 3 afterwards);
3) cutting module order obtains element from list, before element in obtaining list, this list is locked, obtains row After table element, this list is unlocked, removes the element being acquired in lists simultaneously, each module is read-only to open super large data literary composition Part, moves to the document location of element assignment, reads data from this sequence of positions and is written to a file, file name is pressed According to list element order name, until this element index is to step 2) described in end position, execution step afterwards 4);
4) set up a list, the deficiency of data bag in file sequence number and segmentation file split in list element record, File analyzing module analytical data, when discovery packet is complete, then execution step 6), otherwise when detection deficiency of data bag It is first bag of file, then search list, otherwise execution step 5), when a upper file of this file sequence number of detection is in list In, then this element in more new-found list, this deficiency of data bag is added in the packet end of this element, no A then newly-built element, the file sequence number of record this deficiency of data bag current and this deficiency of data bag, and this Individual element adds list, execution step 6 afterwards);
5) when deficiency of data bag be file last bag, then search list, when detect this file sequence number next Individual file in lists, then this element in more new-found list, this deficiency of data bag is added in this yuan of prime number According to the beginning of bag, an otherwise newly-built element, the file sequence number of record this deficiency of data bag current and this is imperfect Packet, and this element is added list, execution step 6 afterwards);
6) packet that analysis module obtains is complete, then analyze this packet, and analysis result according to file Sequence number is written in analysis result file, execution step 7 afterwards);
7) small documents and the mapping relations preserving small documents interim findings file, the little literary composition after Synchronization Analysis cutting are set up Part, and analysis result is put in the interim findings file of corresponding small documents, execution step 8 afterwards);
8) merger interim findings file is in a unified destination file, execution step 9 afterwards);
9) merger process is carried out to destination file, obtain the final analysis result of big file.
After technique scheme, the method have the advantages that the present invention can manually operate with regard to energy Analysis large data files.
Brief description
The flow chart of Fig. 1 present invention.
Specific embodiment
Below according to Figure of description and specific embodiment, the present invention is further explained.
As shown in figure 1, a kind of super large file protocol analytic statistics methods, step includes:
1) multiple super large file cutting modules are opened, file blanking punch block number is joined by the quantity of computer cpu core Put, under default condition, cutting module opens 3, by public affairs
Formula m = rounded ( totalsize sepsize )
n = { m + 1 . ( m * sepsize < totalsize ) m . ( m * sepsize = totalsize ) Obtain super large document size, cut super large parallel File, by calculating document of agreement size, is cut into the small documents that default size is 50mb, execution step afterwards super large file 2);
2) open a list, set up file cutting index, list element records each small documents data in big file Initial, end position, execution step 3 afterwards);
3) cutting module order obtains element from list, before element in obtaining list, this list is locked, obtains row After table element, this list is unlocked, removes the element being acquired in lists simultaneously, each module is read-only to open super large data literary composition Part, moves to the document location of element assignment, reads data from this sequence of positions and is written to a file, file name is pressed According to list element order name, until this element index is to step 2) described in end position, execution step afterwards 4);
4) set up a list, the deficiency of data bag in file sequence number and segmentation file split in list element record, File analyzing module analytical data, when discovery packet is complete, then execution step 6), otherwise when detection deficiency of data bag It is first bag of file, then search list, otherwise execution step 5), when a upper file of this file sequence number of detection is in list In, then this element in more new-found list, this deficiency of data bag is added in the packet end of this element, no A then newly-built element, the file sequence number of record this deficiency of data bag current and this deficiency of data bag, and this Individual element adds list, execution step 6 afterwards);
5) when deficiency of data bag be file last bag, then search list, when detect this file sequence number next Individual file in lists, then this element in more new-found list, this deficiency of data bag is added in this yuan of prime number According to the beginning of bag, an otherwise newly-built element, the file sequence number of record this deficiency of data bag current and this is imperfect Packet, and this element is added list, execution step 6 afterwards);
6) packet that analysis module obtains is complete, then analyze this packet, and analysis result according to file Sequence number is written in analysis result file, execution step 7 afterwards);
7) small documents and the mapping relations preserving small documents interim findings file, the little literary composition after Synchronization Analysis cutting are set up Part, and analysis result is put in the interim findings file of corresponding small documents, execution step 8 afterwards);
8) merger interim findings file is in a unified destination file, execution step 9 afterwards);
9) destination file is carried out with merger process, notifications merger function pair analysis result file carries out merger, pressing The analysis result file merger of sequence number fraction is a complete destination file and exports.Obtain the final analysis knot of big file Really.Output analytic statisticss result.By input inquiry condition, meet the statistical result of condition according to querying condition output.
So far, that is, complete the protocal analysis statistical tool of super large file.

Claims (1)

1. a kind of super large file protocol analytic statistics methods is it is characterised in that step includes:
1) multiple super large file cutting modules are opened, file blanking punch block number is configured by the quantity of computer cpu core, lacks Province's situation incision is cut module and is opened 3, by formula
Obtain super large document size, parallel cutting super large literary composition Part, by calculating document of agreement size, is cut into the small documents that default size is 50mb, execution step afterwards super large file 2);
2) open a list, set up file cutting index, list element record each small documents in big file data rise Beginning, end position, execution step 3 afterwards);
3) cutting module order obtains element from list, before element in obtaining list, this list is locked, obtains list element After element, this list is unlocked, removes the element being acquired in lists, each module is read-only to open super large data file simultaneously, Move to the document location of element assignment, read data being written to a file from this sequence of positions, file name according to List element order name, until this element index is to step 2) described in end position, execution step 4 afterwards);
4) set up a list, the deficiency of data bag in file sequence number and segmentation file, file split in list element record Analysis module analytical data, when discovery packet is complete, then execution step 6), otherwise when detection deficiency of data bag is literary composition First bag of part, then search list, otherwise when detection deficiency of data bag is not first bag of file, execution step 5) and, work as inspection Survey a upper file of this file sequence number in lists, then this element in more new-found list, imperfect this Packet is added in the packet end of this element, otherwise a newly-built element, the literary composition of record this deficiency of data bag current Part sequence number and this deficiency of data bag, and this element is added list, execution step 6 afterwards);
5) when deficiency of data bag is last bag of file, then search list, when the next literary composition detecting this file sequence number Part in lists, then this element in more new-found list, this deficiency of data bag is added in this element data bag Beginning, an otherwise newly-built element, the file sequence number of record this deficiency of data bag current and this deficiency of data Bag, and this element is added list, execution step 6 afterwards);
6) packet that analysis module obtains is complete, then analyze this packet, and analysis result according to file sequence number It is written in analysis result file, execution step 7 afterwards);
7) mapping relations set up small documents and preserve small documents interim findings file, the small documents after Synchronization Analysis cutting, and Analysis result is put in the interim findings file of corresponding small documents, execution step 8 afterwards);
8) merger interim findings file is in a unified destination file, execution step 9 afterwards);
9) merger process is carried out to destination file, obtain the final analysis result of big file.
CN201310722859.9A 2013-12-24 2013-12-24 Extra-large file protocol analytical and statistical method Active CN103647790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310722859.9A CN103647790B (en) 2013-12-24 2013-12-24 Extra-large file protocol analytical and statistical method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310722859.9A CN103647790B (en) 2013-12-24 2013-12-24 Extra-large file protocol analytical and statistical method

Publications (2)

Publication Number Publication Date
CN103647790A CN103647790A (en) 2014-03-19
CN103647790B true CN103647790B (en) 2017-01-25

Family

ID=50252946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310722859.9A Active CN103647790B (en) 2013-12-24 2013-12-24 Extra-large file protocol analytical and statistical method

Country Status (1)

Country Link
CN (1) CN103647790B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389384B (en) * 2015-12-03 2019-03-26 万达信息股份有限公司 A kind of medical treatment private data swap file generation method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101582064B (en) * 2008-05-15 2011-12-21 阿里巴巴集团控股有限公司 Method and system for processing enormous data
CN102193917B (en) * 2010-03-01 2014-03-26 中国移动通信集团公司 Method and device for processing and querying data
CN102821164B (en) * 2012-08-31 2014-10-22 河海大学 Efficient parallel-distribution type data processing system
CN102833336A (en) * 2012-08-31 2012-12-19 河海大学 Data sub-packet processing method in separate distributed information acquisition and concurrent processing system

Also Published As

Publication number Publication date
CN103647790A (en) 2014-03-19

Similar Documents

Publication Publication Date Title
CN111770023B (en) Message duplicate removal method and device based on FPGA and FPGA chip
SG10201900339QA (en) Computing device and method for detecting malicious domain names in a network traffic
CN104135387B (en) A kind of network management data based on meta-model topology processes method for visually monitoring
CN103067218B (en) A kind of express network packet content analytical equipment
CN109446689A (en) DC converter station electrical secondary system drawing recognition methods and system
CN108509658A (en) A kind of analysis method and device of XML file
CN104506376A (en) Multichannel redundant CAN (Controller Area Network) bus test system with frame start sensitive synchronous trigger function
CN102915432A (en) Method and device for extracting vehicle-bone microcomputer image video data
CN107404486B (en) Method, device, terminal equipment and storage medium for analyzing Http data
CN106713351B (en) Secure communication method and device based on serial server
CN108132986B (en) Rapid processing method for test data of mass sensors of aircraft
CN113613287A (en) Automatic data acquisition system based on edge calculation
CN108664635A (en) Acquisition methods, device, equipment and the storage medium of statistics of database information
CN108255837A (en) A kind of SQL resolvers and method
CN111970151A (en) Flow fault positioning method and system for virtual and container network
CN116800586A (en) Method for diagnosing data communication faults of telecommunication network
CN103647790B (en) Extra-large file protocol analytical and statistical method
CN202815869U (en) Vehicle microcomputer image and video data extraction apparatus
CN107748755A (en) Synonym method for digging, device, equipment and computer-readable recording medium
CN110609982A (en) PDF file data analysis system and method
CN102323975A (en) Message correctness judging method of IEC61850-based model file
CN104572767B (en) A kind of method and system of website languages classification
CN103699482A (en) Method and device for testing reasonableness of controls
CN102147818A (en) Test file compression method
CN115765153A (en) Method and system for fusion monitoring of Internet of things and online monitoring data of primary electric power equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20181129

Address after: 213017 Room 938, Tianning Science Promotion Center, 256 Zhulin North Road, Tianning District, Changzhou City, Jiangsu Province

Patentee after: Changzhou Dongji Road International Trade Co., Ltd.

Address before: 213022 Wushan Road, Xinbei District, Changzhou, Jiangsu Province, No. 1

Patentee before: Changzhou Polytechnic College

TR01 Transfer of patent right