CN103647790B - Extra-large file protocol analytical and statistical method - Google Patents
Extra-large file protocol analytical and statistical method Download PDFInfo
- Publication number
- CN103647790B CN103647790B CN201310722859.9A CN201310722859A CN103647790B CN 103647790 B CN103647790 B CN 103647790B CN 201310722859 A CN201310722859 A CN 201310722859A CN 103647790 B CN103647790 B CN 103647790B
- Authority
- CN
- China
- Prior art keywords
- file
- list
- execution step
- deficiency
- afterwards
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an extra-large file protocol analytical and statistical method. The extra-large file protocol analytical and statistical method comprises the steps of partitioning extra-large files into numbered small files, performing parallel synchronous analysis on the small partitioned files, outputting analysis results to the small independent files saved according to serial numbers, and then merging the analysis results, namely the small files to form a whole analysis result file. The results are directly inquired from the merged file. A tool can perform quick and efficient analytic statistic on extra-large protocol data files through the parallel partitioning analysis processing.
Description
Technical field
The present invention relates to a kind of protocal analysis statistical method, particularly a kind of super large file protocol analytic statistics methods.
Background technology
Based in the data analysis system of operator, towards data be substantially mass data based on communication network,
In random network, these data contents are real-time and do not fix, once these systems go wrong, position, analyze solution and ask
Topic acquires a certain degree of difficulty, and at this moment generally requires to capture network data bag, position problem depending on the data APMB package capturing by analysis, looks for
To solve problem the reason problem.
Analysis of network instrument mainly has sniffer, netxray and wireshark now, and in actual commercial system
In, once system goes wrong, because wireshark belongs to open source projects, system provides the accident analysis personnel of company general
Carry out packet capturing using wireshark, data is saved as file, then analyze these data files and with orientation problem and solve to ask
Topic, in order to obtain enough network data information, this data file is often in more than 4g, and will analyze these data files,
Whether sniffer or wireshark instrument, all has no idea to open, and therefore these instruments cannot analyze these big numbers
According to file, so analysis solve problem there is difficulty.
Content of the invention
For problems of the prior art, it is an object of the invention to provide one kind need not manually-operated super large literary composition
Part protocal analysis statistical tool.
In order to achieve the above object, the present invention employs the following technical solutions: a kind of super large file protocol analytic statistics methods,
Step includes:
1) multiple super large file cutting modules are opened, file blanking punch block number is joined by the quantity of computer cpu core
Put, under default condition, cutting module opens 3, by public affairs
Formula
2) open a list, set up file cutting index, list element records each small documents data in big file
Initial, end position, execution step 3 afterwards);
3) cutting module order obtains element from list, before element in obtaining list, this list is locked, obtains row
After table element, this list is unlocked, removes the element being acquired in lists simultaneously, each module is read-only to open super large data literary composition
Part, moves to the document location of element assignment, reads data from this sequence of positions and is written to a file, file name is pressed
According to list element order name, until this element index is to step 2) described in end position, execution step afterwards
4);
4) set up a list, the deficiency of data bag in file sequence number and segmentation file split in list element record,
File analyzing module analytical data, when discovery packet is complete, then execution step 6), otherwise when detection deficiency of data bag
It is first bag of file, then search list, otherwise execution step 5), when a upper file of this file sequence number of detection is in list
In, then this element in more new-found list, this deficiency of data bag is added in the packet end of this element, no
A then newly-built element, the file sequence number of record this deficiency of data bag current and this deficiency of data bag, and this
Individual element adds list, execution step 6 afterwards);
5) when deficiency of data bag be file last bag, then search list, when detect this file sequence number next
Individual file in lists, then this element in more new-found list, this deficiency of data bag is added in this yuan of prime number
According to the beginning of bag, an otherwise newly-built element, the file sequence number of record this deficiency of data bag current and this is imperfect
Packet, and this element is added list, execution step 6 afterwards);
6) packet that analysis module obtains is complete, then analyze this packet, and analysis result according to file
Sequence number is written in analysis result file, execution step 7 afterwards);
7) small documents and the mapping relations preserving small documents interim findings file, the little literary composition after Synchronization Analysis cutting are set up
Part, and analysis result is put in the interim findings file of corresponding small documents, execution step 8 afterwards);
8) merger interim findings file is in a unified destination file, execution step 9 afterwards);
9) merger process is carried out to destination file, obtain the final analysis result of big file.
After technique scheme, the method have the advantages that the present invention can manually operate with regard to energy
Analysis large data files.
Brief description
The flow chart of Fig. 1 present invention.
Specific embodiment
Below according to Figure of description and specific embodiment, the present invention is further explained.
As shown in figure 1, a kind of super large file protocol analytic statistics methods, step includes:
1) multiple super large file cutting modules are opened, file blanking punch block number is joined by the quantity of computer cpu core
Put, under default condition, cutting module opens 3, by public affairs
Formula
2) open a list, set up file cutting index, list element records each small documents data in big file
Initial, end position, execution step 3 afterwards);
3) cutting module order obtains element from list, before element in obtaining list, this list is locked, obtains row
After table element, this list is unlocked, removes the element being acquired in lists simultaneously, each module is read-only to open super large data literary composition
Part, moves to the document location of element assignment, reads data from this sequence of positions and is written to a file, file name is pressed
According to list element order name, until this element index is to step 2) described in end position, execution step afterwards
4);
4) set up a list, the deficiency of data bag in file sequence number and segmentation file split in list element record,
File analyzing module analytical data, when discovery packet is complete, then execution step 6), otherwise when detection deficiency of data bag
It is first bag of file, then search list, otherwise execution step 5), when a upper file of this file sequence number of detection is in list
In, then this element in more new-found list, this deficiency of data bag is added in the packet end of this element, no
A then newly-built element, the file sequence number of record this deficiency of data bag current and this deficiency of data bag, and this
Individual element adds list, execution step 6 afterwards);
5) when deficiency of data bag be file last bag, then search list, when detect this file sequence number next
Individual file in lists, then this element in more new-found list, this deficiency of data bag is added in this yuan of prime number
According to the beginning of bag, an otherwise newly-built element, the file sequence number of record this deficiency of data bag current and this is imperfect
Packet, and this element is added list, execution step 6 afterwards);
6) packet that analysis module obtains is complete, then analyze this packet, and analysis result according to file
Sequence number is written in analysis result file, execution step 7 afterwards);
7) small documents and the mapping relations preserving small documents interim findings file, the little literary composition after Synchronization Analysis cutting are set up
Part, and analysis result is put in the interim findings file of corresponding small documents, execution step 8 afterwards);
8) merger interim findings file is in a unified destination file, execution step 9 afterwards);
9) destination file is carried out with merger process, notifications merger function pair analysis result file carries out merger, pressing
The analysis result file merger of sequence number fraction is a complete destination file and exports.Obtain the final analysis knot of big file
Really.Output analytic statisticss result.By input inquiry condition, meet the statistical result of condition according to querying condition output.
So far, that is, complete the protocal analysis statistical tool of super large file.
Claims (1)
1. a kind of super large file protocol analytic statistics methods is it is characterised in that step includes:
1) multiple super large file cutting modules are opened, file blanking punch block number is configured by the quantity of computer cpu core, lacks
Province's situation incision is cut module and is opened 3, by formula
Obtain super large document size, parallel cutting super large literary composition
Part, by calculating document of agreement size, is cut into the small documents that default size is 50mb, execution step afterwards super large file
2);
2) open a list, set up file cutting index, list element record each small documents in big file data rise
Beginning, end position, execution step 3 afterwards);
3) cutting module order obtains element from list, before element in obtaining list, this list is locked, obtains list element
After element, this list is unlocked, removes the element being acquired in lists, each module is read-only to open super large data file simultaneously,
Move to the document location of element assignment, read data being written to a file from this sequence of positions, file name according to
List element order name, until this element index is to step 2) described in end position, execution step 4 afterwards);
4) set up a list, the deficiency of data bag in file sequence number and segmentation file, file split in list element record
Analysis module analytical data, when discovery packet is complete, then execution step 6), otherwise when detection deficiency of data bag is literary composition
First bag of part, then search list, otherwise when detection deficiency of data bag is not first bag of file, execution step 5) and, work as inspection
Survey a upper file of this file sequence number in lists, then this element in more new-found list, imperfect this
Packet is added in the packet end of this element, otherwise a newly-built element, the literary composition of record this deficiency of data bag current
Part sequence number and this deficiency of data bag, and this element is added list, execution step 6 afterwards);
5) when deficiency of data bag is last bag of file, then search list, when the next literary composition detecting this file sequence number
Part in lists, then this element in more new-found list, this deficiency of data bag is added in this element data bag
Beginning, an otherwise newly-built element, the file sequence number of record this deficiency of data bag current and this deficiency of data
Bag, and this element is added list, execution step 6 afterwards);
6) packet that analysis module obtains is complete, then analyze this packet, and analysis result according to file sequence number
It is written in analysis result file, execution step 7 afterwards);
7) mapping relations set up small documents and preserve small documents interim findings file, the small documents after Synchronization Analysis cutting, and
Analysis result is put in the interim findings file of corresponding small documents, execution step 8 afterwards);
8) merger interim findings file is in a unified destination file, execution step 9 afterwards);
9) merger process is carried out to destination file, obtain the final analysis result of big file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310722859.9A CN103647790B (en) | 2013-12-24 | 2013-12-24 | Extra-large file protocol analytical and statistical method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310722859.9A CN103647790B (en) | 2013-12-24 | 2013-12-24 | Extra-large file protocol analytical and statistical method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103647790A CN103647790A (en) | 2014-03-19 |
CN103647790B true CN103647790B (en) | 2017-01-25 |
Family
ID=50252946
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310722859.9A Active CN103647790B (en) | 2013-12-24 | 2013-12-24 | Extra-large file protocol analytical and statistical method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103647790B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105389384B (en) * | 2015-12-03 | 2019-03-26 | 万达信息股份有限公司 | A kind of medical treatment private data swap file generation method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101582064B (en) * | 2008-05-15 | 2011-12-21 | 阿里巴巴集团控股有限公司 | Method and system for processing enormous data |
CN102193917B (en) * | 2010-03-01 | 2014-03-26 | 中国移动通信集团公司 | Method and device for processing and querying data |
CN102821164B (en) * | 2012-08-31 | 2014-10-22 | 河海大学 | Efficient parallel-distribution type data processing system |
CN102833336A (en) * | 2012-08-31 | 2012-12-19 | 河海大学 | Data sub-packet processing method in separate distributed information acquisition and concurrent processing system |
-
2013
- 2013-12-24 CN CN201310722859.9A patent/CN103647790B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN103647790A (en) | 2014-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111770023B (en) | Message duplicate removal method and device based on FPGA and FPGA chip | |
SG10201900339QA (en) | Computing device and method for detecting malicious domain names in a network traffic | |
CN104135387B (en) | A kind of network management data based on meta-model topology processes method for visually monitoring | |
CN103067218B (en) | A kind of express network packet content analytical equipment | |
CN109446689A (en) | DC converter station electrical secondary system drawing recognition methods and system | |
CN108509658A (en) | A kind of analysis method and device of XML file | |
CN104506376A (en) | Multichannel redundant CAN (Controller Area Network) bus test system with frame start sensitive synchronous trigger function | |
CN102915432A (en) | Method and device for extracting vehicle-bone microcomputer image video data | |
CN107404486B (en) | Method, device, terminal equipment and storage medium for analyzing Http data | |
CN106713351B (en) | Secure communication method and device based on serial server | |
CN108132986B (en) | Rapid processing method for test data of mass sensors of aircraft | |
CN113613287A (en) | Automatic data acquisition system based on edge calculation | |
CN108664635A (en) | Acquisition methods, device, equipment and the storage medium of statistics of database information | |
CN108255837A (en) | A kind of SQL resolvers and method | |
CN111970151A (en) | Flow fault positioning method and system for virtual and container network | |
CN116800586A (en) | Method for diagnosing data communication faults of telecommunication network | |
CN103647790B (en) | Extra-large file protocol analytical and statistical method | |
CN202815869U (en) | Vehicle microcomputer image and video data extraction apparatus | |
CN107748755A (en) | Synonym method for digging, device, equipment and computer-readable recording medium | |
CN110609982A (en) | PDF file data analysis system and method | |
CN102323975A (en) | Message correctness judging method of IEC61850-based model file | |
CN104572767B (en) | A kind of method and system of website languages classification | |
CN103699482A (en) | Method and device for testing reasonableness of controls | |
CN102147818A (en) | Test file compression method | |
CN115765153A (en) | Method and system for fusion monitoring of Internet of things and online monitoring data of primary electric power equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20181129 Address after: 213017 Room 938, Tianning Science Promotion Center, 256 Zhulin North Road, Tianning District, Changzhou City, Jiangsu Province Patentee after: Changzhou Dongji Road International Trade Co., Ltd. Address before: 213022 Wushan Road, Xinbei District, Changzhou, Jiangsu Province, No. 1 Patentee before: Changzhou Polytechnic College |
|
TR01 | Transfer of patent right |