CN106557535A - A kind of processing method and system of big data level Pcap file - Google Patents

A kind of processing method and system of big data level Pcap file Download PDF

Info

Publication number
CN106557535A
CN106557535A CN201610461833.7A CN201610461833A CN106557535A CN 106557535 A CN106557535 A CN 106557535A CN 201610461833 A CN201610461833 A CN 201610461833A CN 106557535 A CN106557535 A CN 106557535A
Authority
CN
China
Prior art keywords
data
pcap
file
memory
eigenvalue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610461833.7A
Other languages
Chinese (zh)
Other versions
CN106557535B (en
Inventor
桑彦东
宋丹成
韩文奇
肖新光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Antiy Technology Group Co Ltd
Original Assignee
Harbin Antiy Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Antiy Technology Co Ltd filed Critical Harbin Antiy Technology Co Ltd
Priority to CN201610461833.7A priority Critical patent/CN106557535B/en
Publication of CN106557535A publication Critical patent/CN106557535A/en
Application granted granted Critical
Publication of CN106557535B publication Critical patent/CN106557535B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of processing method and system of big data level Pcap file, computing system internal memory first, and order read Pcap files during dynamic calculation Installed System Memory utilization rate, stop reading when memory usage reaches specified value, and labelling flag bit, data to once reading are analyzed rear releasing memory, and Returning mark position continues to be read out Pcap files, until Pcap files are by complete process.Coupling system internal memory situation of the present invention is read in batches to big data level Pcap file, and the speed of system processes data has been effectively ensured, and improves data analysiss efficiency;Further, the integrity of the Pcap files of process can be verified according to the present invention, it is ensured that the accuracy of the integrity and result of data.

Description

A kind of processing method and system of big data level Pcap file
Technical field
The present invention relates to technical field of the computer network, more particularly to a kind of big data level Pcap file processing method and System.
Background technology
Pcap files be carry out network packet capturing after the data file deposited, generally, we can use wireshark To check Pcap file contents, and the packet that we need is filtered out by filter, and carry out network traffic analysis.But it is existing In having technology, due to the restriction of Installed System Memory size, can only typically process the Pcap files of the GB orders of magnitude, and to the TB orders of magnitude this The integrity of the processing speed, efficiency and data processing of class big data level Pcap file is not high.
The content of the invention
For defect present in above-mentioned prior art, the present invention proposes a kind of processing method of big data level Pcap file And system, computing system internal memory, and the dynamic calculation Installed System Memory utilization rate during order reads Pcap files first, when Memory usage stops reading when reaching specified value, and labelling flag bit, and the data to once reading are discharged after being analyzed Internal memory, Returning mark position continue to be read out Pcap files, until Pcap files are by complete process.
The concrete content of the invention includes:
A kind of processing method of big data level Pcap file, comprises the steps:
Step 1:Obtain Installed System Memory information, computing system capacity;
Step 2:Start order from Pcap top of files and read data;
Step 3:Dynamic calculation memory usage, when memory usage reaches specified value, suspends and reads data;
Step 4:Flag bit is set the position of reading data is suspended, this eigenvalue for reading data is calculated, and by flag bit It is stored in journal file with eigenvalue;
Step 5:This reading data is analyzed, characteristic information extraction, by regulation storage characteristic information;
Step 6:The position of return Pcap file mark bits, erasing marker bit, release Installed System Memory, order reading data, and again Secondary execution step 3 is to step 5;
Step 7:Repeat step 6, until Pcap files are by complete process.
Further, also include:With reference to the calculation for calculating this read data features value, Pcap files are calculated The eigenvalue of data between the two neighboring marker bit recorded in journal file, if result is complete with the eigenvalue in journal file Complete to match, then handled Pcap files are complete, if can not match completely, the mark corresponding to eigenvalue that it fails to match Data before note position are incomplete, need to return to the position of Pcap file respective markers position, data are reacquired.
Further, it is described to store characteristic information by regulation, specially:Characteristic information is stored in into what is named with marker bit In file.
Further, the characteristic information includes:Source IP, purpose IP, URL, protocol mode, port information.
Further, the Installed System Memory information includes:The total internal memory of system, system free memory, block device buffer size, File buffering size.
A kind of processing system of big data level Pcap file, including:
Power system capacity computing module, for obtaining Installed System Memory information, computing system capacity;
Memory usage computing module, for dynamic calculation memory usage, when memory usage reaches specified value, suspends Read data;
Flag bit setup module, for suspending the position for reading data setting flag bit, calculates this feature for reading data Value, and flag bit and eigenvalue are stored in journal file;
Data analysis module, for being analyzed to reading data, characteristic information extraction, by regulation storage characteristic information;
File read module, for sequentially reading Pcap file datas, and dynamic call memory usage computing module, flag bit Setup module, data analysis module, until Pcap files are by complete process.
Further, also including data integrity verifying module, for reference to described this read data features value of calculating Calculation, calculate the eigenvalue of data between the two neighboring marker bit that records in journal file of Pcap files, if knot Fruit is matched completely with the eigenvalue in journal file, then handled Pcap files are complete, if can not match completely, The data before marker bit corresponding to eigenvalue with failure are incomplete, need to return to the position of Pcap file respective markers position Put, data are reacquired.
Further, it is described to store characteristic information by regulation, specially:Characteristic information is stored in into what is named with marker bit In file.
Further, the characteristic information includes:Source IP, purpose IP, URL, protocol mode, port information.
Further, the Installed System Memory information includes:The total internal memory of system, system free memory, block device buffer size, File buffering size.
The invention has the beneficial effects as follows:
Coupling system internal memory situation of the present invention is read in batches to big data level Pcap file, system has been effectively ensured and has processed number According to speed, improve data analysiss efficiency;Further, school can be carried out to the integrity of the Pcap files of process according to the present invention Test, it is ensured that the accuracy of the integrity and result of data.
Description of the drawings
In order to be illustrated more clearly that technical scheme of the invention or of the prior art, below will be to embodiment or prior art Needed for description, accompanying drawing to be used is briefly described, it should be apparent that, during drawings in the following description are only the present invention Some embodiments recorded, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of process flow figure of big data level Pcap file of the invention;
Fig. 2 is a kind of processing system structure chart of big data level Pcap file of the invention.
Specific embodiment
In order that those skilled in the art more fully understand the technical scheme in the embodiment of the present invention, and make the present invention's Above-mentioned purpose, feature and advantage can become apparent from understandable, and below in conjunction with the accompanying drawings technical scheme in the present invention is made further in detail Thin explanation.
The present invention gives a kind of processing method embodiment of big data level Pcap file, as shown in figure 1, including:
S101:Obtain Installed System Memory information, computing system capacity;
S102:Start order from Pcap top of files and read data;
S103:Dynamic calculation memory usage, when memory usage reaches specified value, suspends and reads data;
Memory usage(MEMUsedPerc)Can be calculated in the following way:
MEMUsedPerc=100*(MemTotal-MemFree-Buffers-Cached)/MemTotal
Wherein,
MemTotal:The total internal memory of system
MemFree:System free memory
Buffers:Block device buffer size
Cached:File buffering size
The specified value can be set according to concrete data processing needs and system environmentss etc., generally, the numerical value does not surpass Cross 90%;
S104:Suspend read data position arrange flag bit, calculate this read data eigenvalue, and by flag bit with Eigenvalue is stored in journal file;
S105:This reading data is analyzed, characteristic information extraction, by regulation storage characteristic information;The process needs basis Concrete data analysis requirements are analyzed to reading data, and characteristic information extraction;
S106:The position of Pcap file mark bits is returned to, marker bit is wiped, Installed System Memory is discharged, order reads data;
S107:Whether Pcap files are judged by complete process, if it is not, then entering S103, if so, then terminate.
Preferably, also include:With reference to the calculation for calculating this read data features value, calculate Pcap files and exist The eigenvalue of data between the two neighboring marker bit recorded in journal file, if result is complete with the eigenvalue in journal file Matching, then handled Pcap files are complete, if can not match completely, the labelling corresponding to eigenvalue that it fails to match Data before position are incomplete, need to return to the position of Pcap file respective markers position, data are reacquired.
Preferably, it is described to store characteristic information by regulation, specially:Characteristic information is stored in the text named with marker bit In part.
Preferably, the characteristic information includes:Source IP, purpose IP, URL, protocol mode, port information.
Preferably, the Installed System Memory information includes:The total internal memory of system, system free memory, block device buffer size, text Part buffer size.
A kind of processing system of big data level Pcap file, including:
Power system capacity computing module 201, for obtaining Installed System Memory information, computing system capacity;
Memory usage computing module 202, for dynamic calculation memory usage, when memory usage reaches specified value, Suspend and read data;
Flag bit setup module 203, for suspending the position for reading data setting flag bit, calculates this spy for reading data Value indicative, and flag bit and eigenvalue are stored in journal file;
Data analysis module 204, for being analyzed to reading data, characteristic information extraction, by regulation storage characteristic information;
File read module 205, for sequentially reading Pcap file datas, and dynamic call memory usage computing module, mark Will position setup module, data analysis module, until Pcap files are by complete process.
Preferably, also including data integrity verifying module, for reference to this read data features value of calculating Calculation, calculates the eigenvalue of data between the two neighboring marker bit that Pcap files are recorded in journal file, if result Matched with the eigenvalue in journal file completely, then handled Pcap files are complete, if can not match completely, are matched The data before marker bit corresponding to the eigenvalue of failure are incomplete, need to return to the position of Pcap file respective markers position Put, data are reacquired.
Preferably, it is described to store characteristic information by regulation, specially:Characteristic information is stored in the text named with marker bit In part.
Preferably, the characteristic information includes:Source IP, purpose IP, URL, protocol mode, port information.
Preferably, the Installed System Memory information includes:The total internal memory of system, system free memory, block device buffer size, text Part buffer size.
In this specification, the embodiment of method is described by the way of progressive, for the embodiment of system, due to which Embodiment of the method is substantially similar to, so description is fairly simple, related part is illustrated referring to the part of embodiment of the method. For prior art it cannot be guaranteed that big data level Pcap file activity this technological deficiency, the present invention proposes a kind of big data The processing method and system of level Pcap files, first computing system internal memory, and the dynamic during order reads Pcap files Computing system memory usage, stops reading when memory usage reaches specified value, and labelling flag bit, to once reading Data be analyzed rear releasing memory, Returning mark position continues to be read out Pcap files, until Pcap files are complete Process.Coupling system internal memory situation of the present invention is read in batches to big data level Pcap file, and system process has been effectively ensured The speed of data, improves data analysiss efficiency;Further, the integrity of the Pcap files of process can be carried out according to the present invention Verification, it is ensured that the accuracy of the integrity and result of data.
Although depicting the present invention by embodiment, it will be appreciated by the skilled addressee that the present invention have it is many deformation and Change the spirit without deviating from the present invention, it is desirable to which appended claim includes these deformations and changes without deviating from the present invention's Spirit.

Claims (10)

1. a kind of processing method of big data level Pcap file, it is characterised in that comprise the steps:
Step 1:Obtain Installed System Memory information, computing system capacity;
Step 2:Start order from Pcap top of files and read data;
Step 3:Dynamic calculation memory usage, when memory usage reaches specified value, suspends and reads data;
Step 4:Flag bit is set the position of reading data is suspended, this eigenvalue for reading data is calculated, and by flag bit It is stored in journal file with eigenvalue;
Step 5:This reading data is analyzed, characteristic information extraction, by regulation storage characteristic information;
Step 6:The position of return Pcap file mark bits, erasing marker bit, release Installed System Memory, order reading data, and again Secondary execution step 3 is to step 5;
Step 7:Repeat step 6, until Pcap files are by complete process.
2. the method for claim 1, it is characterised in that also include:With reference to described this read data features value of calculating Calculation, calculate the eigenvalue of data between the two neighboring marker bit that records in journal file of Pcap files, if knot Fruit is matched completely with the eigenvalue in journal file, then handled Pcap files are complete, if can not match completely, The data before marker bit corresponding to eigenvalue with failure are incomplete, need to return to the position of Pcap file respective markers position Put, data are reacquired.
3. method as claimed in claim 1 or 2, it is characterised in that described by regulation storage characteristic information, specially:By spy Reference breath is stored in the file named with marker bit.
4. method as claimed in claim 3, it is characterised in that the characteristic information includes:Source IP, purpose IP, URL, agreement Mode, port information.
5. the method as described in claim 1 or 2 or 4, it is characterised in that the Installed System Memory information includes:The total internal memory of system, System free memory, block device buffer size, file buffering size.
6. a kind of processing system of big data level Pcap file, it is characterised in that include:
Power system capacity computing module, for obtaining Installed System Memory information, computing system capacity;
Memory usage computing module, for dynamic calculation memory usage, when memory usage reaches specified value, suspends Read data;
Flag bit setup module, for suspending the position for reading data setting flag bit, calculates this feature for reading data Value, and flag bit and eigenvalue are stored in journal file;
Data analysis module, for being analyzed to reading data, characteristic information extraction, by regulation storage characteristic information;
File read module, for sequentially reading Pcap file datas, and dynamic call memory usage computing module, flag bit Setup module, data analysis module, until Pcap files are by complete process.
7. system as claimed in claim 6, it is characterised in that also including data integrity verifying module, for reference to described The calculation of this read data features value is calculated, the two neighboring marker bit that Pcap files are recorded in journal file is calculated Between data eigenvalue, if result is matched completely with the eigenvalue in journal file, handled Pcap files are complete , if can not match completely, the data before the marker bit corresponding to eigenvalue that it fails to match are incomplete, need to return Data are reacquired by the position of Pcap file respective markers position.
8. system as claimed in claims 6 or 7, it is characterised in that described by regulation storage characteristic information, specially:By spy Reference breath is stored in the file named with marker bit.
9. system as claimed in claim 8, it is characterised in that the characteristic information includes:Source IP, purpose IP, URL, agreement Mode, port information.
10. the system as described in claim 6 or 7 or 9, it is characterised in that the Installed System Memory information includes:System is always interior Deposit, system free memory, block device buffer size, file buffering size.
CN201610461833.7A 2016-06-23 2016-06-23 Method and system for processing big data level Pcap file Active CN106557535B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610461833.7A CN106557535B (en) 2016-06-23 2016-06-23 Method and system for processing big data level Pcap file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610461833.7A CN106557535B (en) 2016-06-23 2016-06-23 Method and system for processing big data level Pcap file

Publications (2)

Publication Number Publication Date
CN106557535A true CN106557535A (en) 2017-04-05
CN106557535B CN106557535B (en) 2020-10-30

Family

ID=58418246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610461833.7A Active CN106557535B (en) 2016-06-23 2016-06-23 Method and system for processing big data level Pcap file

Country Status (1)

Country Link
CN (1) CN106557535B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108881305A (en) * 2018-08-08 2018-11-23 西安交通大学 A kind of sample automatic calibration method towards encryption flow identification
CN112256634A (en) * 2020-10-14 2021-01-22 杭州当虹科技股份有限公司 Low-memory large file analysis method based on http
CN113485653A (en) * 2021-09-08 2021-10-08 苏州浪潮智能科技有限公司 SSD, application scene identification method and device thereof, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102346783A (en) * 2011-11-09 2012-02-08 华为技术有限公司 Data retrieval method and device
US20120041979A1 (en) * 2010-08-12 2012-02-16 Industry-Academic Cooperation Foundation, Yonsei University Method for generating context hierarchy and system for generating context hierarchy
US20120330802A1 (en) * 2011-06-22 2012-12-27 International Business Machines Corporation Method and apparatus for supporting memory usage accounting
CN102968384A (en) * 2012-11-21 2013-03-13 浪潮电子信息产业股份有限公司 Non-blocking storage acceleration method
CN103377073A (en) * 2012-04-26 2013-10-30 中国银联股份有限公司 Data information processing device and method
CN104717269A (en) * 2013-12-17 2015-06-17 北京合众思壮科技股份有限公司 Method for monitoring and dispatching cloud public platform computer resources for location-based service

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041979A1 (en) * 2010-08-12 2012-02-16 Industry-Academic Cooperation Foundation, Yonsei University Method for generating context hierarchy and system for generating context hierarchy
US20120330802A1 (en) * 2011-06-22 2012-12-27 International Business Machines Corporation Method and apparatus for supporting memory usage accounting
CN102346783A (en) * 2011-11-09 2012-02-08 华为技术有限公司 Data retrieval method and device
CN103377073A (en) * 2012-04-26 2013-10-30 中国银联股份有限公司 Data information processing device and method
CN102968384A (en) * 2012-11-21 2013-03-13 浪潮电子信息产业股份有限公司 Non-blocking storage acceleration method
CN104717269A (en) * 2013-12-17 2015-06-17 北京合众思壮科技股份有限公司 Method for monitoring and dispatching cloud public platform computer resources for location-based service

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108881305A (en) * 2018-08-08 2018-11-23 西安交通大学 A kind of sample automatic calibration method towards encryption flow identification
CN108881305B (en) * 2018-08-08 2020-04-28 西安交通大学 Automatic sample calibration method for encrypted flow identification
CN112256634A (en) * 2020-10-14 2021-01-22 杭州当虹科技股份有限公司 Low-memory large file analysis method based on http
CN112256634B (en) * 2020-10-14 2024-03-26 杭州当虹科技股份有限公司 Http-based low-memory large file analysis method
CN113485653A (en) * 2021-09-08 2021-10-08 苏州浪潮智能科技有限公司 SSD, application scene identification method and device thereof, and storage medium
CN113485653B (en) * 2021-09-08 2021-11-19 苏州浪潮智能科技有限公司 SSD, application scene identification method and device thereof, and storage medium

Also Published As

Publication number Publication date
CN106557535B (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN106383852A (en) Docker container-based log acquisition method and apparatus
CN103902348B (en) The reading/writing method of user data, system and physical machine under a kind of virtualized environment
CN106557535A (en) A kind of processing method and system of big data level Pcap file
US8756269B2 (en) Monitoring a path of a transaction across a composite application
CN107111452A (en) Data migration method and device, computer system applied to computer system
CN104346194B (en) A kind of startup file loading method, device and electronic equipment
CN103927305B (en) It is a kind of that the method and apparatus being controlled is overflowed to internal memory
CN110493302A (en) A kind of document transmission method, equipment and computer readable storage medium
CN103731364B (en) X86 platform based method for achieving trillion traffic rapid packaging
CN106843908A (en) Data integrated collection method and system
CN110221914B (en) File processing method and device
CN103970570A (en) Testing method for compatibility adaptation between disk array and mainframe
CN109521969B (en) Solid state disk data recovery method and device and computer readable storage medium
CN105446895A (en) Method and system for carrying out IO deduplication on non-homologous data of storage system in operation process
CN107393594A (en) A kind of multinuclear solid state hard disc adjustment method and system
CN107368351B (en) Automatic upgrading and capacity expanding method and device for virtual machine configuration
CN106919854B (en) Detection method for clearing residual information of virtual machine
CN103164290B (en) application memory management method and device
CN107018096A (en) The method that data analysis and reduction are carried out based on application layer protocol
CN109726181B (en) Data processing method and data processing device
CN107977450A (en) The analysis integrated application platform of road traffic based on video big data
TWI571745B (en) Method for managing buffer memoryand electronice device employing the same
CN109614389A (en) A kind of data storage method, system, equipment and medium
CN104317703B (en) Method and device for monitoring thread stack
CN107506202A (en) A kind of method and system for judging operating system release version number

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 150028 Building 7, Innovation Plaza, Science and Technology Innovation City, Harbin High-tech Industrial Development Zone, Heilongjiang Province (838 Shikun Road)

Applicant after: Harbin antiy Technology Group Limited by Share Ltd

Address before: 506 room 162, Hongqi Avenue, Nangang District, Harbin Development Zone, Heilongjiang, 150090

Applicant before: Harbin Antiy Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 150028 Building 7, Innovation Plaza, Science and Technology Innovation City, Harbin High-tech Industrial Development Zone, Heilongjiang Province (838 Shikun Road)

Patentee after: Antan Technology Group Co.,Ltd.

Address before: 150028 Building 7, Innovation Plaza, Science and Technology Innovation City, Harbin High-tech Industrial Development Zone, Heilongjiang Province (838 Shikun Road)

Patentee before: Harbin Antian Science and Technology Group Co.,Ltd.