CN103701821B - File type identification method and device - Google Patents

File type identification method and device Download PDF

Info

Publication number
CN103701821B
CN103701821B CN201310750085.0A CN201310750085A CN103701821B CN 103701821 B CN103701821 B CN 103701821B CN 201310750085 A CN201310750085 A CN 201310750085A CN 103701821 B CN103701821 B CN 103701821B
Authority
CN
China
Prior art keywords
file
character value
file type
offset
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310750085.0A
Other languages
Chinese (zh)
Other versions
CN103701821A (en
Inventor
郭璞
曹政
刘岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING NETENTSEC Inc
Original Assignee
BEIJING NETENTSEC Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING NETENTSEC Inc filed Critical BEIJING NETENTSEC Inc
Priority to CN201310750085.0A priority Critical patent/CN103701821B/en
Publication of CN103701821A publication Critical patent/CN103701821A/en
Application granted granted Critical
Publication of CN103701821B publication Critical patent/CN103701821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of file type identification method, including:Precompile is carried out to the file characteristic of file, so as to obtain bitmap characteristic, the bitmap characteristic includes the first offset and corresponding first character value of first offset;The file stream for the first file for needing to recognize file type is obtained from the packet of transmission, the file stream includes the second offset and corresponding second character value of second offset;The first offset matched with second offset is searched from the bitmap characteristic according to second offset;Second character value and each first character value are subjected to computing successively, operation result is obtained;The file type of first file is determined according to the operation result.The present invention can efficiently and accurately recognize file type, Real-time Alarm particular file types and track user's operation in LAN, and grain refined presentation user uploads or downloaded the behavior of file.

Description

File type identification method and device
Technical field
The present invention relates to computer techno-stress security fields, more particularly to a kind of file type identification method and device.
Background technology
With the continuous development of scientific technology, people are to the degree of dependence more and more higher of network, and transmitted by network Data, but while data are transmitted, information security is also faced with great challenge.In order to prevent confidential information from revealing, net Network keeper or enterprise usually need that the type for transmitting file is identified and detected.
Existing file type identification technology is analyzed by application identification and protocol depth, obtains filename, and rely on text Suffix name in part name determines file type.Although this method does not need locating file border, it is not necessary to Study document content, If being that the filename of file in actual applications is changed, it will the result that identification makes mistake, therefore, using this technology just True discrimination is low and mistake is not expectable.
Meanwhile, based on the file type identification method of devil's numeral, this method is matched with file header data flow, according to Matching result judges file type.Although this method can effectively recognize file type, it is compared, is recognized using character string Efficiency is low, it is impossible to meet demand of the network equipment to forwarding performance.
Therefore, existing file type identification technology all can not accurately and efficiently recognize file type.
The content of the invention
The purpose of the present invention is to improve file type recognition accuracy, it is to avoid suffix name recognizes the hidden danger of error tape;Know Other process uses logical operation, substantially increases the efficiency of file type identification.
To achieve the above object, the invention provides a kind of file type identification method, this method includes:
Precompile is carried out to the file characteristic of file, so as to obtain bitmap characteristic, the bitmap characteristic includes the first skew Amount the first character value corresponding with first offset;
The file stream for the first file for needing to recognize file type is obtained from the packet of transmission, the file stream includes Second offset and corresponding second character value of second offset;
Searched according to second offset from the bitmap characteristic with second offset match it is first inclined Shifting amount;
Second character value and each first character value are subjected to computing successively, operation result is obtained;
The file type of first file is determined according to the operation result.
Further, it is described to be determined also to include after the file type of first file according to the operation result:It is right The file type of first file is handled.
Further, the computing is and computing, so as to reduce the scope of locating file type.
Further, precompile is carried out to the file characteristic of file, so as to also include after obtaining bitmap characteristic:Will be described Bitmap characteristic is loaded into internal memory in process initiation.
Further, the bitmap characteristic also includes file type ID;It is described by second character value and each described the One character value carries out computing successively, and obtaining operation result includes:
Judge whether second character value matches with first character value;
If second character value is matched with first character value, according to corresponding to presently described first character value The file type ID determine the file type of first file;
If second character value is mismatched with first character value, the file type of first file is judged For abnormal document type.
Further, the abnormal document type includes content tampering file type and Unknown file type.
Further, the file stream for the first file for needing to recognize file type is obtained in the packet from transmission, Also include obtaining file boundaries, between the file boundaries are used to determining at the beginning of the packet of the transmission and the end time.
On the other hand, the invention provides a kind of file type recognition device, described device includes feature collector, text Part border acquisition module, type identification module, result decision-making module, policy module and strategy matching module;
Feature collector, precompile is carried out for the file characteristic to file, so that bitmap characteristic is obtained, the bitmap Feature includes file type ID, the first offset and corresponding first character value of first offset;
Policy module, for indicating how the file type of first file is handled;
File boundaries acquisition module, for obtaining the first file for needing to recognize file type from the packet of transmission File stream and file boundaries, the file stream include the second offset and corresponding second character value of second offset;
Type identification module, for being searched according to second offset from the bitmap characteristic and the described second skew Measure the first offset matched;Second character value and each first character value are subjected to computing successively, computing is obtained As a result;
As a result decision-making module, the file type for determining first file according to the operation result;
Strategy matching module, for according to the policy module, the file type to first file to be handled.
Main advantages of the present invention are:
1st, can be with Real-time Alarm particular file types file in the leakage-preventing application of function of data of fire wall of future generation Send or receive.
2nd, in the leakage-preventing application of function of data of fire wall of future generation, file type can be efficiently and accurately recognized, is The function such as document analysis and content auditing provides safeguard.
3rd, in the application of the network monitoring devices such as network log-in management, the operation of user in LAN, particulate can be tracked Change the behavior that presentation user downloads or uploaded.
Brief description of the drawings
Fig. 1 is file type identification method flow chart provided in an embodiment of the present invention;
Fig. 2 is the schematic diagram of file type recognition device provided in an embodiment of the present invention;
Fig. 3 is bitmap characteristic schematic diagram provided in an embodiment of the present invention.
Embodiment
Below by drawings and examples, technical scheme is described in further detail.
Fig. 1 is file type identification method flow chart provided in an embodiment of the present invention.As shown in figure 1, this method is included such as Lower step:
Step 101, precompile is carried out to the file characteristic of file, so as to obtain bitmap characteristic, bitmap characteristic includes first Offset and corresponding first character value of first offset;
Further, so as to also include after obtaining bitmap characteristic:Bitmap characteristic is loaded into internal memory in process initiation In.
Step 102, the file stream for the first file for needing to recognize file type, file stream are obtained from the packet of transmission Including the second offset and corresponding second character value of second offset;
Further, the file stream for the first file for needing to recognize file type is obtained from the packet of transmission, is also wrapped Acquisition file boundaries are included, between file boundaries are used to determining at the beginning of the packet of transmission and the end time.
Step 103, the first offset matched with the second offset is searched from bitmap characteristic according to the second offset;
Step 104, the second character value and each first character value are subjected to computing successively, obtain operation result;
Further, bitmap characteristic also includes file type ID;Second character value is carried out successively with each first character value Computing, obtaining operation result includes:
Judge whether the second character value matches with the first character value;
If the second character value is matched with the first character value, the file type ID according to corresponding to current first character value Determine the file type of the first file;
If the second character value is mismatched with the first character value, judge the file type of the first file for abnormal document class Type.
Further, the abnormal document type includes content tampering file type and Unknown file type.
Further, computing is and computing, so as to reduce the scope of locating file type.
Step 105, the file type of the first file is determined according to operation result.
Further, the file type to the first file is handled.
Fig. 2 is a kind of file type recognition device structural representation provided in an embodiment of the present invention.As shown in Fig. 2 file Type identification device includes feature collector 20, file boundaries acquisition module 10, type identification module 30, result decision-making module 40th, policy module 60 and strategy matching module 50.
Feature collector 20, carries out precompile, so as to obtain bitmap characteristic, bitmap is special for the file characteristic to file Levy including file type ID, the first offset and corresponding first character value of first offset.
Policy module 60, for indicating how the file type of the first file is handled.;
File boundaries acquisition module 10, for obtaining the first file for needing to recognize file type from the packet of transmission File stream and file boundaries, file stream include the second offset and corresponding second character value of second offset.
Type identification module 30, for being searched and the second offset phase from the bitmap characteristic according to the second offset The first offset matched somebody with somebody;Second character value and each first character value are subjected to computing successively, operation result is obtained.
As a result decision-making module 40, the file type for determining the first file according to operation result.
Strategy matching module 50, for according to policy module 60, the file type to the first file to be handled.
Fig. 3 is bitmap characteristic schematic diagram provided in an embodiment of the present invention.As shown in figure 3, the figure includes file type ID, the One offset and corresponding first character value of the first offset.By the second offset in file stream searched from bitmap characteristic with The first offset that second offset matches;Second character value and each first character value are carried out and computing successively, if with fortune It is 1 to calculate result, then according to the current location carried out with the second character value with operation result for 1 the first character value, determines file Type ID, and file type is determined according to file type ID.
Above-described embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc. all should be included Within protection scope of the present invention.

Claims (7)

1. a kind of file type identification method, it is characterised in that including:
Precompile is carried out to the file characteristic of file, so as to obtain bitmap characteristic, the bitmap characteristic includes the first offset, institute State corresponding first character value of the first offset, file type ID;
The file stream for the first file for needing to recognize file type is obtained from the packet of transmission, the file stream includes second Offset and corresponding second character value of second offset;
The first offset matched with second offset is searched from the bitmap characteristic according to second offset;
Second character value and each first character value are subjected to computing successively, operation result is obtained, according to the computing As a result the file type of first file is determined, including:
Judge whether second character value matches with first character value;
If second character value is matched with first character value, the institute according to corresponding to presently described first character value State the file type that file type ID determines first file;
If second character value is mismatched with first character value, judge the file type of first file to be different Normal file type.
2. a kind of file type identification method according to claim 1, it is characterised in that described according to the operation result Also include after the file type for determining first file:File type to first file is handled.
3. a kind of file type identification method according to claim 1, it is characterised in that the computing is and computing, from And reduce the scope of locating file type.
4. a kind of file type identification method according to claim 1, it is characterised in that carried out to the file characteristic of file Precompile, so as to also include after obtaining bitmap characteristic:The bitmap characteristic is loaded into internal memory in process initiation.
5. a kind of file type identification method according to claim 1, it is characterised in that the abnormal document type includes Content tampering file type and Unknown file type.
6. a kind of file type identification method according to claim 1, it is characterised in that in the packet from transmission The file stream for the first file for needing to recognize file type is obtained, in addition to obtains file boundaries, the file boundaries are used for true Between at the beginning of the packet of the fixed transmission and the end time.
7. a kind of file type recognition device, it is characterised in that described device includes feature collector, file boundaries and obtains mould Block, type identification module, result decision-making module, policy module and strategy matching module;
Feature collector, precompile is carried out for the file characteristic to file, so that bitmap characteristic is obtained, the bitmap characteristic Including file type ID, the first offset and corresponding first character value of first offset;
Policy module, for indicating how the file type of the first file is handled;
File boundaries acquisition module, the file for obtaining the first file for needing to recognize file type from the packet of transmission Stream and file boundaries, the file stream include the second offset and corresponding second character value of second offset;
Type identification module, for being searched and the second offset phase from the bitmap characteristic according to second offset First offset of matching;Second character value and each first character value are subjected to computing successively, operation result is obtained;
As a result decision-making module, the file type for determining first file according to the operation result, including:
Judge whether second character value matches with first character value;
If second character value is matched with first character value, the institute according to corresponding to presently described first character value State the file type that file type ID determines first file;
If second character value is mismatched with first character value, judge the file type of first file to be different Normal file type;
Strategy matching module, for according to the policy module, the file type to first file to be handled.
CN201310750085.0A 2013-12-31 2013-12-31 File type identification method and device Active CN103701821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310750085.0A CN103701821B (en) 2013-12-31 2013-12-31 File type identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310750085.0A CN103701821B (en) 2013-12-31 2013-12-31 File type identification method and device

Publications (2)

Publication Number Publication Date
CN103701821A CN103701821A (en) 2014-04-02
CN103701821B true CN103701821B (en) 2017-07-28

Family

ID=50363217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310750085.0A Active CN103701821B (en) 2013-12-31 2013-12-31 File type identification method and device

Country Status (1)

Country Link
CN (1) CN103701821B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460155A (en) * 2018-03-28 2018-08-28 深信服科技股份有限公司 A kind of file identification method, device, equipment and storage medium
CN110929110B (en) * 2019-11-13 2023-02-21 北京北信源软件股份有限公司 Electronic document detection method, device, equipment and storage medium
CN111563063B (en) * 2020-05-12 2022-09-13 福建天晴在线互动科技有限公司 Method for identifying file type based on HashMap

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1790328A (en) * 2004-12-17 2006-06-21 微软公司 Extensible file system
CN103383681A (en) * 2011-12-31 2013-11-06 华为数字技术(成都)有限公司 File type identification method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7480766B2 (en) * 2005-08-03 2009-01-20 Sandisk Corporation Interfacing systems operating through a logical address space and on a direct data file basis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1790328A (en) * 2004-12-17 2006-06-21 微软公司 Extensible file system
CN103383681A (en) * 2011-12-31 2013-11-06 华为数字技术(成都)有限公司 File type identification method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于特征标识的文件类型识别与匹配;张润峰;《计算机安全》;20110630;第40页-第42页 *

Also Published As

Publication number Publication date
CN103701821A (en) 2014-04-02

Similar Documents

Publication Publication Date Title
CN104320377B (en) The anti-stealing link method and equipment of a kind of files in stream media
US9794246B2 (en) Increased communication security
CN106534146B (en) A kind of safety monitoring system and method
CN101686239B (en) Trojan discovery system
CN104937886A (en) Log analysis device, information processing method and program
US10341294B2 (en) Unauthorized communication detection system and unauthorized communication detection method
CN107995179B (en) Unknown threat sensing method, device, equipment and system
CN105306463A (en) Modbus TCP intrusion detection method based on support vector machine
US10389714B2 (en) Increased communication security
CN109831448A (en) For the detection method of particular encryption web page access behavior
US10057155B2 (en) Method and apparatus for determining automatic scanning action
CN103701821B (en) File type identification method and device
Bai et al. Analysis and detection of bogus behavior in web crawler measurement
CN110020161B (en) Data processing method, log processing method and terminal
CN102130791A (en) Method, device and gateway server for detecting agent on gateway server
CN112671724B (en) Terminal security detection analysis method, device, equipment and readable storage medium
Yin et al. Anomaly traffic detection based on feature fluctuation for secure industrial internet of things
Zhou et al. A model-based method for enabling source mapping and intrusion detection on proprietary can bus
Oudah et al. Using burstiness for network applications classification
TWI750252B (en) Method and device for recording website access log
CN110401639B (en) Method and device for judging abnormality of network access, server and storage medium thereof
US9049170B2 (en) Building filter through utilization of automated generation of regular expression
Yang et al. RTP timestamp steganography detection method
Su et al. Mobile traffic identification based on application's network signature
JP6055726B2 (en) Web page monitoring device, web page monitoring system, web page monitoring method and computer program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant