CN103701821B - File type identification method and device - Google Patents
File type identification method and device Download PDFInfo
- Publication number
- CN103701821B CN103701821B CN201310750085.0A CN201310750085A CN103701821B CN 103701821 B CN103701821 B CN 103701821B CN 201310750085 A CN201310750085 A CN 201310750085A CN 103701821 B CN103701821 B CN 103701821B
- Authority
- CN
- China
- Prior art keywords
- file
- character value
- file type
- offset
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The present invention relates to a kind of file type identification method, including:Precompile is carried out to the file characteristic of file, so as to obtain bitmap characteristic, the bitmap characteristic includes the first offset and corresponding first character value of first offset;The file stream for the first file for needing to recognize file type is obtained from the packet of transmission, the file stream includes the second offset and corresponding second character value of second offset;The first offset matched with second offset is searched from the bitmap characteristic according to second offset;Second character value and each first character value are subjected to computing successively, operation result is obtained;The file type of first file is determined according to the operation result.The present invention can efficiently and accurately recognize file type, Real-time Alarm particular file types and track user's operation in LAN, and grain refined presentation user uploads or downloaded the behavior of file.
Description
Technical field
The present invention relates to computer techno-stress security fields, more particularly to a kind of file type identification method and device.
Background technology
With the continuous development of scientific technology, people are to the degree of dependence more and more higher of network, and transmitted by network
Data, but while data are transmitted, information security is also faced with great challenge.In order to prevent confidential information from revealing, net
Network keeper or enterprise usually need that the type for transmitting file is identified and detected.
Existing file type identification technology is analyzed by application identification and protocol depth, obtains filename, and rely on text
Suffix name in part name determines file type.Although this method does not need locating file border, it is not necessary to Study document content,
If being that the filename of file in actual applications is changed, it will the result that identification makes mistake, therefore, using this technology just
True discrimination is low and mistake is not expectable.
Meanwhile, based on the file type identification method of devil's numeral, this method is matched with file header data flow, according to
Matching result judges file type.Although this method can effectively recognize file type, it is compared, is recognized using character string
Efficiency is low, it is impossible to meet demand of the network equipment to forwarding performance.
Therefore, existing file type identification technology all can not accurately and efficiently recognize file type.
The content of the invention
The purpose of the present invention is to improve file type recognition accuracy, it is to avoid suffix name recognizes the hidden danger of error tape;Know
Other process uses logical operation, substantially increases the efficiency of file type identification.
To achieve the above object, the invention provides a kind of file type identification method, this method includes:
Precompile is carried out to the file characteristic of file, so as to obtain bitmap characteristic, the bitmap characteristic includes the first skew
Amount the first character value corresponding with first offset;
The file stream for the first file for needing to recognize file type is obtained from the packet of transmission, the file stream includes
Second offset and corresponding second character value of second offset;
Searched according to second offset from the bitmap characteristic with second offset match it is first inclined
Shifting amount;
Second character value and each first character value are subjected to computing successively, operation result is obtained;
The file type of first file is determined according to the operation result.
Further, it is described to be determined also to include after the file type of first file according to the operation result:It is right
The file type of first file is handled.
Further, the computing is and computing, so as to reduce the scope of locating file type.
Further, precompile is carried out to the file characteristic of file, so as to also include after obtaining bitmap characteristic:Will be described
Bitmap characteristic is loaded into internal memory in process initiation.
Further, the bitmap characteristic also includes file type ID;It is described by second character value and each described the
One character value carries out computing successively, and obtaining operation result includes:
Judge whether second character value matches with first character value;
If second character value is matched with first character value, according to corresponding to presently described first character value
The file type ID determine the file type of first file;
If second character value is mismatched with first character value, the file type of first file is judged
For abnormal document type.
Further, the abnormal document type includes content tampering file type and Unknown file type.
Further, the file stream for the first file for needing to recognize file type is obtained in the packet from transmission,
Also include obtaining file boundaries, between the file boundaries are used to determining at the beginning of the packet of the transmission and the end time.
On the other hand, the invention provides a kind of file type recognition device, described device includes feature collector, text
Part border acquisition module, type identification module, result decision-making module, policy module and strategy matching module;
Feature collector, precompile is carried out for the file characteristic to file, so that bitmap characteristic is obtained, the bitmap
Feature includes file type ID, the first offset and corresponding first character value of first offset;
Policy module, for indicating how the file type of first file is handled;
File boundaries acquisition module, for obtaining the first file for needing to recognize file type from the packet of transmission
File stream and file boundaries, the file stream include the second offset and corresponding second character value of second offset;
Type identification module, for being searched according to second offset from the bitmap characteristic and the described second skew
Measure the first offset matched;Second character value and each first character value are subjected to computing successively, computing is obtained
As a result;
As a result decision-making module, the file type for determining first file according to the operation result;
Strategy matching module, for according to the policy module, the file type to first file to be handled.
Main advantages of the present invention are:
1st, can be with Real-time Alarm particular file types file in the leakage-preventing application of function of data of fire wall of future generation
Send or receive.
2nd, in the leakage-preventing application of function of data of fire wall of future generation, file type can be efficiently and accurately recognized, is
The function such as document analysis and content auditing provides safeguard.
3rd, in the application of the network monitoring devices such as network log-in management, the operation of user in LAN, particulate can be tracked
Change the behavior that presentation user downloads or uploaded.
Brief description of the drawings
Fig. 1 is file type identification method flow chart provided in an embodiment of the present invention;
Fig. 2 is the schematic diagram of file type recognition device provided in an embodiment of the present invention;
Fig. 3 is bitmap characteristic schematic diagram provided in an embodiment of the present invention.
Embodiment
Below by drawings and examples, technical scheme is described in further detail.
Fig. 1 is file type identification method flow chart provided in an embodiment of the present invention.As shown in figure 1, this method is included such as
Lower step:
Step 101, precompile is carried out to the file characteristic of file, so as to obtain bitmap characteristic, bitmap characteristic includes first
Offset and corresponding first character value of first offset;
Further, so as to also include after obtaining bitmap characteristic:Bitmap characteristic is loaded into internal memory in process initiation
In.
Step 102, the file stream for the first file for needing to recognize file type, file stream are obtained from the packet of transmission
Including the second offset and corresponding second character value of second offset;
Further, the file stream for the first file for needing to recognize file type is obtained from the packet of transmission, is also wrapped
Acquisition file boundaries are included, between file boundaries are used to determining at the beginning of the packet of transmission and the end time.
Step 103, the first offset matched with the second offset is searched from bitmap characteristic according to the second offset;
Step 104, the second character value and each first character value are subjected to computing successively, obtain operation result;
Further, bitmap characteristic also includes file type ID;Second character value is carried out successively with each first character value
Computing, obtaining operation result includes:
Judge whether the second character value matches with the first character value;
If the second character value is matched with the first character value, the file type ID according to corresponding to current first character value
Determine the file type of the first file;
If the second character value is mismatched with the first character value, judge the file type of the first file for abnormal document class
Type.
Further, the abnormal document type includes content tampering file type and Unknown file type.
Further, computing is and computing, so as to reduce the scope of locating file type.
Step 105, the file type of the first file is determined according to operation result.
Further, the file type to the first file is handled.
Fig. 2 is a kind of file type recognition device structural representation provided in an embodiment of the present invention.As shown in Fig. 2 file
Type identification device includes feature collector 20, file boundaries acquisition module 10, type identification module 30, result decision-making module
40th, policy module 60 and strategy matching module 50.
Feature collector 20, carries out precompile, so as to obtain bitmap characteristic, bitmap is special for the file characteristic to file
Levy including file type ID, the first offset and corresponding first character value of first offset.
Policy module 60, for indicating how the file type of the first file is handled.;
File boundaries acquisition module 10, for obtaining the first file for needing to recognize file type from the packet of transmission
File stream and file boundaries, file stream include the second offset and corresponding second character value of second offset.
Type identification module 30, for being searched and the second offset phase from the bitmap characteristic according to the second offset
The first offset matched somebody with somebody;Second character value and each first character value are subjected to computing successively, operation result is obtained.
As a result decision-making module 40, the file type for determining the first file according to operation result.
Strategy matching module 50, for according to policy module 60, the file type to the first file to be handled.
Fig. 3 is bitmap characteristic schematic diagram provided in an embodiment of the present invention.As shown in figure 3, the figure includes file type ID, the
One offset and corresponding first character value of the first offset.By the second offset in file stream searched from bitmap characteristic with
The first offset that second offset matches;Second character value and each first character value are carried out and computing successively, if with fortune
It is 1 to calculate result, then according to the current location carried out with the second character value with operation result for 1 the first character value, determines file
Type ID, and file type is determined according to file type ID.
Above-described embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect
Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention
Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc. all should be included
Within protection scope of the present invention.
Claims (7)
1. a kind of file type identification method, it is characterised in that including:
Precompile is carried out to the file characteristic of file, so as to obtain bitmap characteristic, the bitmap characteristic includes the first offset, institute
State corresponding first character value of the first offset, file type ID;
The file stream for the first file for needing to recognize file type is obtained from the packet of transmission, the file stream includes second
Offset and corresponding second character value of second offset;
The first offset matched with second offset is searched from the bitmap characteristic according to second offset;
Second character value and each first character value are subjected to computing successively, operation result is obtained, according to the computing
As a result the file type of first file is determined, including:
Judge whether second character value matches with first character value;
If second character value is matched with first character value, the institute according to corresponding to presently described first character value
State the file type that file type ID determines first file;
If second character value is mismatched with first character value, judge the file type of first file to be different
Normal file type.
2. a kind of file type identification method according to claim 1, it is characterised in that described according to the operation result
Also include after the file type for determining first file:File type to first file is handled.
3. a kind of file type identification method according to claim 1, it is characterised in that the computing is and computing, from
And reduce the scope of locating file type.
4. a kind of file type identification method according to claim 1, it is characterised in that carried out to the file characteristic of file
Precompile, so as to also include after obtaining bitmap characteristic:The bitmap characteristic is loaded into internal memory in process initiation.
5. a kind of file type identification method according to claim 1, it is characterised in that the abnormal document type includes
Content tampering file type and Unknown file type.
6. a kind of file type identification method according to claim 1, it is characterised in that in the packet from transmission
The file stream for the first file for needing to recognize file type is obtained, in addition to obtains file boundaries, the file boundaries are used for true
Between at the beginning of the packet of the fixed transmission and the end time.
7. a kind of file type recognition device, it is characterised in that described device includes feature collector, file boundaries and obtains mould
Block, type identification module, result decision-making module, policy module and strategy matching module;
Feature collector, precompile is carried out for the file characteristic to file, so that bitmap characteristic is obtained, the bitmap characteristic
Including file type ID, the first offset and corresponding first character value of first offset;
Policy module, for indicating how the file type of the first file is handled;
File boundaries acquisition module, the file for obtaining the first file for needing to recognize file type from the packet of transmission
Stream and file boundaries, the file stream include the second offset and corresponding second character value of second offset;
Type identification module, for being searched and the second offset phase from the bitmap characteristic according to second offset
First offset of matching;Second character value and each first character value are subjected to computing successively, operation result is obtained;
As a result decision-making module, the file type for determining first file according to the operation result, including:
Judge whether second character value matches with first character value;
If second character value is matched with first character value, the institute according to corresponding to presently described first character value
State the file type that file type ID determines first file;
If second character value is mismatched with first character value, judge the file type of first file to be different
Normal file type;
Strategy matching module, for according to the policy module, the file type to first file to be handled.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310750085.0A CN103701821B (en) | 2013-12-31 | 2013-12-31 | File type identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310750085.0A CN103701821B (en) | 2013-12-31 | 2013-12-31 | File type identification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103701821A CN103701821A (en) | 2014-04-02 |
CN103701821B true CN103701821B (en) | 2017-07-28 |
Family
ID=50363217
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310750085.0A Active CN103701821B (en) | 2013-12-31 | 2013-12-31 | File type identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103701821B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108460155A (en) * | 2018-03-28 | 2018-08-28 | 深信服科技股份有限公司 | A kind of file identification method, device, equipment and storage medium |
CN110929110B (en) * | 2019-11-13 | 2023-02-21 | 北京北信源软件股份有限公司 | Electronic document detection method, device, equipment and storage medium |
CN111563063B (en) * | 2020-05-12 | 2022-09-13 | 福建天晴在线互动科技有限公司 | Method for identifying file type based on HashMap |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1790328A (en) * | 2004-12-17 | 2006-06-21 | 微软公司 | Extensible file system |
CN103383681A (en) * | 2011-12-31 | 2013-11-06 | 华为数字技术(成都)有限公司 | File type identification method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7480766B2 (en) * | 2005-08-03 | 2009-01-20 | Sandisk Corporation | Interfacing systems operating through a logical address space and on a direct data file basis |
-
2013
- 2013-12-31 CN CN201310750085.0A patent/CN103701821B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1790328A (en) * | 2004-12-17 | 2006-06-21 | 微软公司 | Extensible file system |
CN103383681A (en) * | 2011-12-31 | 2013-11-06 | 华为数字技术(成都)有限公司 | File type identification method and system |
Non-Patent Citations (1)
Title |
---|
基于特征标识的文件类型识别与匹配;张润峰;《计算机安全》;20110630;第40页-第42页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103701821A (en) | 2014-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104320377B (en) | The anti-stealing link method and equipment of a kind of files in stream media | |
US9794246B2 (en) | Increased communication security | |
CN106534146B (en) | A kind of safety monitoring system and method | |
CN101686239B (en) | Trojan discovery system | |
CN104937886A (en) | Log analysis device, information processing method and program | |
US10341294B2 (en) | Unauthorized communication detection system and unauthorized communication detection method | |
CN107995179B (en) | Unknown threat sensing method, device, equipment and system | |
CN105306463A (en) | Modbus TCP intrusion detection method based on support vector machine | |
US10389714B2 (en) | Increased communication security | |
CN109831448A (en) | For the detection method of particular encryption web page access behavior | |
US10057155B2 (en) | Method and apparatus for determining automatic scanning action | |
CN103701821B (en) | File type identification method and device | |
Bai et al. | Analysis and detection of bogus behavior in web crawler measurement | |
CN110020161B (en) | Data processing method, log processing method and terminal | |
CN102130791A (en) | Method, device and gateway server for detecting agent on gateway server | |
CN112671724B (en) | Terminal security detection analysis method, device, equipment and readable storage medium | |
Yin et al. | Anomaly traffic detection based on feature fluctuation for secure industrial internet of things | |
Zhou et al. | A model-based method for enabling source mapping and intrusion detection on proprietary can bus | |
Oudah et al. | Using burstiness for network applications classification | |
TWI750252B (en) | Method and device for recording website access log | |
CN110401639B (en) | Method and device for judging abnormality of network access, server and storage medium thereof | |
US9049170B2 (en) | Building filter through utilization of automated generation of regular expression | |
Yang et al. | RTP timestamp steganography detection method | |
Su et al. | Mobile traffic identification based on application's network signature | |
JP6055726B2 (en) | Web page monitoring device, web page monitoring system, web page monitoring method and computer program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |