CN103701821A - File type recognition method and device - Google Patents
File type recognition method and device Download PDFInfo
- Publication number
- CN103701821A CN103701821A CN201310750085.0A CN201310750085A CN103701821A CN 103701821 A CN103701821 A CN 103701821A CN 201310750085 A CN201310750085 A CN 201310750085A CN 103701821 A CN103701821 A CN 103701821A
- Authority
- CN
- China
- Prior art keywords
- file
- file type
- character value
- play amount
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention relates to a file type recognition method, which comprises the steps of pre-compiling the file characteristic of a file to obtain a bitmap characteristic, wherein the bitmap characteristic comprises first offsets and first character values corresponding to the first offsets; obtaining a file stream of a first file with a file type to be recognized from a transmitted data packet, wherein the file stream comprises second offsets and second character values corresponding to the second offsets; searching the first offsets matched with the second offsets from the bitmap characteristic according to the second offsets; sequentially performing calculation to the second character values and each first character value to obtain a calculation result; determining the file type of the first file according to the calculation result. The file type recognition method has the advantages that the file type can be effectively and accurately recognized, the alarming of specific file types and the tracking of user operation in a local area network are realized in real time and the behaviors of uploading or downloading files by users are presented in fine grains.
Description
Technical field
The present invention relates to computer and network safety filed, particularly relate to a kind of file type recognition methods and device.
Background technology
Along with scientific and technical development, people are more and more higher to the degree of dependence of network, and transmit data by network, yet in transmission data, information security is also faced with great challenge.In order to prevent confidential information leakage, network manager or enterprise usually need the type of transfer files to identify and detect.
Existing file type recognition technology is analyzed by application identification and protocol depth, obtains filename, and the suffix name in dependent file name is determined file type.Although the method does not need locating file border, do not need Study document content, if the filename of file is modified in actual applications, will identify wrong result, therefore, use the low and mistake of this technology correct recognition rata to expect.
Meanwhile, the file type recognition methods based on devil's numeral, the method is mated with file header data flow, according to matching result judgement file type.Although the method can effectively be identified file type, adopts character string to compare, recognition efficiency is low, can not meet the demand of the network equipment to forwarding performance.
Therefore, existing file type recognition technology all can not be identified file type accurately and efficiently.
Summary of the invention
The object of the invention is to improve file type recognition accuracy, the hidden danger of avoiding suffix name identification error to bring; Identifying adopts logical operation, has greatly improved the efficiency of file type identification.
For achieving the above object, the invention provides a kind of file type recognition methods, the method comprises:
The file characteristic of file is carried out to precompile, thereby obtain bitmap feature, described bitmap feature comprises the first side-play amount and the first character value corresponding to described the first side-play amount;
The document flow of obtaining the first file that need to identify file type from the packet of transmission, described document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount;
According to described the second side-play amount, from described bitmap feature, search the first side-play amount matching with described the second side-play amount;
By described the second character value and described in each the first character value carry out successively computing, obtain operation result;
According to described operation result, determine the file type of described the first file.
Further, describedly also comprise after determining the file type of described the first file according to described operation result: the file type to described the first file is processed.
Further, described computing is and computing, thereby dwindles the scope of locating file type.
Further, the file characteristic of file is carried out to precompile, thereby obtain after bitmap feature, also comprise: described bitmap feature is loaded in internal memory when the process initiation.
Further, described bitmap feature also comprises file type ID; Described by described the second character value and described in each the first character value carry out successively computing, obtain operation result and comprise:
Judge whether described the second character value mates with described the first character value;
If described the second character value is mated with described the first character value, according to the corresponding described file type ID of current described the first character value, determine the file type of described the first file;
If described the second character value is not mated with described the first character value, the file type that judges described the first file is abnormal document type.
Further, described abnormal document type comprises content tampering file type and Unknown file type.
Further, the described document flow of obtaining the first file that need to identify file type from the packet of transmission, also comprises and obtains file border, and described file border is for determining time started and the end time of the packet of described transmission.
On the other hand, the invention provides a kind of file type recognition device, described device comprises feature collector, file border acquisition module, type identification module, result decision-making module, policy module and strategy matching module;
Feature collector, for the file characteristic of file is carried out to precompile, thereby obtains bitmap feature, and described bitmap feature comprises file type ID, the first side-play amount and the first character value corresponding to described the first side-play amount;
Policy module, how the file type that is used to indicate described the first file is processed;
File border acquisition module, the document flow and the file border that for the packet from transmission, obtain the first file that need to identify file type, described document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount;
Type identification module, for searching the first side-play amount matching with described the second side-play amount from described bitmap feature according to described the second side-play amount; By described the second character value and described in each the first character value carry out successively computing, obtain operation result;
Result decision-making module, for determining the file type of described the first file according to described operation result;
Strategy matching module, for according to described policy module, processes the file type of described the first file.
Major advantage of the present invention is:
1, in the leakage-preventing function application of the data of fire compartment wall of future generation, transmission or reception that can Real-time Alarm particular file types file.
2, in the leakage-preventing function application of data of fire compartment wall of future generation, can identify efficiently and accurately file type, for the functions such as document analysis and content auditing provide safeguard.
3, in the network monitoring device application such as internet behavior management, can follow the tracks of the operation of user in local area network (LAN), grain refined presents the behavior that user downloads or uploads.
Accompanying drawing explanation
The file type recognition methods flow chart that Fig. 1 provides for the embodiment of the present invention;
The schematic diagram of the file type recognition device that Fig. 2 provides for the embodiment of the present invention;
The bitmap feature schematic diagram that Fig. 3 provides for the embodiment of the present invention.
Embodiment
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
The file type recognition methods flow chart that Fig. 1 provides for the embodiment of the present invention.As shown in Figure 1, the method comprises the steps:
Step 101, carries out precompile to the file characteristic of file, thereby obtains bitmap feature, and bitmap feature comprises the first side-play amount and the first character value corresponding to described the first side-play amount;
Further, thus obtain after bitmap feature and also comprise: bitmap feature is loaded in internal memory when the process initiation.
Step 102 is obtained the document flow of the first file that need to identify file type from the packet of transmission, and document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount;
Further, obtain the document flow of the first file that need to identify file type from the packet of transmission, also comprise and obtain file border, file border is for determining time started and the end time of the packet of transmission.
Step 103 is searched the first side-play amount matching with the second side-play amount from bitmap feature according to the second side-play amount;
Step 104, carries out computing successively by the second character value and each the first character value, obtains operation result;
Further, bitmap feature also comprises file type ID; The second character value and each the first character value are carried out to computing successively, obtain operation result and comprise:
Judge whether the second character value mates with the first character value;
If the second character value is mated with the first character value, according to the corresponding file type ID of current the first character value, determine the file type of the first file;
If the second character value is not mated with the first character value, the file type that judges the first file is abnormal document type.
Further, described abnormal document type comprises content tampering file type and Unknown file type.
Further, computing is and computing, thereby dwindles the scope of locating file type.
Further, the file type of the first file is processed.
A kind of file type recognition device structural representation that Fig. 2 provides for the embodiment of the present invention.As shown in Figure 2, file type recognition device comprises feature collector 20, file border acquisition module 10, type identification module 30, result decision-making module 40, policy module 60 and strategy matching module 50.
File border acquisition module 10, the document flow and the file border that for the packet from transmission, obtain the first file that need to identify file type, document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount.
Result decision-making module 40, for determining the file type of the first file according to operation result.
Strategy matching module 50, for according to policy module 60, processes the file type of the first file.
The bitmap feature schematic diagram that Fig. 3 provides for the embodiment of the present invention.As shown in Figure 3, this figure comprises file type ID, the first character value that the first side-play amount and the first side-play amount are corresponding.The second side-play amount in document flow is searched from bitmap feature to the first side-play amount matching with the second side-play amount; The second character value and each the first character value are carried out and computing successively, if with operation result be 1,, according to the current location with the second character value the first character value that to carry out with operation result be 1, determine file type ID, and determine file type according to file type ID.
Above-described embodiment; object of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the foregoing is only the specific embodiment of the present invention; the protection range being not intended to limit the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.
Claims (8)
1. a file type recognition methods, is characterized in that, comprising:
The file characteristic of file is carried out to precompile, thereby obtain bitmap feature, described bitmap feature comprises the first side-play amount and the first character value corresponding to described the first side-play amount;
The document flow of obtaining the first file that need to identify file type from the packet of transmission, described document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount;
According to described the second side-play amount, from described bitmap feature, search the first side-play amount matching with described the second side-play amount;
By described the second character value and described in each the first character value carry out successively computing, obtain operation result;
According to described operation result, determine the file type of described the first file.
2. a kind of file type recognition methods according to claim 1, is characterized in that, describedly also comprises after determining the file type of described the first file according to described operation result: the file type to described the first file is processed.
3. a kind of file type recognition methods according to claim 1, is characterized in that, described computing is and computing, thereby dwindles the scope of locating file type.
4. a kind of file type recognition methods according to claim 1, is characterized in that, the file characteristic of file is carried out to precompile, thereby obtain after bitmap feature, also comprises: described bitmap feature is loaded in internal memory when the process initiation.
5. a kind of file type recognition methods according to claim 1, is characterized in that, described bitmap feature also comprises file type ID; Described by described the second character value and described in each the first character value carry out successively computing, obtain operation result and comprise:
Judge whether described the second character value mates with described the first character value;
If described the second character value is mated with described the first character value, according to the corresponding described file type ID of current described the first character value, determine the file type of described the first file;
If described the second character value is not mated with described the first character value, the file type that judges described the first file is abnormal document type.
6. a kind of file type recognition methods according to claim 1, is characterized in that, described abnormal document type comprises content tampering file type and Unknown file type.
7. a kind of file type recognition methods according to claim 1, it is characterized in that, the described document flow of obtaining the first file that need to identify file type from the packet of transmission, also comprise and obtain file border, described file border is for determining time started and the end time of the packet of described transmission.
8. a file type recognition device, is characterized in that, described device comprises feature collector, file border acquisition module, type identification module, result decision-making module, policy module and strategy matching module;
Feature collector, for the file characteristic of file is carried out to precompile, thereby obtains bitmap feature, and described bitmap feature comprises file type ID, the first side-play amount and the first character value corresponding to described the first side-play amount;
Policy module, how the file type that is used to indicate described the first file is processed;
File border acquisition module, the document flow and the file border that for the packet from transmission, obtain the first file that need to identify file type, described document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount;
Type identification module, for searching the first side-play amount matching with described the second side-play amount from described bitmap feature according to described the second side-play amount; By described the second character value and described in each the first character value carry out successively computing, obtain operation result;
Result decision-making module, for determining the file type of described the first file according to described operation result;
Strategy matching module, for according to described policy module, processes the file type of described the first file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310750085.0A CN103701821B (en) | 2013-12-31 | 2013-12-31 | File type identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310750085.0A CN103701821B (en) | 2013-12-31 | 2013-12-31 | File type identification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103701821A true CN103701821A (en) | 2014-04-02 |
CN103701821B CN103701821B (en) | 2017-07-28 |
Family
ID=50363217
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310750085.0A Active CN103701821B (en) | 2013-12-31 | 2013-12-31 | File type identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103701821B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108460155A (en) * | 2018-03-28 | 2018-08-28 | 深信服科技股份有限公司 | A kind of file identification method, device, equipment and storage medium |
CN110929110A (en) * | 2019-11-13 | 2020-03-27 | 北京北信源软件股份有限公司 | Electronic document detection method, device, equipment and storage medium |
CN111563063A (en) * | 2020-05-12 | 2020-08-21 | 福建天晴在线互动科技有限公司 | Method for identifying file type based on HashMap |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1790328A (en) * | 2004-12-17 | 2006-06-21 | 微软公司 | Extensible file system |
US20070033323A1 (en) * | 2005-08-03 | 2007-02-08 | Gorobets Sergey A | Interfacing systems operating through a logical address space and on a direct data file basis |
CN103383681A (en) * | 2011-12-31 | 2013-11-06 | 华为数字技术(成都)有限公司 | File type identification method and system |
-
2013
- 2013-12-31 CN CN201310750085.0A patent/CN103701821B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1790328A (en) * | 2004-12-17 | 2006-06-21 | 微软公司 | Extensible file system |
US20070033323A1 (en) * | 2005-08-03 | 2007-02-08 | Gorobets Sergey A | Interfacing systems operating through a logical address space and on a direct data file basis |
CN103383681A (en) * | 2011-12-31 | 2013-11-06 | 华为数字技术(成都)有限公司 | File type identification method and system |
Non-Patent Citations (1)
Title |
---|
张润峰: "基于特征标识的文件类型识别与匹配", 《计算机安全》, 30 June 2011 (2011-06-30), pages 40 - 42 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108460155A (en) * | 2018-03-28 | 2018-08-28 | 深信服科技股份有限公司 | A kind of file identification method, device, equipment and storage medium |
CN110929110A (en) * | 2019-11-13 | 2020-03-27 | 北京北信源软件股份有限公司 | Electronic document detection method, device, equipment and storage medium |
CN111563063A (en) * | 2020-05-12 | 2020-08-21 | 福建天晴在线互动科技有限公司 | Method for identifying file type based on HashMap |
Also Published As
Publication number | Publication date |
---|---|
CN103701821B (en) | 2017-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111401416B (en) | Abnormal website identification method and device and abnormal countermeasure identification method | |
Luo et al. | Position-based automatic reverse engineering of network protocols | |
US9300682B2 (en) | Composite analysis of executable content across enterprise network | |
WO2018107631A1 (en) | Automatic establishing method and apparatus for intrusion detection model based on industrial control network | |
CN104008381B (en) | A kind of personal identification method and device | |
CN108664480B (en) | Multi-data-source user information integration method and device | |
US9122910B2 (en) | Method, apparatus, and system for friend recommendations | |
US20190171644A1 (en) | Efficient event searching | |
CN109831448A (en) | For the detection method of particular encryption web page access behavior | |
CN110392046B (en) | Method and device for detecting abnormity of network access | |
CN105138709A (en) | Remote evidence taking system based on physical memory analysis | |
CN103701821A (en) | File type recognition method and device | |
CN112565278A (en) | Attack capturing method and honeypot system | |
Khan et al. | Digital forensics and cyber forensics investigation: security challenges, limitations, open issues, and future direction | |
CN113343228B (en) | Event credibility analysis method and device, electronic equipment and readable storage medium | |
CN110704698B (en) | Correlation and query method for unstructured massive network security data | |
CN110602059B (en) | Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data | |
Lackner et al. | User tracking based on behavioral fingerprints | |
CN110472410B (en) | Method and device for identifying data and data processing method | |
CN110401639B (en) | Method and device for judging abnormality of network access, server and storage medium thereof | |
CN110830416A (en) | Network intrusion detection method and device | |
CN115085948B (en) | Network security situation assessment method based on improved D-S evidence theory | |
CN103685316A (en) | Audit processing method for network transfer file | |
Cai et al. | Seminer: Side-information-based semantics miner for proprietary industrial control protocols | |
CN110868421A (en) | Malicious code identification method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |