CN103701821A - File type recognition method and device - Google Patents

File type recognition method and device Download PDF

Info

Publication number
CN103701821A
CN103701821A CN201310750085.0A CN201310750085A CN103701821A CN 103701821 A CN103701821 A CN 103701821A CN 201310750085 A CN201310750085 A CN 201310750085A CN 103701821 A CN103701821 A CN 103701821A
Authority
CN
China
Prior art keywords
file
file type
character value
play amount
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310750085.0A
Other languages
Chinese (zh)
Other versions
CN103701821B (en
Inventor
郭璞
曹政
刘岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING NETENTSEC Inc
Original Assignee
BEIJING NETENTSEC Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING NETENTSEC Inc filed Critical BEIJING NETENTSEC Inc
Priority to CN201310750085.0A priority Critical patent/CN103701821B/en
Publication of CN103701821A publication Critical patent/CN103701821A/en
Application granted granted Critical
Publication of CN103701821B publication Critical patent/CN103701821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a file type recognition method, which comprises the steps of pre-compiling the file characteristic of a file to obtain a bitmap characteristic, wherein the bitmap characteristic comprises first offsets and first character values corresponding to the first offsets; obtaining a file stream of a first file with a file type to be recognized from a transmitted data packet, wherein the file stream comprises second offsets and second character values corresponding to the second offsets; searching the first offsets matched with the second offsets from the bitmap characteristic according to the second offsets; sequentially performing calculation to the second character values and each first character value to obtain a calculation result; determining the file type of the first file according to the calculation result. The file type recognition method has the advantages that the file type can be effectively and accurately recognized, the alarming of specific file types and the tracking of user operation in a local area network are realized in real time and the behaviors of uploading or downloading files by users are presented in fine grains.

Description

File type recognition methods and device
Technical field
The present invention relates to computer and network safety filed, particularly relate to a kind of file type recognition methods and device.
Background technology
Along with scientific and technical development, people are more and more higher to the degree of dependence of network, and transmit data by network, yet in transmission data, information security is also faced with great challenge.In order to prevent confidential information leakage, network manager or enterprise usually need the type of transfer files to identify and detect.
Existing file type recognition technology is analyzed by application identification and protocol depth, obtains filename, and the suffix name in dependent file name is determined file type.Although the method does not need locating file border, do not need Study document content, if the filename of file is modified in actual applications, will identify wrong result, therefore, use the low and mistake of this technology correct recognition rata to expect.
Meanwhile, the file type recognition methods based on devil's numeral, the method is mated with file header data flow, according to matching result judgement file type.Although the method can effectively be identified file type, adopts character string to compare, recognition efficiency is low, can not meet the demand of the network equipment to forwarding performance.
Therefore, existing file type recognition technology all can not be identified file type accurately and efficiently.
Summary of the invention
The object of the invention is to improve file type recognition accuracy, the hidden danger of avoiding suffix name identification error to bring; Identifying adopts logical operation, has greatly improved the efficiency of file type identification.
For achieving the above object, the invention provides a kind of file type recognition methods, the method comprises:
The file characteristic of file is carried out to precompile, thereby obtain bitmap feature, described bitmap feature comprises the first side-play amount and the first character value corresponding to described the first side-play amount;
The document flow of obtaining the first file that need to identify file type from the packet of transmission, described document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount;
According to described the second side-play amount, from described bitmap feature, search the first side-play amount matching with described the second side-play amount;
By described the second character value and described in each the first character value carry out successively computing, obtain operation result;
According to described operation result, determine the file type of described the first file.
Further, describedly also comprise after determining the file type of described the first file according to described operation result: the file type to described the first file is processed.
Further, described computing is and computing, thereby dwindles the scope of locating file type.
Further, the file characteristic of file is carried out to precompile, thereby obtain after bitmap feature, also comprise: described bitmap feature is loaded in internal memory when the process initiation.
Further, described bitmap feature also comprises file type ID; Described by described the second character value and described in each the first character value carry out successively computing, obtain operation result and comprise:
Judge whether described the second character value mates with described the first character value;
If described the second character value is mated with described the first character value, according to the corresponding described file type ID of current described the first character value, determine the file type of described the first file;
If described the second character value is not mated with described the first character value, the file type that judges described the first file is abnormal document type.
Further, described abnormal document type comprises content tampering file type and Unknown file type.
Further, the described document flow of obtaining the first file that need to identify file type from the packet of transmission, also comprises and obtains file border, and described file border is for determining time started and the end time of the packet of described transmission.
On the other hand, the invention provides a kind of file type recognition device, described device comprises feature collector, file border acquisition module, type identification module, result decision-making module, policy module and strategy matching module;
Feature collector, for the file characteristic of file is carried out to precompile, thereby obtains bitmap feature, and described bitmap feature comprises file type ID, the first side-play amount and the first character value corresponding to described the first side-play amount;
Policy module, how the file type that is used to indicate described the first file is processed;
File border acquisition module, the document flow and the file border that for the packet from transmission, obtain the first file that need to identify file type, described document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount;
Type identification module, for searching the first side-play amount matching with described the second side-play amount from described bitmap feature according to described the second side-play amount; By described the second character value and described in each the first character value carry out successively computing, obtain operation result;
Result decision-making module, for determining the file type of described the first file according to described operation result;
Strategy matching module, for according to described policy module, processes the file type of described the first file.
Major advantage of the present invention is:
1, in the leakage-preventing function application of the data of fire compartment wall of future generation, transmission or reception that can Real-time Alarm particular file types file.
2, in the leakage-preventing function application of data of fire compartment wall of future generation, can identify efficiently and accurately file type, for the functions such as document analysis and content auditing provide safeguard.
3, in the network monitoring device application such as internet behavior management, can follow the tracks of the operation of user in local area network (LAN), grain refined presents the behavior that user downloads or uploads.
Accompanying drawing explanation
The file type recognition methods flow chart that Fig. 1 provides for the embodiment of the present invention;
The schematic diagram of the file type recognition device that Fig. 2 provides for the embodiment of the present invention;
The bitmap feature schematic diagram that Fig. 3 provides for the embodiment of the present invention.
Embodiment
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
The file type recognition methods flow chart that Fig. 1 provides for the embodiment of the present invention.As shown in Figure 1, the method comprises the steps:
Step 101, carries out precompile to the file characteristic of file, thereby obtains bitmap feature, and bitmap feature comprises the first side-play amount and the first character value corresponding to described the first side-play amount;
Further, thus obtain after bitmap feature and also comprise: bitmap feature is loaded in internal memory when the process initiation.
Step 102 is obtained the document flow of the first file that need to identify file type from the packet of transmission, and document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount;
Further, obtain the document flow of the first file that need to identify file type from the packet of transmission, also comprise and obtain file border, file border is for determining time started and the end time of the packet of transmission.
Step 103 is searched the first side-play amount matching with the second side-play amount from bitmap feature according to the second side-play amount;
Step 104, carries out computing successively by the second character value and each the first character value, obtains operation result;
Further, bitmap feature also comprises file type ID; The second character value and each the first character value are carried out to computing successively, obtain operation result and comprise:
Judge whether the second character value mates with the first character value;
If the second character value is mated with the first character value, according to the corresponding file type ID of current the first character value, determine the file type of the first file;
If the second character value is not mated with the first character value, the file type that judges the first file is abnormal document type.
Further, described abnormal document type comprises content tampering file type and Unknown file type.
Further, computing is and computing, thereby dwindles the scope of locating file type.
Step 105, determines the file type of the first file according to operation result.
Further, the file type of the first file is processed.
A kind of file type recognition device structural representation that Fig. 2 provides for the embodiment of the present invention.As shown in Figure 2, file type recognition device comprises feature collector 20, file border acquisition module 10, type identification module 30, result decision-making module 40, policy module 60 and strategy matching module 50.
Feature collector 20, for the file characteristic of file is carried out to precompile, thereby obtains bitmap feature, and bitmap feature comprises file type ID, the first side-play amount and the first character value corresponding to described the first side-play amount.
Policy module 60, how the file type that is used to indicate the first file is processed.;
File border acquisition module 10, the document flow and the file border that for the packet from transmission, obtain the first file that need to identify file type, document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount.
Type identification module 30, for searching the first side-play amount matching with the second side-play amount from described bitmap feature according to the second side-play amount; The second character value and each the first character value are carried out to computing successively, obtain operation result.
Result decision-making module 40, for determining the file type of the first file according to operation result.
Strategy matching module 50, for according to policy module 60, processes the file type of the first file.
The bitmap feature schematic diagram that Fig. 3 provides for the embodiment of the present invention.As shown in Figure 3, this figure comprises file type ID, the first character value that the first side-play amount and the first side-play amount are corresponding.The second side-play amount in document flow is searched from bitmap feature to the first side-play amount matching with the second side-play amount; The second character value and each the first character value are carried out and computing successively, if with operation result be 1,, according to the current location with the second character value the first character value that to carry out with operation result be 1, determine file type ID, and determine file type according to file type ID.
Above-described embodiment; object of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the foregoing is only the specific embodiment of the present invention; the protection range being not intended to limit the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (8)

1. a file type recognition methods, is characterized in that, comprising:
The file characteristic of file is carried out to precompile, thereby obtain bitmap feature, described bitmap feature comprises the first side-play amount and the first character value corresponding to described the first side-play amount;
The document flow of obtaining the first file that need to identify file type from the packet of transmission, described document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount;
According to described the second side-play amount, from described bitmap feature, search the first side-play amount matching with described the second side-play amount;
By described the second character value and described in each the first character value carry out successively computing, obtain operation result;
According to described operation result, determine the file type of described the first file.
2. a kind of file type recognition methods according to claim 1, is characterized in that, describedly also comprises after determining the file type of described the first file according to described operation result: the file type to described the first file is processed.
3. a kind of file type recognition methods according to claim 1, is characterized in that, described computing is and computing, thereby dwindles the scope of locating file type.
4. a kind of file type recognition methods according to claim 1, is characterized in that, the file characteristic of file is carried out to precompile, thereby obtain after bitmap feature, also comprises: described bitmap feature is loaded in internal memory when the process initiation.
5. a kind of file type recognition methods according to claim 1, is characterized in that, described bitmap feature also comprises file type ID; Described by described the second character value and described in each the first character value carry out successively computing, obtain operation result and comprise:
Judge whether described the second character value mates with described the first character value;
If described the second character value is mated with described the first character value, according to the corresponding described file type ID of current described the first character value, determine the file type of described the first file;
If described the second character value is not mated with described the first character value, the file type that judges described the first file is abnormal document type.
6. a kind of file type recognition methods according to claim 1, is characterized in that, described abnormal document type comprises content tampering file type and Unknown file type.
7. a kind of file type recognition methods according to claim 1, it is characterized in that, the described document flow of obtaining the first file that need to identify file type from the packet of transmission, also comprise and obtain file border, described file border is for determining time started and the end time of the packet of described transmission.
8. a file type recognition device, is characterized in that, described device comprises feature collector, file border acquisition module, type identification module, result decision-making module, policy module and strategy matching module;
Feature collector, for the file characteristic of file is carried out to precompile, thereby obtains bitmap feature, and described bitmap feature comprises file type ID, the first side-play amount and the first character value corresponding to described the first side-play amount;
Policy module, how the file type that is used to indicate described the first file is processed;
File border acquisition module, the document flow and the file border that for the packet from transmission, obtain the first file that need to identify file type, described document flow comprises the second side-play amount and the second character value corresponding to described the second side-play amount;
Type identification module, for searching the first side-play amount matching with described the second side-play amount from described bitmap feature according to described the second side-play amount; By described the second character value and described in each the first character value carry out successively computing, obtain operation result;
Result decision-making module, for determining the file type of described the first file according to described operation result;
Strategy matching module, for according to described policy module, processes the file type of described the first file.
CN201310750085.0A 2013-12-31 2013-12-31 File type identification method and device Active CN103701821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310750085.0A CN103701821B (en) 2013-12-31 2013-12-31 File type identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310750085.0A CN103701821B (en) 2013-12-31 2013-12-31 File type identification method and device

Publications (2)

Publication Number Publication Date
CN103701821A true CN103701821A (en) 2014-04-02
CN103701821B CN103701821B (en) 2017-07-28

Family

ID=50363217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310750085.0A Active CN103701821B (en) 2013-12-31 2013-12-31 File type identification method and device

Country Status (1)

Country Link
CN (1) CN103701821B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460155A (en) * 2018-03-28 2018-08-28 深信服科技股份有限公司 A kind of file identification method, device, equipment and storage medium
CN110929110A (en) * 2019-11-13 2020-03-27 北京北信源软件股份有限公司 Electronic document detection method, device, equipment and storage medium
CN111563063A (en) * 2020-05-12 2020-08-21 福建天晴在线互动科技有限公司 Method for identifying file type based on HashMap

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1790328A (en) * 2004-12-17 2006-06-21 微软公司 Extensible file system
US20070033323A1 (en) * 2005-08-03 2007-02-08 Gorobets Sergey A Interfacing systems operating through a logical address space and on a direct data file basis
CN103383681A (en) * 2011-12-31 2013-11-06 华为数字技术(成都)有限公司 File type identification method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1790328A (en) * 2004-12-17 2006-06-21 微软公司 Extensible file system
US20070033323A1 (en) * 2005-08-03 2007-02-08 Gorobets Sergey A Interfacing systems operating through a logical address space and on a direct data file basis
CN103383681A (en) * 2011-12-31 2013-11-06 华为数字技术(成都)有限公司 File type identification method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张润峰: "基于特征标识的文件类型识别与匹配", 《计算机安全》, 30 June 2011 (2011-06-30), pages 40 - 42 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460155A (en) * 2018-03-28 2018-08-28 深信服科技股份有限公司 A kind of file identification method, device, equipment and storage medium
CN110929110A (en) * 2019-11-13 2020-03-27 北京北信源软件股份有限公司 Electronic document detection method, device, equipment and storage medium
CN111563063A (en) * 2020-05-12 2020-08-21 福建天晴在线互动科技有限公司 Method for identifying file type based on HashMap

Also Published As

Publication number Publication date
CN103701821B (en) 2017-07-28

Similar Documents

Publication Publication Date Title
CN111401416B (en) Abnormal website identification method and device and abnormal countermeasure identification method
Luo et al. Position-based automatic reverse engineering of network protocols
US9300682B2 (en) Composite analysis of executable content across enterprise network
WO2018107631A1 (en) Automatic establishing method and apparatus for intrusion detection model based on industrial control network
CN104008381B (en) A kind of personal identification method and device
CN108664480B (en) Multi-data-source user information integration method and device
US9122910B2 (en) Method, apparatus, and system for friend recommendations
US20190171644A1 (en) Efficient event searching
CN109831448A (en) For the detection method of particular encryption web page access behavior
CN110392046B (en) Method and device for detecting abnormity of network access
CN105138709A (en) Remote evidence taking system based on physical memory analysis
CN103701821A (en) File type recognition method and device
CN112565278A (en) Attack capturing method and honeypot system
Khan et al. Digital forensics and cyber forensics investigation: security challenges, limitations, open issues, and future direction
CN113343228B (en) Event credibility analysis method and device, electronic equipment and readable storage medium
CN110704698B (en) Correlation and query method for unstructured massive network security data
CN110602059B (en) Method for accurately restoring clear text length fingerprint of TLS protocol encrypted transmission data
Lackner et al. User tracking based on behavioral fingerprints
CN110472410B (en) Method and device for identifying data and data processing method
CN110401639B (en) Method and device for judging abnormality of network access, server and storage medium thereof
CN110830416A (en) Network intrusion detection method and device
CN115085948B (en) Network security situation assessment method based on improved D-S evidence theory
CN103685316A (en) Audit processing method for network transfer file
Cai et al. Seminer: Side-information-based semantics miner for proprietary industrial control protocols
CN110868421A (en) Malicious code identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant