CN108460155A - A kind of file identification method, device, equipment and storage medium - Google Patents
A kind of file identification method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN108460155A CN108460155A CN201810265755.2A CN201810265755A CN108460155A CN 108460155 A CN108460155 A CN 108460155A CN 201810265755 A CN201810265755 A CN 201810265755A CN 108460155 A CN108460155 A CN 108460155A
- Authority
- CN
- China
- Prior art keywords
- file
- text
- binary
- result
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
Abstract
This application discloses a kind of file identification method, device, equipment and storage medium, this method includes:Determine the file class of file destination;If the file class of the file destination is binary file, feature string corresponding with the binary file is searched, and determines the file identification result of the binary file according to lookup result;If the file class of the file destination is text file, keyword corresponding with the text file and/or canonical sentence are searched for, and determines the file identification result of the text file according to search result.From the foregoing, it will be observed that being not necessarily based on file suffixes name this application discloses a kind of and carry out the technical solution of file identification, it is possible thereby to realize that file after being modified to the file or suffix name of no suffix name carries out file identification, to improve the discrimination of file.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of file identification method, device, equipment and storage are situated between
Matter.
Background technology
Currently, in order to identify the file format or file type of some file, it is common practice to after this document
Sew name to be identified, this identification method is very convenient and fast in normal conditions, and application range is wider.However, by
It can be changed in the suffix name of file, after the suffix name of file is either intentionally or unintentionally changed, above-mentioned identification method
It will be unable to that correctly file is identified.In addition, since some files are not no suffix names, will be unable at this time above-mentioned knowledge
Other mode is applied in the identification process of these files.
In summary as can be seen that the format identification rate for how promoting file is that have problem to be solved at present.
Invention content
In view of this, the purpose of the present invention is to provide a kind of file identification method, device, equipment and storage medium, energy
Enough discriminations for effectively promoting file.Its concrete scheme is as follows:
In a first aspect, the invention discloses a kind of file identification methods, including:
Determine the file class of file destination;
If the file class of the file destination is binary file, feature corresponding with the binary file is searched
Character string, and determine according to lookup result the file identification result of the binary file;
If the file class of the file destination is text file, keyword corresponding with the text file is searched for
And/or canonical sentence, and determine according to search result the file identification result of the text file.
Optionally, the binary file includes PE files, compound document or compressed file.
Optionally, described to search feature string corresponding with the binary file, and institute is determined according to lookup result
The step of stating the file identification result of binary file, including:
The file header feature of the binary file is searched, and judges that the binary file is according to current lookup result
No is PE files;
If it is not, then using the mapping table between the feature string and offset of preset compound document, lookup and institute
The corresponding feature string of binary file is stated, and judges whether the binary file is compound text according to current lookup result
Shelves;
If it is not, then using the mapping table between the feature string and offset of preset compressed file, lookup and institute
The corresponding feature string of binary file is stated, and judges whether the binary file is compression text according to current lookup result
Part.
Optionally, the file header feature includes DOS features and NT features.
Optionally, the text file includes programming file or script file.
Optionally, described search keyword corresponding with the text file and/or canonical sentence, and according to search result
The step of determining the file identification result of the text file, including:
Using the keyword and/or canonical sentence of preset programming file, key corresponding with the text file is searched for
Word and canonical sentence;
The corresponding practical discrimination of first file is determined according to current search result;
Judge whether the practical discrimination of the first file is more than the first predetermined threshold value, if it is, judging the text
File is programming file.
Optionally, described search keyword corresponding with the text file and/or canonical sentence, and according to search result
The step of determining the file identification result of the text file, including:
Using the keyword and/or canonical sentence of preset script file, key corresponding with the text file is searched for
Word and canonical sentence;
The corresponding practical discrimination of second file is determined according to current search result;
Judge whether the described second practical discrimination is more than the second predetermined threshold value, if it is, judging the text file
For script file.
Optionally, before the step of file class of the determining file destination, further include:
Determine whether the file destination includes file suffixes name;
If it is, directly determining the file identification knot of the file destination according to the file suffixes name of the file destination
Fruit.
Second aspect, the invention discloses a kind of file identification devices, including:
File class determining module, the file class for determining file destination;
Binary file identification module, for when the file destination file class be binary file, then search with
The corresponding feature string of the binary file, and determine according to lookup result the file identification knot of the binary file
Fruit;
Text file identification module, for when the file destination file class be text file, then search with it is described
The corresponding keyword of text file and/or canonical sentence, and determine according to search result the file identification knot of the text file
Fruit.
Optionally, the binary file includes PE files, compound document or compressed file.
Optionally, the binary file identification module, including:
First judging unit, the file header feature for searching the binary file, and sentenced according to current lookup result
Whether the binary file of breaking is PE files;
Second judgment unit then utilizes preset compound text for being no when the judging result of first judging unit
Mapping table between the feature string and offset of shelves searches feature string corresponding with the binary file, and root
Judge whether the binary file is compound document according to current lookup result;
Third judging unit then utilizes preset compression text for being no when the judging result of the second judgment unit
Mapping table between the feature string and offset of part searches feature string corresponding with the binary file, and root
Judge whether the binary file is compressed file according to current lookup result.
Optionally, the file header feature includes DOS features and NT features.
Optionally, the text file includes programming file or script file.
Optionally, the text file identification module, including:
First search unit, for the keyword and/or canonical sentence using preset programming file, search and the text
The corresponding keyword of this document and canonical sentence;
First determination unit, for determining the corresponding practical discrimination of first file according to current search result;
4th judging unit, for judging whether the practical discrimination of the first file is more than the first predetermined threshold value, if
It is then to judge the text file for programming file.
Optionally, the text file identification module, including:
First search unit, for the keyword and/or canonical sentence using preset script file, search and the text
The corresponding keyword of this document and canonical sentence;
First determination unit, for determining the corresponding practical discrimination of second file according to current search result;
5th judging unit, for judging whether the described second practical discrimination is more than the second predetermined threshold value, if it is,
Judge the text file for script file.
Optionally, described device further includes:
File Direct Recognition module, the step of the file class for determining file destination in the file class determining module
Before rapid, determine whether the file destination includes file suffixes name, if it is, directly according to the file of the file destination
Suffix name determines the file identification result of the file destination.
The third aspect, the invention discloses a kind of file identification equipment, including processor and memory;Wherein, the place
Reason device realizes aforementioned disclosed file identification method when executing the computer program preserved in the memory.
Fourth aspect, the invention discloses a kind of computer readable storage mediums, for storing computer program;Wherein,
The computer program realizes aforementioned disclosed file identification method when being executed by processor.
As it can be seen that the present invention is first to determine the file class of file, it, will be according to two when file class is binary file
The feature string of binary file come determine file identification as a result, when file class be text file when, will be according to text file
Corresponding keyword and/or canonical sentence determine file identification as a result, from the foregoing, it will be observed that the invention discloses one kind to be not necessarily based on
File suffixes name carries out the technical solution of file identification, is modified to the file or suffix name of no suffix name it is possible thereby to realize
File afterwards carries out file identification, to improve the discrimination of file.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of file identification method flow chart disclosed by the embodiments of the present invention;
Fig. 2 is a kind of specific file identification method flow chart disclosed by the embodiments of the present invention;
Fig. 3 is a kind of file identification method sub-process figure disclosed by the embodiments of the present invention;
Fig. 4 is a kind of file identification method sub-process figure disclosed by the embodiments of the present invention;
Fig. 5 is a kind of file identification device structural schematic diagram disclosed by the embodiments of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Shown in Figure 1 the embodiment of the invention discloses a kind of file identification method, this method includes:
Step S11:Determine the file class of file destination.
It should be pointed out that file destination can be the file for having suffix name in the present embodiment, can also be no suffix
The file of name.For the file that no suffix name or suffix name are tampered, the preferential file identification side using in the present embodiment
Method carries out the identification of file.
In addition, determining the process of the file class of file destination in the present embodiment, can specifically include:Read file destination
In file content, the file class of file destination is then determined using above-mentioned file content.Wherein, in above-mentioned file destination
File content can be specifically partial file content in file destination, such as can be randomly selected from file destination
The file content of default file content-length.
Further, file class specifically includes two kinds in file destination in the present embodiment, and one is binary files, separately
It is a kind of then be text file.The present embodiment can determine that the file class of file destination is by the file content of file destination
Binary file or text file.
In the present embodiment, the binary file can specifically include PE files (PE, i.e. Portable
Executable), compound document or compressed file, that is, the file type of binary file can be specifically in the present embodiment
PE files, compound document or compressed file.In addition, the text file can specifically include programming file or script file,
That is, the file type of text file can be specifically programming file or script file in the present embodiment.
In order to promote recognition rate, in the case where the suffix name of file is not tampered with, the present embodiment can be described
Before the step of determining the file class of file destination, further comprise:Determine whether the file destination includes file suffixes
Name;If it is, directly determining the file identification result of the file destination according to the file suffixes name of the file destination.
Step S12:If the file class of the file destination is binary file, search and the binary file pair
The feature string answered, and determine according to lookup result the file identification result of the binary file.
That is, the present embodiment in the case where file destination is binary file, is searched corresponding with the binary file
Then feature string determines the file identification result of binary file according to the feature string found.It is appreciated that
, the file identification result of above-mentioned binary file can specifically include the recognition result and/or file type of file format
Recognition result.
Step S13:If the file class of the file destination is text file, search is corresponding with the text file
Keyword and/or canonical sentence, and determine according to search result the file identification result of the text file.
That is, the present embodiment in the case where file destination is text file, searches for key corresponding with this article this document
Then word and/or canonical sentence determine the identification knot of this article this document according to the keyword and/or canonical sentence that search
Fruit.It is understood that the file identification result of above-mentioned text file can specifically include file format recognition result and/or
The recognition result of file type.
As it can be seen that the embodiment of the present invention is first to determine the file class of file, it, will when file class is binary file
Determine file identification according to the feature string of binary file as a result, when file class be text file when, will be according to text
The corresponding keyword of this document and/or canonical sentence come determine file identification as a result, from the foregoing, it will be observed that the embodiment of the invention discloses
It is a kind of to be not necessarily based on file suffixes name to carry out the technical solution of file identification, it is possible thereby to realize to the file of no suffix name or
File after suffix name is modified carries out file identification, to improve the discrimination of file.
On the basis of previous embodiment, the embodiment of the invention discloses a kind of specific file identification modes, referring to Fig. 2
Shown, this method includes:
Step S21:Determine the file class of file destination.
Wherein, corresponding contents disclosed in previous embodiment can be referred to about the detailed process of above-mentioned steps S21, herein
No longer repeated.
Step S22:If the file class of the file destination is binary file, the text of the binary file is searched
Part head feature, and judge whether the binary file is PE files according to current lookup result.
Wherein, the file header feature can specifically include DOS features and NT features, be based on above-mentioned DOS feature
With NT features, it can identify whether binary file is PE files, and be to belong to which type of PE files.Wherein,
The file type of PE files can specifically include but be not limited to command file, dll file, sys file, EXE files, LE files and
NE files.
Step S23:If it is not, then using the mapping table between the feature string and offset of preset compound document,
Feature string corresponding with the binary file is searched, and whether the binary file is judged according to current lookup result
For compound document.
That is, the present embodiment can further utilize preset compound in the case where binary file is not PE files
Mapping table between the feature string and offset of document searches feature string corresponding with binary file, wherein multiple
The file type for closing document can specifically include but be not limited to WPS documents, Visio documents, Chm documents, Caj documents and PDF texts
Shelves.
For example, for WPS documents, corresponding feature string include " WordDocument " and " WPS Office " its
In, the corresponding characteristic value of feature string " WordDocument " is specially:57 00 6F 00 72 00 64 00 44 00
6F 00 63 00 75 00 6D 00 65 00 6E 00 74;Feature string " WPS Office " corresponding characteristic value is specific
For:57 00 50 00 52 00 20 00 4F 00 66 00 69 00 63 00 65 00.Utilize the spy of above-mentioned WPS documents
Levy the mapping relations between character string and the offset of feature string, it may be determined that whether include above-mentioned spy in file destination
Character string is levied, if it is, can be determined that file destination is WPS documents.
For Visio documents, corresponding feature string is specially " Visio Document ", with this feature character string
Corresponding characteristic value is specially:56 00 69 00 73 00 69 00 6F 00 44 00 6F 00 63 00 75 00 6D
00 65 00 6E 00 74 00.Using between the feature string of above-mentioned Visio documents and the offset of feature string
Mapping relations, it may be determined that in file destination whether include features described above character string, if it is, can be determined that file destination
For Visio documents.
For Chm documents, corresponding feature string specifically includes " ITSF ", " ITSP " and " PMGL ", wherein feature
The corresponding characteristic value of character string " ITSF " is specially:49 54 53 46;The corresponding characteristic value of feature string " ITSP " is specific
For:46 54 53 01;The corresponding characteristic value of feature string " PMGL " is specially:50 4D 47 4C.Utilize above-mentioned Chm documents
Feature string and feature string offset between mapping relations, it may be determined that whether comprising upper in file destination
Feature string is stated, if it is, can be determined that file destination is Chm documents.
For Caj documents, corresponding feature string is specially " CAJ ", characteristic value corresponding with this feature character string
Specially:43 41 4A.It is closed using the mapping between the feature string of above-mentioned Caj documents and the offset of feature string
System, it may be determined that whether include features described above character string in file destination, if it is, can be determined that file destination is Caj texts
Shelves.
For PDF document, corresponding feature string is specially " %PDF-1. ", spy corresponding with this feature character string
Value indicative is specially:25 50 44 46 2D 31 2E.Using above-mentioned PDF document feature string and feature string it is inclined
Mapping relations between shifting amount, it may be determined that whether include features described above character string in file destination, if it is, can be determined that
File destination is PDF document.
As seen from the above, the present embodiment can be identified based on the feature string found in above-mentioned steps S23
State whether binary file is compound document, and is to belong to which type of compound document.
Step S24:If it is not, then using the mapping table between the feature string and offset of preset compressed file,
Feature string corresponding with the binary file is searched, and whether the binary file is judged according to current lookup result
For compressed file.
That is, the present embodiment can further be utilized in the case where binary file is not PE files and compound document
Mapping table between the feature string and offset of preset compressed file searches characteristic character corresponding with binary file
String.Wherein, the file type of compressed file can specifically include but be not limited to zip file, wim files, 7z files, tar files
With Rar files.
For example, for zip file, corresponding feature string is " PK ", characteristic value corresponding with this feature character string
Specially 50 4B.Using the mapping relations between the feature string of above-mentioned zip file and the offset of feature string,
It can determine in file destination whether include features described above character string, if it is, can be determined that file destination is zip file.
For wim files, corresponding feature string is " MSWIM ", characteristic value tool corresponding with this feature character string
Body is 53 57 49 4D of 4D.Utilize reflecting between the feature string of above-mentioned wim files and the offset of feature string
Penetrate relationship, it may be determined that whether include features described above character string in file destination, if it is, can be determined that file destination is
Wim files.
For 7z files, corresponding feature string is " 7z.. ' ", and characteristic value corresponding with this feature character string is specific
For 37 7A BC AF 27.It is closed using the mapping between the feature string of above-mentioned 7z files and the offset of feature string
System, it may be determined that whether include features described above character string in file destination, if it is, can be determined that file destination is 7z texts
Part.
For tar files, corresponding feature string is " .ustar.00 ", feature corresponding with this feature character string
Value is specially 00 75 73 74 61 72 00 30 30.Utilize the feature string and feature string of above-mentioned tar files
Offset between mapping relations, it may be determined that whether include features described above character string in file destination, if it is, can be with
Judge that file destination is tar files.
For Rar files, corresponding feature string is " Rar!... ..s..... ", it is corresponding with this feature character string
Characteristic value be specially 52 61 72 21 1A, 07 00CF, 90 73 00 0D.Utilize the feature string of above-mentioned Rar files
And the mapping relations between the offset of feature string, it may be determined that whether include features described above character in file destination
String, if it is, can be determined that file destination is Rar files.
As seen from the above, the present embodiment can be identified based on the feature string found in above-mentioned steps S24
State whether binary file is compressed file, and is to belong to which type of compressed file.
Content described in above-mentioned steps S22 to S24 can be seen that the present embodiment to binary file into style of writing
During part identifies, the identification of PE files is carried out before this, then carries out the identification of compound document again, then just carry out compression text
The identification of part.Implement it should be pointed out that the above-mentioned file identification process for binary file expansion is that one kind is specific
The file identification sequencing of mode, binary file can be specifically adjusted flexibly according to actual application, for example,
The present embodiment can also first carry out the identification of compressed file, then carry out the identification of PE files, finally carry out compound document again
Identification then carries out the identification of compressed file alternatively, the present embodiment can also first carry out the identification of compound document, finally again into
The identification of row PE files.
Step S25:If the file class of the file destination is text file, the key of preset programming file is utilized
Word and/or canonical sentence, and/or using the keyword and/or canonical sentence of preset script file, search is literary with the text
The corresponding keyword of part and/or canonical sentence, and determine according to search result the file identification result of the text file.
In the present embodiment, the file type of text file can specifically include programming file and script file.Wherein, it programs
File can specifically include but be not limited to Java programming files and C/C++ programming files.Script file can specifically include but
It is not limited to PHP files, jsp file, ASPX files and ASP files.
File is programmed for Java, keyword includes:"package"、"import"、"public class"、"
Extends " with " implements ", canonical sentence include:import[\s]*\bjava[(\w+)*.(\w+)+]*.Pass through
It searches in file destination and whether contains above-mentioned keyword and canonical sentence, it may be determined that go out whether the file destination is Java programmings
File.
File is programmed for C/C++, keyword includes:"#include"、"#define"、"public"、"
Private ", " struct " and " class ".By searching in file destination whether contain above-mentioned keyword, it may be determined that go out this
Whether file destination is C/C++ programming files.
For PHP files, keyword includes:"<"、"<php"、"$"、"function"、"array"、"
Isset ", " eval " and ">", canonical sentence includes:var\s+\$(\w+)+(\s)=(s)(\w)+.By searching for mesh
It marks in file and whether contains above-mentioned keyword and canonical sentence, it may be determined that go out whether the file destination is PHP files.
For jsp file, keyword includes:"<script"、"javascript"、"function"、"var"、"
document."、"</script>" with " jsp ", canonical sentence include:<%@page [(w+) (s)] +=and
bString(\s)+(\w+)\b.By searching in file destination whether contain above-mentioned keyword and canonical sentence, it may be determined that go out
Whether the file destination is jsp file.
For ASPX files, keyword includes:"<%@", " namespace ", " system. ", "<asp:"、"
Response ", "@", "@renderPage " and "</asp:".By searching in file destination whether contain above-mentioned keyword,
It can determine whether the file destination is ASPX files.
For ASP files, keyword includes:"<% ", " vbscript ", " option ", " explicit ", " dim ", "
Sub ", " end " and " response ".By searching in file destination whether contain above-mentioned keyword, it may be determined that go out the target
Whether file is ASP files.
It is shown in Figure 3, using the keyword and/or canonical sentence of preset programming file, search and text text
The corresponding keyword of part and/or canonical sentence, and determine according to search result the step of the file identification result of the text file
Suddenly, it can specifically include:
Step S31:Using the keyword and/or canonical sentence of preset programming file, search and the text file pair
The keyword and canonical sentence answered.
It is understood that preset programming file is specifically as follows Java programming files or C/C++ in above-mentioned steps S31
Program file.
Step S32:The corresponding practical discrimination of first file is determined according to current search result.
Wherein, the above-mentioned the step of practical discrimination of corresponding first file is determined according to current search result, specifically may be used
To include:Hit in statistics current search result with the reality of the preset programming file corresponding keyword and canonical sentence
Border quantity, then by the total quantity of the actual quantity divided by the keyword and canonical sentence of the preset programming file, thus
Obtain in current search result with the hit rate of the preset programming file corresponding keyword and canonical sentence, the present embodiment
By the hit rate as the practical discrimination of the first file.
Step S33:Judge whether the practical discrimination of the first file is more than the first predetermined threshold value, if it is, judgement
The text file is programming file.
That is, when the practical discrimination of the first file obtained in step S32 is more than first predetermined threshold value, then can sentence
It is specifically the preset programming file in step S31 to determine text file.For example, it is assumed that preset C/C++ programmings text
The keyword of part includes:" #include ", " #define ", " public ", " private ", " struct " and " class ", and
Assuming that corresponding first predetermined threshold value be 80%, then if searched from some file " #include ", " public ", "
Private ", " struct " and " class ", then the keyword corresponding with C/C++ programming files of this document can be calculated
Hit rate is 83.33%, since the hit rate is more than above-mentioned first predetermined threshold value 80%, it is possible to judge that this document is specifically
C/C++ programs file.
It is shown in Figure 4, using the keyword and/or canonical sentence of preset script file, search and text text
The corresponding keyword of part and/or canonical sentence, and determine according to search result the step of the file identification result of the text file
Suddenly, it can specifically include:
Step S41:Using the keyword and/or canonical sentence of preset script file, search and the text file pair
The keyword and canonical sentence answered.
It is understood that in above-mentioned steps S41 preset script file be specifically as follows PHP files, jsp file,
ASPX files or ASP files.
Step S42:The corresponding practical discrimination of second file is determined according to current search result.
Wherein, the above-mentioned the step of practical discrimination of corresponding second file is determined according to current search result, specifically may be used
To include:The reality of corresponding with the preset script file keyword and canonical sentence that are hit in statistics current search result
Border quantity, then by the total quantity of the actual quantity divided by the keyword and canonical sentence of the preset script file, thus
Obtain the hit rate of corresponding with the preset script file keyword and canonical sentence in current search result, the present embodiment
By the hit rate as the practical discrimination of the second file.
Step S43:Judge whether the described second practical discrimination is more than the second predetermined threshold value, if it is, described in judgement
Text file is script file.
That is, when the practical discrimination of the second file obtained in step S42 is more than second predetermined threshold value, then can sentence
It is specifically the preset script file in step S41 to determine text file.
It is understood that above-mentioned first predetermined threshold value and the second predetermined threshold value can carry out according to the actual application
Setting, herein without specifically limiting.
In the present embodiment, if after above-mentioned file identification process, still None- identified goes out the file format of file
Or file type, then this document can be classified as to unknown file, the unknown file can be subsequently sent to preset unknown
In file collecting unit, file manager user can check all unknown files by the unknown file collector unit, with
Just file manager user carries out the document manipulations such as manual identified to these unknown files.
Correspondingly, the embodiment of the invention also discloses a kind of file identification device, shown in Figure 5, which includes:
File class determining module 11, the file class for determining file destination;
Binary file identification module 12 is binary file for the file class when the file destination, then searches
Feature string corresponding with the binary file, and determine according to lookup result the file identification knot of the binary file
Fruit;
Text file identification module 13 is text file for the file class when the file destination, then search and institute
The corresponding keyword of text file and/or canonical sentence are stated, and determines the file identification of the text file according to search result
As a result.
Specifically, the binary file includes but not limited to PE files, compound document or compressed file.
In the present embodiment, the binary file identification module may include:
First judging unit, the file header feature for searching the binary file, and sentenced according to current lookup result
Whether the binary file of breaking is PE files;
Second judgment unit then utilizes preset compound text for being no when the judging result of first judging unit
Mapping table between the feature string and offset of shelves searches feature string corresponding with the binary file, and root
Judge whether the binary file is compound document according to current lookup result;
Third judging unit then utilizes preset compression text for being no when the judging result of the second judgment unit
Mapping table between the feature string and offset of part searches feature string corresponding with the binary file, and root
Judge whether the binary file is compressed file according to current lookup result.
Further, the file header feature includes but not limited to DOS features and NT features.
In the present embodiment, the text file includes but not limited to program file or script file.
In a kind of specific embodiment, the text file identification module may include:
First search unit, for the keyword and/or canonical sentence using preset programming file, search and the text
The corresponding keyword of this document and canonical sentence;
First determination unit, for determining the corresponding practical discrimination of first file according to current search result;
4th judging unit, for judging whether the practical discrimination of the first file is more than the first predetermined threshold value, if
It is then to judge the text file for programming file.
In another embodiment specific implementation mode, the text file identification module may include:
First search unit, for the keyword and/or canonical sentence using preset script file, search and the text
The corresponding keyword of this document and canonical sentence;
First determination unit, for determining the corresponding practical discrimination of second file according to current search result;
5th judging unit, for judging whether the described second practical discrimination is more than the second predetermined threshold value, if it is,
Judge the text file for script file.
Further, the file identification device further includes:
File Direct Recognition module, the step of the file class for determining file destination in the file class determining module
Before rapid, determine whether the file destination includes file suffixes name, if it is, directly according to the file of the file destination
Suffix name determines the file identification result of the file destination.
Correspondingly, the invention also discloses a kind of file identification equipment, including processor and memory;Wherein, the place
Reason device realizes file identification method disclosed in previous embodiment when executing the computer program preserved in the memory.About
The specific steps of above-mentioned file identification method can refer to corresponding contents disclosed in previous embodiment, no longer go to live in the household of one's in-laws on getting married herein
It states.
Further, the invention also discloses a kind of computer readable storage mediums, for storing computer program;Its
In, file identification method disclosed in previous embodiment is realized when the computer program is executed by processor.About above-mentioned text
The specific steps of part recognition methods can refer to corresponding contents disclosed in previous embodiment, no longer be repeated herein.
Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with it is other
The difference of embodiment, just to refer each other for same or similar part between each embodiment.For being filled disclosed in embodiment
For setting, since it is corresponded to the methods disclosed in the examples, so description is fairly simple, related place is referring to method part
Explanation.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, depends on the specific application and design constraint of technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor
The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment including a series of elements includes not only that
A little elements, but also include other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Detailed Jie has been carried out to a kind of file identification method provided by the present invention, device, equipment and storage medium above
It continues, principle and implementation of the present invention are described for specific case used herein, and the explanation of above example is only
It is the method and its core concept for being used to help understand the present invention;Meanwhile for those of ordinary skill in the art, according to this hair
Bright thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not manage
Solution is limitation of the present invention.
Claims (18)
1. a kind of file identification method, which is characterized in that including:
Determine the file class of file destination;
If the file class of the file destination is binary file, characteristic character corresponding with the binary file is searched
It goes here and there, and determines the file identification result of the binary file according to lookup result;
If the file class of the file destination is text file, search for corresponding with text file keyword and/or
Canonical sentence, and determine according to search result the file identification result of the text file.
2. file identification method according to claim 1, which is characterized in that the binary file includes PE files, answers
Close document or compressed file.
3. file identification method according to claim 2, which is characterized in that the lookup is corresponding with the binary file
Feature string, and the step of determining according to lookup result the file identification result of the binary file, including:
Search the file header feature of the binary file, and according to current lookup result judge the binary file whether be
PE files;
If it is not, then using the mapping table between the feature string and offset of preset compound document, search and described two
The corresponding feature string of binary file, and judge whether the binary file is compound document according to current lookup result;
If it is not, then using the mapping table between the feature string and offset of preset compressed file, search and described two
The corresponding feature string of binary file, and judge whether the binary file is compressed file according to current lookup result.
4. file identification method according to claim 3, which is characterized in that the file header feature includes DOS features
With NT features.
5. file identification method according to claim 1, which is characterized in that the text file includes programming file or foot
This document.
6. file identification method according to claim 5, which is characterized in that described search is corresponding with the text file
Keyword and/or canonical sentence, and the step of determining according to search result the file identification result of the text file, including:
Using the keyword and/or canonical sentence of preset programming file, search for corresponding with text file keyword and
Canonical sentence;
The corresponding practical discrimination of first file is determined according to current search result;
Judge whether the practical discrimination of the first file is more than the first predetermined threshold value, if it is, judging the text file
To program file.
7. file identification method according to claim 5, which is characterized in that described search is corresponding with the text file
Keyword and/or canonical sentence, and the step of determining according to search result the file identification result of the text file, including:
Using the keyword and/or canonical sentence of preset script file, search for keyword corresponding with the text file and
Canonical sentence;
The corresponding practical discrimination of second file is determined according to current search result;
Judge whether the described second practical discrimination is more than the second predetermined threshold value, if it is, judging the text file for foot
This document.
8. file identification method according to any one of claims 1 to 7, which is characterized in that the determining file destination
Before the step of file class, further include:
Determine whether the file destination includes file suffixes name;
If it is, directly determining the file identification result of the file destination according to the file suffixes name of the file destination.
9. a kind of file identification device, which is characterized in that including:
File class determining module, the file class for determining file destination;
Binary file identification module, for when the file destination file class be binary file, then search with it is described
The corresponding feature string of binary file, and determine according to lookup result the file identification result of the binary file;
Text file identification module, for being text file, then search and the text when the file class of the file destination
The corresponding keyword of file and/or canonical sentence, and determine according to search result the file identification result of the text file.
10. file identification device according to claim 9, which is characterized in that the binary file includes PE files, answers
Close document or compressed file.
11. file identification device according to claim 10, which is characterized in that the binary file identification module, packet
It includes:
First judging unit, the file header feature for searching the binary file, and institute is judged according to current lookup result
State whether binary file is PE files;
Second judgment unit then utilizes preset compound document for being no when the judging result of first judging unit
Mapping table between feature string and offset, searches corresponding with binary file feature string, and according to working as
Preceding lookup result judges whether the binary file is compound document;
Third judging unit then utilizes preset compressed file for being no when the judging result of the second judgment unit
Mapping table between feature string and offset, searches corresponding with binary file feature string, and according to working as
Preceding lookup result judges whether the binary file is compressed file.
12. file identification device according to claim 11, which is characterized in that the file header feature includes DOS spies
It seeks peace NT features.
13. file identification device according to claim 9, which is characterized in that the text file include programming file or
Script file.
14. file identification device according to claim 13, which is characterized in that the text file identification module, including:
First search unit, for the keyword and/or canonical sentence using preset programming file, search and text text
The corresponding keyword of part and canonical sentence;
First determination unit, for determining the corresponding practical discrimination of first file according to current search result;
4th judging unit, for judging whether the practical discrimination of the first file is more than the first predetermined threshold value, if it is,
The text file is judged to program file.
15. file identification device according to claim 13, which is characterized in that the text file identification module, including:
First search unit, for the keyword and/or canonical sentence using preset script file, search and text text
The corresponding keyword of part and canonical sentence;
First determination unit, for determining the corresponding practical discrimination of second file according to current search result;
5th judging unit, for judging whether the described second practical discrimination is more than the second predetermined threshold value, if it is, judgement
The text file is script file.
16. according to claim 9 to 14 any one of them file identification device, which is characterized in that further include:
File Direct Recognition module, for the step of the file class determining module determines the file class of file destination it
Before, determine whether the file destination includes file suffixes name, if it is, directly according to the file suffixes of the file destination
Name determines the file identification result of the file destination.
17. a kind of file identification equipment, which is characterized in that including processor and memory;Wherein, described in the processor executes
Such as claim 1 to 8 any one of them file identification method is realized when the computer program preserved in memory.
18. a kind of computer readable storage medium, which is characterized in that for storing computer program;Wherein, the computer journey
Such as claim 1 to 8 any one of them file identification method is realized when sequence is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810265755.2A CN108460155A (en) | 2018-03-28 | 2018-03-28 | A kind of file identification method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810265755.2A CN108460155A (en) | 2018-03-28 | 2018-03-28 | A kind of file identification method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108460155A true CN108460155A (en) | 2018-08-28 |
Family
ID=63238082
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810265755.2A Pending CN108460155A (en) | 2018-03-28 | 2018-03-28 | A kind of file identification method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108460155A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134644A (en) * | 2019-05-17 | 2019-08-16 | 成都卫士通信息产业股份有限公司 | File type identification method, device, electronic equipment and readable storage medium storing program for executing |
CN110502486A (en) * | 2019-08-21 | 2019-11-26 | 中国工商银行股份有限公司 | Log processing method, device, electronic equipment and computer readable storage medium |
CN110825701A (en) * | 2019-11-07 | 2020-02-21 | 深信服科技股份有限公司 | File type determination method and device, electronic equipment and readable storage medium |
CN111159709A (en) * | 2019-12-27 | 2020-05-15 | 深信服科技股份有限公司 | File type identification method, device, equipment and storage medium |
CN111352907A (en) * | 2020-03-30 | 2020-06-30 | 见知数据科技(上海)有限公司 | Method and device for analyzing pipeline file, computer equipment and storage medium |
CN113111147A (en) * | 2020-01-13 | 2021-07-13 | 深信服科技股份有限公司 | Text type identification method and device, electronic equipment and storage medium |
CN113742002A (en) * | 2021-09-10 | 2021-12-03 | 上海达梦数据库有限公司 | Method, device, equipment and storage medium for acquiring dependency relationship of dynamic library |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7680850B2 (en) * | 2006-03-31 | 2010-03-16 | Fujitsu Limited | Computer-readable recording medium storing information search program, information search method, and information search system |
CN102902768A (en) * | 2012-09-24 | 2013-01-30 | 广东威创视讯科技股份有限公司 | Method and system for searching and displaying file content |
CN103701821A (en) * | 2013-12-31 | 2014-04-02 | 北京网康科技有限公司 | File type recognition method and device |
CN104679871A (en) * | 2015-03-06 | 2015-06-03 | 北京语言大学 | Chinese text searching method and Chinese text searching device |
CN105975575A (en) * | 2016-05-04 | 2016-09-28 | 电子科技大学 | Automatic data type recognition method |
-
2018
- 2018-03-28 CN CN201810265755.2A patent/CN108460155A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7680850B2 (en) * | 2006-03-31 | 2010-03-16 | Fujitsu Limited | Computer-readable recording medium storing information search program, information search method, and information search system |
CN102902768A (en) * | 2012-09-24 | 2013-01-30 | 广东威创视讯科技股份有限公司 | Method and system for searching and displaying file content |
CN103701821A (en) * | 2013-12-31 | 2014-04-02 | 北京网康科技有限公司 | File type recognition method and device |
CN104679871A (en) * | 2015-03-06 | 2015-06-03 | 北京语言大学 | Chinese text searching method and Chinese text searching device |
CN105975575A (en) * | 2016-05-04 | 2016-09-28 | 电子科技大学 | Automatic data type recognition method |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134644A (en) * | 2019-05-17 | 2019-08-16 | 成都卫士通信息产业股份有限公司 | File type identification method, device, electronic equipment and readable storage medium storing program for executing |
CN110502486A (en) * | 2019-08-21 | 2019-11-26 | 中国工商银行股份有限公司 | Log processing method, device, electronic equipment and computer readable storage medium |
CN110502486B (en) * | 2019-08-21 | 2022-01-11 | 中国工商银行股份有限公司 | Log processing method and device, electronic equipment and computer readable storage medium |
CN110825701A (en) * | 2019-11-07 | 2020-02-21 | 深信服科技股份有限公司 | File type determination method and device, electronic equipment and readable storage medium |
CN111159709A (en) * | 2019-12-27 | 2020-05-15 | 深信服科技股份有限公司 | File type identification method, device, equipment and storage medium |
CN113111147A (en) * | 2020-01-13 | 2021-07-13 | 深信服科技股份有限公司 | Text type identification method and device, electronic equipment and storage medium |
CN111352907A (en) * | 2020-03-30 | 2020-06-30 | 见知数据科技(上海)有限公司 | Method and device for analyzing pipeline file, computer equipment and storage medium |
CN113742002A (en) * | 2021-09-10 | 2021-12-03 | 上海达梦数据库有限公司 | Method, device, equipment and storage medium for acquiring dependency relationship of dynamic library |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108460155A (en) | A kind of file identification method, device, equipment and storage medium | |
US5935210A (en) | Mapping the structure of a collection of computer resources | |
JP2004355614A5 (en) | ||
RU2005112058A (en) | ESTABLISHING REQUEST FOR REQUEST AND RECORD | |
CN108829829A (en) | Detect method, system, device and storage medium that ideal money digs mine program | |
CN100524302C (en) | File management in a computing device | |
AU2005209584A1 (en) | System and method for determining target failback and target priority for a distributed file system | |
WO2005060484A3 (en) | Generic token-based authentication system | |
KR20060045659A (en) | Method and system for renaming consecutive keys in a b-tree | |
CA2516741A1 (en) | Additional hash functions in content-based addressing | |
NO20065092L (en) | System and method for dynamically generating a selectable sock version | |
WO2005069783A3 (en) | Methods and apparatus for searching backup data based on content and attributes | |
Block et al. | Linux memory forensics: Dissecting the user space process heap | |
CN108399124A (en) | Application testing method, device, computer equipment and storage medium | |
CN108363923A (en) | A kind of blackmailer's virus defense method, system and equipment | |
JP2008146601A5 (en) | ||
JP2008287533A5 (en) | ||
CN109388943A (en) | A kind of method, apparatus and computer readable storage medium identifying XSS attack | |
CN104346102B (en) | A kind of data auto-deleted method based on DICOM | |
CN107066592A (en) | A kind of file defragmentation method and system for file system | |
CN108073808A (en) | Method and system based on pdb Debugging message generation attacker's portrait | |
CN109977075A (en) | A kind of file store path acquisition methods and device | |
CN108628871A (en) | A kind of link De-weight method based on chain feature | |
CN112422581B (en) | Webshell webpage detection method, device and equipment in JVM (Java virtual machine) | |
CN108959401A (en) | A kind of method for monitoring and analyzing, system, server and storage medium that information is propagated |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180828 |
|
RJ01 | Rejection of invention patent application after publication |