CN105138918A - Recognition method and device for secure file - Google Patents

Recognition method and device for secure file Download PDF

Info

Publication number
CN105138918A
CN105138918A CN201510553020.6A CN201510553020A CN105138918A CN 105138918 A CN105138918 A CN 105138918A CN 201510553020 A CN201510553020 A CN 201510553020A CN 105138918 A CN105138918 A CN 105138918A
Authority
CN
China
Prior art keywords
file
identified
code block
safe
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510553020.6A
Other languages
Chinese (zh)
Other versions
CN105138918B (en
Inventor
党伟
郭根
邹荣新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510553020.6A priority Critical patent/CN105138918B/en
Publication of CN105138918A publication Critical patent/CN105138918A/en
Application granted granted Critical
Publication of CN105138918B publication Critical patent/CN105138918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Abstract

The embodiment of the invention provides a recognition method and device for a secure file. The recognition method comprises the steps that firstly, a known secure file matched with a file to be recognized is obtained according to the file to be recognized; secondly, the similarity of the file to be recognized and the known secure file is obtained; thirdly, whether the file to be recognized is the secure file is recognized according to the similarity of the file to be recognized and the known secure file. Accordingly, the recognition method and device for the secure file are used for solving the problem that in the prior art, the recognition rate of the secure file is low.

Description

A kind of recognition methods of secure file and device
[technical field]
The present invention relates to technical field of network security, particularly relate to a kind of recognition methods and device of secure file.
[background technology]
The developing rapidly in recent years along with internet and the communication technology, various being applied in constantly occurs, and iteration upgrades fast, all can produce the associated documents of a lot of application every day.Therefore, security system can capture a lot of unknown file, and in these unknown file, existing secure file, also has malicious file, therefore needs to identify secure file from numerous unknown file.
In prior art, be the data signature utilizing unknown file, search in the white list preset.If this unknown file belongs to white list, then identifying this unknown file is secure file, otherwise then to identify this unknown file be non-secure file.But, the newly-increased speed of application and renewal speed are quickly, associated documents are also occurred very fast and substantial amounts, therefore, utilize in prior art based on the data signature of unknown file and the mode of white list, the recognition technology of the secure file realized cannot meet the identification demand of current secure file, thus cause the discrimination of secure file lower.
[summary of the invention]
In view of this, embodiments provide a kind of recognition methods and device of secure file, in order to solve the problem that in prior art, the discrimination of secure file is lower.
The one side of the embodiment of the present invention, provides a kind of recognition methods of secure file, comprising:
According to file to be identified, obtain the known safe file matched with described file to be identified;
Obtain the similarity of described file to be identified and described known safe file;
According to the similarity of described file to be identified and described known safe file, identify whether described file to be identified is secure file.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described according to file to be identified, obtains the known safe file matched with described file to be identified, comprising:
According to the details of described file to be identified, mate in a database, to obtain the known safe file matched with described file to be identified; Wherein, described database comprises the details of known safe file.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the similarity of the described file to be identified of described acquisition and described known safe file, comprising:
Obtain the code block fingerprint of described file to be identified;
According to the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file, obtain the similarity of described file to be identified and described known safe file.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the code block fingerprint of the described file to be identified of described acquisition, comprising:
Decompiling process is carried out to described file to be identified, to obtain the decompiling code of described file to be identified;
Obtain at least one code block that described decompiling code comprises;
Obtain the finger print information of each described code block;
According to the finger print information of each described code block, obtain the code block fingerprint of described file to be identified.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the finger print information of each described code block comprises:
The order structure feature of each described code block; And,
According to the cryptographic hash that part instructs in each described code block obtains.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the similarity of the described file to be identified of described acquisition and described known safe file, comprising:
Following formula is utilized to obtain the similarity of described file to be identified and described known safe file:
C = F ( A ∩ B ) F ( A ∪ B ) × 100
In this formula, C represents the similarity of described file to be identified and described known safe file; Described A represents the code block fingerprint of described file to be identified; Described B represents the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum of the step-length of code block fingerprint in the common factor of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum also concentrating the step-length of code block fingerprint of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the described similarity according to described file to be identified and described known safe file, identifies whether described file to be identified is secure file, comprising:
If the similarity of described file to be identified and described known safe file is greater than default similar threshold value, identifying described file to be identified is secure file.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described method also comprises:
If identifying described file to be identified is secure file, add described file to be identified to described database.
The one side of the embodiment of the present invention, provides a kind of recognition device of secure file, comprising:
Ff unit, for according to file to be identified, obtains the known safe file matched with described file to be identified;
Similar statistics unit, for obtaining the similarity of described file to be identified and described known safe file;
File identification unit, for the similarity according to described file to be identified and described known safe file, identifies whether described file to be identified is secure file.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described ff unit, specifically for:
According to the details of described file to be identified, mate in a database, to obtain the known safe file matched with described file to be identified; Wherein, described database comprises the details of known safe file.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described similar statistics unit, specifically for:
Obtain the code block fingerprint of described file to be identified;
According to the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file, obtain the similarity of described file to be identified and described known safe file.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, when described similar statistics unit is for obtaining the code block fingerprint of described file to be identified, specifically for:
Decompiling process is carried out to described file to be identified, to obtain the decompiling code of described file to be identified;
Obtain at least one code block that described decompiling code comprises;
Obtain the finger print information of each described code block;
According to the finger print information of each described code block, obtain the code block fingerprint of described file to be identified.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the finger print information of each described code block comprises:
The order structure feature of each described code block; And,
According to the cryptographic hash that part instructs in each described code block obtains.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, when described similar statistics unit is for obtaining the similarity of described file to be identified and described known safe file, specifically for:
Following formula is utilized to obtain the similarity of described file to be identified and described known safe file:
C=(F(A∩B))/(F(A∪B))×100
In this formula, C represents the similarity of described file to be identified and described known safe file; Described A represents the code block fingerprint of described file to be identified; Described B represents the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum of the step-length of code block fingerprint in the common factor of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum also concentrating the step-length of code block fingerprint of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described file identification unit, specifically for:
If the similarity of described file to be identified and described known safe file is greater than default similar threshold value, identifying described file to be identified is secure file.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described device also comprises:
File adding device, if be secure file for identifying described file to be identified, adds described file to be identified to described database.
As can be seen from the above technical solutions, the embodiment of the present invention has following beneficial effect:
In the technical scheme that the embodiment of the present invention provides, utilize the similarity of known safe file and file to be identified, identify whether file to be identified is secure file, be equivalent to the differentiation by whether belonging to source file to file to be identified and known safe file, whether identify file to be identified is secure file, therefore, it is possible to identify secure file fast.With judge in prior art whether unknown file belongs to compared with the recognition method of white list, the embodiment of the present invention can identify more secure file, meet the identification demand of current secure file, improve the discrimination of secure file, therefore the problem that in prior art, the discrimination of secure file is lower is solved, thus the unnecessary bullet window quantity in user side can be reduced, promote Consumer's Experience.
[accompanying drawing explanation]
In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, be briefly described to the accompanying drawing used required in embodiment below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the schematic flow sheet of the recognition methods of the secure file that the embodiment of the present invention provides;
Fig. 2 is the exemplary plot utilizing known safe file to identify file to be identified that the embodiment of the present invention provides;
Fig. 3 is the functional block diagram of the embodiment one of the recognition device of the secure file that the embodiment of the present invention provides;
Fig. 4 is the functional block diagram of the embodiment two of the recognition device of the secure file that the embodiment of the present invention provides.
[embodiment]
Technical scheme for a better understanding of the present invention, is described in detail the embodiment of the present invention below in conjunction with accompanying drawing.
Should be clear and definite, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments all obtained under creative work prerequisite, belong to the scope of protection of the invention.
The term used in embodiments of the present invention is only for the object describing specific embodiment, and not intended to be limiting the present invention." one ", " described " and " being somebody's turn to do " of the singulative used in the embodiment of the present invention and appended claims is also intended to comprise most form, unless context clearly represents other implications.
Should be appreciated that term "and/or" used herein is only a kind of incidence relation describing affiliated partner, can there are three kinds of relations in expression, and such as, A and/or B, can represent: individualism A, exists A and B simultaneously, these three kinds of situations of individualism B.In addition, character "/" herein, general expression forward-backward correlation is to the relation liking a kind of "or".
Depend on linguistic context, word as used in this " if " can be construed as into " ... time " or " when ... time " or " in response to determining " or " in response to detection ".Similarly, depend on linguistic context, phrase " if determination " or " if detecting (the conditioned disjunction event of statement) " can be construed as " when determining " or " in response to determining " or " when detecting (the conditioned disjunction event of statement) " or " in response to detection (the conditioned disjunction event of statement) ".
The embodiment of the present invention provides a kind of recognition methods of secure file, please refer to Fig. 1, the schematic flow sheet of the recognition methods of its secure file provided for the embodiment of the present invention, and as shown in the figure, the method comprises the following steps:
S101, according to file to be identified, obtains the known safe file matched with described file to be identified.
S102, obtains the similarity of described file to be identified and described known safe file.
S103, according to the similarity of described file to be identified and described known safe file, identifies whether described file to be identified is secure file.
It should be noted that, the executive agent of S101 ~ S103 can be the recognition device of secure file, this device can be positioned at the application of local terminal, or can also for being arranged in plug-in unit or the SDK (Software Development Kit) (SoftwareDevelopmentKit of the application of local terminal, the functional unit such as SDK), or can also be positioned at server end, the embodiment of the present invention is not particularly limited this.
It should be noted that, terminal involved in the embodiment of the present invention can include but not limited to personal computer (PersonalComputer, PC), personal digital assistant (PersonalDigitalAssistant, PDA), radio hand-held equipment, panel computer (TabletComputer), mobile phone, MP3 player, MP4 player etc.
Be understandable that, described application can be mounted in the application program (nativeApp) in terminal, or can also be a web page program (webApp) of browser in terminal, and the embodiment of the present invention does not limit this.
Embodiment two
Based on the recognition methods of the secure file that above-described embodiment one provides, the embodiment of the present invention in S101 according to file to be identified, the method obtaining the known safe file matched with described file to be identified is specifically described.This step specifically can comprise:
Illustrate, in the embodiment of the present invention, according to file to be identified, the method obtaining the known safe file matched with described file to be identified can include but not limited to:
Please refer to Fig. 2, its exemplary plot utilizing known safe file to identify file to be identified provided for the embodiment of the present invention, as shown in the figure, according to the details of described file to be identified, mate in a database, to obtain the known safe file matched with described file to be identified; Wherein, described database comprises the details of known safe file.
Alternatively, in one of the embodiment of the present invention possible implementation, a database can be pre-set, the details of known safe file in this database, can be comprised.
In a concrete implementation procedure, as shown in Figure 2, the details of the known safe file comprised in described database can include but not limited to: the cryptographic hash of described known safe file, trade name, file name, FileVersion, file description information and cloud number of scans.
In like manner, the details of described file to be identified can include but not limited to: the cryptographic hash of described file to be identified, trade name, file name, FileVersion, file description information and cloud number of scans.
In a concrete implementation procedure, hash algorithm can be utilized to calculate the cryptographic hash of described known safe file.Such as, described hash algorithm can include but not limited to: Message Digest Algorithm 5 (MessageDigestAlgorithm-5, MD5), the Message Digest 5 second edition (MessageDigestAlgorithm-2, MD2), Message Digest 5 the 4th edition (MessageDigestAlgorithm-4, or Secure Hash Algorithm (SecureHashAlgorithm MD4), SHA) etc., the embodiment of the present invention is not particularly limited this.
In a concrete implementation procedure, from preset database obtain can include but not limited to the known safe file that described file to be identified matches: identical with the trade name of described file to be identified, and with the file name of described file to be identified and at least one identical known safe file in file description information.
In a concrete implementation procedure, the number of the known safe file matched with described file to be identified obtained can be at least one, thus, according to the FileVersion of wherein each known safe file, the known safe file that at least one and described file to be identified match can be sorted.Wherein, the sequence of more close with the FileVersion of described file to be identified known safe file is more forward.
Embodiment three
The recognition methods of the secure file provided based on above-described embodiment one and embodiment two, the method for the embodiment of the present invention to the similarity obtaining described file to be identified and described known safe file in S102 is specifically described.This step specifically can comprise:
In a concrete implementation procedure, the method obtaining the similarity of described file to be identified and described known safe file can include but not limited to: first, obtains the code block fingerprint of described file to be identified.Then, according to the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file, the similarity of described file to be identified and described known safe file is obtained.
Illustrate, in the embodiment of the present invention, the method obtaining the code block fingerprint of described file to be identified can include but not limited to:
Decompiling process is carried out to described file to be identified, to obtain the decompiling code of described file to be identified;
Obtain at least one code block that described decompiling code comprises;
Obtain the finger print information of each described code block;
According to the finger print information of each described code block, obtain the code block fingerprint of described file to be identified.
Be understandable that, higher level lanquage source program becomes executable file through compiling, and decompiling is exactly the inverse process of this compiling.File to be identified described in the embodiment of the present invention is a kind of executable file, therefore, utilizes decompiling process, can obtain the decompiling code of described file to be identified.Wherein, decompiler can be utilized to carry out decompiling process to described file to be identified.
In a concrete implementation procedure, the decompiling code of described file to be identified comprises a large amount of instructions, therefore, in the embodiment of the present invention, can according to discontinuous instruction in the decompiling code of described file to be identified, described decompiling code division is become code block, thus can be obtained up to a few code block.Also can be understood as: continuous print instruction needs to be divided in a code block, discontinuous instruction can as division border.
Such as, in the decompiling code of described file to be identified, continuous print instruction can include but not limited to: continuous print branch instruction or continuous print recursion instruction.Accordingly, can include but not limited to as the discontinuous instruction dividing border: discontinuous branch instruction or discontinuous recursion instruction.
Preferably, in the embodiment of the present invention, the finger print information of each described code block comprises: the order structure feature of each described code block; And, according to the cryptographic hash that part instructs in each described code block obtains.
In a concrete implementation procedure, the order structure feature of each described code block can include but not limited to: the number of nodes in code block, boundary condition quantity, forward skip quantity, negative sense redirect quantity and son call quantity.
Illustrate, the method obtaining cryptographic hash according to part instructs in each described code block can include but not limited to: first, for the instruction in each described code block, the process of immediate difference is carried out in the instruction different to wherein immediate, and the immediate of instructions different for original immediate is revised as identical immediate; And, difference process is carried out to wherein different jump instructions, originally different jump instructions is all revised as same jump instruction.Then, for the instruction in amended described code block, extract the instruction of top n step-length, then utilize hash algorithm to calculate the cryptographic hash of the instruction of this top n step-length, using as the above-mentioned cryptographic hash obtained according to part instructs in each described code block.Wherein, N is positive integer and is less than or equal to total step-length number of instruction in described code block.
Such as, immediate in instruction " cmpeax, 41h " and instruction " cmpeax, 37h " is different, the immediate of these two instructions all can be revised as " xx ", i.e. instruction " cmpeax, 41h " is revised as instruction " cmpeax, xx ", instruction " cmpeax; 37h " is also revised as instruction " cmpeax, xx ", to eliminate different immediates.
Again such as, jump instruction " jnz " belongs to different jump instructions from jump instruction " jz ", these two jump instructions all can be revised as " jmp ", i.e. jump instruction " jnz " is revised as jump instruction " jmp ", jump instruction " jz " is also revised as jump instruction " jmp ", to eliminate different jump instructions.
Such as, the form of the finger print information of each described code block is as follows:
Number of nodes in code block; Boundary condition quantity; Forward skip quantity; Negative sense redirect quantity; Son calls quantity; According to the cryptographic hash that part instructs in code block obtains; Total step-length number of code block
Obtain at least one code block for according to each file to be identified, after the finger print information obtaining each code block, the finger print information of code block each at least one code block is gathered, to obtain the code block fingerprint of this file to be identified.That is, the code block fingerprint of file to be identified is exactly the set of the finger print information of each code block.
Illustrate, in the embodiment of the present invention, the method obtaining the similarity of described file to be identified and described known safe file can include but not limited to:
Following formula is utilized to obtain the similarity of described file to be identified and described known safe file:
C = F ( A ∩ B ) F ( A ∪ B ) × 100
In this formula, C represents the similarity of described file to be identified and described known safe file; Described A represents the code block fingerprint of described file to be identified; Described B represents the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum of the step-length of code block fingerprint in the common factor of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum also concentrating the step-length of code block fingerprint of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file.
In a concrete implementation procedure, the code block fingerprint of described known safe file can obtain in database from above.
It should be noted that, owing to only having a byte difference between two files, the cryptographic hash of two files is just different, therefore, if only use the cryptographic hash obtained according to code, realize the Similarity Measure of file to be identified and known safe File, described file to be identified will be judged as be not the source file of known safe file, thus to be judged as not secure file, therefore, make the recognition capability of file to be identified lower, cause the discrimination of secure file lower.
In addition, there is different instruction architectural feature the same, but the situation that code actual logic differs greatly, therefore, if whether only adopt instruction feature structure in code to realize is the identification of secure file to file to be identified, will causes file to be identified with known safe file because order structure feature is the same, and be determined to be source file, and then file to be identified is judged to be secure file, cause the wrong identification of file to be identified.
In order to solve the problem, in the embodiment of the present invention, based on the order structure feature comprising each described code block in code block fingerprint and the cryptographic hash obtained according to part instructs in each described code block, realize the comparison of file to be identified and known safe file, compared with only using the technical scheme of the cryptographic hash obtained according to code, the embodiment of the present invention improves the recognition capability to described file to be identified, improves the discrimination of secure file.In addition, compared with only using the technical scheme of instruction architectural feature in code, the embodiment of the present invention can avoid the wrong identification of file to be identified.
Embodiment four
The recognition methods of the secure file provided based on above-described embodiment one, embodiment two and embodiment three, the embodiment of the present invention, to the similarity according to described file to be identified and described known safe file in S103, identifies that whether described file to be identified is that the method for secure file is specifically described.This step specifically can comprise:
Illustrate, in the embodiment of the present invention, according to the similarity of described file to be identified and described known safe file, identify that whether described file to be identified is that the method for secure file can comprise:
As shown in Figure 2, the similarity of described file to be identified and described known safe file is compared with the similar threshold value preset.If comparative result is the similarity of described file to be identified and described known safe file be greater than similar threshold value, then illustrate that described file to be identified and described known safe file belong to the different editions of source file, then identifying described file to be identified is secure file.
In a concrete implementation procedure, number due to the known safe file matched with described file to be identified can be at least one, and at least one known safe file sorts according to the degree of closeness of the FileVersion of known safe file and the FileVersion of file to be identified, therefore said method can be utilized to obtain the similarity of first known safe file in file to be identified and ranking results, and then judge whether file to be identified is secure file according to this similarity.If judge that file to be identified is secure file, then obtain recognition result, therefore do not continue the similarity calculating file to be identified and second known safe file.
Otherwise, if comparative result is the similarity of file to be identified and first known safe file be less than or equal to described similar threshold value, then illustrate that described file to be identified and first known safe file do not belong to the different editions of source file, then continue the similarity calculating second known safe file in file to be identified and described ranking results.Then, judge whether file to be identified is secure file according to this similarity, by that analogy, until when judging that file to be identified is secure file.
If the similarity of each known safe file is less than or equal to described similar threshold value in described file to be identified and at least one known safe file, then illustrate that described file to be identified and each known safe file do not belong to the different editions of source file, then identifying described file to be identified is not secure file.
It should be noted that, if comparative result is the similarity of described file to be identified and described known safe file be less than or equal to described similar threshold value, then can only identify described file to be identified is not secure file, illustrates that described file to be identified may be malicious file also may be unknown file.
Embodiment five
The recognition methods of the secure file provided based on above-described embodiment one, embodiment two to embodiment four, if identifying described file to be identified in S103 is secure file, the embodiment of the present invention can also comprise: if identifying described file to be identified is secure file, add described file to be identified to described database.
In the embodiment of the present invention, by adding the secure file identified to described database, constantly improving and upgrading known safe file in described database can be realized.
It should be noted that, described database is the details of known safe file due to what store, the known safe file stored can be considered as file white list in this database.Therefore, by adding database to by identifying secure file, can realize constantly improving and upgrading of file white list, the technical scheme that the embodiment of the present invention provides as empty-handed section of the receipts of white list, can be applied in various network safety system.
The embodiment of the present invention provides the device embodiment realizing each step and method in said method embodiment further.
Please refer to Fig. 3, the functional block diagram of the embodiment one of the recognition device of its secure file provided for the embodiment of the present invention.As shown in the figure, this device comprises:
Ff unit 31, for according to file to be identified, obtains the known safe file matched with described file to be identified;
Similar statistics unit 32, for obtaining the similarity of described file to be identified and described known safe file;
File identification unit 33, for the similarity according to described file to be identified and described known safe file, identifies whether described file to be identified is secure file.
In a concrete implementation procedure, described ff unit 31, specifically for:
According to the details of described file to be identified, mate in a database, to obtain the known safe file matched with described file to be identified; Wherein, described database comprises the details of known safe file.
In a concrete implementation procedure, described similar statistics unit 32, specifically for:
Obtain the code block fingerprint of described file to be identified;
According to the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file, obtain the similarity of described file to be identified and described known safe file.
In a concrete implementation procedure, when described similar statistics unit 32 is for obtaining the code block fingerprint of described file to be identified, specifically for:
Decompiling process is carried out to described file to be identified, to obtain the decompiling code of described file to be identified;
Obtain at least one code block that described decompiling code comprises;
Obtain the finger print information of each described code block;
According to the finger print information of each described code block, obtain the code block fingerprint of described file to be identified.
In a concrete implementation procedure, the finger print information of each described code block comprises:
The order structure feature of each described code block; And,
According to the cryptographic hash that part instructs in each described code block obtains.
In a concrete implementation procedure, when described similar statistics unit 32 is for obtaining the similarity of described file to be identified and described known safe file, specifically for:
Following formula is utilized to obtain the similarity of described file to be identified and described known safe file:
C=(F(A∩B))/(F(A∪B))×100
In this formula, C represents the similarity of described file to be identified and described known safe file; Described A represents the code block fingerprint of described file to be identified; Described B represents the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum of the step-length of code block fingerprint in the common factor of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum also concentrating the step-length of code block fingerprint of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file.
In a concrete implementation procedure, described file identification unit 33, specifically for:
If the similarity of described file to be identified and described known safe file is greater than default similar threshold value, identifying described file to be identified is secure file.
Alternatively, in one of the embodiment of the present invention possible implementation, please refer to Fig. 4, the functional block diagram of the embodiment two of the recognition device of its secure file provided for the embodiment of the present invention, as shown in the figure, described device can also comprise:
File adding device 34, if be secure file for identifying described file to be identified, adds described file to be identified to described database.
Because each unit in the present embodiment can perform the method shown in Fig. 1, the part that the present embodiment is not described in detail, can with reference to the related description to Fig. 1.
The technical scheme of the embodiment of the present invention has following beneficial effect:
In the embodiment of the present invention, by according to file to be identified, obtain the known safe file matched with described file to be identified; Thus, obtain the similarity of described file to be identified and described known safe file; And then, according to the similarity of described file to be identified and described known safe file, identify whether described file to be identified is secure file.
In the technical scheme that the embodiment of the present invention provides, utilize the similarity of known safe file and file to be identified, identify whether file to be identified is secure file, be equivalent to the differentiation by whether belonging to source file to file to be identified and known safe file, whether identify file to be identified is secure file, therefore, it is possible to identify secure file fast.With judge in prior art whether unknown file belongs to compared with the recognition method of white list, the embodiment of the present invention can identify more secure file, meet the identification demand of current secure file, improve the discrimination of secure file, therefore the problem that in prior art, the discrimination of secure file is lower is solved, thus the unnecessary bullet window quantity in user side can be reduced, promote Consumer's Experience.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
In several embodiment provided by the present invention, should be understood that, disclosed system, apparatus and method, can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, is only a kind of logic function and divides, and actual can have other dividing mode when realizing, such as, multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form that hardware also can be adopted to add SFU software functional unit realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, comprising some instructions in order to make a computer installation (can be personal computer, server, or network equipment etc.) or processor (Processor) perform the part steps of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (Read-OnlyMemory, ROM), random access memory (RandomAccessMemory, RAM), magnetic disc or CD etc. various can be program code stored medium.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (16)

1. a recognition methods for secure file, is characterized in that, described method comprises:
According to file to be identified, obtain the known safe file matched with described file to be identified;
Obtain the similarity of described file to be identified and described known safe file;
According to the similarity of described file to be identified and described known safe file, identify whether described file to be identified is secure file.
2. method according to claim 1, is characterized in that, described according to file to be identified, obtains the known safe file matched with described file to be identified, comprising:
According to the details of described file to be identified, mate in a database, to obtain the known safe file matched with described file to be identified; Wherein, described database comprises the details of known safe file.
3. method according to claim 1 and 2, is characterized in that, the similarity of the described file to be identified of described acquisition and described known safe file, comprising:
Obtain the code block fingerprint of described file to be identified;
According to the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file, obtain the similarity of described file to be identified and described known safe file.
4. method according to claim 3, is characterized in that, the code block fingerprint of the described file to be identified of described acquisition, comprising:
Decompiling process is carried out to described file to be identified, to obtain the decompiling code of described file to be identified;
Obtain at least one code block that described decompiling code comprises;
Obtain the finger print information of each described code block;
According to the finger print information of each described code block, obtain the code block fingerprint of described file to be identified.
5. method according to claim 4, is characterized in that, the finger print information of each described code block comprises:
The order structure feature of each described code block; And,
According to the cryptographic hash that part instructs in each described code block obtains.
6. method according to claim 4, is characterized in that, the similarity of the described file to be identified of described acquisition and described known safe file, comprising:
Following formula is utilized to obtain the similarity of described file to be identified and described known safe file:
C = F ( A ∩ B ) F ( A ∪ B ) × 100
In this formula, C represents the similarity of described file to be identified and described known safe file; Described A represents the code block fingerprint of described file to be identified; Described B represents the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum of the step-length of code block fingerprint in the common factor of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum also concentrating the step-length of code block fingerprint of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file.
7. method according to claim 1, is characterized in that, the described similarity according to described file to be identified and described known safe file, identifies whether described file to be identified is secure file, comprising:
If the similarity of described file to be identified and described known safe file is greater than default similar threshold value, identifying described file to be identified is secure file.
8. method according to claim 2, is characterized in that, described method also comprises:
If identifying described file to be identified is secure file, add described file to be identified to described database.
9. a recognition device for secure file, is characterized in that, described device comprises:
Ff unit, for according to file to be identified, obtains the known safe file matched with described file to be identified;
Similar statistics unit, for obtaining the similarity of described file to be identified and described known safe file;
File identification unit, for the similarity according to described file to be identified and described known safe file, identifies whether described file to be identified is secure file.
10. device according to claim 9, is characterized in that, described ff unit, specifically for:
According to the details of described file to be identified, mate in a database, to obtain the known safe file matched with described file to be identified; Wherein, described database comprises the details of known safe file.
11. devices according to claim 9 or 10, is characterized in that, described similar statistics unit, specifically for:
Obtain the code block fingerprint of described file to be identified;
According to the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file, obtain the similarity of described file to be identified and described known safe file.
12. devices according to claim 11, is characterized in that, when described similar statistics unit is for obtaining the code block fingerprint of described file to be identified, specifically for:
Decompiling process is carried out to described file to be identified, to obtain the decompiling code of described file to be identified;
Obtain at least one code block that described decompiling code comprises;
Obtain the finger print information of each described code block;
According to the finger print information of each described code block, obtain the code block fingerprint of described file to be identified.
13. devices according to claim 12, is characterized in that, the finger print information of each described code block comprises:
The order structure feature of each described code block; And,
According to the cryptographic hash that part instructs in each described code block obtains.
14. devices according to claim 12, is characterized in that, when described similar statistics unit is for obtaining the similarity of described file to be identified and described known safe file, specifically for:
Following formula is utilized to obtain the similarity of described file to be identified and described known safe file:
C=(F(A∩B))/(F(A∪B))×100
In this formula, C represents the similarity of described file to be identified and described known safe file; Described A represents the code block fingerprint of described file to be identified; Described B represents the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum of the step-length of code block fingerprint in the common factor of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file; F (A ∩ B) represents the cumulative sum also concentrating the step-length of code block fingerprint of the code block fingerprint of described file to be identified and the code block fingerprint of described known safe file.
15. devices according to claim 9, is characterized in that, described file identification unit, specifically for:
If the similarity of described file to be identified and described known safe file is greater than default similar threshold value, identifying described file to be identified is secure file.
16. devices according to claim 10, is characterized in that, described device also comprises:
File adding device, if be secure file for identifying described file to be identified, adds described file to be identified to described database.
CN201510553020.6A 2015-09-01 2015-09-01 A kind of recognition methods of secure file and device Active CN105138918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510553020.6A CN105138918B (en) 2015-09-01 2015-09-01 A kind of recognition methods of secure file and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510553020.6A CN105138918B (en) 2015-09-01 2015-09-01 A kind of recognition methods of secure file and device

Publications (2)

Publication Number Publication Date
CN105138918A true CN105138918A (en) 2015-12-09
CN105138918B CN105138918B (en) 2019-03-29

Family

ID=54724263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510553020.6A Active CN105138918B (en) 2015-09-01 2015-09-01 A kind of recognition methods of secure file and device

Country Status (1)

Country Link
CN (1) CN105138918B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105590068A (en) * 2015-12-25 2016-05-18 北京奇虎科技有限公司 File fingerprint check method and device
CN108491458A (en) * 2018-03-02 2018-09-04 深圳市联软科技股份有限公司 A kind of sensitive document detection method, medium and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082792A (en) * 2010-12-31 2011-06-01 成都市华为赛门铁克科技有限公司 Phishing webpage detection method and device
CN103824030A (en) * 2014-02-27 2014-05-28 宇龙计算机通信科技(深圳)有限公司 Data protection device and data protection method
CN104123493A (en) * 2014-07-31 2014-10-29 百度在线网络技术(北京)有限公司 Method and device for detecting safety performance of application program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082792A (en) * 2010-12-31 2011-06-01 成都市华为赛门铁克科技有限公司 Phishing webpage detection method and device
CN103824030A (en) * 2014-02-27 2014-05-28 宇龙计算机通信科技(深圳)有限公司 Data protection device and data protection method
CN104123493A (en) * 2014-07-31 2014-10-29 百度在线网络技术(北京)有限公司 Method and device for detecting safety performance of application program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105590068A (en) * 2015-12-25 2016-05-18 北京奇虎科技有限公司 File fingerprint check method and device
CN108491458A (en) * 2018-03-02 2018-09-04 深圳市联软科技股份有限公司 A kind of sensitive document detection method, medium and equipment

Also Published As

Publication number Publication date
CN105138918B (en) 2019-03-29

Similar Documents

Publication Publication Date Title
EP2472425B1 (en) System and method for detecting unknown malware
US8291497B1 (en) Systems and methods for byte-level context diversity-based automatic malware signature generation
US8533831B2 (en) Systems and methods for alternating malware classifiers in an attempt to frustrate brute-force malware testing
US8769685B1 (en) Systems and methods for using file paths to identify potentially malicious computer files
KR101720686B1 (en) Apparaus and method for detecting malcious application based on visualization similarity
US8732836B2 (en) System and method for correcting antivirus records to minimize false malware detections
CN104025107A (en) Fuzzy whitelisting anti-malware systems and methods
US9553889B1 (en) System and method of detecting malicious files on mobile devices
CN102307189A (en) Malicious code detection method and network equipment
KR101858620B1 (en) Device and method for analyzing javascript using machine learning
Kang et al. Malware categorization using dynamic mnemonic frequency analysis with redundancy filtering
CN105138918A (en) Recognition method and device for secure file
US20170109515A1 (en) Identifying machine-generated strings
Martinelli et al. Classifying android malware through subgraph mining
CN103297267A (en) Method and system for network behavior risk assessment
US10114946B2 (en) Method and device for detecting malicious code in an intelligent terminal
KR101907681B1 (en) Method, apparatus, and system for automatically generating rule for detecting virus code, and computer readable recording medium for reciring the same
CN106709350B (en) Virus detection method and device
Mantoo et al. Static, dynamic and intrinsic features based android malware detection using machine learning
Lim et al. Analyzing stack flows to compare Java programs
CN103714269A (en) Virus identification method and device
Van Nguyen et al. Code Action Network for Binary Function Scope Identification
KR20180133726A (en) Appratus and method for classifying data using feature vector
CN105224451A (en) A kind of disposal route of script file and system
CN106909839A (en) A kind of method and device for extracting sample code feature

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant