WO2018121464A1 - Method and device for detecting virus, and storage medium - Google Patents

Method and device for detecting virus, and storage medium Download PDF

Info

Publication number
WO2018121464A1
WO2018121464A1 PCT/CN2017/118195 CN2017118195W WO2018121464A1 WO 2018121464 A1 WO2018121464 A1 WO 2018121464A1 CN 2017118195 W CN2017118195 W CN 2017118195W WO 2018121464 A1 WO2018121464 A1 WO 2018121464A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
feature
detected
control flow
flow graph
Prior art date
Application number
PCT/CN2017/118195
Other languages
French (fr)
Chinese (zh)
Inventor
罗元海
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018121464A1 publication Critical patent/WO2018121464A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Definitions

  • the present invention relates to the field of computers, and in particular, to a virus detecting method, a virus detecting device, and a storage medium.
  • the feature code based on the binary content of the virus sample in the prior art has no semantic analysis, lacks understanding of the program logic, and is very sensitive to the change of the sample. As long as the virus sample has slight changes, the signature may fail. This makes the broad spectrum of the signature very poor: on the one hand, the virus author can achieve the purpose of changing the virus hash and binary data to bypass the anti-virus software by slightly modifying the source code; on the other hand, most of them
  • the compiler has optimization mechanisms such as instruction reordering and register reassignment, so that the binary content of the object files compiled by the same source code may be inconsistent.
  • the technical problem to be solved by the embodiments of the present invention is to provide a virus detection method, a virus detection device, and a storage medium, which solves the technical problem that the prior art is based on the poor spectral characteristics of the binary content of the virus sample and the difficulty of virus elimination.
  • the first aspect of the embodiments of the present invention discloses a virus detection method, including:
  • the sample to be detected is determined to be a malicious sample.
  • the method before the matching the feature of the sample to be detected with the feature in the virus signature database, the method further includes:
  • the identifiers of the control flow graphs corresponding to the known malicious samples are collected, the features of the known malicious samples are generated, and the features of the known malicious samples are stored in the virus signature database.
  • control flow graph corresponding to the known malicious sample is encoded, and the identifier corresponding to the control flow graph is generated, including:
  • the control flow graph corresponding to each function is encoded into a triplet, and the triplet is used as an identifier corresponding to the control flow graph, wherein the triplet includes the basic block number of the control flow graph and the number of edges of the control flow graph And the number of calls in the control flow graph.
  • the method further includes:
  • the feature of the sample to be detected fails to match all the features in the virus signature database, it is determined that the sample to be detected is a normal sample.
  • the matching the feature of the sample to be detected with the feature in the virus signature database includes:
  • the calculating, according to the feature of the to-be-detected sample and the common coding set, the similarity between the feature of the to-be-detected sample and the feature in the virus signature database including:
  • the number of target identifiers includes the number of identifiers in the set corresponding to the feature of the sample to be detected, and the number in the virus signature database The sum of the number of identities in the set corresponding to the feature.
  • a second aspect of the embodiments of the present invention discloses a virus detecting apparatus, including:
  • a first disassembly module configured to disassemble the sample to be detected, and obtain a control flow graph corresponding to the disassembled function
  • the first encoding module is configured to encode the obtained control flow graph to generate an identifier corresponding to each control flow graph; the identifiers corresponding to the different control flow graphs are different;
  • the ensemble module is configured to: collect an identifier corresponding to the control flow graph, and generate a feature of the sample to be detected;
  • a feature matching module configured to match features of the sample to be detected with features in a virus signature database
  • the first determining module is configured to determine that the to-be-detected sample is a malicious sample if the feature of the to-be-detected sample matches the feature in the virus signature database.
  • a second disassembly module configured to disassemble the known malicious samples before the feature matching module matches the features of the sample to be detected with the features in the virus signature database, and obtain a function corresponding to the disassembled function Control flow graph;
  • the second encoding module is configured to encode the control flow graph corresponding to the known malicious sample, and generate an identifier corresponding to the control flow graph;
  • a storage module configured to collect an identifier of the control flow graph corresponding to the known malicious sample, generate a feature of the known malicious sample, and store the feature of the known malicious sample in the virus signature database.
  • the first coding module is further configured to encode a control flow graph corresponding to each function into a triplet, and use the triplet as an identifier corresponding to the control flow graph; wherein the triplet includes Controls the basic number of blocks in the flow graph, the number of edges in the control flow graph, and the number of calls in the control flow graph.
  • the second determining module is configured to: after the feature matching module matches the feature of the sample to be detected with the feature in the virus signature database, if the feature of the sample to be detected and the virus signature database If all the features in the match fail, it is determined that the sample to be detected is a normal sample.
  • the feature matching module includes:
  • a common coding calculation unit configured to calculate a common coding set of the feature of the sample to be detected and a feature in the virus signature database
  • a similarity calculation unit configured to calculate, according to the feature of the to-be-detected sample and the common coding set, a similarity between a feature of the to-be-detected sample and a feature in a virus signature database;
  • a determining unit configured to determine whether the similarity is greater than a preset threshold
  • the similarity calculation unit includes:
  • a first calculating unit configured to divide the number of identifiers in the common code set by the number of identifiers in the set corresponding to the feature of the sample to be detected, to obtain a similarity
  • a second calculating unit configured to divide the number of the identifiers in the common code set by the number of target identifiers to obtain a similarity; the number of target identifiers includes the number of identifiers in the set corresponding to the feature of the sample to be detected, and the The sum of the number of identities in the set corresponding to the feature in the virus signature database.
  • a third aspect of the embodiments of the present invention discloses a virus detecting apparatus, including: a processor and a memory for storing a computer program capable of running on a processor, wherein the processor is configured to execute when the computer program is executed The method described above.
  • a fourth aspect of the embodiments of the present invention discloses a computer readable storage medium having stored thereon a computer program, wherein the computer program is executed by a processor to implement the above method.
  • the control flow graph corresponding to the function is obtained by disassembling the sample to be detected, the control flow graph is encoded, the identifier corresponding to the control flow graph is generated, and then the identifier corresponding to the control flow graph is collected, and the to-be-detected is generated.
  • the characteristics of the sample; finally, the characteristics of the sample to be detected are matched with the features in the virus signature database to perform virus detection, thus solving the problem that the prior art is based on the poor spectral characteristics of the binary content of the virus sample, and the difficulty of virus elimination Low technical issues.
  • FIG. 1 is a schematic diagram of a scenario structure of a virus detection according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a virus detecting method according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a function control flow diagram provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a principle for encoding a control flow graph according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart diagram of another embodiment of a virus detecting method provided by the present invention.
  • FIG. 6 is a schematic diagram of a feature calculation principle provided by an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a virus detecting apparatus according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of another embodiment of a virus detecting apparatus provided by the present invention.
  • FIG. 9 is a schematic structural diagram of a feature matching module according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of another embodiment of a virus detecting apparatus provided by the present invention.
  • FIG. 1 is a schematic diagram of a scenario structure of a virus detection according to an embodiment of the present invention.
  • the virus signature database needs to be generated first, and the known malicious samples (ie, virus samples) are extracted, and then stored in the virus signature database. Then, after the sample to be detected is acquired, the feature to be detected is extracted by the same feature extraction, and then the feature scan is performed to see whether the feature of the sample to be detected matches the feature in the virus signature database. Then, the sample to be detected is a malicious sample, and if it does not match, the sample to be detected is a normal sample.
  • the known malicious samples ie, virus samples
  • the sample in the embodiment of the present invention may include an executable file.
  • the executable file format is different according to different operating systems.
  • the executable file under Windows is exe format
  • the executable file under Linux is elf format, Android.
  • the executable files under Android are dex format, elf format, and so on.
  • the virus detecting method of the embodiment of the invention can be applied to a personal computer, a smart mobile terminal (such as a mobile phone, a mobile computer, a tablet computer), a personal digital assistant (PDA), a smart TV, a smart watch, a smart glasses, and a smart device.
  • a smart mobile terminal such as a mobile phone, a mobile computer, a tablet computer
  • PDA personal digital assistant
  • smart TV TV
  • smart watch a smart watch
  • smart glasses a smart device.
  • smart device On an electronic device with an operating system such as a wristband.
  • FIG. 2 is a schematic flowchart of a virus detection method according to an embodiment of the present invention, and details how to perform virus detection in the embodiment of the present invention may include the following steps:
  • Step S200 disassembling the sample to be detected, and obtaining a control flow graph corresponding to the function after disassembling;
  • FIG. 3 is a schematic diagram of a function control flow graph provided by an embodiment of the present invention.
  • a sequence of all assembly statements for a function can be divided into several basic blocks.
  • a basic block is a contiguous sequence of assembly statements, from which the control flow enters and exits from its end, with no interruptions or branches (except at the end).
  • the nodes of the control flow graph are composed of basic blocks, which represent calculations; the edges between the nodes represent control flow directions.
  • Each function control flow graph has only one entry (a point with an entry degree of 0), but can have multiple exits (points with an exit degree of 0). As shown in Figure 3, a function control flow graph example, this function control flow graph contains 5 basic blocks, 6 edges.
  • Step S202 Encoding the obtained control flow graph to generate an identifier corresponding to the control flow graph; different identifiers corresponding to the different control flow graphs are different. For example, in a specific example, each control flow graph is separately encoded to generate each Control the identifier corresponding to the flow graph;
  • the identifiers corresponding to the different control flow graphs in the embodiment of the present invention are different. That is, the embodiment of the present invention does not limit the manner in which the control flow graph is encoded, and only needs to ensure that the encoding of the same control flow graph is consistent, and different control The coding of the flow graph is inconsistent.
  • one embodiment of the present invention may encode a control flow graph corresponding to each function into a triplet, and use the triplet as an identifier corresponding to the control flow graph; wherein the triplet includes a basic block of the control flow graph. Number Node, the number of edges of the control flow graph Edge and the number of calls Call in the control flow graph. Then, in a function control flow diagram example shown in FIG.
  • the edge number Edge of the flow graph is 7, and the number of calls in the control flow graph Call is 4 (that is, the number of BL statements in FIG. 3), so the corresponding identifier of the encoded control flow graph is (5, 7, 4).
  • Step S204 The identifier corresponding to the control flow graph is generated, and the feature of the sample to be detected is generated. For example, in a specific example, the identifier corresponding to all the control flow graphs is collected, and the feature of the sample to be detected is generated.
  • each function is encoded, and an identifier corresponding to the control flow graph is obtained. Therefore, the identifiers corresponding to all the control flow graphs are collected, and the to-be-generated The characteristics of the sample are detected, that is to say, each sample to be detected will get an element as a set of codes, and this set is the feature of the sample to be detected.
  • Step S206 matching the feature of the sample to be detected with the feature in the virus signature database
  • the feature corresponding to the at least one known virus sample is pre-stored in the virus signature database, and then the feature of the sample to be detected (one element is a coded set) is matched to each feature in the feature library (one element is coded)
  • the collection of the virus signature library that is, the traversal of each feature in the virus signature database, if the matching is found to be successful during the traversal process, step S208 may be performed, if all the features in the virus signature database are traversed If the match is found to be successful, step S210 can be performed. Thereby, it is determined whether the sample to be detected is a malicious sample.
  • Step S208 If the feature of the sample to be detected matches the feature in the virus signature database, the sample to be detected is determined to be a malicious sample.
  • Step S210 If the feature of the to-be-detected sample fails to match all the features in the virus signature database, determine that the to-be-detected sample is a normal sample.
  • a detailed description of how the virus detection is performed in the embodiment of the present invention may include the following steps:
  • Step S500 disassembling the known malicious samples, and obtaining a control flow graph corresponding to the disassembled function
  • Step S502 Encoding the control flow graph corresponding to the known malicious sample, and generating an identifier corresponding to the control flow graph. For example, in a specific example, each control flow graph is separately encoded to generate a corresponding control flow graph. Identification
  • Step S504 Collecting identifiers of control flow graphs corresponding to known malicious samples, generating features of the known malicious samples, and storing the features of the known malicious samples in a virus signature database, for example, in a specific example. And collecting identifiers corresponding to all control flow graphs, generating features of the known malicious samples, and storing the features of the known malicious samples in a virus signature database;
  • step S500 to step S504 can be referred to step S200 to step S204 in the foregoing embodiment of FIG. 2, and details are not described herein again. Only the objects are different.
  • the steps S500 to S504 are for known malicious samples, and the above steps S200 to S204 are for the samples to be detected.
  • Step S506 disassembling the sample to be detected, and obtaining a control flow graph corresponding to the function after disassembling;
  • Step S508 encoding each control flow graph separately, and generating an identifier corresponding to each control flow graph;
  • Step S510 Collecting identifiers corresponding to all control flow graphs, and generating features of the to-be-detected samples;
  • step S506 to step S510 can be referred to step S200 to step S204 in the foregoing embodiment of FIG. 2, and details are not described herein again.
  • Step S512 Calculating a common coding set of the feature of the sample to be detected and the feature in the virus signature database;
  • the embodiment of the present invention performs matching by calculating a common coding set of two features, as shown in the schematic diagram of the feature calculation principle provided by the embodiment of the present invention, and calculates the relationship between the feature A and the feature B.
  • Public coding set In the embodiment of the present invention, the feature A can be used to refer to the feature of the sample to be detected, and the feature B refers to the feature in the virus feature database.
  • Step S514 Calculate the similarity between the feature of the sample to be detected and the feature in the virus feature database according to the feature of the sample to be detected and the common code set;
  • the number of identifiers in the common code set may be divided by the number of identifiers in the set corresponding to the feature of the sample to be detected to obtain a similarity; or the number of identifiers in the common code set may be divided by the target identifier.
  • the number of the target identifiers includes a sum of the number of identifiers in the set corresponding to the feature of the sample to be detected and the number of identifiers in the set corresponding to the feature in the virus signature database. For example:
  • similarity(A, B) the number of codes of A and B common code / A.
  • Characteristics A ⁇ (12,3,4), (5,7,4), (22,11,2), (56,90,5), (32,54,1),(123,34,15 ) ⁇ ;
  • Characteristic B ⁇ (54, 32, 1), (56, 90, 5), (22, 11, 2), (5, 7, 4), (12, 3, 4) ⁇ ;
  • Step S516 determining whether the similarity is greater than a preset threshold
  • the embodiment of the present invention presets a threshold (that is, the preset threshold), and the threshold may set different thresholds based on different calculation algorithms in step S514, and the preset threshold may be selected according to experience or experiment, for example,
  • the threshold may be 0.8 or the like, and the number of identifiers in the common code set is divided by the target identifier.
  • the threshold value may be 0.4 or the like, which is not limited in the present invention.
  • step S518 When the similarity is greater than the preset threshold, the feature corresponding to the sample to be detected is successfully matched with the feature in the virus signature database, and step S518 is performed; otherwise, step S520 is performed.
  • Step S518 determining that the sample to be detected is a malicious sample
  • Step S520 determining that the sample to be detected is a normal sample.
  • control flow graph corresponding to the function is obtained by disassembling the sample to be detected, the control flow graph is encoded, the identifier corresponding to the control flow graph is generated, and then the identifier corresponding to the control flow graph is collected, and the to-be-detected is generated.
  • the characteristics of the sample are matched with the features in the virus signature database to perform virus detection, thus solving the problem that the prior art is based on the poor spectral characteristics of the binary content of the virus sample, and the difficulty of virus elimination Low technical problem;
  • the embodiment of the present invention performs virus detection by controlling the structured features of the flow graph based on the function, and treats each function of the sample as a directed graph and encodes it as a feature, thereby taking into account the logic of the program.
  • the process can well resist the interference introduced by the compiler optimization strategy and the interference introduced by the virus author's modification of the source code, which greatly improves the broad spectrum of the feature code and the difficulty of virus elimination.
  • the present invention further provides a virus detecting device, which will be described in detail below with reference to the accompanying drawings:
  • FIG. 7 is a schematic structural diagram of a virus detecting apparatus according to an embodiment of the present invention.
  • the virus detecting apparatus 70 may include: a first disassembly module 700, a first encoding module 702, a set module 704, a feature matching module 706, and a first Determining module 708, wherein
  • the first disassembly module 700 is configured to disassemble the sample to be detected, and obtain a control flow graph corresponding to the disassembled function;
  • the first encoding module 702 is configured to encode the obtained control flow graph to generate an identifier corresponding to the control flow graph; the identifiers corresponding to the different control flow graphs are different;
  • the ensemble module 704 is configured to generate an identifier corresponding to the control flow graph, and generate a feature of the sample to be detected.
  • Feature matching module 706 is configured to match features of the sample to be detected with features in a virus signature database
  • the first determining module 708 is configured to determine that the to-be-detected sample is a malicious sample if the feature of the to-be-detected sample matches the feature in the virus signature database.
  • the virus detecting apparatus 70 includes a first disassembling module 700, a first encoding module 702, a set module 704, and a feature matching module.
  • the 706 and the first determining module 708 may further include: a second disassembly module 7010, a second encoding module 7012, a storage module 7014, and a second determining module 7016, where
  • the second disassembly module 7010 is configured to disassemble the known malicious samples before the feature matching module 706 matches the features of the sample to be detected with the features in the virus signature database, and obtain the corresponding control of the disassembled function.
  • the second encoding module 7012 is configured to encode the control flow graph corresponding to the known malicious sample, and generate an identifier corresponding to the control flow graph;
  • the storage module 7014 is configured to collect an identifier of the control flow graph corresponding to the known malicious sample, generate a feature of the known malicious sample, and store the feature of the known malicious sample in the virus signature database.
  • the second determining module 7016 is configured to: after the feature matching module 706 matches the feature of the to-be-detected sample with the feature in the virus signature database, if the feature of the sample to be detected and all features in the virus signature database are If the matching fails, it is determined that the sample to be detected is a normal sample.
  • first disassembly module 700 and the first encoding module 702 in the virus detecting device 70 may also perform operations of the second disassembly module 7010 and the second encoding module 7012, that is, the virus detecting device 70.
  • the second disassembly module 7010 and the second encoding module 7012 may also be included, and the first disassembly module 700 may directly match the features of the to-be-detected sample with the features in the virus signature database by the feature matching module 706.
  • the known malicious samples are disassembled, the control flow graph corresponding to the disassembled function is obtained, and each control flow graph is encoded by the first encoding module 702 to generate an identifier corresponding to each control flow graph.
  • the structure matching module 706 may include a common coding calculation unit 7060, a similarity calculation unit 7062, and a determination unit 7063, where
  • the common coding calculation unit 7060 is configured to calculate a common coding set of features of the sample to be detected and features in the virus signature database;
  • the similarity calculation unit 7062 is configured to calculate, according to the feature of the to-be-detected sample and the common coding set, a similarity between a feature of the to-be-detected sample and a feature in a virus signature database;
  • the determining unit 7063 is configured to determine whether the similarity is greater than a preset threshold
  • the similarity calculation unit 7062 may include a first calculation unit or a second calculation unit, where
  • the first calculating unit is configured to divide the number of the identifiers in the common code set by the number of the identifiers in the set corresponding to the feature of the sample to be detected to obtain a similarity; or
  • the second calculating unit is configured to divide the number of the identifiers in the common code set by the number of target identifiers to obtain a similarity; the number of target identifiers includes the number of identifiers in the set corresponding to the feature of the sample to be detected, and the The sum of the number of identities in the set corresponding to the feature in the virus signature database.
  • first encoding module 702 or the second encoding module 7012 in the embodiment of the present invention may be further configured to encode the control flow graph corresponding to each function into a triplet and use the triplet as the control flow graph.
  • the triplet includes a basic block number of the control flow graph, a number of edges of the control flow graph, and a number of calls in the control flow graph.
  • FIG. 10 is a schematic structural diagram of another embodiment of a virus detecting apparatus provided by the present invention.
  • the virus detecting apparatus 100 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, a memory 1005, at least one communication bus 1002, a display screen 1006, and a camera module 1007. .
  • the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 may include a touch screen or the like.
  • the network interface 1004 can optionally include a standard wired interface, a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory, and the memory 1005 includes a flash in the embodiment of the present invention.
  • the memory 1005 can also optionally be at least one storage system located remotely from the aforementioned processor 1001. As shown in FIG. 10, an operating system, a network communication module, a user interface module, and a virus detection program may be included in the memory 1005 as a computer storage medium.
  • the processor 1001 can be used to call the virus detecting program stored in the memory 1005 and perform the following operations:
  • the virus signature database may be stored in the memory 1005;
  • the sample to be detected is determined to be a malicious sample.
  • the processor 1001 may further perform:
  • the identifier corresponding to the control flow graph corresponding to the known malicious sample is collected, the feature of the known malicious sample is generated, and the feature of the known malicious sample is stored in the virus signature database.
  • the processor 1001 matches the feature of the sample to be detected with the feature in the virus signature database, including:
  • the processor 1001 encodes the control flow graph corresponding to the known malicious sample, and generates an identifier corresponding to the control flow graph, including:
  • the control flow graph corresponding to each function is encoded into a triplet, and the triplet is used as an identifier corresponding to the control flow graph, wherein the triplet includes the basic block number of the control flow graph and the number of edges of the control flow graph And the number of calls in the control flow graph.
  • the processor 1001 may further perform:
  • the feature of the sample to be detected fails to match all the features in the virus signature database, it is determined that the sample to be detected is a normal sample.
  • the processor 1001 calculates, according to the feature of the to-be-detected sample and the common coding set, the similarity between the feature of the to-be-detected sample and the feature in the virus signature database, which may include:
  • the number of target identifiers includes the number of identifiers in the set corresponding to the feature of the sample to be detected, and the number in the virus signature database The sum of the number of identities in the set corresponding to the feature.
  • the functions of the modules in the virus detecting device 70 or the virus detecting device 100 in the embodiments of the present invention may be corresponding to the specific implementation manners of any of the foregoing embodiments in the foregoing method embodiments, for example,
  • the first disassembly module 700, the first encoding module 702, the aggregation module 704, the feature matching module 706, the first determining module 708, the second disassembling module 7010, the second encoding module 7012, the storage module 7014, and the second determining module 7016 can be implemented by the processor 1001 described above.
  • Embodiments of the present invention implement a control flow graph corresponding to a function by disassembling a sample to be detected, respectively encoding each control flow graph, generating an identifier corresponding to each control flow graph, and then collecting all control flow graphs corresponding to each Identifying, generating a feature of the sample to be detected; and finally matching the feature of the sample to be detected with a feature in the virus signature database for virus detection.
  • the prior art solves the technical problem that the feature code of the binary content of the virus sample is poor in spectral characteristics, and the virus is not difficult to kill.
  • the embodiment of the present invention performs virus detection by controlling the structured feature of the flow graph based on the function, and each of the samples is
  • the function is treated as a directed graph and encoded as a feature, thus taking into account the logic flow of the program, which can well resist the interference introduced by the compiler optimization strategy and the interference introduced by the virus author's modification of the source code, greatly improving The broad spectrum of signature codes and the difficulty of virus elimination.
  • the embodiment further provides a computer readable storage medium having stored thereon a computer program, wherein the computer program is executed by the processor to implement the steps of the method described above.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
  • the embodiment of the invention performs virus detection by controlling the structured features of the flow graph based on the function, and each function of the sample is regarded as a directed graph and encoded as a feature, thereby taking into account the logic flow of the program, which can be very good.
  • the interference introduced by the ground resistance compiler optimization strategy and the interference introduced by the virus author's modification of the source code greatly improve the broad spectrum of the feature code and the difficulty of virus elimination.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

A method for detecting a virus, comprising: disassembling a sample to be tested to obtain a control flow graph corresponding to a disassembled function (S200); encoding each control flow graph separately to generate an identifier corresponding to each control flow graph (S202), different control flow graphs corresponding to different identifiers; collecting the identifiers corresponding to all control flow graphs to generate a feature of the sample to be tested (S204); matching the feature of the sample to be tested with features in a virus signature database (S206); if the feature of the sample to be tested matches a feature in the virus signature database successfully, determining that the sample to be tested is a malicious sample (S208). Also involved is a device for detecting a virus, and a storage medium.

Description

一种病毒检测方法及装置、存储介质Virus detection method and device, storage medium
相关申请的交叉引用Cross-reference to related applications
本申请基于申请号为201611257005.8、申请日为2016年12月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。The present application is based on a Chinese patent application filed on Jan. 30, 2016, the entire disclosure of which is hereby incorporated by reference.
技术领域Technical field
本发明涉及计算机领域,尤其涉及病毒检测方法及病毒检测装置、存储介质。The present invention relates to the field of computers, and in particular, to a virus detecting method, a virus detecting device, and a storage medium.
背景技术Background technique
随着电子科技以及移动互联网技术的发展,电子设备尤其是智能移动设备的功能越来越强大,只要用户按照自身的需求在电子设备上安装各种应用程序,便可以通过移动互联网完成各种事务,例如移动支付、移动办公等等。然而,当电子设备上网成为非常普遍的事情的同时,电子设备感染病毒木马的几率也越来越高,很容易给用户造成巨大损失。With the development of electronic technology and mobile Internet technology, the functions of electronic devices, especially smart mobile devices, are becoming more and more powerful. As long as users install various applications on electronic devices according to their own needs, they can complete various tasks through the mobile Internet. Such as mobile payment, mobile office and so on. However, while electronic devices become very commonplace, electronic devices are increasingly infected with Trojans, which can easily cause huge losses to users.
目前基于特征码的反病毒引擎大多数是基于文件哈希匹配和内容二进制字节匹配的。比如著名的开源反病毒软件Clam AntiVirus(ClamAV)的特征码:Most of the current signature-based anti-virus engines are based on file hash matching and content binary byte matching. For example, the signature code of the famous open source anti-virus software Clam AntiVirus (ClamAV):
1.格式:HashString:FileSize:MalwareName1. Format: HashString: FileSize: MalwareName
举例:Example:
507d8f868c27feb88b18e6f8426adf1c:12391:Win.Exploit.CVE_2013_3163507d8f868c27feb88b18e6f8426adf1c:12391:Win.Exploit.CVE_2013_3163
2.格式:MalwareName=HexSignature:2. Format: MalwareName=HexSignature:
举例:Example:
Trojan.URLspoof.gen(Clam)=2e687265663d756e6573636170652827*3a2f2f*Trojan.URLspoof.gen(Clam)=2e687265663d756e6573636170652827*3a2f2f*
现有技术中的基于病毒样本二进制内容的特征码没有语义分析,缺乏对程序逻辑的理解,对样本的变化是非常敏感的,只要病毒样本有轻微的变化,特征码就可能失效。这使得特征码的广谱性非常差:一方面,病毒作者可以通过对源码进行轻微的修改即可达到改变病毒哈希值和二进制数据从而绕过反病毒软件的目的;另一方面,大部分编译器存在指令重排、寄存器重分配等优化机制,使得即使是相同的源码编译出来的目标文件的二进制内容也可能不一致。The feature code based on the binary content of the virus sample in the prior art has no semantic analysis, lacks understanding of the program logic, and is very sensitive to the change of the sample. As long as the virus sample has slight changes, the signature may fail. This makes the broad spectrum of the signature very poor: on the one hand, the virus author can achieve the purpose of changing the virus hash and binary data to bypass the anti-virus software by slightly modifying the source code; on the other hand, most of them The compiler has optimization mechanisms such as instruction reordering and register reassignment, so that the binary content of the object files compiled by the same source code may be inconsistent.
因此,如何提高特征码的光谱性以及病毒免杀的难度,是当前研发人员关注的问题。Therefore, how to improve the spectral properties of signature codes and the difficulty of virus-free killing is a problem that current researchers are concerned about.
发明内容Summary of the invention
本发明实施例所要解决的技术问题在于,提供病毒检测方法及病毒检测装置、存储介质,解决了现有技术基于病毒样本二进制内容的特征码光谱性差,病毒免杀的难度低的技术问题。The technical problem to be solved by the embodiments of the present invention is to provide a virus detection method, a virus detection device, and a storage medium, which solves the technical problem that the prior art is based on the poor spectral characteristics of the binary content of the virus sample and the difficulty of virus elimination.
为了解决上述技术问题,本发明实施例第一方面公开了一种病毒检测方法,包括:In order to solve the above technical problem, the first aspect of the embodiments of the present invention discloses a virus detection method, including:
对待检测样本进行反汇编,得到反汇编后的函数对应的控制流图;Disassemble the test sample to obtain a control flow graph corresponding to the disassembled function;
对得到的控制流图进行编码,生成控制流图对应的标识;不同控制流图对应的标识不同;Encoding the obtained control flow graph to generate an identifier corresponding to the control flow graph; the identifiers corresponding to the different control flow graphs are different;
集合控制流图对应的标识,生成所述待检测样本的特征;Collecting an identifier corresponding to the control flow graph to generate a feature of the sample to be detected;
将所述待检测样本的特征与病毒特征库中的特征进行匹配;Matching characteristics of the sample to be detected with features in a virus signature database;
若所述待检测样本的特征与病毒特征库中的特征匹配成功,则确定所述待检测样本为恶意样本。If the feature of the sample to be detected matches the feature in the virus signature database, the sample to be detected is determined to be a malicious sample.
上述方案中,所述将所述待检测样本的特征与病毒特征库中的特征进行匹配之前,还包括:In the above solution, before the matching the feature of the sample to be detected with the feature in the virus signature database, the method further includes:
对已知恶意样本进行反汇编,得到反汇编后的函数对应的控制流图;Disassemble the known malicious samples and obtain the control flow graph corresponding to the disassembled function;
对已知恶意样本对应的控制流图进行编码,生成控制流图对应的标识;Encoding the control flow graph corresponding to the known malicious sample, and generating an identifier corresponding to the control flow graph;
集合已知恶意样本对应的控制流图的标识,生成所述已知恶意样本的特征,并将所述已知恶意样本的特征存储在病毒特征库中。The identifiers of the control flow graphs corresponding to the known malicious samples are collected, the features of the known malicious samples are generated, and the features of the known malicious samples are stored in the virus signature database.
上述方案中,所述对已知恶意样本对应的控制流图进行编码,生成控制流图对应的标识,包括:In the above solution, the control flow graph corresponding to the known malicious sample is encoded, and the identifier corresponding to the control flow graph is generated, including:
将每个函数对应的控制流图编码为一个三元组,将三元组作为控制流图对应的标识,其中,所述三元组包括控制流图的基本块数、控制流图的边数和控制流图中的调用数。The control flow graph corresponding to each function is encoded into a triplet, and the triplet is used as an identifier corresponding to the control flow graph, wherein the triplet includes the basic block number of the control flow graph and the number of edges of the control flow graph And the number of calls in the control flow graph.
上述方案中,所述将所述待检测样本的特征与病毒特征库中的特征进行匹配之后,还包括:In the above solution, after the matching the feature of the sample to be detected with the feature in the virus signature database, the method further includes:
若所述待检测样本的特征与所述病毒特征库中的所有特征都匹配失败,则确定所述待检测样本为正常样本。If the feature of the sample to be detected fails to match all the features in the virus signature database, it is determined that the sample to be detected is a normal sample.
上述方案中,所述将所述待检测样本的特征与病毒特征库中的特征进行匹配,包括:In the above solution, the matching the feature of the sample to be detected with the feature in the virus signature database includes:
计算所述待检测样本的特征与病毒特征库中的其中一个特征的公共编码集合;Calculating a common coding set of the feature of the sample to be detected and one of the features in the virus signature database;
根据所述待检测样本的特征和所述公共编码集合,计算所述待检测样本的特征与病毒特征库中的所述特征的相似度;Calculating a similarity between the feature of the sample to be detected and the feature in the virus feature database according to the feature of the sample to be detected and the common code set;
判断所述相似度是否大于预设阈值;Determining whether the similarity is greater than a preset threshold;
当所述相似度大于预设阈值时,确定所述待检测样本的特征与病毒特征库中的所述特征匹配成功。When the similarity is greater than the preset threshold, determining that the feature of the sample to be detected matches the feature in the virus signature database is successful.
上述方案中,所述根据所述待检测样本的特征和所述公共编码集合,计算所述待检测样本的特征与病毒特征库中的所述特征的相似度,包括:In the above solution, the calculating, according to the feature of the to-be-detected sample and the common coding set, the similarity between the feature of the to-be-detected sample and the feature in the virus signature database, including:
将所述公共编码集合中的标识数量除以所述待检测样本的特征对应的 集合中的标识数量,得到相似度;或者,Comparing the number of identifiers in the common code set by the number of identifiers in the set corresponding to the feature of the sample to be detected, to obtain a similarity; or
将所述公共编码集合中的标识数量除以目标标识数量,得到相似度;所述目标标识数量包括所述待检测样本的特征对应的集合中的标识数量与所述病毒特征库中的所述特征对应的集合中的标识数量之和。And dividing the number of the identifiers in the common code set by the number of target identifiers to obtain a similarity; the number of target identifiers includes the number of identifiers in the set corresponding to the feature of the sample to be detected, and the number in the virus signature database The sum of the number of identities in the set corresponding to the feature.
本发明实施例第二方面公开了一种病毒检测装置,包括:A second aspect of the embodiments of the present invention discloses a virus detecting apparatus, including:
第一反汇编模块,配置为对待检测样本进行反汇编,得到反汇编后的函数对应的控制流图;a first disassembly module configured to disassemble the sample to be detected, and obtain a control flow graph corresponding to the disassembled function;
第一编码模块,配置为对得到的控制流图进行编码,生成每个控制流图对应的标识;不同控制流图对应的标识不同;The first encoding module is configured to encode the obtained control flow graph to generate an identifier corresponding to each control flow graph; the identifiers corresponding to the different control flow graphs are different;
集合模块,配置为集合控制流图对应的标识,生成所述待检测样本的特征;The ensemble module is configured to: collect an identifier corresponding to the control flow graph, and generate a feature of the sample to be detected;
特征匹配模块,配置为将所述待检测样本的特征与病毒特征库中的特征进行匹配;a feature matching module configured to match features of the sample to be detected with features in a virus signature database;
第一确定模块,配置为若所述待检测样本的特征与病毒特征库中的特征匹配成功,则确定所述待检测样本为恶意样本。The first determining module is configured to determine that the to-be-detected sample is a malicious sample if the feature of the to-be-detected sample matches the feature in the virus signature database.
上述方案中,In the above scheme,
第二反汇编模块,配置为在所述特征匹配模块将所述待检测样本的特征与病毒特征库中的特征进行匹配之前,对已知恶意样本进行反汇编,得到反汇编后的函数对应的控制流图;a second disassembly module configured to disassemble the known malicious samples before the feature matching module matches the features of the sample to be detected with the features in the virus signature database, and obtain a function corresponding to the disassembled function Control flow graph;
第二编码模块,配置为对已知恶意样本对应的控制流图进行编码,生成控制流图对应的标识;The second encoding module is configured to encode the control flow graph corresponding to the known malicious sample, and generate an identifier corresponding to the control flow graph;
存储模块,配置为集合已知恶意样本对应的控制流图的标识,生成所述已知恶意样本的特征,并将所述已知恶意样本的特征存储在病毒特征库中。And a storage module configured to collect an identifier of the control flow graph corresponding to the known malicious sample, generate a feature of the known malicious sample, and store the feature of the known malicious sample in the virus signature database.
上述方案中,所述第一编码模块还配置为,将每个函数对应的控制流 图编码为一个三元组,将三元组作为控制流图对应的标识;其中,所述三元组包括控制流图的基本块数、控制流图的边数和控制流图中的调用数。In the above solution, the first coding module is further configured to encode a control flow graph corresponding to each function into a triplet, and use the triplet as an identifier corresponding to the control flow graph; wherein the triplet includes Controls the basic number of blocks in the flow graph, the number of edges in the control flow graph, and the number of calls in the control flow graph.
上述方案中,第二确定模块,配置为在所述特征匹配模块将所述待检测样本的特征与病毒特征库中的特征进行匹配之后,若所述待检测样本的特征与所述病毒特征库中的所有特征都匹配失败,则确定所述待检测样本为正常样本。In the above solution, the second determining module is configured to: after the feature matching module matches the feature of the sample to be detected with the feature in the virus signature database, if the feature of the sample to be detected and the virus signature database If all the features in the match fail, it is determined that the sample to be detected is a normal sample.
上述方案中,所述特征匹配模块包括:In the above solution, the feature matching module includes:
公共编码计算单元,配置为计算所述待检测样本的特征与病毒特征库中的特征的公共编码集合;a common coding calculation unit configured to calculate a common coding set of the feature of the sample to be detected and a feature in the virus signature database;
相似度计算单元,配置为根据所述待检测样本的特征和所述公共编码集合,计算所述待检测样本的特征与病毒特征库中的特征的相似度;a similarity calculation unit, configured to calculate, according to the feature of the to-be-detected sample and the common coding set, a similarity between a feature of the to-be-detected sample and a feature in a virus signature database;
判断单元,配置为判断所述相似度是否大于预设阈值;a determining unit, configured to determine whether the similarity is greater than a preset threshold;
当所述相似度大于预设阈值时,确定所述待检测样本的特征与病毒特征库中的特征匹配成功。When the similarity is greater than the preset threshold, determining that the feature of the sample to be detected matches the feature in the virus signature database is successful.
上述方案中,所述相似度计算单元包括:In the above solution, the similarity calculation unit includes:
第一计算单元,配置为将所述公共编码集合中的标识数量除以所述待检测样本的特征对应的集合中的标识数量,得到相似度;或者,a first calculating unit, configured to divide the number of identifiers in the common code set by the number of identifiers in the set corresponding to the feature of the sample to be detected, to obtain a similarity; or
第二计算单元,配置为将所述公共编码集合中的标识数量除以目标标识数量,得到相似度;所述目标标识数量包括所述待检测样本的特征对应的集合中的标识数量与所述病毒特征库中的所述特征对应的集合中的标识数量之和。a second calculating unit, configured to divide the number of the identifiers in the common code set by the number of target identifiers to obtain a similarity; the number of target identifiers includes the number of identifiers in the set corresponding to the feature of the sample to be detected, and the The sum of the number of identities in the set corresponding to the feature in the virus signature database.
本发明实施例第三方面公开了一种病毒检测装置,包括:处理器和用于存储能够在处理器上运行的计算机程序的存储器,其中,所述处理器用于运行所述计算机程序时,执行以上所述方法。A third aspect of the embodiments of the present invention discloses a virus detecting apparatus, including: a processor and a memory for storing a computer program capable of running on a processor, wherein the processor is configured to execute when the computer program is executed The method described above.
本发明实施例第四方面公开了一种计算机可读存储介质,其上存储有 计算机程序,其中,该计算机程序被处理器执行时实现以上所述方法。A fourth aspect of the embodiments of the present invention discloses a computer readable storage medium having stored thereon a computer program, wherein the computer program is executed by a processor to implement the above method.
实施本发明实施例,通过对待检测样本进行反汇编后得到函数对应的控制流图,对控制流图进行编码,生成控制流图对应的标识,然后集合控制流图对应的标识,生成该待检测样本的特征;最终将该待检测样本的特征与病毒特征库中的特征进行匹配,来进行病毒检测,如此,解决了现有技术基于病毒样本二进制内容的特征码光谱性差,病毒免杀的难度低的技术问题。After the embodiment of the present invention is implemented, the control flow graph corresponding to the function is obtained by disassembling the sample to be detected, the control flow graph is encoded, the identifier corresponding to the control flow graph is generated, and then the identifier corresponding to the control flow graph is collected, and the to-be-detected is generated. The characteristics of the sample; finally, the characteristics of the sample to be detected are matched with the features in the virus signature database to perform virus detection, thus solving the problem that the prior art is based on the poor spectral characteristics of the binary content of the virus sample, and the difficulty of virus elimination Low technical issues.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.
图1是本发明实施例公开的一种病毒检测的场景架构示意图;1 is a schematic diagram of a scenario structure of a virus detection according to an embodiment of the present invention;
图2是本发明实施例提供的病毒检测方法的流程示意图;2 is a schematic flowchart of a virus detecting method according to an embodiment of the present invention;
图3是本发明实施例提供的一个函数控制流图的示意图;3 is a schematic diagram of a function control flow diagram provided by an embodiment of the present invention;
图4是本发明实施例提供的对控制流图进行编码的原理示意图;4 is a schematic diagram of a principle for encoding a control flow graph according to an embodiment of the present invention;
图5是本发明提供的病毒检测方法的另一实施例的流程示意图;FIG. 5 is a schematic flowchart diagram of another embodiment of a virus detecting method provided by the present invention; FIG.
图6是本发明实施例提供的特征计算原理示意图;6 is a schematic diagram of a feature calculation principle provided by an embodiment of the present invention;
图7是本发明实施例提供的病毒检测装置的结构示意图;7 is a schematic structural diagram of a virus detecting apparatus according to an embodiment of the present invention;
图8是本发明提供的病毒检测装置的另一实施例的结构示意图;8 is a schematic structural diagram of another embodiment of a virus detecting apparatus provided by the present invention;
图9是本发明实施例提供的特征匹配模块的结构示意图;9 is a schematic structural diagram of a feature matching module according to an embodiment of the present invention;
图10是本发明提供的病毒检测装置的另一实施例的结构示意图。FIG. 10 is a schematic structural diagram of another embodiment of a virus detecting apparatus provided by the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行描述。The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.
为了更好理解本发明实施例公开的一种病毒检测方法及装置,下面先对本发明实施例的场景架构进行描述。请参阅图1,图1是本发明实施例公开的一种病毒检测的场景架构示意图。如图1所示,首先需要生成病毒特征库,针对已知恶意样本(即病毒样本),提取其特征,然后存入病毒特征库中。那么当获取到待检测样本后,通过同样的提取特征的方式对该待检测样本进行特征提取,然后进行特征扫描,看该待检测样本的特征是否与病毒特征库中的特征匹配,若匹配,那么该待检测样本即为恶意样本,若不匹配,那么该待检测样本即为正常样本。In order to better understand a virus detection method and apparatus disclosed in the embodiments of the present invention, the scenario architecture of the embodiment of the present invention is described below. Please refer to FIG. 1. FIG. 1 is a schematic diagram of a scenario structure of a virus detection according to an embodiment of the present invention. As shown in Figure 1, the virus signature database needs to be generated first, and the known malicious samples (ie, virus samples) are extracted, and then stored in the virus signature database. Then, after the sample to be detected is acquired, the feature to be detected is extracted by the same feature extraction, and then the feature scan is performed to see whether the feature of the sample to be detected matches the feature in the virus signature database. Then, the sample to be detected is a malicious sample, and if it does not match, the sample to be detected is a normal sample.
本发明实施例中的样本可以包括可执行文件,具体地,根据操作系统不同,可执行文件格式也不同,比如Windows下面的可执行文件是exe格式、Linux下面的可执行文件是elf格式、Android安卓下面的可执行文件是dex格式、elf格式等等。The sample in the embodiment of the present invention may include an executable file. Specifically, the executable file format is different according to different operating systems. For example, the executable file under Windows is exe format, and the executable file under Linux is elf format, Android. The executable files under Android are dex format, elf format, and so on.
本发明实施例的病毒检测方法可以适用于个人计算机、智能移动终端(如移动电话、移动电脑、平板电脑)、个人数字助理(Personal Digital Assistant,PDA)、智能电视、智能手表、智能眼镜、智能手环等带有操作系统的电子设备上。The virus detecting method of the embodiment of the invention can be applied to a personal computer, a smart mobile terminal (such as a mobile phone, a mobile computer, a tablet computer), a personal digital assistant (PDA), a smart TV, a smart watch, a smart glasses, and a smart device. On an electronic device with an operating system such as a wristband.
基于图1所示的场景架构,结合图2,图2是本发明实施例提供的病毒检测方法的流程示意图,详细说明本发明实施例如何进行病毒检测,可以包括以下步骤:Based on the scenario architecture shown in FIG. 1 , in conjunction with FIG. 2 , FIG. 2 is a schematic flowchart of a virus detection method according to an embodiment of the present invention, and details how to perform virus detection in the embodiment of the present invention may include the following steps:
步骤S200:对待检测样本进行反汇编,得到反汇编后的函数对应的控制流图;Step S200: disassembling the sample to be detected, and obtaining a control flow graph corresponding to the function after disassembling;
具体地,在获取到待检测样本后,对该待检测样本进行反汇编,把目 标代码转为汇编代码,也可以说是把机器语言转换为汇编语言代码,从而得到反汇编后的函数对应的控制流图。例如,本发明其中一个实施例可以通过IDA对待检测样本进行反汇编,得到反汇编后的函数对应的控制流图,如图3示出本发明实施例提供的一个函数控制流图的示意图。Specifically, after the sample to be detected is acquired, the sample to be detected is disassembled, and the object code is converted into assembly code, or the machine language is converted into an assembly language code, thereby obtaining a function corresponding to the disassembled function. Control flow graph. For example, one embodiment of the present invention can disassemble the detected samples by IDA, and obtain a control flow graph corresponding to the disassembled function. FIG. 3 is a schematic diagram of a function control flow graph provided by an embodiment of the present invention.
需要说明的是,反编译后的函数可以使用一个有向图来描述,此编译原理即为本发明实施例中的控制流图。一个函数所有的汇编语句序列可以被分为若干基本块。基本块是一个连续的汇编语句序列,控制流从它的开始进入,并从它的末尾离开,不可能有中断或者分支(末尾除外)。控制流图的节点由基本块组成,表示计算;结点之间的边表示控制流向。每一个函数控制流图都只有一个入口(入度为0的点),但可以有多个出口(出度为0的点)。如图3示出的一个函数控制流图示例,此函数控制流图包含5个基本块,6条边。It should be noted that the decompiled function can be described by using a directed graph, which is the control flow graph in the embodiment of the present invention. A sequence of all assembly statements for a function can be divided into several basic blocks. A basic block is a contiguous sequence of assembly statements, from which the control flow enters and exits from its end, with no interruptions or branches (except at the end). The nodes of the control flow graph are composed of basic blocks, which represent calculations; the edges between the nodes represent control flow directions. Each function control flow graph has only one entry (a point with an entry degree of 0), but can have multiple exits (points with an exit degree of 0). As shown in Figure 3, a function control flow graph example, this function control flow graph contains 5 basic blocks, 6 edges.
步骤S202:对得到的控制流图进行编码,生成控制流图对应的标识;不同控制流图对应的标识不同,比如,在一具体示例中,分别对每个控制流图进行编码,生成每个控制流图对应的标识;Step S202: Encoding the obtained control flow graph to generate an identifier corresponding to the control flow graph; different identifiers corresponding to the different control flow graphs are different. For example, in a specific example, each control flow graph is separately encoded to generate each Control the identifier corresponding to the flow graph;
具体地,本发明实施例中的不同控制流图对应的标识不同,也就是说,本发明实施例不限定对控制流图进行编码的方式,只需要保证相同控制流图的编码一致,不同控制流图的编码不一致即可。例如,本发明其中一个实施例可以将每个函数对应的控制流图编码为一个三元组,将三元组作为控制流图对应的标识;其中,该三元组包括控制流图的基本块数Node、控制流图的边数Edge和控制流图中的调用数Call。那么以图3示出的一个函数控制流图示例来说,如图4示出的本发明实施例提供的对控制流图进行编码的原理示意图,编码后得出基本块数Node为5,控制流图的边数Edge为7,控制流图中的调用数Call为4(即图3中BL语句的数量),因此编码后的该控制流图对应的标识为(5,7,4)。Specifically, the identifiers corresponding to the different control flow graphs in the embodiment of the present invention are different. That is, the embodiment of the present invention does not limit the manner in which the control flow graph is encoded, and only needs to ensure that the encoding of the same control flow graph is consistent, and different control The coding of the flow graph is inconsistent. For example, one embodiment of the present invention may encode a control flow graph corresponding to each function into a triplet, and use the triplet as an identifier corresponding to the control flow graph; wherein the triplet includes a basic block of the control flow graph. Number Node, the number of edges of the control flow graph Edge and the number of calls Call in the control flow graph. Then, in a function control flow diagram example shown in FIG. 3, the schematic diagram of the principle of encoding the control flow graph provided by the embodiment of the present invention as shown in FIG. 4, after encoding, the basic block number Node is 5, and the control is performed. The edge number Edge of the flow graph is 7, and the number of calls in the control flow graph Call is 4 (that is, the number of BL statements in FIG. 3), so the corresponding identifier of the encoded control flow graph is (5, 7, 4).
步骤S204:集合控制流图对应的标识,生成所述待检测样本的特征,比如,在一具体示例中,集合所有控制流图对应的标识,生成所述待检测样本的特征;Step S204: The identifier corresponding to the control flow graph is generated, and the feature of the sample to be detected is generated. For example, in a specific example, the identifier corresponding to all the control flow graphs is collected, and the feature of the sample to be detected is generated.
具体地,待检测样本中存在多个反汇编后的函数,那么对每一函数进行编码,都得到一个控制流图对应的标识,因此集合所有控制流图对应的标识,即可生成所述待检测样本的特征,也就是说,每个待检测样本都将得到一个元素为编码的集合,这个集合即为该待检测样本的特征。Specifically, if there are multiple disassembled functions in the sample to be detected, then each function is encoded, and an identifier corresponding to the control flow graph is obtained. Therefore, the identifiers corresponding to all the control flow graphs are collected, and the to-be-generated The characteristics of the sample are detected, that is to say, each sample to be detected will get an element as a set of codes, and this set is the feature of the sample to be detected.
步骤S206:将所述待检测样本的特征与病毒特征库中的特征进行匹配;Step S206: matching the feature of the sample to be detected with the feature in the virus signature database;
具体地,病毒特征库中预先存储了至少一个已知病毒样本对应的特征,那么通过将待检测样本的特征(一个元素为编码的集合)去匹配特征库里的每一个特征(一个元素为编码的集合),也就是说,会遍历匹配该病毒特征库中的每一特征,在遍历的过程中若发现匹配成功,则可以执行步骤S208,若遍历完所有该病毒特征库中的特征都没有发现匹配成功,则可以执行步骤S210。从而完成判断该待检测样本是否为恶意样本。Specifically, the feature corresponding to the at least one known virus sample is pre-stored in the virus signature database, and then the feature of the sample to be detected (one element is a coded set) is matched to each feature in the feature library (one element is coded) The collection of the virus signature library, that is, the traversal of each feature in the virus signature database, if the matching is found to be successful during the traversal process, step S208 may be performed, if all the features in the virus signature database are traversed If the match is found to be successful, step S210 can be performed. Thereby, it is determined whether the sample to be detected is a malicious sample.
步骤S208:若所述待检测样本的特征与病毒特征库中的特征匹配成功,则确定所述待检测样本为恶意样本。Step S208: If the feature of the sample to be detected matches the feature in the virus signature database, the sample to be detected is determined to be a malicious sample.
步骤S210:若所述待检测样本的特征与所述病毒特征库中的所有特征都匹配失败,则确定所述待检测样本为正常样本。Step S210: If the feature of the to-be-detected sample fails to match all the features in the virus signature database, determine that the to-be-detected sample is a normal sample.
进一步地,如图5示出的本发明提供的病毒检测方法的另一实施例的流程示意图,再详细说明本发明实施例如何进行病毒检测,可以包括以下步骤:Further, as shown in the flow diagram of another embodiment of the virus detection method provided by the present invention, as shown in FIG. 5, a detailed description of how the virus detection is performed in the embodiment of the present invention may include the following steps:
步骤S500:对已知恶意样本进行反汇编,得到反汇编后的函数对应的控制流图;Step S500: disassembling the known malicious samples, and obtaining a control flow graph corresponding to the disassembled function;
步骤S502:对已知恶意样本对应的控制流图进行编码,生成控制流图对应的标识,比如,在一具体示例中,分别对每个控制流图进行编码,生 成每个控制流图对应的标识;Step S502: Encoding the control flow graph corresponding to the known malicious sample, and generating an identifier corresponding to the control flow graph. For example, in a specific example, each control flow graph is separately encoded to generate a corresponding control flow graph. Identification
步骤S504:集合已知恶意样本对应的控制流图的标识,生成所述已知恶意样本的特征,并将所述已知恶意样本的特征存储在病毒特征库中,比如,在一具体示例中,集合所有控制流图对应的标识,生成所述已知恶意样本的特征,并将所述已知恶意样本的特征存储在病毒特征库中;Step S504: Collecting identifiers of control flow graphs corresponding to known malicious samples, generating features of the known malicious samples, and storing the features of the known malicious samples in a virus signature database, for example, in a specific example. And collecting identifiers corresponding to all control flow graphs, generating features of the known malicious samples, and storing the features of the known malicious samples in a virus signature database;
具体地,步骤S500至步骤S504的原理可以参考上述图2实施例中的步骤S200至步骤S204,这里不再赘述。只不过针对的对象不同而已,该步骤S500至步骤S504是针对已知恶意样本,上述步骤S200至步骤S204是针对待检测样本。Specifically, the principle of step S500 to step S504 can be referred to step S200 to step S204 in the foregoing embodiment of FIG. 2, and details are not described herein again. Only the objects are different. The steps S500 to S504 are for known malicious samples, and the above steps S200 to S204 are for the samples to be detected.
步骤S506:对待检测样本进行反汇编,得到反汇编后的函数对应的控制流图;Step S506: disassembling the sample to be detected, and obtaining a control flow graph corresponding to the function after disassembling;
步骤S508:分别对每个控制流图进行编码,生成每个控制流图对应的标识;Step S508: encoding each control flow graph separately, and generating an identifier corresponding to each control flow graph;
步骤S510:集合所有控制流图对应的标识,生成所述待检测样本的特征;Step S510: Collecting identifiers corresponding to all control flow graphs, and generating features of the to-be-detected samples;
具体地,步骤S506至步骤S510的原理可以参考上述图2实施例中的步骤S200至步骤S204,这里不再赘述。Specifically, the principle of step S506 to step S510 can be referred to step S200 to step S204 in the foregoing embodiment of FIG. 2, and details are not described herein again.
步骤S512:计算所述待检测样本的特征与病毒特征库中的特征的公共编码集合;Step S512: Calculating a common coding set of the feature of the sample to be detected and the feature in the virus signature database;
具体地,本发明实施例是通过计算两个特征的公共编码集合的方式来进行匹配,如图6示出的本发明实施例提供的特征计算原理示意图,计算出特征A与特征B之间的公共编码集合。本发明实施例可以以特征A指代待检测样本的特征,特征B指代病毒特征库中的特征。Specifically, the embodiment of the present invention performs matching by calculating a common coding set of two features, as shown in the schematic diagram of the feature calculation principle provided by the embodiment of the present invention, and calculates the relationship between the feature A and the feature B. Public coding set. In the embodiment of the present invention, the feature A can be used to refer to the feature of the sample to be detected, and the feature B refers to the feature in the virus feature database.
步骤S514:根据所述待检测样本的特征和所述公共编码集合,计算所述待检测样本的特征与病毒特征库中的特征的相似度;Step S514: Calculate the similarity between the feature of the sample to be detected and the feature in the virus feature database according to the feature of the sample to be detected and the common code set;
具体地,可以将所述公共编码集合中的标识数量除以所述待检测样本的特征对应的集合中的标识数量,得到相似度;或者将所述公共编码集合中的标识数量除以目标标识数量,得到相似度;所述目标标识数量包括所述待检测样本的特征对应的集合中的标识数量与所述病毒特征库中的所述特征对应的集合中的标识数量之和。举例来说明:Specifically, the number of identifiers in the common code set may be divided by the number of identifiers in the set corresponding to the feature of the sample to be detected to obtain a similarity; or the number of identifiers in the common code set may be divided by the target identifier. And the number of the target identifiers includes a sum of the number of identifiers in the set corresponding to the feature of the sample to be detected and the number of identifiers in the set corresponding to the feature in the virus signature database. For example:
计算特征A和特征B的相似度:similarity(A,B)=A和B的公共编码数/A的编码数。The similarity between feature A and feature B is calculated: similarity(A, B) = the number of codes of A and B common code / A.
例如:E.g:
特征A={(12,3,4),(5,7,4),(22,11,2),(56,90,5),(32,54,1),(123,34,15)};Characteristics A={(12,3,4), (5,7,4), (22,11,2), (56,90,5), (32,54,1),(123,34,15 )};
特征B={(54,32,1),(56,90,5),(22,11,2),(5,7,4),(12,3,4)};Characteristic B = {(54, 32, 1), (56, 90, 5), (22, 11, 2), (5, 7, 4), (12, 3, 4)};
A和B的公共编码集合={(12,3,4),(5,7,4),(22,11,2),(56,90,5)}The common coding set of A and B = {(12, 3, 4), (5, 7, 4), (22, 11, 2), (56, 90, 5)}
similarity(A,B)=count({(12,3,4),(5,7,4),(22,11,2),(56,90,5)})/count(A)=4/6=0.67;Similarity(A,B)=count({(12,3,4),(5,7,4),(22,11,2),(56,90,5)})/count(A)=4 /6=0.67;
或者or
similarity(A,B)=count({(12,3,4),(5,7,4),(22,11,2),(56,90,5)})/count(A)+count(B)=4/11=0.36;Similarity(A,B)=count({(12,3,4),(5,7,4),(22,11,2),(56,90,5)})/count(A)+count (B) = 4/11 = 0.36;
步骤S516:判断所述相似度是否大于预设阈值;Step S516: determining whether the similarity is greater than a preset threshold;
具体地,本发明实施例预先设置了阈值(即该预设阈值),该阈值可以基于步骤S514中不同的计算算法设置不同的阈值,该预设阈值可以根据经验或者实验来选取,例如在将所述公共编码集合中的标识数量除以所述待检测样本的特征对应的集合中的标识数量的情况下,阈值可以为0.8等,在将所述公共编码集合中的标识数量除以目标标识数量,得到相似度的情况下,阈值可以为0.4等,本发明不作限制。Specifically, the embodiment of the present invention presets a threshold (that is, the preset threshold), and the threshold may set different thresholds based on different calculation algorithms in step S514, and the preset threshold may be selected according to experience or experiment, for example, In the case where the number of identifiers in the common code set is divided by the number of identifiers in the set corresponding to the feature of the sample to be detected, the threshold may be 0.8 or the like, and the number of identifiers in the common code set is divided by the target identifier. In the case where the number is similar, the threshold value may be 0.4 or the like, which is not limited in the present invention.
当该相似度大于预设阈值时,相当于待检测样本的特征与病毒特征库中的特征匹配成功,执行步骤S518,否则执行步骤S520。When the similarity is greater than the preset threshold, the feature corresponding to the sample to be detected is successfully matched with the feature in the virus signature database, and step S518 is performed; otherwise, step S520 is performed.
步骤S518:确定所述待检测样本为恶意样本;Step S518: determining that the sample to be detected is a malicious sample;
步骤S520:确定所述待检测样本为正常样本。Step S520: determining that the sample to be detected is a normal sample.
实施本发明实施例,通过对待检测样本进行反汇编后得到函数对应的控制流图,对控制流图进行编码,生成控制流图对应的标识,然后集合控制流图对应的标识,生成该待检测样本的特征;最终将该待检测样本的特征与病毒特征库中的特征进行匹配,来进行病毒检测,如此,解决了现有技术基于病毒样本二进制内容的特征码光谱性差,病毒免杀的难度低的技术问题;本发明实施例通过基于函数控制流图的结构化特征来进行病毒检测,将样本的每个函数视为一个有向图并对其编码后作为特征,从而考虑到了程序的逻辑流程,可以很好地抵抗编译器优化策略引入的干扰和病毒作者对源码的修改引入的干扰,极大的提高了特征码的广谱性和病毒免杀的难度。After the embodiment of the present invention is implemented, the control flow graph corresponding to the function is obtained by disassembling the sample to be detected, the control flow graph is encoded, the identifier corresponding to the control flow graph is generated, and then the identifier corresponding to the control flow graph is collected, and the to-be-detected is generated. The characteristics of the sample; finally, the characteristics of the sample to be detected are matched with the features in the virus signature database to perform virus detection, thus solving the problem that the prior art is based on the poor spectral characteristics of the binary content of the virus sample, and the difficulty of virus elimination Low technical problem; the embodiment of the present invention performs virus detection by controlling the structured features of the flow graph based on the function, and treats each function of the sample as a directed graph and encodes it as a feature, thereby taking into account the logic of the program. The process can well resist the interference introduced by the compiler optimization strategy and the interference introduced by the virus author's modification of the source code, which greatly improves the broad spectrum of the feature code and the difficulty of virus elimination.
为了便于更好地实施本发明实施例的上述方案,本发明还对应提供了一种病毒检测装置,下面结合附图来进行详细说明:In order to facilitate the implementation of the above solution of the embodiment of the present invention, the present invention further provides a virus detecting device, which will be described in detail below with reference to the accompanying drawings:
如图7示出的本发明实施例提供的病毒检测装置的结构示意图,病毒检测装置70可以包括:第一反汇编模块700、第一编码模块702、集合模块704、特征匹配模块706和第一确定模块708,其中,FIG. 7 is a schematic structural diagram of a virus detecting apparatus according to an embodiment of the present invention. The virus detecting apparatus 70 may include: a first disassembly module 700, a first encoding module 702, a set module 704, a feature matching module 706, and a first Determining module 708, wherein
第一反汇编模块700配置为对待检测样本进行反汇编,得到反汇编后的函数对应的控制流图;The first disassembly module 700 is configured to disassemble the sample to be detected, and obtain a control flow graph corresponding to the disassembled function;
第一编码模块702配置为对得到的控制流图进行编码,生成控制流图对应的标识;不同控制流图对应的标识不同;The first encoding module 702 is configured to encode the obtained control flow graph to generate an identifier corresponding to the control flow graph; the identifiers corresponding to the different control flow graphs are different;
集合模块704配置为集合控制流图对应的标识,生成所述待检测样本的特征;The ensemble module 704 is configured to generate an identifier corresponding to the control flow graph, and generate a feature of the sample to be detected.
特征匹配模块706配置为将所述待检测样本的特征与病毒特征库中的特征进行匹配; Feature matching module 706 is configured to match features of the sample to be detected with features in a virus signature database;
第一确定模块708配置为若所述待检测样本的特征与病毒特征库中的 特征匹配成功,则确定所述待检测样本为恶意样本。The first determining module 708 is configured to determine that the to-be-detected sample is a malicious sample if the feature of the to-be-detected sample matches the feature in the virus signature database.
具体地,如图8示出的本发明提供的病毒检测装置的另一实施例的结构示意图,病毒检测装置70包括第一反汇编模块700、第一编码模块702、集合模块704、特征匹配模块706和第一确定模块708外,还可以包括:第二反汇编模块7010、第二编码模块7012、存储模块7014和第二确定模块7016,其中,Specifically, as shown in the structural diagram of another embodiment of the virus detecting apparatus provided by the present invention, the virus detecting apparatus 70 includes a first disassembling module 700, a first encoding module 702, a set module 704, and a feature matching module. The 706 and the first determining module 708 may further include: a second disassembly module 7010, a second encoding module 7012, a storage module 7014, and a second determining module 7016, where
第二反汇编模块7010配置为在特征匹配模块706将所述待检测样本的特征与病毒特征库中的特征进行匹配之前,对已知恶意样本进行反汇编,得到反汇编后的函数对应的控制流图;The second disassembly module 7010 is configured to disassemble the known malicious samples before the feature matching module 706 matches the features of the sample to be detected with the features in the virus signature database, and obtain the corresponding control of the disassembled function. Flow chart
第二编码模块7012配置为对已知恶意样本对应的控制流图进行编码,生成控制流图对应的标识;The second encoding module 7012 is configured to encode the control flow graph corresponding to the known malicious sample, and generate an identifier corresponding to the control flow graph;
存储模块7014配置为集合已知恶意样本对应的控制流图的标识,生成所述已知恶意样本的特征,并将所述已知恶意样本的特征存储在病毒特征库中。The storage module 7014 is configured to collect an identifier of the control flow graph corresponding to the known malicious sample, generate a feature of the known malicious sample, and store the feature of the known malicious sample in the virus signature database.
第二确定模块7016配置为在特征匹配模块706将所述待检测样本的特征与病毒特征库中的特征进行匹配之后,若所述待检测样本的特征与所述病毒特征库中的所有特征都匹配失败,则确定所述待检测样本为正常样本。The second determining module 7016 is configured to: after the feature matching module 706 matches the feature of the to-be-detected sample with the feature in the virus signature database, if the feature of the sample to be detected and all features in the virus signature database are If the matching fails, it is determined that the sample to be detected is a normal sample.
需要说明的是,病毒检测装置70中的第一反汇编模块700、第一编码模块702也可以分别执行第二反汇编模块7010和第二编码模块7012的操作,也就是说,病毒检测装置70也可以不包括第二反汇编模块7010和第二编码模块7012,直接通过第一反汇编模块700来在特征匹配模块706将所述待检测样本的特征与病毒特征库中的特征进行匹配之前,对已知恶意样本进行反汇编,得到反汇编后的函数对应的控制流图,以及通过第一编码模块702分别对每个控制流图进行编码,生成每个控制流图对应的标识。It should be noted that the first disassembly module 700 and the first encoding module 702 in the virus detecting device 70 may also perform operations of the second disassembly module 7010 and the second encoding module 7012, that is, the virus detecting device 70. The second disassembly module 7010 and the second encoding module 7012 may also be included, and the first disassembly module 700 may directly match the features of the to-be-detected sample with the features in the virus signature database by the feature matching module 706. The known malicious samples are disassembled, the control flow graph corresponding to the disassembled function is obtained, and each control flow graph is encoded by the first encoding module 702 to generate an identifier corresponding to each control flow graph.
进一步地,如图9示出的本发明实施例提供的特征匹配模块的结构示 意图,特征匹配模块706可以包括:公共编码计算单元7060、相似度计算单元7062和判断单元7063,其中,Further, the structure matching module 706 may include a common coding calculation unit 7060, a similarity calculation unit 7062, and a determination unit 7063, where
公共编码计算单元7060配置为计算所述待检测样本的特征与病毒特征库中的特征的公共编码集合;The common coding calculation unit 7060 is configured to calculate a common coding set of features of the sample to be detected and features in the virus signature database;
相似度计算单元7062配置为根据所述待检测样本的特征和所述公共编码集合,计算所述待检测样本的特征与病毒特征库中的特征的相似度;The similarity calculation unit 7062 is configured to calculate, according to the feature of the to-be-detected sample and the common coding set, a similarity between a feature of the to-be-detected sample and a feature in a virus signature database;
判断单元7063配置为判断所述相似度是否大于预设阈值;The determining unit 7063 is configured to determine whether the similarity is greater than a preset threshold;
当所述相似度大于预设阈值时,确定所述待检测样本的特征与病毒特征库中的特征匹配成功。When the similarity is greater than the preset threshold, determining that the feature of the sample to be detected matches the feature in the virus signature database is successful.
相似度计算单元7062可以包括第一计算单元或第二计算单元,其中,The similarity calculation unit 7062 may include a first calculation unit or a second calculation unit, where
该第一计算单元配置为将所述公共编码集合中的标识数量除以所述待检测样本的特征对应的集合中的标识数量,得到相似度;或者,The first calculating unit is configured to divide the number of the identifiers in the common code set by the number of the identifiers in the set corresponding to the feature of the sample to be detected to obtain a similarity; or
该第二计算单元配置为将所述公共编码集合中的标识数量除以目标标识数量,得到相似度;所述目标标识数量包括所述待检测样本的特征对应的集合中的标识数量与所述病毒特征库中的所述特征对应的集合中的标识数量之和。The second calculating unit is configured to divide the number of the identifiers in the common code set by the number of target identifiers to obtain a similarity; the number of target identifiers includes the number of identifiers in the set corresponding to the feature of the sample to be detected, and the The sum of the number of identities in the set corresponding to the feature in the virus signature database.
再进一步地,本发明实施例中的第一编码模块702或第二编码模块7012可以还配置为,将每个函数对应的控制流图编码为一个三元组,将三元组作为控制流图对应的标识;其中,所述三元组包括控制流图的基本块数、控制流图的边数和控制流图中的调用数。Further, the first encoding module 702 or the second encoding module 7012 in the embodiment of the present invention may be further configured to encode the control flow graph corresponding to each function into a triplet and use the triplet as the control flow graph. Corresponding identifier; wherein the triplet includes a basic block number of the control flow graph, a number of edges of the control flow graph, and a number of calls in the control flow graph.
本实施例中,还提供了一种病毒检测装置,包括:处理器和用于存储能够在处理器上运行的计算机程序的存储器,其中,所述处理器用于运行所述计算机程序时,执行以上所述方法的步骤。具体地,请参阅图10,图10是本发明提供的病毒检测装置的另一实施例的结构示意图。其中,如图10所示,病毒检测装置100可以包括:至少一个处理器1001,例如CPU, 至少一个网络接口1004,用户接口1003,存储器1005,至少一个通信总线1002、显示屏1006以及摄像模块1007。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括触摸屏等等。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器,存储器1005包括本发明实施例中的flash。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储系统。如图10所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及病毒检测程序。In this embodiment, a virus detecting apparatus is further provided, comprising: a processor and a memory for storing a computer program capable of running on a processor, wherein the processor is configured to execute the computer program The steps of the method. Specifically, please refer to FIG. 10. FIG. 10 is a schematic structural diagram of another embodiment of a virus detecting apparatus provided by the present invention. As shown in FIG. 10, the virus detecting apparatus 100 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, a memory 1005, at least one communication bus 1002, a display screen 1006, and a camera module 1007. . Among them, the communication bus 1002 is used to implement connection communication between these components. Among them, the user interface 1003 may include a touch screen or the like. The network interface 1004 can optionally include a standard wired interface, a wireless interface (such as a WI-FI interface). The memory 1005 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory, and the memory 1005 includes a flash in the embodiment of the present invention. The memory 1005 can also optionally be at least one storage system located remotely from the aforementioned processor 1001. As shown in FIG. 10, an operating system, a network communication module, a user interface module, and a virus detection program may be included in the memory 1005 as a computer storage medium.
在图10所示的病毒检测装置100中,处理器1001可以用于调用存储器1005中存储的病毒检测程序,并执行以下操作:In the virus detecting apparatus 100 shown in FIG. 10, the processor 1001 can be used to call the virus detecting program stored in the memory 1005 and perform the following operations:
对待检测样本进行反汇编,得到反汇编后的函数对应的控制流图;Disassemble the test sample to obtain a control flow graph corresponding to the disassembled function;
对得到的控制流图进行编码,生成控制流图对应的标识;不同控制流图对应的标识不同;Encoding the obtained control flow graph to generate an identifier corresponding to the control flow graph; the identifiers corresponding to the different control flow graphs are different;
集合控制流图对应的标识,生成所述待检测样本的特征;Collecting an identifier corresponding to the control flow graph to generate a feature of the sample to be detected;
将所述待检测样本的特征与病毒特征库中的特征进行匹配;具体地,该病毒特征库可以存储在该存储器1005中;Matching the feature of the sample to be detected with a feature in the virus signature database; specifically, the virus signature database may be stored in the memory 1005;
若所述待检测样本的特征与病毒特征库中的特征匹配成功,则确定所述待检测样本为恶意样本。If the feature of the sample to be detected matches the feature in the virus signature database, the sample to be detected is determined to be a malicious sample.
具体地,处理器1001将所述待检测样本的特征与病毒特征库中的特征进行匹配之前,还可以执行:Specifically, before the processor 1001 matches the feature of the to-be-detected sample with the feature in the virus signature database, the processor 1001 may further perform:
对已知恶意样本进行反汇编,得到反汇编后的函数对应的控制流图;Disassemble the known malicious samples and obtain the control flow graph corresponding to the disassembled function;
对已知恶意样本对应的控制流图进行编码,生成控制流图对应的标识;Encoding the control flow graph corresponding to the known malicious sample, and generating an identifier corresponding to the control flow graph;
集合已知恶意样本对应的控制流图对应的标识,生成所述已知恶意样 本的特征,并将所述已知恶意样本的特征存储在病毒特征库中。The identifier corresponding to the control flow graph corresponding to the known malicious sample is collected, the feature of the known malicious sample is generated, and the feature of the known malicious sample is stored in the virus signature database.
具体地,处理器1001将所述待检测样本的特征与病毒特征库中的特征进行匹配,包括:Specifically, the processor 1001 matches the feature of the sample to be detected with the feature in the virus signature database, including:
计算所述待检测样本的特征与病毒特征库中的特征的公共编码集合;Calculating a common coding set of features of the sample to be detected and features in the virus signature database;
根据所述待检测样本的特征和所述公共编码集合,计算所述待检测样本的特征与病毒特征库中的特征的相似度;Calculating a similarity between the feature of the sample to be detected and the feature in the virus feature database according to the feature of the sample to be detected and the common code set;
判断所述相似度是否大于预设阈值;Determining whether the similarity is greater than a preset threshold;
当所述相似度大于预设阈值时,确定所述待检测样本的特征与病毒特征库中的特征匹配成功。When the similarity is greater than the preset threshold, determining that the feature of the sample to be detected matches the feature in the virus signature database is successful.
具体地,处理器1001对已知恶意样本对应的控制流图进行编码,生成控制流图对应的标识,包括:Specifically, the processor 1001 encodes the control flow graph corresponding to the known malicious sample, and generates an identifier corresponding to the control flow graph, including:
将每个函数对应的控制流图编码为一个三元组,将三元组作为控制流图对应的标识,其中,所述三元组包括控制流图的基本块数、控制流图的边数和控制流图中的调用数。The control flow graph corresponding to each function is encoded into a triplet, and the triplet is used as an identifier corresponding to the control flow graph, wherein the triplet includes the basic block number of the control flow graph and the number of edges of the control flow graph And the number of calls in the control flow graph.
具体地,处理器1001将所述待检测样本的特征与病毒特征库中的特征进行匹配之后,还可以执行:Specifically, after the processor 1001 matches the feature of the to-be-detected sample with the feature in the virus signature database, the processor 1001 may further perform:
若所述待检测样本的特征与所述病毒特征库中的所有特征都匹配失败,则确定所述待检测样本为正常样本。If the feature of the sample to be detected fails to match all the features in the virus signature database, it is determined that the sample to be detected is a normal sample.
具体地,处理器1001根据所述待检测样本的特征和所述公共编码集合,计算所述待检测样本的特征与病毒特征库中的所述特征的相似度,可以包括:Specifically, the processor 1001 calculates, according to the feature of the to-be-detected sample and the common coding set, the similarity between the feature of the to-be-detected sample and the feature in the virus signature database, which may include:
将所述公共编码集合中的标识数量除以所述待检测样本的特征对应的集合中的标识数量,得到相似度;或者,Comparing the number of identifiers in the common code set by the number of identifiers in the set corresponding to the feature of the sample to be detected, to obtain a similarity; or
将所述公共编码集合中的标识数量除以目标标识数量,得到相似度;所述目标标识数量包括所述待检测样本的特征对应的集合中的标识数量与 所述病毒特征库中的所述特征对应的集合中的标识数量之和。And dividing the number of the identifiers in the common code set by the number of target identifiers to obtain a similarity; the number of target identifiers includes the number of identifiers in the set corresponding to the feature of the sample to be detected, and the number in the virus signature database The sum of the number of identities in the set corresponding to the feature.
需要说明的是,本发明实施例中的病毒检测装置70或病毒检测装置100中各模块的功能可对应参考上述各方法实施例中图1至图6任意实施例的具体实现方式,比如,所述第一反汇编模块700、第一编码模块702、集合模块704、特征匹配模块706、第一确定模块708、第二反汇编模块7010、第二编码模块7012、存储模块7014以及第二确定模块7016均可通过以上所述的处理器1001来实现。It should be noted that the functions of the modules in the virus detecting device 70 or the virus detecting device 100 in the embodiments of the present invention may be corresponding to the specific implementation manners of any of the foregoing embodiments in the foregoing method embodiments, for example, The first disassembly module 700, the first encoding module 702, the aggregation module 704, the feature matching module 706, the first determining module 708, the second disassembling module 7010, the second encoding module 7012, the storage module 7014, and the second determining module 7016 can be implemented by the processor 1001 described above.
实施本发明实施例,通过对待检测样本进行反汇编后得到函数对应的控制流图,分别对每个控制流图进行编码,生成每个控制流图对应的标识,然后集合所有控制流图对应的标识,生成该待检测样本的特征;最终将该待检测样本的特征与病毒特征库中的特征进行匹配,来进行病毒检测。解决了现有技术基于病毒样本二进制内容的特征码光谱性差,病毒免杀的难度低的技术问题;本发明实施例通过基于函数控制流图的结构化特征来进行病毒检测,将样本的每个函数视为一个有向图并对其编码后作为特征,从而考虑到了程序的逻辑流程,可以很好地抵抗编译器优化策略引入的干扰和病毒作者对源码的修改引入的干扰,极大的提高了特征码的广谱性和病毒免杀的难度。Embodiments of the present invention implement a control flow graph corresponding to a function by disassembling a sample to be detected, respectively encoding each control flow graph, generating an identifier corresponding to each control flow graph, and then collecting all control flow graphs corresponding to each Identifying, generating a feature of the sample to be detected; and finally matching the feature of the sample to be detected with a feature in the virus signature database for virus detection. The prior art solves the technical problem that the feature code of the binary content of the virus sample is poor in spectral characteristics, and the virus is not difficult to kill. The embodiment of the present invention performs virus detection by controlling the structured feature of the flow graph based on the function, and each of the samples is The function is treated as a directed graph and encoded as a feature, thus taking into account the logic flow of the program, which can well resist the interference introduced by the compiler optimization strategy and the interference introduced by the virus author's modification of the source code, greatly improving The broad spectrum of signature codes and the difficulty of virus elimination.
本实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现以上所述方法的步骤。The embodiment further provides a computer readable storage medium having stored thereon a computer program, wherein the computer program is executed by the processor to implement the steps of the method described above.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。One of ordinary skill in the art can understand that all or part of the process of implementing the foregoing embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
以上所揭露的仅为本发明较佳实施例而已,当然不能以此来限定本发明之权利范围,因此依本发明权利要求所作的等同变化,仍属本发明所涵盖的范围。The above is only the preferred embodiment of the present invention, and the scope of the present invention is not limited thereto, and thus equivalent changes made in the claims of the present invention are still within the scope of the present invention.
工业实用性Industrial applicability
本发明实施例通过基于函数控制流图的结构化特征来进行病毒检测,将样本的每个函数视为一个有向图并对其编码后作为特征,从而考虑到了程序的逻辑流程,可以很好地抵抗编译器优化策略引入的干扰和病毒作者对源码的修改引入的干扰,极大的提高了特征码的广谱性和病毒免杀的难度。The embodiment of the invention performs virus detection by controlling the structured features of the flow graph based on the function, and each function of the sample is regarded as a directed graph and encoded as a feature, thereby taking into account the logic flow of the program, which can be very good. The interference introduced by the ground resistance compiler optimization strategy and the interference introduced by the virus author's modification of the source code greatly improve the broad spectrum of the feature code and the difficulty of virus elimination.

Claims (15)

  1. 一种病毒检测方法,包括:A virus detection method comprising:
    对待检测样本进行反汇编,得到反汇编后的函数对应的控制流图;Disassemble the test sample to obtain a control flow graph corresponding to the disassembled function;
    对得到的控制流图进行编码,生成控制流图对应的标识;不同控制流图对应的标识不同;Encoding the obtained control flow graph to generate an identifier corresponding to the control flow graph; the identifiers corresponding to the different control flow graphs are different;
    集合控制流图对应的标识,生成所述待检测样本的特征;Collecting an identifier corresponding to the control flow graph to generate a feature of the sample to be detected;
    将所述待检测样本的特征与病毒特征库中的特征进行匹配;Matching characteristics of the sample to be detected with features in a virus signature database;
    若所述待检测样本的特征与病毒特征库中的特征匹配成功,则确定所述待检测样本为恶意样本。If the feature of the sample to be detected matches the feature in the virus signature database, the sample to be detected is determined to be a malicious sample.
  2. 如权利要求1所述的方法,其中,所述将所述待检测样本的特征与病毒特征库中的特征进行匹配之前,还包括:The method of claim 1, wherein before the matching the feature of the sample to be detected with the feature in the virus signature database, the method further comprises:
    对已知恶意样本进行反汇编,得到反汇编后的函数对应的控制流图;Disassemble the known malicious samples and obtain the control flow graph corresponding to the disassembled function;
    对已知恶意样本对应的控制流图进行编码,生成控制流图对应的标识;Encoding the control flow graph corresponding to the known malicious sample, and generating an identifier corresponding to the control flow graph;
    集合已知恶意样本对应的控制流图的标识,生成所述已知恶意样本的特征,并将所述已知恶意样本的特征存储在病毒特征库中。The identifiers of the control flow graphs corresponding to the known malicious samples are collected, the features of the known malicious samples are generated, and the features of the known malicious samples are stored in the virus signature database.
  3. 如权利要求2所述的方法,其中,所述对已知恶意样本对应的控制流图进行编码,生成控制流图对应的标识,包括:The method of claim 2, wherein the encoding the control flow graph corresponding to the known malicious sample to generate the identifier corresponding to the control flow graph comprises:
    将每个函数对应的控制流图编码为一个三元组,将三元组作为控制流图对应的标识,其中,所述三元组包括控制流图的基本块数、控制流图的边数和控制流图中的调用数。The control flow graph corresponding to each function is encoded into a triplet, and the triplet is used as an identifier corresponding to the control flow graph, wherein the triplet includes the basic block number of the control flow graph and the number of edges of the control flow graph And the number of calls in the control flow graph.
  4. 如权利要求1所述的方法,其中,所述将所述待检测样本的特征与病毒特征库中的特征进行匹配之后,还包括:The method of claim 1, wherein after the matching the feature of the sample to be detected with the feature in the virus signature database, the method further comprises:
    若所述待检测样本的特征与所述病毒特征库中的所有特征都匹配失败,则确定所述待检测样本为正常样本。If the feature of the sample to be detected fails to match all the features in the virus signature database, it is determined that the sample to be detected is a normal sample.
  5. 如权利要求1-4任一项所述的方法,其中,所述将所述待检测样本的特征与病毒特征库中的特征进行匹配,包括:The method according to any one of claims 1 to 4, wherein the matching the feature of the sample to be detected with the feature in the virus signature database comprises:
    计算所述待检测样本的特征与病毒特征库中的其中一个特征的公共编码集合;Calculating a common coding set of the feature of the sample to be detected and one of the features in the virus signature database;
    根据所述待检测样本的特征和所述公共编码集合,计算所述待检测样本的特征与病毒特征库中的所述特征的相似度;Calculating a similarity between the feature of the sample to be detected and the feature in the virus feature database according to the feature of the sample to be detected and the common code set;
    判断所述相似度是否大于预设阈值;Determining whether the similarity is greater than a preset threshold;
    当所述相似度大于预设阈值时,确定所述待检测样本的特征与病毒特征库中的所述特征匹配成功。When the similarity is greater than the preset threshold, determining that the feature of the sample to be detected matches the feature in the virus signature database is successful.
  6. 如权利要求5所述的方法,其中,所述根据所述待检测样本的特征和所述公共编码集合,计算所述待检测样本的特征与病毒特征库中的所述特征的相似度,包括:The method according to claim 5, wherein the calculating the similarity between the feature of the sample to be detected and the feature in the virus feature library according to the feature of the sample to be detected and the common code set, including :
    将所述公共编码集合中的标识数量除以所述待检测样本的特征对应的集合中的标识数量,得到相似度;或者,Comparing the number of identifiers in the common code set by the number of identifiers in the set corresponding to the feature of the sample to be detected, to obtain a similarity; or
    将所述公共编码集合中的标识数量除以目标标识数量,得到相似度;所述目标标识数量包括所述待检测样本的特征对应的集合中的标识数量与所述病毒特征库中的所述特征对应的集合中的标识数量之和。And dividing the number of the identifiers in the common code set by the number of target identifiers to obtain a similarity; the number of target identifiers includes the number of identifiers in the set corresponding to the feature of the sample to be detected, and the number in the virus signature database The sum of the number of identities in the set corresponding to the feature.
  7. 一种病毒检测装置,包括:A virus detecting device comprising:
    第一反汇编模块,配置为对待检测样本进行反汇编,得到反汇编后的函数对应的控制流图;a first disassembly module configured to disassemble the sample to be detected, and obtain a control flow graph corresponding to the disassembled function;
    第一编码模块,配置为对得到的控制流图进行编码,生成控制流图对应的标识;不同控制流图对应的标识不同;The first encoding module is configured to encode the obtained control flow graph to generate an identifier corresponding to the control flow graph; the identifiers corresponding to the different control flow graphs are different;
    集合模块,配置为集合控制流图对应的标识,生成所述待检测样本的特征;The ensemble module is configured to: collect an identifier corresponding to the control flow graph, and generate a feature of the sample to be detected;
    特征匹配模块,配置为将所述待检测样本的特征与病毒特征库中的特 征进行匹配;a feature matching module configured to match characteristics of the sample to be detected with features in a virus signature database;
    第一确定模块,配置为若所述待检测样本的特征与病毒特征库中的特征匹配成功,则确定所述待检测样本为恶意样本。The first determining module is configured to determine that the to-be-detected sample is a malicious sample if the feature of the to-be-detected sample matches the feature in the virus signature database.
  8. 如权利要求7所述的装置,其中,还包括:The apparatus of claim 7 further comprising:
    第二反汇编模块,配置为在所述特征匹配模块将所述待检测样本的特征与病毒特征库中的特征进行匹配之前,对已知恶意样本进行反汇编,得到反汇编后的函数对应的控制流图;a second disassembly module configured to disassemble the known malicious samples before the feature matching module matches the features of the sample to be detected with the features in the virus signature database, and obtain a function corresponding to the disassembled function Control flow graph;
    第二编码模块,配置为对已知恶意样本对应的控制流图进行编码,生成控制流图对应的标识;The second encoding module is configured to encode the control flow graph corresponding to the known malicious sample, and generate an identifier corresponding to the control flow graph;
    存储模块,配置为集合已知恶意样本对应的控制流图的标识,生成所述已知恶意样本的特征,并将所述已知恶意样本的特征存储在病毒特征库中。And a storage module configured to collect an identifier of the control flow graph corresponding to the known malicious sample, generate a feature of the known malicious sample, and store the feature of the known malicious sample in the virus signature database.
  9. 如权利要求8所述的装置,其中,所述第一编码模块还配置为,将每个函数对应的控制流图编码为一个三元组,将三元组作为控制流图对应的标识;其中,所述三元组包括控制流图的基本块数、控制流图的边数和控制流图中的调用数。The apparatus according to claim 8, wherein the first encoding module is further configured to encode a control flow graph corresponding to each function into a triplet, and use the triplet as an identifier corresponding to the control flow graph; The triplet includes the basic block number of the control flow graph, the number of edges of the control flow graph, and the number of calls in the control flow graph.
  10. 如权利要求7所述的装置,其中,还包括:The apparatus of claim 7 further comprising:
    第二确定模块,配置为在所述特征匹配模块将所述待检测样本的特征与病毒特征库中的特征进行匹配之后,若所述待检测样本的特征与所述病毒特征库中的所有特征都匹配失败,则确定所述待检测样本为正常样本。a second determining module, configured to: after the feature matching module matches the feature of the to-be-detected sample with the feature in the virus signature database, if the feature of the sample to be detected and all the features in the virus signature database If the matching fails, it is determined that the sample to be detected is a normal sample.
  11. 如权利要求7-10任一项所述的装置,其中,所述特征匹配模块包括:The apparatus of any of claims 7-10, wherein the feature matching module comprises:
    公共编码计算单元,配置为计算所述待检测样本的特征与病毒特征库中的特征的公共编码集合;a common coding calculation unit configured to calculate a common coding set of the feature of the sample to be detected and a feature in the virus signature database;
    相似度计算单元,配置为根据所述待检测样本的特征和所述公共编码 集合,计算所述待检测样本的特征与病毒特征库中的特征的相似度;a similarity calculation unit, configured to calculate, according to the feature of the to-be-detected sample and the common coding set, a similarity between a feature of the to-be-detected sample and a feature in a virus signature database;
    判断单元,配置为判断所述相似度是否大于预设阈值;a determining unit, configured to determine whether the similarity is greater than a preset threshold;
    当所述相似度大于预设阈值时,确定所述待检测样本的特征与病毒特征库中的特征匹配成功。When the similarity is greater than the preset threshold, determining that the feature of the sample to be detected matches the feature in the virus signature database is successful.
  12. 如权利要求11所述的装置,其中,所述相似度计算单元包括:The apparatus of claim 11, wherein the similarity calculation unit comprises:
    第一计算单元,配置为将所述公共编码集合中的标识数量除以所述待检测样本的特征对应的集合中的标识数量,得到相似度;或者,a first calculating unit, configured to divide the number of identifiers in the common code set by the number of identifiers in the set corresponding to the feature of the sample to be detected, to obtain a similarity; or
    第二计算单元,配置为将所述公共编码集合中的标识数量除以目标标识数量,得到相似度;所述目标标识数量包括所述待检测样本的特征对应的集合中的标识数量与所述病毒特征库中的所述特征对应的集合中的标识数量之和。a second calculating unit, configured to divide the number of the identifiers in the common code set by the number of target identifiers to obtain a similarity; the number of target identifiers includes the number of identifiers in the set corresponding to the feature of the sample to be detected, and the The sum of the number of identities in the set corresponding to the feature in the virus signature database.
  13. 一种病毒检测方法,包括:A virus detection method comprising:
    病毒检测装置对待检测样本进行反汇编,得到反汇编后的函数对应的控制流图;The virus detecting device disassembles the detected sample, and obtains a control flow graph corresponding to the disassembled function;
    所述病毒检测装置对得到的控制流图进行编码,生成控制流图对应的标识;不同控制流图对应的标识不同;The virus detecting device encodes the obtained control flow graph to generate an identifier corresponding to the control flow graph; the identifiers corresponding to the different control flow graphs are different;
    所述病毒检测装置集合控制流图对应的标识,生成所述待检测样本的特征;The virus detecting device collects an identifier corresponding to the flow graph, and generates a feature of the sample to be detected;
    所述病毒检测装置将所述待检测样本的特征与病毒特征库中的特征进行匹配;The virus detecting device matches characteristics of the sample to be detected with features in a virus signature database;
    若所述待检测样本的特征与病毒特征库中的特征匹配成功,则所述病毒检测装置确定所述待检测样本为恶意样本。If the feature of the sample to be detected matches the feature in the virus signature database, the virus detecting device determines that the sample to be detected is a malicious sample.
  14. 一种病毒检测装置,包括:处理器和用于存储能够在处理器上运行的计算机程序的存储器,其中,所述处理器用于运行所述计算机程序时,执行权利要求1至6所述方法。A virus detecting apparatus comprising: a processor and a memory for storing a computer program executable on a processor, wherein the processor is configured to execute the method of claims 1 to 6 when the computer program is run.
  15. 一种计算机可读存储介质,存储有计算机程序,其中,该计算机程序被处理器执行时实现权利要求1至6所述方法。A computer readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the method of claims 1 to 6.
PCT/CN2017/118195 2016-12-30 2017-12-25 Method and device for detecting virus, and storage medium WO2018121464A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611257005.8 2016-12-30
CN201611257005.8A CN106709350B (en) 2016-12-30 2016-12-30 Virus detection method and device

Publications (1)

Publication Number Publication Date
WO2018121464A1 true WO2018121464A1 (en) 2018-07-05

Family

ID=58906816

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/118195 WO2018121464A1 (en) 2016-12-30 2017-12-25 Method and device for detecting virus, and storage medium

Country Status (2)

Country Link
CN (1) CN106709350B (en)
WO (1) WO2018121464A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709350B (en) * 2016-12-30 2020-01-14 腾讯科技(深圳)有限公司 Virus detection method and device
CN109117635B (en) * 2018-09-06 2023-07-04 腾讯科技(深圳)有限公司 Virus detection method and device for application program, computer equipment and storage medium
CN113901457A (en) * 2020-06-22 2022-01-07 深信服科技股份有限公司 Method, system, equipment and readable storage medium for identifying malicious software
CN113360910A (en) * 2021-06-30 2021-09-07 中国农业银行股份有限公司 Malicious application detection method and device, server and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104021346A (en) * 2014-06-06 2014-09-03 东南大学 Method for detecting Android malicious software based on program flow chart
CN104318161A (en) * 2014-11-18 2015-01-28 北京奇虎科技有限公司 Virus detection method and device for Android samples
CN106162648A (en) * 2015-04-17 2016-11-23 上海墨贝网络科技有限公司 A kind of behavioral value method, server and system applying installation kit
CN106709350A (en) * 2016-12-30 2017-05-24 腾讯科技(深圳)有限公司 Virus detection method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100504903C (en) * 2007-09-18 2009-06-24 北京大学 Malevolence code automatic recognition method
CN101359352B (en) * 2008-09-25 2010-08-25 中国人民解放军信息工程大学 API use action discovering and malice deciding method after confusion of multi-tier synergism
KR101251001B1 (en) * 2010-12-20 2013-04-04 한국인터넷진흥원 System for analysing automatic of malicious code using cfg analysis algorithm and method therefor
CN103793650A (en) * 2013-12-02 2014-05-14 北京邮电大学 Static analysis method and static analysis device for Android application program
CN104091121B (en) * 2014-06-12 2017-07-18 上海交通大学 The detection, excision and the method recovered of the malicious code of bag Malware are beaten again Android
CN104077527B (en) * 2014-06-20 2017-12-19 珠海市君天电子科技有限公司 The generation method and device and method for detecting virus and device of Viral diagnosis machine
CN104700033B (en) * 2015-03-30 2019-01-29 北京瑞星网安技术股份有限公司 The method and device of viral diagnosis
CN105046152B (en) * 2015-07-24 2018-01-26 四川大学 Malware detection method based on function call graph fingerprint

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104021346A (en) * 2014-06-06 2014-09-03 东南大学 Method for detecting Android malicious software based on program flow chart
CN104318161A (en) * 2014-11-18 2015-01-28 北京奇虎科技有限公司 Virus detection method and device for Android samples
CN106162648A (en) * 2015-04-17 2016-11-23 上海墨贝网络科技有限公司 A kind of behavioral value method, server and system applying installation kit
CN106709350A (en) * 2016-12-30 2017-05-24 腾讯科技(深圳)有限公司 Virus detection method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIANMIN ET AL., TECHNOLOGY OF COMPILING AND DECOMPILING, 30 April 2016 (2016-04-30), pages 362 *

Also Published As

Publication number Publication date
CN106709350A (en) 2017-05-24
CN106709350B (en) 2020-01-14

Similar Documents

Publication Publication Date Title
US10114946B2 (en) Method and device for detecting malicious code in an intelligent terminal
WO2015101097A1 (en) Method and device for feature extraction
WO2018121464A1 (en) Method and device for detecting virus, and storage medium
US10339315B2 (en) Apparatus and method for detecting malicious mobile app
CN106503558B (en) A kind of Android malicious code detecting method based on community structure analysis
WO2017190620A1 (en) Virus detection method, terminal and server
KR102317833B1 (en) method for machine LEARNING of MALWARE DETECTING MODEL AND METHOD FOR detecting Malware USING THE SAME
US10747880B2 (en) System and method for identifying and comparing code by semantic abstractions
US10032021B2 (en) Method for detecting a threat and threat detecting apparatus
CN112005532B (en) Method, system and storage medium for classifying executable files
Li et al. CNN-based malware variants detection method for internet of things
US10678914B2 (en) Virus program detection method, terminal, and computer readable storage medium
RU2722692C1 (en) Method and system for detecting malicious files in a non-isolated medium
US11916937B2 (en) System and method for information gain for malware detection
WO2015135286A1 (en) Method and device for extracting pe file feature
CN109977976B (en) Executable file similarity detection method and device and computer equipment
CN109977675B (en) Open source software identification method and device
Akram et al. DroidMD: an efficient and scalable android malware detection approach at source code level
EP3087527B1 (en) System and method of detecting malicious multimedia files
CN109145589B (en) Application program acquisition method and device
US20210336973A1 (en) Method and system for detecting malicious or suspicious activity by baselining host behavior
CN105138918B (en) A kind of recognition methods of secure file and device
CN108491718B (en) Method and device for realizing information classification
CN107844702B (en) Website trojan backdoor detection method and device based on cloud protection environment
CN106909839B (en) Method and device for extracting sample code features

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17887319

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17887319

Country of ref document: EP

Kind code of ref document: A1