CN111310179A - Method and device for analyzing computer virus variants and computer equipment - Google Patents

Method and device for analyzing computer virus variants and computer equipment Download PDF

Info

Publication number
CN111310179A
CN111310179A CN202010074842.7A CN202010074842A CN111310179A CN 111310179 A CN111310179 A CN 111310179A CN 202010074842 A CN202010074842 A CN 202010074842A CN 111310179 A CN111310179 A CN 111310179A
Authority
CN
China
Prior art keywords
sample
behavior
variant
newly added
added sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010074842.7A
Other languages
Chinese (zh)
Other versions
CN111310179B (en
Inventor
谭昱
彭宁
沈江波
曹有理
齐文杰
刘敏
程虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010074842.7A priority Critical patent/CN111310179B/en
Publication of CN111310179A publication Critical patent/CN111310179A/en
Application granted granted Critical
Publication of CN111310179B publication Critical patent/CN111310179B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/561Virus type analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Virology (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present application relates to a method, apparatus, computer-readable storage medium and computer device for analyzing variants of a computer virus, the method comprising: acquiring a newly added sample, and identifying a known sample having an association relation with the newly added sample; filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample; the exclusive characteristic set of the newly added sample comprises behavior categories; acquiring a variant sample specific feature set corresponding to the known sample; matching the newly added sample-specific feature set with the variant sample-specific feature set according to the behavior category; and when the similarity obtained by matching exceeds a threshold value, marking the newly added sample as a variant sample of the known sample. The scheme provided by the application can improve the accuracy and the identification efficiency of virus variant identification.

Description

Method and device for analyzing computer virus variants and computer equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for analyzing a computer virus variant, a computer-readable storage medium, and a computer device.
Background
With the development of computer technology, computer viruses have emerged. Computer viruses are code inserted into a computing program that destroys computer functions or data, affecting the operation of the computer. Computer viruses can generate virus variants to avoid searching and killing, and are continuously destroyed.
In the conventional method, when analyzing whether a newly added virus sample is a variant of a known virus sample, only a suspected conclusion is usually given, and then a manual experience is needed to obtain a final result. This results in inefficient identification of variants of computer viruses.
Disclosure of Invention
In view of the above, there is a need to provide a method, an apparatus, a computer-readable storage medium, and a computer device for analyzing a computer virus variant, aiming at the technical problem of low efficiency of identifying the computer virus variant.
A method of analyzing variants of a computer virus, comprising:
acquiring a newly added sample, and identifying a known sample having an association relation with the newly added sample;
filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample; the exclusive characteristic set of the newly added sample comprises behavior categories;
acquiring a variant sample specific feature set corresponding to the known sample;
matching the newly added sample-specific feature set with the variant sample-specific feature set according to the behavior category;
and when the similarity obtained by matching exceeds a threshold value, marking the newly added sample as a variant sample of the known sample.
An apparatus for analyzing variants of a computer virus, said apparatus comprising:
the association identification module is used for acquiring a newly added sample and identifying a known sample which has an association relation with the newly added sample;
the filtering module is used for filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample; the exclusive characteristic set of the newly added sample comprises behavior categories;
the matching module is used for acquiring a variant sample exclusive feature set corresponding to the known sample; matching the newly added sample-specific feature set with the variant sample-specific feature set according to the behavior category; and when the similarity obtained by matching exceeds a threshold value, marking the newly added sample as a variant sample of the known sample.
A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
acquiring a newly added sample, and identifying a known sample having an association relation with the newly added sample;
filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample; the exclusive characteristic set of the newly added sample comprises behavior categories;
acquiring a variant sample specific feature set corresponding to the known sample;
matching the newly added sample-specific feature set with the variant sample-specific feature set according to the behavior category;
and when the similarity obtained by matching exceeds a threshold value, marking the newly added sample as a variant sample of the known sample.
A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of:
acquiring a newly added sample, and identifying a known sample having an association relation with the newly added sample;
filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample; the exclusive characteristic set of the newly added sample comprises behavior categories;
acquiring a variant sample specific feature set corresponding to the known sample;
matching the newly added sample-specific feature set with the variant sample-specific feature set according to the behavior category;
and when the similarity obtained by matching exceeds a threshold value, marking the newly added sample as a variant sample of the known sample.
According to the method, the device, the computer readable storage medium and the computer equipment for analyzing the computer virus varieties, the newly added samples can be marked as suspected varieties by identifying the known samples which are in the association relation with the newly added samples. And filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample. And after matching the exclusive feature set of the newly added sample with the exclusive feature set of the variant sample corresponding to the known sample according to the behavior category, when the similarity obtained by matching exceeds a threshold value, determining the newly added sample as the variant sample of the known sample. Therefore, when suspected variants are found, the matching analysis of the virus variant conditions can be rapidly and accurately carried out, and the accuracy and the identification efficiency of virus variant identification are effectively improved.
Drawings
FIG. 1 is a diagram of an exemplary environment for a method for analyzing variants of a computer virus;
FIG. 2 is a schematic flow chart diagram illustrating a method for analyzing variants of a computer virus, according to one embodiment;
FIG. 3 is a diagram of a new sample behavior log in one embodiment;
FIG. 4 is a schematic flow chart of a method for analyzing variants of a computer virus according to another embodiment;
FIG. 5 is a schematic flow chart showing a method for analyzing variants of a computer virus according to still another embodiment;
FIG. 6 is a schematic flow chart diagram of a method for analyzing variants of a computer virus according to yet another embodiment;
FIG. 7 is a block diagram of an apparatus for analyzing a variant of a computer virus according to an embodiment;
FIG. 8 is a block diagram showing the structure of an apparatus for analyzing a variant of a computer virus according to another embodiment;
FIG. 9 is a block diagram showing the construction of an apparatus for analyzing a variant of a computer virus according to still another embodiment;
FIG. 10 is a block diagram showing a configuration of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Fig. 1 is a diagram illustrating an application environment of a method for analyzing a computer virus variation according to an embodiment, which includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers. When the terminal 110 detects the suspected virus sample, the suspected virus sample is uploaded to the server 120. The server 120 uses the suspected virus sample as a new sample, and identifies a known virus sample (known sample for short) having an association relationship with the new sample. The server 120 filters the behavior log of the newly added sample to generate a special feature set of the newly added sample. The server 120 obtains a set of variant sample-specific features corresponding to known samples. The server 120 matches the behavior class of the newly added sample with the set of variant sample-specific features. And when the similarity obtained by matching exceeds a threshold value, marking the newly added sample as a variety of the known sample. Therefore, the matching analysis of the virus variant conditions can be rapidly and accurately carried out, and the accuracy and the identification efficiency of virus variant identification are effectively improved.
In one embodiment, as shown in FIG. 2, a method for analyzing a variant of a computer virus is provided. The embodiment is mainly illustrated by applying the method to the server 120 in fig. 1. Referring to fig. 2, the method for analyzing variants of a computer virus specifically includes the following steps:
s202, acquiring a newly added sample, and identifying a known sample having an association relation with the newly added sample.
The terminal runs an application program, and a suspected virus sample can be detected through the application program. The suspected virus sample may also be referred to as a new virus sample (abbreviated as new sample). And the terminal uploads the newly added sample to the server. The server stores a plurality of virus family corresponding known samples in advance. Plural means two or more. The virus family may be divided according to virus-specific algorithms. For example, the virus family includes macroviruses, CIH viruses, helminth viruses, trojan horse viruses, and the like. The known samples may also be referred to as members of the virus family. The server can identify whether the known sample which has the association relation with the new sample exists in the virus family in various ways. The server can identify the known sample which has an association relation with the newly added sample by using the behavior difference between the newly added sample and the known sample. And if a known sample which is associated with the newly added sample exists, marking the newly added sample as a suspected variant of the known sample.
S204, filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample; the newly added sample specific feature set comprises behavior categories.
And the server acquires a behavior log of the newly added sample, wherein the behavior log can be a dynamic behavior log. And the server filters the behavior log of the newly added sample, and filters out public behavior information, general behavior information and behavior information existing in the white sample. The common behavior information includes behavior information that cannot characterize the sample, for example, behavior information for shelling the computer program. The general behavior information refers to a large amount of triggered behavior information. Such as compiler general behavior information, etc. The white sample refers to an application program allowed to run in the terminal, including a social communication application program, a browser application program, an online shopping application program, and the like. And filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample.
The exclusive feature set of the newly added sample comprises exclusive behaviors and corresponding behavior parameters. The exclusive behavior may also be referred to as a behavior class. For example, the behavior categories include "CreateFile", "createpprocesses", "CreateCmdline", "DnsQuery", and the like. Each behavior category has corresponding behavior parameters, and the behavior parameters may be one or two or more. For example, the behavior class "DnsQuery" corresponds to one behavior parameter, i.e., ii. The behavior category "CreateFile" corresponds to two behavior parameters, Filename and FileMd 5.
S206, acquiring a variant sample exclusive feature set corresponding to the known sample.
And S208, matching the behavior type of the newly added sample with the exclusive feature set of the variant sample.
And S210, when the similarity obtained by matching exceeds a threshold value, marking the newly added sample as a variety of the known sample.
A behavior feature library of the virus is established in the server in advance. The behavior profile library may include a set of variant sample-specific profiles for all virus families. The variant sample-specific feature set includes variant-specific behaviors and corresponding behavior parameters. Variant sample-specific behavior may also be referred to as a behavior class. And the server performs similarity matching on the behavior parameters of the behavior categories of the newly added samples and the behavior parameters of the behavior categories of the variant samples according to the behavior categories, and calculates the similarity between the newly added samples and the variant samples according to the similarity corresponding to all the behavior categories. When the similarity between the newly added sample and the variant sample exceeds a threshold value (is greater than or equal to the threshold value), the newly added sample is marked as the variant sample of the known sample.
In this embodiment, the newly added sample may be marked as a suspected variety by identifying a known sample associated with the newly added sample. And filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample. And after matching the exclusive feature set of the newly added sample with the exclusive feature set of the variant sample corresponding to the known sample according to the behavior category, when the similarity obtained by matching exceeds a threshold value, determining the newly added sample as the variant sample of the known sample. Therefore, when suspected variants are found, the matching analysis of the virus variant conditions can be rapidly and accurately carried out, and the accuracy and the identification efficiency of virus variant identification are effectively improved.
In one embodiment, identifying the known samples associated with the newly added sample comprises: acquiring a behavior sequence of a known sample in a virus family; generating a corresponding behavior sequence by using the behavior log of the newly added sample; and identifying the known sample which has an association relation with the newly added sample by comparing the behavior sequence of the newly added sample with the behavior sequence of the known sample in the virus family.
Because there are many members in the virus family, in order to quickly and accurately identify whether the newly added sample is a variant of a member of a certain virus family, the server may first identify whether the newly added sample is a suspected variant of a known sample. That is, the server needs to identify whether there is a known sample in the virus family that has an association with the newly added sample. If so, the new sample is marked as a suspected variant of the known sample.
The service can obtain the dynamic behavior by using the behavior log of the newly added sample to generate a corresponding behavior sequence. The server may generate corresponding behavior sequences for known samples in each virus family in advance. And the server compares the behavior sequence of the newly added sample with the behavior sequence of the known sample in the virus family, and if the dynamic behavior sequences of the newly added sample and the known sample are similar, the server determines that the newly added sample and the known sample have an association relationship.
In the conventional method, the access relationship between the newly added sample and the known sample is generally utilized, and if the communication addresses are the same or the parent files are the same, it is determined that the association relationship exists between the newly added sample and the known sample. Or comparing the static homologous codes corresponding to the newly added sample and the known sample, and determining that the newly added sample and the known sample have an association relationship if the binary codes of the newly added sample and the known sample are similar. However, in the first mode, the interference of network data is large, and computer viruses do not exist when accessing the same communication address. The second approach has a low coverage rate and is not able to combat the binary changes of all computer virus variant samples. Therefore, the accuracy of identification of suspected variants in the conventional approach is limited.
In this embodiment, by comparing the behavior sequence of the newly added sample with the behavior sequence of the known sample, whether an association relationship exists can be identified according to the behavior difference between the newly added sample and the known sample, so that the accuracy of identifying the association relationship can be effectively improved, and the accuracy of identifying the suspected variants of the known sample can be further improved.
In one embodiment, filtering the behavior log of the newly added sample includes: cleaning public behavior information and general behavior information in the behavior log of the newly added sample; classifying the cleaned behavior logs to generate a newly added behavior feature set corresponding to the behavior categories; and screening the newly added behavior feature set by using the white sample set to obtain the exclusive feature set of the newly added sample.
The newly added sample behavior log comprises a plurality of behaviors and corresponding detailed information. Behaviors include common behaviors and common behaviors. Wherein the behavior of the common behavior that cannot characterize the sample characteristics comprises shell code behavior. If a computer program (or application) adds upx shells, the initial behavior is that of upx shell code, which cannot be used to characterize the sample features. The general behavior refers to a large number of triggered behaviors, which may be repetitive behaviors. As shown in fig. 3, the behaviors in the block are all "modifying the process memory", but the process ID, the address, the process file name, and the like in the detailed information are partially the same or partially different. This general behavior is already ubiquitous in computer programs and does not have a corresponding effect in performing viral variety analysis. And the server cleans the public behavior information and the general behavior information in the newly added sample behavior log so as to remove the behavior information which is meaningless in analyzing the virus variation.
And the server classifies the cleaned behavior information and generates a newly added behavior feature set. The server acquires corresponding behavior parameter information according to the cleaned behaviors, wherein different behaviors include different behavior parameter information, and one behavior can correspond to one behavior parameter or a plurality of behavior parameters. The server can generate a new behavior feature set according to the corresponding relationship between the cleaned behaviors and the behavior parameter information. In the newly added behavior feature set, the behavior parameter information may adopt a json format, and when one cleaned behavior corresponds to a plurality of behavior parameters, the behavior parameter information may be stored in a manner of a json format dictionary. As shown in table 1, the behavior "DnsQuery" after cleaning corresponds to a behavior parameter, and the behavior parameter information is "p.abbny.com". The behavior parameter information of the cleaned behavior 'CreateFile', corresponding to the two behavior parameters 'Filename' and 'FileMd 5', can be stored in a json format dictionary.
Figure BDA0002378211480000081
TABLE 1
A behavior feature white library is established in the server in advance. The behavior feature white library comprises behavior features corresponding to the white samples. Since the white sample is an application program allowed at the terminal, the behavior characteristics corresponding to the white sample cannot reflect the characteristics of the virus variants and need to be eliminated. And through collision with the behavior feature white library, behavior features corresponding to the white samples are screened out, and thus a special feature set of the newly added samples is generated. After the new behavior feature set in table 1 is screened by using a white sample, the obtained exclusive feature set of the new sample may be as shown in table 2 below.
Figure BDA0002378211480000082
TABLE 2
In this embodiment, the common behavior information and the common behavior information in the newly added sample behavior log are cleaned, so that the behavior information which is meaningless in analyzing the virus variation is removed. And after a newly added behavior feature set is generated by utilizing the cleaned behavior log, screening the newly added behavior feature set by utilizing a white sample set, so that the behavior features which cannot reflect the virus varieties can be further eliminated, and the exclusive features of the newly added sample for analyzing the virus varieties are obtained.
In an embodiment, as shown in fig. 4, the method further includes a step of creating a variant feature library, which specifically includes:
step S212, classifying the known samples of the virus family to obtain a corresponding variant sample set.
Step S214, a behavior log of the variant samples in the variant sample set is obtained.
Step S216, generating a variant sample exclusive feature set according to the behavior log of the variant sample.
In step S218, a variant feature library is created by using the variant sample specific feature set.
Known samples of multiple virus families are stored in the server. Plural means two or more. Known samples of virus families may also be referred to as virus family members. The known samples having variant samples in the members may also be referred to as base samples. For example, 20 members are included in a virus family, wherein the members S2 and S3 are variants of S1. S1 may be referred to as a base sample, and S2 and S3 are variant samples corresponding to the base sample. The server classifies the known samples of each virus family by variant dimension. Wherein, the server takes the base sample with variant sample and the variant sample corresponding to the base sample as a variant sample set. The known samples having no variants are taken as a sample set. For example, the server treats (S2, S3, S1) as one sample set. The sample set is represented according to a preset format, and the basic sample is marked at the end.
The server obtains a behavior log of all the variant samples in each set of variant samples. The behavior log may be an original dynamic behavior log. The server may perform a cleaning of the behavior log of each variant sample in the set of variant samples to remove the common behavior information and the common behavior information in the manner provided in the above embodiments. And classifying the cleaned behavior logs according to the behavior categories to generate a variant behavior feature set. And screening the variant behavior feature set by using the white sample set to obtain the exclusive feature set of the variant sample. The server can aggregate the variant sample specific feature sets according to the virus families, and aggregate and combine a plurality of virus families to generate a variant feature library.
In this embodiment, the corresponding variant feature library is created by using the variant sample of the virus family, so that when the newly added sample is found to be a suspected variant of a known sample, the newly added sample can be quickly and accurately analyzed for the virus variant status by matching with the variant feature library, thereby effectively improving the accuracy and efficiency of virus variant analysis.
In one embodiment, as shown in FIG. 5, a method for analyzing variants of a computer virus is provided, comprising:
step S502, acquiring a newly added sample, and identifying a known sample having an association relation with the newly added sample.
Step S504, filtering the behavior log of the newly added sample to generate a special feature set of the newly added sample; the newly added sample specific feature set comprises behavior categories.
In step S506, a variant sample specific feature set corresponding to the known sample is obtained.
And step S508, performing similarity matching on the behavior parameters of the newly added sample and the behavior parameters of the variant sample by using the behavior types.
Step S510, calculating the similarity between the new sample and the variant sample according to the similarity corresponding to the behavior category and the weight.
And step S512, when the similarity obtained by matching exceeds a threshold value, marking the newly added sample as a variant sample of the known sample.
And when the newly added sample has an association relation with the known sample, marking the newly added sample as a suspected variety. The server may obtain a variant sample exclusive feature set corresponding to the known sample from the variant feature library according to the association relationship between the new sample and the known sample. The variant sample-specific feature set includes variant-specific behaviors and corresponding behavior parameters. The server obtains the exclusive behavior of the newly added sample and the corresponding behavior parameters. The newly added sample-specific behavior and the variant sample-specific behavior may also be referred to as behavior classes. The behavior category may be preset with similarity weights (abbreviated as weights). And the server performs similarity matching on the behavior parameters of the newly added sample behavior categories and the behavior parameters of the variant sample behavior categories according to the behavior categories to obtain the similarity corresponding to each behavior category. Different behavior categories may get the same similarity or different similarities.
Continue to be illustrated with the new sample behavior categories in table 2. When the server matches the new sample behavior class with the variant behavior class, the behavior class may include "CreateFile", "createpprocesses", "CreateCmdline", and "DnsQuery". Similarity weights are set in each behavior category respectively. The matching results can be shown in table 3 below.
Figure BDA0002378211480000101
Figure BDA0002378211480000111
TABLE 3
And the server performs accumulation calculation according to the similarity and the weight corresponding to the behavior type to obtain the similarity between the newly added sample and the variant sample. The server compares the similarity with a threshold value, and if the similarity exceeds the threshold value, the newly added sample is marked as a variant sample of the known sample. For example, the similarity weight corresponding to each behavior category in table 3 is set to 1, and according to the matching result of the behavior categories, the similarity P between the newly added sample and the known sample (1 × 1+1 × 1+0 × 1)/4 — 0.75 can be calculated. Assuming that the threshold is 0.6, the new sample can be determined to be a variant of the known sample.
In the embodiment, the variant feature library is established by using the known sample of the virus family in advance, and when the newly added sample is found to be a suspected variant of a certain known sample, the exclusive feature set of the newly added sample is compared with the exclusive feature set of the variant sample having an association relation in the variant feature library, so that the matching analysis of the virus variant condition of the newly added sample can be rapidly and accurately performed, and the matching accuracy and the matching efficiency of the virus variant are effectively improved.
In one embodiment, as shown in FIG. 6, another method for analyzing variants of a computer virus is provided, comprising:
step S602, acquiring a newly added sample, and identifying a known sample having an association relation with the newly added sample.
Step S604, filtering the behavior log of the newly added sample to generate a special feature set of the newly added sample; the newly added sample specific feature set comprises behavior categories.
In step S606, a variant sample dedicated feature set corresponding to the known sample is obtained.
Step S608, the number of behavior parameters corresponding to the behavior category is obtained.
Step S610, selecting a similarity matching mode corresponding to the behavior type according to the number of the behavior parameters.
And step S612, performing similarity matching on the behavior parameters of the newly added sample and the behavior parameters of the known sample in a similarity matching mode.
Step S614, according to the similarity and the weight corresponding to the behavior category, the similarity between the newly added sample and the variant sample is calculated.
And step S616, marking the newly added sample as a variant sample of the known sample when the similarity obtained by matching exceeds a threshold value.
Because the complexity of the behavior parameters corresponding to different behavior categories is different, in order to effectively improve the accuracy of similarity matching, the server can perform similarity matching in different ways for different behavior categories. The server may select the matching manner based on the number of behavior parameters corresponding to the behavior category. When the number of the behavior parameters corresponding to the behavior category is one, the server can calculate the binary similarity between the behavior parameters corresponding to the newly added samples and the behavior parameters corresponding to the varieties in the behavior category. Therefore, the similarity corresponding to the behavior category can be obtained.
When the number of the behavior parameters corresponding to the behavior category is two or more, the server may analyze the plurality of behavior parameters corresponding to the behavior category into a single behavior parameter, and construct a directed behavior graph corresponding to the behavior category. The server can respectively construct a directed behavior diagram corresponding to the added sample and a directed behavior diagram corresponding to the variant sample according to the same behavior category. For one behavior class, the server compares the similarity between the directed behavior graph corresponding to the newly-added sample and the directed behavior graph corresponding to the variant sample, and can calculate the similarity of the behavior class.
In one embodiment, after marking the newly added sample as a variant sample of the known sample, the method further comprises: identifying the change content between the newly added sample and the existing variant sample according to the matching result; generating early warning information of virus varieties corresponding to newly added samples based on changed contents
After similarity matching according to the above embodiments, the server may identify the changed content between the newly added sample and the existing variant sample according to the corresponding matching result. The server can identify corresponding changed content according to the matching result of the behavior category. As shown in table 3, the difference part is "DnsQuery", and the changed content of the new sample can be found as follows: the new domain name is replaced and the domain name ii.
The server can generate early warning information of the virus varieties corresponding to the newly added samples according to the changed contents. Furthermore, the content can be graded according to the behavior intensity of the changed content, and different behavior categories can correspond to different intensities to generate early warning information of different levels. Therefore, the staff can adopt corresponding treatment measures to the newly added samples in time according to the early warning information.
It should be understood that although the steps in the flowcharts of fig. 2, 4, 5, and 6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 4, 5, and 6 may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least some of the sub-steps or stages of other steps.
As shown in FIG. 7, in one embodiment, there is provided an apparatus for analyzing a variant of a computer virus, comprising:
and the association identification module 702 is configured to obtain a new sample, and identify a known sample having an association relationship with the new sample.
The filtering module 704 is configured to filter the behavior log of the newly added sample to generate an exclusive feature set of the newly added sample; the newly added sample specific feature set comprises behavior categories.
A matching module 706, configured to obtain a variant sample exclusive feature set corresponding to a known sample; matching the exclusive feature set of the newly added sample with the exclusive feature set of the variant sample according to the behavior category; and when the similarity obtained by matching exceeds a threshold value, marking the newly added sample as a variant sample of the known sample.
In this embodiment, the newly added sample may be marked as a suspected variety by identifying a known sample associated with the newly added sample. And filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample. And after matching the exclusive feature set of the newly added sample with the exclusive feature set of the variant sample corresponding to the known sample according to the behavior category, when the similarity obtained by matching exceeds a threshold value, determining the newly added sample as the variant sample of the known sample. Therefore, when suspected variants are found, the matching analysis of the virus variant conditions can be rapidly and accurately carried out, and the accuracy and the identification efficiency of virus variant identification are effectively improved.
In one embodiment, the association identification module 702 is further configured to obtain a behavior sequence of a known sample in the virus family; generating a corresponding behavior sequence by using the behavior log of the newly added sample; and identifying the known sample which has an association relation with the newly added sample by comparing the behavior sequence of the newly added sample with the behavior sequence of the known sample in the virus family.
In one embodiment, the filtering module 704 is further configured to clean the common behavior information and the common behavior information in the behavior log of the newly added sample; classifying the cleaned behavior logs according to the behavior categories to generate a newly added behavior feature set; and screening the newly added behavior feature set by using the white sample set to obtain the exclusive feature set of the newly added sample.
In one embodiment, as shown in fig. 8, the apparatus further comprises: a variant feature library creating module 708, configured to classify known samples of a virus family to obtain a corresponding variant sample set; acquiring a behavior log of variant samples in a variant sample set; generating a variant sample exclusive feature set according to the behavior log of the variant sample; and creating a variant feature library by using the variant sample specific feature set.
In one embodiment, the matching module 706 is further configured to perform similarity matching on the behavior parameters of the new sample and the behavior parameters of the variant sample by using the behavior category; and calculating the similarity between the newly added sample and the variant sample according to the similarity corresponding to the behavior class and the weight.
In one embodiment, the matching module 706 is further configured to obtain the number of behavior parameters corresponding to the behavior category; selecting a similarity matching mode corresponding to the behavior type according to the number of the behavior parameters; and performing similarity matching on the behavior parameters of the newly added sample and the behavior parameters of the known sample in a similarity matching mode.
In one embodiment, as shown in fig. 9, the apparatus further comprises: the early warning module 710 identifies the change content between the newly added sample and the existing variant sample according to the matching result; and generating early warning information of the virus varieties corresponding to the newly added samples based on the changed contents.
FIG. 10 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the server 120 in fig. 1. As shown in fig. 10, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement a method of analyzing a variant of a computer virus. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a method for analyzing a variant of a computer virus.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the apparatus for analyzing variants of computer viruses provided herein may be implemented in the form of a computer program that is executable on a computer device such as that shown in fig. 10. The memory of the computer device may store various program modules that make up the analysis means for the computer virus variant, such as the association identification module, the filtering module, and the matching module shown in FIG. 7. The program modules constitute computer programs that cause the processor to perform the steps of the method for analyzing a computer virus variant of the embodiments of the present application described in the present specification. For example, the computer device shown in FIG. 10 may perform the identification of the known sample associated with the newly added sample by the association identification module in the computer virus variant analysis apparatus shown in FIG. 7.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method of analyzing a computer virus variant described above. The steps of the method for analyzing a variant of a computer virus herein may be steps of the method for analyzing a variant of a computer virus of the various embodiments described above.
In one embodiment, a computer readable storage medium is provided, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method for analyzing a variant of a computer virus described above. The steps of the method for analyzing a variant of a computer virus herein may be steps of the method for analyzing a variant of a computer virus of the various embodiments described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (12)

1. A method of analyzing variants of a computer virus, comprising:
acquiring a newly added sample, and identifying a known sample having an association relation with the newly added sample;
filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample; the exclusive characteristic set of the newly added sample comprises behavior categories;
acquiring a variant sample specific feature set corresponding to the known sample;
matching the newly added sample-specific feature set with the variant sample-specific feature set according to the behavior category;
and when the similarity obtained by matching exceeds a threshold value, marking the newly added sample as a variant sample of the known sample.
2. The method of claim 1, wherein the identifying the known samples associated with the added sample comprises:
acquiring a behavior sequence of a known sample in a virus family;
generating a corresponding behavior sequence by using the behavior log of the newly added sample;
and identifying the known sample which has an association relation with the newly added sample by comparing the behavior sequence of the newly added sample with the behavior sequence of the known sample in the virus family.
3. The method of claim 1, wherein the filtering the behavior log of the added sample comprises:
cleaning public behavior information and general behavior information in the behavior log of the newly added sample;
classifying the cleaned behavior logs according to the behavior categories to generate a newly added behavior feature set;
and screening the newly added behavior feature set by using a white sample set to obtain an exclusive feature set of the newly added sample.
4. The method of claim 1, further comprising:
classifying known samples of the virus family to obtain a corresponding variant sample set;
obtaining a behavior log of variant samples in the set of variant samples;
generating a variant sample exclusive feature set according to the behavior log of the variant sample;
and creating a variant feature library by using the variant sample specific feature set.
5. The method of claim 1, wherein said matching the new sample-specific feature set with the variant sample-specific feature set according to the behavior class comprises:
performing similarity matching on the behavior parameters of the newly-added sample and the behavior parameters of the variant sample by using the behavior category;
and calculating the similarity between the newly-added sample and the variant sample according to the similarity corresponding to the behavior category and the weight.
6. The method of claim 5, wherein the performing similarity matching between the behavior parameters of the newly added sample and the behavior parameters of the known samples comprises:
acquiring the number of behavior parameters corresponding to the behavior categories;
selecting a similarity matching mode corresponding to the behavior type according to the number of the behavior parameters;
and performing similarity matching on the behavior parameters of the newly added sample and the behavior parameters of the known sample in a similarity matching mode.
7. The method of any one of claims 1 to 6, further comprising, after said marking said new sample as a variant sample of said known sample:
identifying the change content between the newly added sample and the existing variant sample according to the matching result;
and generating early warning information of the virus variant corresponding to the newly-added sample based on the changed content.
8. An apparatus for analyzing a variant of a computer virus, said apparatus comprising:
the association identification module is used for acquiring a newly added sample and identifying a known sample which has an association relation with the newly added sample;
the filtering module is used for filtering the behavior log of the newly added sample to generate an exclusive characteristic set of the newly added sample; the exclusive characteristic set of the newly added sample comprises behavior categories;
the matching module is used for acquiring a variant sample exclusive feature set corresponding to the known sample; matching the newly added sample-specific feature set with the variant sample-specific feature set according to the behavior category; and when the similarity obtained by matching exceeds a threshold value, marking the newly added sample as a variant sample of the known sample.
9. The apparatus of claim 8, further comprising: the system comprises a variant feature library creating module, a variant feature library creating module and a variant feature library creating module, wherein the variant feature library creating module is used for classifying known samples of virus families to obtain a corresponding variant sample set; obtaining a behavior log of variant samples in the set of variant samples; generating a variant sample exclusive feature set according to the behavior log of the variant sample; and creating a variant feature library by using the variant sample specific feature set.
10. The apparatus of claim 8, wherein the matching module is further configured to perform similarity matching between the behavior parameters of the new sample and the behavior parameters of the variant samples by using the behavior classes; and calculating the similarity between the newly-added sample and the variant sample according to the similarity corresponding to the behavior category and the weight.
11. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 7.
12. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.
CN202010074842.7A 2020-01-22 2020-01-22 Analysis method and device for computer virus variants and computer equipment Active CN111310179B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010074842.7A CN111310179B (en) 2020-01-22 2020-01-22 Analysis method and device for computer virus variants and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010074842.7A CN111310179B (en) 2020-01-22 2020-01-22 Analysis method and device for computer virus variants and computer equipment

Publications (2)

Publication Number Publication Date
CN111310179A true CN111310179A (en) 2020-06-19
CN111310179B CN111310179B (en) 2024-07-09

Family

ID=71159793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010074842.7A Active CN111310179B (en) 2020-01-22 2020-01-22 Analysis method and device for computer virus variants and computer equipment

Country Status (1)

Country Link
CN (1) CN111310179B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350052A (en) * 2007-10-15 2009-01-21 北京瑞星国际软件有限公司 Method and apparatus for discovering malignancy of computer program
US20130046763A1 (en) * 2011-08-18 2013-02-21 Verisign, Inc. Systems and methods for identifying associations between malware samples
CN103139169A (en) * 2011-11-30 2013-06-05 西门子公司 Virus detection system and method based on network behavior
CN103902897A (en) * 2012-12-26 2014-07-02 腾讯科技(深圳)有限公司 Differentiating method and system for computer virus
CN104036187A (en) * 2013-03-04 2014-09-10 阿里巴巴集团控股有限公司 Method and system for determining computer virus types
CN104424435A (en) * 2013-08-22 2015-03-18 腾讯科技(深圳)有限公司 Method and device for acquiring virus characteristic code
CN104598820A (en) * 2015-01-14 2015-05-06 国家电网公司 Trojan virus detection method based on feature behavior activity
CN104640105A (en) * 2013-11-12 2015-05-20 严威 Method and system for mobile phone virus analyzing and threat associating
WO2016127037A1 (en) * 2015-02-06 2016-08-11 Alibaba Group Holding Limited Method and device for identifying computer virus variants
US20160306972A1 (en) * 2014-02-18 2016-10-20 Tencent Technology (Shenzhen) Company Limited Virus signature matching method and apparatus
US20180293380A1 (en) * 2016-05-06 2018-10-11 Tencent Technology (Shenzhen) Company Limited Virus program detection method, terminal, and computer readable storage medium
CN110210218A (en) * 2018-04-28 2019-09-06 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of viral diagnosis
CN110457903A (en) * 2019-07-24 2019-11-15 腾讯科技(深圳)有限公司 A kind of virus analysis method, apparatus, equipment and medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350052A (en) * 2007-10-15 2009-01-21 北京瑞星国际软件有限公司 Method and apparatus for discovering malignancy of computer program
US20130046763A1 (en) * 2011-08-18 2013-02-21 Verisign, Inc. Systems and methods for identifying associations between malware samples
CN103139169A (en) * 2011-11-30 2013-06-05 西门子公司 Virus detection system and method based on network behavior
CN103902897A (en) * 2012-12-26 2014-07-02 腾讯科技(深圳)有限公司 Differentiating method and system for computer virus
CN104036187A (en) * 2013-03-04 2014-09-10 阿里巴巴集团控股有限公司 Method and system for determining computer virus types
CN104424435A (en) * 2013-08-22 2015-03-18 腾讯科技(深圳)有限公司 Method and device for acquiring virus characteristic code
CN104640105A (en) * 2013-11-12 2015-05-20 严威 Method and system for mobile phone virus analyzing and threat associating
US20160306972A1 (en) * 2014-02-18 2016-10-20 Tencent Technology (Shenzhen) Company Limited Virus signature matching method and apparatus
CN104598820A (en) * 2015-01-14 2015-05-06 国家电网公司 Trojan virus detection method based on feature behavior activity
WO2016127037A1 (en) * 2015-02-06 2016-08-11 Alibaba Group Holding Limited Method and device for identifying computer virus variants
US20180293380A1 (en) * 2016-05-06 2018-10-11 Tencent Technology (Shenzhen) Company Limited Virus program detection method, terminal, and computer readable storage medium
CN110210218A (en) * 2018-04-28 2019-09-06 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of viral diagnosis
CN110457903A (en) * 2019-07-24 2019-11-15 腾讯科技(深圳)有限公司 A kind of virus analysis method, apparatus, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈琪: "基于静态结构的恶意代码同源性分析", 《计算机工程与应用》, vol. 53, no. 14, 1 April 2017 (2017-04-01), pages 93 - 98 *

Also Published As

Publication number Publication date
CN111310179B (en) 2024-07-09

Similar Documents

Publication Publication Date Title
CN110177108B (en) Abnormal behavior detection method, device and verification system
CN113255370B (en) Industry type recommendation method, device, equipment and medium based on semantic similarity
CN108200054A (en) A kind of malice domain name detection method and device based on dns resolution
US11580219B2 (en) System and method for malware signature generation
CN111368289B (en) Malicious software detection method and device
US11080398B2 (en) Identifying signatures for data sets
CN111723371A (en) Method for constructing detection model of malicious file and method for detecting malicious file
CN116383742B (en) Rule chain setting processing method, system and medium based on feature classification
Vadrevu et al. Maxs: Scaling malware execution with sequential multi-hypothesis testing
CN111400597A (en) Information classification method based on k-means algorithm and related equipment
CN112347474A (en) Method, device, equipment and storage medium for constructing security threat information
CN111666258A (en) Information processing method and device, and information query method and device
CN111310179B (en) Analysis method and device for computer virus variants and computer equipment
CN112883373A (en) PHP type WebShell detection method and detection system thereof
Karat et al. CNN-LSTM Hybrid Model for Enhanced Malware Analysis and Detection
KR102068605B1 (en) Method for classifying malicious code by using sequence of functions' execution and device using the same
CN105095752B (en) The recognition methods of viral data packet, apparatus and system
CN112347477A (en) Family variant malicious file mining method and device
CN115147020B (en) Decoration data processing method, device, equipment and storage medium
CN113781156B (en) Malicious order identification method, model training method, device and storage medium
CN110888977A (en) Text classification method and device, computer equipment and storage medium
US12067120B2 (en) Classifier generator
CN113254672B (en) Method, system, equipment and readable storage medium for identifying abnormal account
CN115688099A (en) Computer virus retrieval method and device, computer equipment and storage medium
CN114363039A (en) Method, device, equipment and storage medium for identifying fraud websites

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40024318

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant