CN116861430B - Malicious file detection method, device, equipment and medium - Google Patents

Malicious file detection method, device, equipment and medium Download PDF

Info

Publication number
CN116861430B
CN116861430B CN202311131112.6A CN202311131112A CN116861430B CN 116861430 B CN116861430 B CN 116861430B CN 202311131112 A CN202311131112 A CN 202311131112A CN 116861430 B CN116861430 B CN 116861430B
Authority
CN
China
Prior art keywords
file
behavior
malicious
detected
behavior information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311131112.6A
Other languages
Chinese (zh)
Other versions
CN116861430A (en
Inventor
田国新
奚广生
白富宽
孙晋超
肖新光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Antiy Network Technology Co Ltd
Original Assignee
Beijing Antiy Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Antiy Network Technology Co Ltd filed Critical Beijing Antiy Network Technology Co Ltd
Priority to CN202311131112.6A priority Critical patent/CN116861430B/en
Publication of CN116861430A publication Critical patent/CN116861430A/en
Application granted granted Critical
Publication of CN116861430B publication Critical patent/CN116861430B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/565Static detection by checking file integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Abstract

The invention provides a malicious file detection method, a malicious file detection device, malicious file detection equipment and a malicious file detection medium, and relates to the field of security detection, wherein the method comprises the following steps: acquiring file characteristics of a file to be detected; detecting file characteristics to obtain a detection result; if the detection result shows that the file to be detected is not a malicious file, determining a first behavior vector of the file to be detected; obtaining a first matching degree of the first behavior vector and each first sample vector; if any first matching degree is larger than a preset matching degree threshold value, determining a second behavior vector of the file to be detected; obtaining a corresponding target file identifier; if the target file identifier is a malicious file identifier, the file to be detected is determined to be a malicious file. According to the method and the device, whether the file to be detected is a malicious file is determined by carrying out sectional detection on the file to be detected and comparing the file behavior of the file to be detected with the combination of a plurality of malicious behaviors of the malicious sample file, so that the detection precision, applicability and detection efficiency are improved.

Description

Malicious file detection method, device, equipment and medium
Technical Field
The present invention relates to the field of security detection, and in particular, to a method, apparatus, device, and medium for detecting malicious files.
Background
The existing malicious file detection method is a general malicious file detection method for detecting whether a file is a malicious file or not by checking file attributes, digital signatures, system detection and the like, but the method for detecting the malicious file by detecting the file attributes is poor in accuracy and easy to have false detection, and if the method for detecting the malicious file by the system detection is used for carrying out accurate detection, the corresponding detection strategy is required to be formulated for different types of malicious files due to the fact that the types of the malicious files are more, so that the detection efficiency is lower.
Disclosure of Invention
In view of this, the application provides a malicious file detection method, device, equipment and medium, which at least partially solve the technical problem of low detection efficiency in the prior art, and adopts the following technical scheme:
according to one aspect of the present application, there is provided a malicious file detection method applied to a file detection system, the malicious file detection method including the steps of:
in response to receiving a file to be detected, acquiring file characteristics of the file to be detected;
detecting file characteristics to obtain a detection result corresponding to the file to be detected;
If the detection result corresponding to the file to be detected indicates that the file to be detected is not a malicious file, acquiring a plurality of pieces of general file behavior information of the file to be detected in a first preset time period;
determining a first behavior vector of a file to be detected according to the behavior information of a plurality of universal files;
obtaining a first matching degree of the first behavior vector and each first sample vector according to the first behavior vector and a plurality of first sample vectors; the first sample vector is obtained according to the general file behavior information of the malicious sample file; each first sample vector corresponds to a preset malicious attack type;
if any first matching degree is larger than a preset matching degree threshold value, acquiring a plurality of pieces of full-quantity file behavior information of the file to be detected in a second preset time period; the number of the file behavior types corresponding to the full-quantity file behavior information is more than that of the file behavior types corresponding to the general file behavior information; the second preset time period is after the first preset time period;
determining a second behavior vector of the file to be detected according to the behavior information of the plurality of full files;
inputting the second behavior vector into the target model to obtain a corresponding target file identifier; the target model is obtained by training according to file behaviors of malicious sample files; the target file identifier is used for identifying whether the file to be detected is a malicious file or not;
If the target file identifier is a malicious file identifier, the file to be detected is determined to be a malicious file.
In an exemplary embodiment of the application, the file characteristics include at least one of:
hash value, file structure information, MD5 value and file code feature of the file to be detected;
detecting the file characteristics to obtain a detection result corresponding to the file to be detected, including:
comparing the file characteristics of the file to be detected with the preset abnormal file characteristics;
if the file characteristics of any file to be detected are the same as the corresponding preset abnormal file characteristics, the detection result shows that the file to be detected is a malicious file; otherwise, the detection result indicates that the file to be detected is not a malicious file.
In an exemplary embodiment of the application, the generic file behavior information is determined by:
obtaining m malicious sample files in a third preset time period T 3 =[t 31 ,t 32 ]Obtaining a sample file behavior information set F= (F) by a plurality of file behavior information of a plurality of behaviors performed in the file 1 ,F 2 ,...,F j ,...,F m );F j =(F j1 ,F j2 ,...,F jd ,...,F jf(j) ) The method comprises the steps of carrying out a first treatment on the surface of the Where j=1, 2, m; d=1, 2,., f (j); f (j) is the j-th malicious sample file at T 3 The number of file behavior information for the behavior performed internally; f (F) j A file behavior information list corresponding to the jth malicious sample file; f (F) jd At T for jth malicious sample file 3 The d-th file behavior information performed internally; t is t 31 <t 32 ;t 31 Is T 3 Corresponding startStart time; t is t 32 Is T 3 A corresponding deadline;
performing de-duplication treatment on the F to obtain b pieces of target malicious behavior information; each piece of target malicious behavior information corresponds to a plurality of malicious behavior type identifiers;
grouping b target malicious behavior information according to a plurality of malicious behavior type identifications, determining z malicious behavior type identification groups, and obtaining a malicious behavior type identification set L= (L) 1 ,L 2 ,...,L x ,...,L z );L x =(L x1 ,L x2 ,...,L xv ,...,L xf(x) ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein x=1, 2,..z; v=1, 2,. -%, f (x); f (x) is the number of target malicious behavior information in the xth malicious behavior type identification group; l (L) x Identifying a target malicious behavior information list corresponding to the group for the xth malicious behavior type; l (L) xv Identifying, for an xth malicious behavior type, a xth target malicious behavior information included in the group;
will L 1 ,L 2 ,...,L x ,...,L z And determining w pieces of target malicious behavior information obtained after intersection processing as general file behavior information.
In an exemplary embodiment of the present application, obtaining a plurality of pieces of general file behavior information of a file to be detected performed in a first preset time period includes:
during a first preset period of time T 1 After the completion, acquiring a plurality of pieces of general file behavior information of the file to be detected, and obtaining a general file behavior information set J= (J) 1 ,J 2 ,...,J s ,...,J o ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein s=1, 2, once again, o; o is less than or equal to w; o is the file to be detected at T 1 The number of generic file behavior information performed internally; j (J) s For the file to be detected at T 1 The s-th general file behavior information of the internal process; t (T) 1 =[t 11 ,t 12 ];t 12 >t 11 >t 32 ;(t 12 -t 11 )=(t 32 -t 31 );t 11 Is T 1 Corresponding start time; t is t 12 Is T 1 Corresponding deadlines.
In an exemplary embodiment of the application, the first behavior vector is determined by:
obtaining a general preset behavior feature vector A= (A) according to w general file behavior information 1 ,A 2 ,...,A c ,...,A w ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein c=1, 2, once again, w; a is that c The behavior characteristics corresponding to the behavior information of the c-th general file in the A are obtained;
traversing A, if A c If the corresponding general file behavior information exists in J, then A is c Is determined to be 1; otherwise, will A c Determined to be 0;
and determining A as a first behavior vector of the file to be detected.
In an exemplary embodiment of the present application, obtaining a plurality of pieces of full-quantity file behavior information of a file to be detected performed in a second preset time period includes:
during a second preset period of time T 2 After the completion, acquiring a plurality of full-quantity file behavior information of a plurality of behaviors of the file to be detected, and obtaining a full-quantity file behavior information set Q= (Q) 1 ,Q 2 ,...,Q i ,...,Q n ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein i=1, 2, n; n is the T of the file to be detected 2 The number of full-volume file behavior information of the behaviors performed internally; q (Q) i For the file to be detected at T 2 The i-th full-quantity file behavior information performed internally; t (T) 2 =[t 21 ,t 22 ];t 22 >t 21 >t 11 ;t 21 Is T 2 Corresponding start time; t is t 22 Is T 2 Corresponding deadlines.
In an exemplary embodiment of the application, the second behavior vector is determined by:
obtaining a first preset behavior feature vector E= (E) according to the b target malicious behavior information 1 ,E 2 ,...,E a ,...,E b ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein a=1, 2, b; e (E) a The behavior characteristics corresponding to the a-th target malicious behavior information in E are obtained;
traversing E, if E a If the corresponding target malicious behavior information exists in Q, E is determined to be a Is determined to be 1; whether or notThen, E is a Determined to be 0;
e is determined as a second behavior vector of the file to be detected.
According to an aspect of the present application, there is provided a malicious file detection apparatus including:
the file characteristic acquisition module is used for acquiring file characteristics of the file to be detected when the file to be detected is received;
the file feature detection module is used for detecting file features to obtain detection results corresponding to the files to be detected;
the universal behavior acquisition module is used for acquiring a plurality of pieces of universal file behavior information of the file to be detected in a first preset time period when the detection result corresponding to the file to be detected indicates that the file to be detected is not a malicious file;
The first vector determining module is used for determining a first behavior vector of the file to be detected according to the behavior information of the plurality of universal files;
the matching degree determining module is used for obtaining a first matching degree of the first behavior vector and each first sample vector according to the first behavior vector and the plurality of first sample vectors; the first sample vector is obtained according to the general file behavior information of the malicious sample file; each first sample vector corresponds to a preset malicious attack type;
the full-quantity behavior acquisition module is used for acquiring a plurality of pieces of full-quantity file behavior information of the file to be detected in a second preset time period when any first matching degree is larger than a preset matching degree threshold value; the number of the file behavior types corresponding to the full-quantity file behavior information is more than that of the file behavior types corresponding to the general file behavior information; the second preset time period is after the first preset time period;
the second vector determining module is used for determining a second behavior vector of the file to be detected according to the behavior information of the plurality of full-quantity files;
the file identification determining module is used for inputting the second behavior vector into the target model to obtain a corresponding target file identification; the target model is obtained by training according to file behaviors of malicious sample files; the target file identifier is used for identifying whether the file to be detected is a malicious file or not;
And the malicious file determining module is used for determining the file to be detected as a malicious file when the target file identifier is a malicious file identifier.
According to one aspect of the present application, there is provided a non-transitory computer readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the foregoing malicious file detection method.
According to one aspect of the present application, there is provided an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.
The application has at least the following beneficial effects:
according to the method, the file characteristics of the file to be detected are obtained, the file to be detected is initially detected, if the file characteristics are the same as the abnormal file characteristics, the file to be detected is determined to be the malicious file, if the file characteristics are different from the abnormal file characteristics, the universal file behavior information of the file to be detected is obtained, the first behavior vector is compared with each first sample vector, the corresponding first matching degree is obtained, if the first matching degree is larger than the preset matching degree threshold value, the possibility that the file to be detected is the malicious file is higher, further, the second behavior vector is determined through the full-scale file behavior of the file to be detected, the second behavior vector is input into the target model, the corresponding target file identification is obtained, if the target file identification is the malicious file identification, the file to be detected is determined to be the malicious file, and the target model is obtained according to the file behavior of the malicious sample file.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a malicious file detection method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for determining a target model according to an embodiment of the present invention;
fig. 3 is a block diagram of a malicious file detection apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
A malicious file detection method is applied to a file detection system, and the file detection system is used for carrying out malicious detection on a file to be detected and detecting whether the file to be detected is a malicious file or not.
As shown in fig. 1, the malicious file detection method includes the following steps:
step S100, responding to the received file to be detected, and acquiring the file characteristics of the file to be detected;
the method comprises the steps that a file to be detected, namely a file which is received by a file detection system and is not subjected to malicious detection, after the file detection system receives the file to be detected, file characteristics of the file to be detected are obtained, the file characteristics comprise one or more of hash values, file structure information, MD5 values, file code characteristics and the like of the file to be detected, and whether the file to be detected is a malicious file is judged through detection of the file characteristics of the file to be detected.
Step 200, detecting file characteristics to obtain detection results corresponding to the files to be detected;
the detection of the file features is preliminary detection of the file to be detected, and the detection method of the file features is convenient and takes less time. Therefore, in order to improve the detection efficiency, the method detects the file characteristics of the file to be detected, if the file to be detected is a malicious file after the file characteristics are detected, the file to be detected can be directly determined to be the malicious file without a subsequent detection step, so that the malicious detection flow is simplified, and if the file to be detected through the file characteristics is not the malicious file, the file characteristics are detected with poor accuracy, and therefore, although the file characteristics detection result indicates that the file characteristics of the file to be detected are normal characteristics, the subsequent detection step still needs to be continued to accurately detect whether the file to be detected is the malicious file.
Further, in step S200, detecting the file feature to obtain a detection result corresponding to the file to be detected, including:
step S210, comparing the hash value, the file structure information, the MD5 value and the file code characteristic of the file to be detected with a preset abnormal hash value, a preset abnormal file structure information, a preset abnormal MD5 value and a preset abnormal file code characteristic to obtain a detection result corresponding to the file to be detected;
step S220, if the hash value is the same as the preset abnormal hash value, or the file structure information is the same as the preset abnormal file structure information, or the MD5 value is the same as the preset abnormal MD5 value, or the file code feature is the same as the preset abnormal file code feature, the detection result indicates that the file to be detected is a malicious file; otherwise, the detection result indicates that the file to be detected is not a malicious file.
Because the number of the abnormal file features is smaller than that of the normal file features and the abnormal file features are easy to acquire, the file features of the file to be detected are compared with the abnormal file features, and a corresponding detection result is obtained. The abnormal file characteristics can be called from a data storage library of the server system, and can also be obtained by analyzing malicious sample files.
If one of the file characteristics of the file to be detected is the same as the corresponding abnormal file characteristic, the file to be detected is a malicious file, and if all the file characteristics of the file to be detected are different from the corresponding abnormal file characteristic, the file to be detected is not a malicious file, and further, the subsequent steps are needed to detect the file to be detected, and whether the file to be detected is a malicious file is further judged.
Step S300, if a detection result corresponding to the file to be detected indicates that the file to be detected is not a malicious file, acquiring a plurality of pieces of general file behavior information of the file to be detected in a first preset time period;
each general file behavior information corresponds to a general file behavior, the general file behavior represents file behaviors included in a plurality of malicious behavior types corresponding to a plurality of malicious sample files, and as each malicious behavior type comprises all general file behaviors, namely the same general file behavior corresponds to all malicious behavior types, when malicious operation is executed by the malicious sample files corresponding to each malicious behavior type, the general file behavior is executed, the general file behavior of the file to be detected is detected as a second step of malicious detection, and the more general file behaviors executed by the file to be detected, the more general file behaviors are, the more the file to be detected is represented as the malicious file.
Further, in step S300, obtaining a plurality of pieces of general file behavior information of the file to be detected in the first preset time period includes:
step S301, in a first preset time period T 1 After the completion, acquiring a plurality of pieces of general file behavior information of the file to be detected, and obtaining a general file behavior information set J= (J) 1 ,J 2 ,...,J s ,...,J o ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein s=1, 2, once again, o; o is less than or equal to w; o is the file to be detected at T 1 The number of generic file behavior information performed internally; j (J) s For the file to be detected at T 1 The s-th general file behavior information of the internal process; t (T) 1 =[t 11 ,t 12 ];t 12 >t 11 >t 32 ;(t 12 -t 11 )=(t 32 -t 31 );t 11 Is T 1 Corresponding start time; t is t 12 Is T 1 Corresponding deadlines.
The first preset time period is a time period after the file detection system receives the file to be detected, the file to be detected can be subjected to behavior monitoring in the server system, the file to be detected can be placed in the sandbox for behavior monitoring, if the size of the file to be detected is smaller than the preset file size value, the file to be detected is indicated to have fewer behavior types, the file to be detected can be directly subjected to behavior monitoring in the server system, if the size of the file to be detected is larger than or equal to the preset file size value, the file to be detected is indicated to have more executable behavior types, the file to be detected is placed in the sandbox for safety, the file to be detected is subjected to behavior monitoring in the sandbox, even if the file to be detected performs malicious behavior, the file to be detected does not harm the server system, if the file to be detected is not malicious, the file to be detected is moved from the sandbox to the server system, and information safety of the server system is ensured.
Specifically, the general file behavior information is determined by the following steps:
step S310, obtaining m malicious sample files in a third preset time period T 3 =[t 31 ,t 32 ]Obtaining a sample file behavior information set F= (F) by a plurality of file behavior information carried out internally 1 ,F 2 ,...,F j ,...,F m );F j =(F j1 ,F j2 ,...,F jd ,...,F jf(j) ) The method comprises the steps of carrying out a first treatment on the surface of the Where j=1, 2, m; d=1, 2,., f (j); f (j) is the j-th malicious sample file at T 3 The number of file behavior information performed internally; f (F) j A file behavior information list corresponding to the jth malicious sample file; f (F) jd At T for jth malicious sample file 3 The d-th file behavior information performed internally; t is t 31 <t 32 ;t 31 Is T 3 Corresponding start time; t is t 32 Is T 3 A corresponding deadline;
each target malicious behavior information corresponds to a target malicious behavior, and the target malicious behavior passes through a malicious sample textThe method comprises the steps that a piece of malicious sample files are determined to be known malicious files, or malicious files in a certain period of statistics or historical malicious files stored in a server database, and m malicious sample files are obtained in T 3 Internal file behavior, T 3 For the historical time period, since the same file behavior is executed by different malicious sample files, all the obtained file behaviors are subjected to deduplication.
Step S320, performing deduplication processing on the F to obtain b pieces of target malicious behavior information; each piece of target malicious behavior information corresponds to a plurality of malicious behavior type identifiers;
And obtaining b file behaviors after the file behaviors of all the malicious sample files are de-duplicated, and determining the file behaviors as target malicious behaviors, wherein the corresponding information is target malicious behavior information.
Each malicious sample file corresponds to a malicious behavior type identifier, the target malicious behavior performed by the malicious sample file also corresponds to the malicious behavior type identifier, the malicious behavior type identifier represents the identifier of the malicious behavior type performed by the corresponding malicious sample file, and the malicious behavior type is a malicious attack type and represents the attack means of the corresponding malicious sample file.
Step S330, grouping b target malicious behavior information according to a plurality of malicious behavior type identifications, determining z malicious behavior type identification groups, and obtaining a malicious behavior type identification set L= (L) 1 ,L 2 ,...,L x ,...,L z );L x =(L x1 ,L x2 ,...,L xv ,...,L xf(x) ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein x=1, 2,..z; v=1, 2,. -%, f (x); f (x) is the number of target malicious behavior information in the xth malicious behavior type identification group; l (L) x Identifying a target malicious behavior information list corresponding to the group for the xth malicious behavior type; l (L) xv Identifying, for an xth malicious behavior type, a xth target malicious behavior information included in the group;
each malicious behavior type identification group comprises all target malicious behavior information corresponding to the malicious behavior type identification.
Step S340,Will L 1 ,L 2 ,...,L x ,...,L z And determining w pieces of target malicious behavior information obtained after intersection processing as general file behavior information.
All malicious behavior types identify target malicious behavior information common to the group, i.e., generic file behavior information.
Step S400, determining a first behavior vector of a file to be detected according to behavior information of a plurality of general files;
the method comprises the steps of obtaining corresponding first behavior vectors by obtaining general file behavior information of a file to be detected, wherein the first behavior vectors represent the number of general file behaviors performed by the file to be detected, and judging the possibility that the file to be detected is a malicious file by further detecting the general file behaviors of the file to be detected.
Specifically, the first behavior vector is determined by:
step S410, obtaining a general preset behavior feature vector A= (A) according to the w general file behavior information 1 ,A 2 ,...,A c ,...,A w ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein c=1, 2, once again, w; a is that c The behavior characteristics corresponding to the behavior information of the c-th general file in the A are obtained;
step S420, traversing A, if A c If the corresponding general file behavior information exists in J, then A is c Is determined to be 1; otherwise, will A c Determined to be 0;
step S430, determining A as a first behavior vector of the file to be detected.
Comparing each file behavior of the file to be detected with each general file behavior, if the file behavior of the file to be detected comprises the general file behavior, determining the corresponding behavior characteristic in the general preset behavior characteristic vector corresponding to the general file behavior as 1, otherwise, determining the corresponding behavior characteristic in the general preset behavior characteristic vector corresponding to the general file behavior as 0, and obtaining the first behavior vector of the file to be detected.
Step S500, obtaining a first matching degree of the first behavior vector and each first sample vector according to the first behavior vector and a plurality of first sample vectors; the first sample vector is obtained according to the general file behavior information of the malicious sample file; each first sample vector corresponds to a preset malicious attack type;
the first sample vector is a vector obtained according to the general file behavior information of the malicious sample files, represents an average vector of general file behaviors of the malicious sample files corresponding to preset malicious attack types, and after the first sample vector is obtained, performs matching degree calculation on each first sample vector and the first behavior vector to obtain a corresponding plurality of first matching degrees.
Specifically, the first sample vector is determined by:
step S510, according to each malicious sample file at T 3 The general file behavior information of the internal process is used for obtaining a sample general behavior vector corresponding to each malicious sample file;
the method for determining the universal behavior vector of the sample is the same as the method for determining the first behavior vector, and each malicious sample file is obtained at T 3 And the internal general file behavior information forms a corresponding sample general behavior vector.
Step S520, dividing m malicious sample files into z sample file groups according to z malicious behavior type identifications;
grouping the plurality of malicious sample files according to the corresponding malicious behavior type identifiers to obtain a plurality of sample file groups.
Step S530, performing mean processing on a plurality of sample universal behavior vectors corresponding to a plurality of malicious sample files in each sample file group to obtain a first sample vector corresponding to each sample file group.
Each first sample vector represents an average vector of sample generic behavior vectors corresponding to the corresponding malicious behavior type.
Step S600, if any first matching degree is larger than a preset matching degree threshold value, acquiring a plurality of pieces of full-quantity file behavior information of the file to be detected in a second preset time period; the number of the file behavior types corresponding to the full-quantity file behavior information is more than that of the file behavior types corresponding to the general file behavior information; the second preset time period is after the first preset time period;
If the first matching degree is larger than a preset matching degree threshold value, the malicious behavior of the file to be detected is prone to the corresponding malicious behavior type, the file to be detected is considered to be a malicious file with high possibility, and subsequent further detection is continued, namely the overall file behaviors of the file to be detected are checked to determine whether the file to be detected is a malicious file; otherwise, if all the first matching degrees are smaller than or equal to the preset matching degree threshold, the probability that the file to be detected is currently judged to be a malicious file is small, then the general behavior of the file to be detected is monitored, after the general behavior monitoring time is continuously preset, if the obtained first matching degrees are still smaller than or equal to the preset matching degree threshold, the file to be detected is determined to be a normal file.
Further, in step S600, obtaining a plurality of pieces of full-quantity file behavior information of the file to be detected in the second preset time period includes:
step S610, in a second preset time period T 2 After the completion, acquiring a plurality of full-volume file behavior information of the file to be detected, and obtaining a full-volume file behavior information set Q= (Q) 1 ,Q 2 ,...,Q i ,...,Q n ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein i=1, 2, n; n is the T of the file to be detected 2 The number of total file behavior information performed internally; q (Q) i For the file to be detected at T 2 The i-th full-quantity file behavior information performed internally; t (T) 2 =[t 21 ,t 22 ];t 22 >t 21 >t 11 ;t 21 Is T 2 Corresponding start time; t is t 22 Is T 2 Corresponding deadlines.
Each full-volume file behavior information corresponds to a full-volume file behavior, wherein the full-volume file behavior is that the file to be detected is in T 2 All file behaviors performed internally.
Step S700, determining a second behavior vector of the file to be detected according to the behavior information of the plurality of full files;
each full-quantity file behavior information corresponds to a file behavior, the file behavior comprises self-starting, registry generation, scanning, encryption, information stealing and other behaviors, the file behavior of the file to be detected comprises normal file behavior and abnormal file behavior, the abnormal file behavior is the behavior of stealing or stealing user information or system information, and whether the file to be detected executes malicious behavior or not is comprehensively judged by detecting all the file behaviors, namely the full-quantity file behavior, of the file to be detected within a second preset time period, and then whether the file to be detected is malicious is judged.
Specifically, the second behavior vector is determined by:
step S710, obtaining a first preset behavior feature vector E= (E) according to the b target malicious behavior information 1 ,E 2 ,...,E a ,...,E b ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein a=1, 2, b; e (E) a The behavior characteristics corresponding to the a-th target malicious behavior information in E are obtained;
each behavior feature in the first preset behavior feature vector corresponds to a file behavior.
Step S720, traversing E, if E a If the corresponding target malicious behavior information exists in Q, E is determined to be a Is determined to be 1; otherwise, will E a Determined to be 0;
and determining the corresponding behavior characteristic in the first preset behavior characteristic vector to be 1 by detecting whether the file behavior of the file to be detected contains the corresponding target malicious behavior or not, and determining the corresponding behavior characteristic to be 0 if the file behavior of the file to be detected contains the corresponding target malicious behavior.
Step S730, determining E as a second behavior vector of the file to be detected.
Step S800, inputting a second behavior vector into the target model to obtain a corresponding target file identifier; the target model is obtained by training according to file behaviors of malicious sample files; the target file identifier is used for identifying whether the file to be detected is a malicious file or not;
after the second behavior vector is obtained, the second behavior vector is input into a target model, a corresponding target file identifier is output by the target model, and whether the file to be detected is a malicious file is judged through the target file identifier.
Specifically, as shown in fig. 2, the target model is determined by:
step 810, obtaining malicious behavior type identifiers corresponding to m malicious sample files to obtain a malicious behavior type identifier set h= (H) 1 ,H 2 ,...,H j ,...,H m ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein H is j Identifying a malicious behavior type corresponding to the jth malicious sample file;
step S820, determining e malicious behavior type identifiers obtained after the duplication removal processing of the H as malicious file identifiers;
because the malicious behavior type identifiers of different malicious sample files may exist under the same condition, duplication removal is needed, and the obtained malicious behavior type identifiers are determined to be malicious file identifiers.
Step S830, obtaining m second preset behavior feature vectors G according to F 1 ,G 2 ,...,G j ,...,G m ;G j =(G j1 ,G j2 ,...,G ja ,...,G jb ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein G is j A second preset behavior feature vector corresponding to the jth malicious sample file; g ja Behavior characteristics corresponding to the a-th target malicious behavior information of the j-th malicious sample file; g ja Corresponding target malicious behavior and E a The corresponding target malicious behaviors are the same;
presetting a second preset behavior feature vector corresponding to each malicious sample file according to file behaviors of the malicious sample files.
Step S840, traversing G j If G ja Corresponding target malicious behavior information exists in F j In (C), then G ja Is determined to be 1; otherwise, G is ja Determined to be 0;
if the file behaviors of the malicious sample file comprise target malicious behaviors, determining the behavior characteristics in the corresponding second preset behavior characteristic vector to be 1, otherwise, determining the behavior characteristics to be 0.
Step S850, G j Malicious behavior vector determined as jth malicious sample file;
Step S860, G j And inputting a malicious file identifier corresponding to the j-th malicious sample file into a preset model for training to obtain a target model.
Inputting each malicious behavior vector and a corresponding malicious file identifier into a preset model to train to obtain a target model, and enabling the target model to output the corresponding file identifier according to the input behavior vector.
Step S900, if the target file identifier is a malicious file identifier, determining the file to be detected as a malicious file.
And inputting the target behavior vector into a target model to obtain a target file identifier, and if the target file identifier is a malicious file identifier, indicating that the corresponding file to be detected is a malicious file.
According to the method, the file characteristics of the file to be detected are obtained, the file to be detected is initially detected, if the file characteristics are the same as the abnormal file characteristics, the file to be detected is determined to be the malicious file, if the file characteristics are different from the abnormal file characteristics, the universal file behavior information of the file to be detected is obtained, the first behavior vector is compared with each first sample vector, the corresponding first matching degree is obtained, if the first matching degree is larger than the preset matching degree threshold value, the possibility that the file to be detected is the malicious file is higher, further, the second behavior vector is determined through the full-scale file behavior of the file to be detected, the second behavior vector is input into the target model, the corresponding target file identification is obtained, if the target file identification is the malicious file identification, the file to be detected is determined to be the malicious file, and the target model is obtained according to the file behavior of the malicious sample file.
Further, in another embodiment, if at T 1 The internal detection has the same file as the file to be detectedA plurality of target association files of association relation are in a first preset time period T 1 After the completion, acquiring file behavior information of a plurality of target associated files to obtain a second file behavior information set R= (R) 1 ,R 2 ,...,R g ,...,R h );R g =(R g1 ,R g2 ,...,R gk ,...,R gf(g) ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein g=1, 2,..h; k=1, 2, f (g); h is the number of target associated files; f (g) is that the g-th target associated file is in T 1 The number of file behavior information performed internally; r is R g A file behavior information list corresponding to the g-th target associated file; r is R gk For g-th target association file at T 1 The kth file behavior information of the internal process; the b target malicious behavior information corresponds to u behavior monitoring strategies; wherein, the behavior monitoring list N of the p-th behavior monitoring strategy p =(N p1 ,N p2 ,...,N py ,...,N pf(p) ) The method comprises the steps of carrying out a first treatment on the surface of the p=1, 2, u; y=1, 2, f (p); f (p) is the number of target malicious behavior information corresponding to the p-th behavior monitoring strategy; sigma (sigma) u p=1 f(p)=b;N py And monitoring the y-th target malicious behavior information corresponding to the strategy for the p-th behavior.
The method is a detection method when the target associated file exists in the file to be detected, the file behavior of the file to be detected is obtained after the comparison of the file characteristics, and if the file behavior is in T 1 If the target association file exists in the file, the following steps are executed.
The target association file is a file with association relation with the file to be detected, wherein the association relation is the relation of downloading, releasing, triggering and the like, and the file to be detected is in T 1 The download, release, trigger and other actions are executed, and corresponding download files, release files and trigger files are generated, then the corresponding generated files are determined to be target associated files, and the present malicious files have the condition of associated information stealing, such as the A file does not execute malicious actions such as information stealing, but after the A file enters a server system, the download actions are executed, the corresponding B file is generated, and the B file executes malicious actions of information stealing, and only the A file executes malicious actions of information stealingThe downloading behavior is not malicious, so that the first file cannot be intercepted or detected by the current security detection method, the first file is released after the current security detection method detects that the first file does not have malicious information, the subsequently generated file cannot be detected, and the corresponding malicious detection is also carried out on the target associated file corresponding to the file to be detected.
The target malicious behavior information is information corresponding to known malicious behaviors or acquired through malicious sample files at present, the malicious behaviors are abnormal file behaviors, each target malicious behavior information corresponds to a behavior monitoring strategy, and the behavior monitoring strategy is a method for monitoring behaviors of files to be detected or target related files by a file detection system.
Step S001, monitoring target malicious behavior information corresponding to the file to be detected and the target associated file through each behavior monitoring strategy;
each behavior monitoring strategy corresponds to a plurality of pieces of target malicious behavior information, namely, each behavior monitoring strategy monitors each corresponding target malicious behavior.
Step S002, if at present T 1 T of (2) 12 Time of day E, M 1 ,M 2 ,...,M g ,...,M h N of (a) p1 ,N p2 ,...,N py ,...,N pf(p) The corresponding behavior features are 1, then at the next T 1 T of (2) 11 And stopping the behavior monitoring of the file to be detected and the target associated file by the p-th behavior monitoring strategy at the moment.
At t 12 At moment, if all target malicious behaviors corresponding to one of the behavior monitoring strategies are detected to be executed, namely the file to be detected and the target associated file are in T 1 And if all the target malicious behaviors corresponding to the behavior monitoring strategy are executed, the behavior monitoring strategy is indicated to have monitored all the corresponding target malicious behaviors, and in order to reduce the system calculation power and save the system resources, the behavior monitoring strategy is stopped.
Step S003, according to R g Determining g-th target associated textAssociation behavior vector M of a piece g
According to the file behavior information corresponding to each target associated file, corresponding associated behavior vectors are determined, and the corresponding target associated file can be known in T through the associated behavior vectors 1 File behavior executed internally.
Wherein the associated behavior vector is determined by:
step S0031, obtaining h third preset behavior feature vectors M according to b target malicious behavior information 1 ,M 2 ,...,M g ,...,M h ;M g =(M g1 ,M g2 ,...,M ga ,...,M gb ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein M is g A third preset behavior feature vector corresponding to the g-th target association file; m is M ga The behavior characteristics corresponding to the a-th target malicious behavior information of the g-th target associated file are obtained; m is M ga Corresponding target malicious behavior information and E a The corresponding target malicious behavior information is the same;
each target association file corresponds to a third preset behavior feature vector, the third preset behavior feature vector is a preset feature vector, the number of features contained in each third preset behavior feature vector is the same, each feature corresponds to one target malicious behavior information, the target malicious behavior information corresponding to the same feature position of different third preset behavior feature vectors is the same, and if the first features of all third preset behavior feature vectors represent the same target malicious behavior information.
Step S0032, traversing M g If M ga Corresponding target malicious behavior information exists in R g In (C), then M ga Is determined to be 1; otherwise, M is ga Determined to be 0;
after h third preset behavior feature vectors are preset, comparing each piece of target malicious behavior information with the file behaviors carried out by each piece of target associated file, if the file behaviors carried out by the target associated file comprise target malicious behaviors, such as information stealing behaviors, determining the behavior feature corresponding to the information stealing behaviors in the third preset behavior feature vectors corresponding to the target associated file as 1, otherwise, determining the corresponding behavior feature as 0 if the file behaviors carried out by the target associated file do not comprise the corresponding target malicious behaviors.
Step S0033, M g And determining the associated action vector of the g-th target associated file.
In each associated behavior vector, if the behavior characteristic is 1, it indicates that the corresponding target associated file is in T 1 The corresponding target malicious behavior is executed in the file, if the behavior characteristic is 0, the corresponding target associated file is represented in T 1 The corresponding target malicious behaviors are not executed, so that whether the corresponding target associated files execute the target malicious behaviors can be known by checking each associated behavior vector.
Step S004, according to E, M 1 ,M 2 ,...,M g ,...,M h Determining a fusion behavior vector;
the fused behavior vector is a vector obtained according to the target behavior vector and all the associated behavior vectors, and represents the behavior executed by the file to be detected and the target associated file together.
S005, inputting the fusion behavior vector into a target model to obtain a corresponding fusion file identifier;
the target model is a model obtained by training according to malicious behaviors of malicious sample files, the fusion behavior vector is input into the target model, the target model outputs a fusion file identifier corresponding to the fusion behavior vector, and whether the file to be detected and the corresponding target associated file are malicious files or not is determined by verifying the fusion file identifier.
And step S006, if the fusion file identifier is a malicious file identifier, determining the file to be detected and each target associated file as a malicious file.
A malicious file detection apparatus 100, as shown in fig. 3, includes:
A file feature acquiring module 110, configured to acquire a file feature of a file to be detected when the file to be detected is received;
the file feature detection module 120 is configured to detect a feature of a file to obtain a detection result corresponding to the file to be detected;
the general behavior acquisition module 130 is configured to acquire a plurality of general file behavior information of the file to be detected in a first preset time period when a detection result corresponding to the file to be detected indicates that the file to be detected is not a malicious file;
the first vector determining module 140 is configured to determine a first behavior vector of a file to be detected according to the plurality of general file behavior information;
the matching degree determining module 150 is configured to obtain a first matching degree of the first behavior vector and each first sample vector according to the first behavior vector and the plurality of first sample vectors; the first sample vector is obtained according to the general file behavior information of the malicious sample file; each first sample vector corresponds to a preset malicious attack type;
the full-volume behavior acquisition module 160 is configured to acquire a plurality of pieces of full-volume file behavior information of the file to be detected in a second preset time period when any one of the first matching degrees is greater than a preset matching degree threshold; the number of the file behavior types corresponding to the full-quantity file behavior information is more than that of the file behavior types corresponding to the general file behavior information; the second preset time period is after the first preset time period;
The second vector determining module 170 is configured to determine a second behavior vector of the file to be detected according to the plurality of full-dose file behavior information;
the file identification determining module 180 is configured to input the second behavior vector into the target model to obtain a corresponding target file identification; the target model is obtained by training according to file behaviors of malicious sample files; the target file identifier is used for identifying whether the file to be detected is a malicious file or not;
the malicious file determining module 190 is configured to determine the file to be detected as a malicious file when the target file identifier is a malicious file identifier.
Embodiments of the present invention also provide a computer program product comprising program code for causing an electronic device to carry out the steps of the method according to the various exemplary embodiments of the invention as described in the specification, when said program product is run on the electronic device.
Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device according to this embodiment of the invention. The electronic device is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present invention.
The electronic device is in the form of a general purpose computing device. Components of an electronic device may include, but are not limited to: the at least one processor, the at least one memory, and a bus connecting the various system components, including the memory and the processor.
Wherein the memory stores program code that is executable by the processor to cause the processor to perform steps according to various exemplary embodiments of the invention described in the "exemplary methods" section of this specification.
The storage may include readable media in the form of volatile storage, such as Random Access Memory (RAM) and/or cache memory, and may further include Read Only Memory (ROM).
The storage may also include a program/utility having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The bus may be one or more of several types of bus structures including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device, and/or with any device (e.g., router, modem, etc.) that enables the electronic device to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface. And, the electronic device may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter. As shown, the network adapter communicates with other modules of the electronic device over a bus. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with an electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. A malicious file detection method, applied to a file detection system, the method comprising the steps of:
responding to receiving a file to be detected, and acquiring file characteristics of the file to be detected;
detecting the file characteristics to obtain a detection result corresponding to the file to be detected;
If the detection result corresponding to the file to be detected indicates that the file to be detected is not a malicious file, acquiring a plurality of pieces of general file behavior information of the file to be detected in a first preset time period; each piece of general file behavior information corresponds to a general file behavior, and the general file behavior represents file behaviors included in a plurality of malicious behavior types corresponding to a plurality of malicious sample files;
determining a first behavior vector of the file to be detected according to the plurality of universal file behavior information;
obtaining a first matching degree of the first behavior vector and each first sample vector according to the first behavior vector and a plurality of first sample vectors; the first sample vector is obtained according to general file behavior information of a malicious sample file; each first sample vector corresponds to a preset malicious attack type;
if any one of the first matching degree is larger than a preset matching degree threshold value, acquiring a plurality of pieces of full-quantity file behavior information of the file to be detected in a second preset time period; the number of the file behavior types corresponding to the full-quantity file behavior information is more than the number of the file behavior types corresponding to the general file behavior information; the second preset time period is after the first preset time period; each piece of full-quantity file behavior information corresponds to a full-quantity file behavior, wherein the full-quantity file behavior is all file behaviors of the file to be detected in a second preset time period;
Determining a second behavior vector of the file to be detected according to the full-quantity file behavior information;
inputting the second behavior vector into a target model to obtain a corresponding target file identifier; the target model is obtained by training according to file behaviors of malicious sample files; the target file identifier is used for identifying whether the file to be detected is a malicious file or not;
and if the target file identifier is a malicious file identifier, determining the file to be detected as a malicious file.
2. The method of claim 1, wherein the file characteristics include at least one of:
the hash value, the file structure information, the MD5 value and the file code characteristic of the file to be detected;
the step of detecting the file characteristics to obtain a detection result corresponding to the file to be detected comprises the following steps:
comparing the file characteristics of the file to be detected with preset abnormal file characteristics;
if the file characteristics of any file to be detected are the same as the corresponding preset abnormal file characteristics, the detection result indicates that the file to be detected is a malicious file; otherwise, the detection result indicates that the file to be detected is not a malicious file.
3. The method of claim 1, wherein the generic file behavior information is determined by:
obtaining m malicious sample files in a third preset time period T 3 =[t 31 ,t 32 ]Obtaining a sample file behavior information set F= (F) by multiple file behavior information of multiple behaviors performed internally 1 ,F 2 ,...,F j ,...,F m );F j =(F j1 ,F j2 ,...,F jd ,...,F jf(j) ) The method comprises the steps of carrying out a first treatment on the surface of the Where j=1, 2, m; d=1, 2,., f (j); f (j) is the j-th malicious sample file at T 3 The number of file behavior information for the behavior performed internally; f (F) j A file behavior information list corresponding to the jth malicious sample file; f (F) jd At T for jth malicious sample file 3 The d-th file behavior information performed internally; t is t 31 <t 32 ;t 31 Is T 3 Corresponding start time; t is t 32 Is T 3 A corresponding deadline;
performing de-duplication treatment on the F to obtain b pieces of target malicious behavior information; each piece of target malicious behavior information corresponds to a plurality of malicious behavior type identifiers;
grouping b pieces of target malicious behavior information according to a plurality of malicious behavior type identifications, determining z malicious behavior type identification groups, and obtaining a malicious behavior type identification set L= (L) 1 ,L 2 ,...,L x ,...,L z );L x =(L x1 ,L x2 ,...,L xv ,...,L xf(x) ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein x=1, 2,..z; v=1, 2,. -%, f (x); f (x) is the number of target malicious behavior information in the xth malicious behavior type identification group; l (L) x Identifying a target malicious behavior information list corresponding to the group for the xth malicious behavior type; l (L) xv Identifying a v-th target malicious for inclusion in a group for an x-th malicious behavior typeBehavior information;
will L 1 ,L 2 ,...,L x ,...,L z And determining w pieces of target malicious behavior information obtained after intersection processing as general file behavior information.
4. The method of claim 3, wherein the obtaining the plurality of general file behavior information of the file to be detected performed in the first preset time period includes:
during a first preset period of time T 1 After the completion, acquiring a plurality of pieces of general file behavior information of the file to be detected to obtain a general file behavior information set J= (J) 1 ,J 2 ,...,J s ,...,J o ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein s=1, 2, once again, o; o is less than or equal to w; o is that the file to be detected is at T 1 The number of generic file behavior information performed internally; j (J) s For the file to be detected at T 1 The s-th general file behavior information of the internal process; t (T) 1 =[t 11 ,t 12 ];t 12 >t 11 >t 32 ;(t 12 -t 11 )=(t 32 -t 31 );t 11 Is T 1 Corresponding start time; t is t 12 Is T 1 Corresponding deadlines.
5. The method of claim 4, wherein the first behavior vector is determined by:
obtaining a general preset behavior feature vector A= (A) according to w general file behavior information 1 ,A 2 ,...,A c ,...,A w ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein c=1, 2, once again, w; a is that c The behavior characteristics corresponding to the behavior information of the c-th general file in the A are obtained;
traversing A, if A c If the corresponding general file behavior information exists in J, then A is c Is determined to be 1; otherwise, will A c Determined to be 0;
and determining A as a first behavior vector of the file to be detected.
6. The method of claim 5, wherein the obtaining the plurality of full-volume file behavior information of the file to be detected performed in the second preset time period includes:
during a second preset period of time T 2 After the completion, acquiring a plurality of full-volume file behavior information of a plurality of behaviors of the file to be detected to obtain a full-volume file behavior information set Q= (Q) 1 ,Q 2 ,...,Q i ,...,Q n ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein i=1, 2, n; n is T of the file to be detected 2 The number of full-volume file behavior information of the behaviors performed internally; q (Q) i For the file to be detected at T 2 The i-th full-quantity file behavior information performed internally; t (T) 2 =[t 21 ,t 22 ];t 22 >t 21 >t 11 ;t 21 Is T 2 Corresponding start time; t is t 22 Is T 2 Corresponding deadlines.
7. The method of claim 6, wherein the second behavior vector is determined by:
obtaining a first preset behavior feature vector E= (E) according to the b target malicious behavior information 1 ,E 2 ,...,E a ,...,E b ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein a=1, 2, b; e (E) a The behavior characteristics corresponding to the a-th target malicious behavior information in E are obtained;
traversing E, if E a If the corresponding target malicious behavior information exists in Q, E is determined to be a Is determined to be 1; otherwise, will E a Determined to be 0;
and E is determined to be a second behavior vector of the file to be detected.
8. A malicious file detection apparatus, comprising:
the file characteristic acquisition module is used for acquiring file characteristics of the file to be detected when the file to be detected is received;
the file feature detection module is used for detecting file features to obtain detection results corresponding to the files to be detected;
the universal behavior acquisition module is used for acquiring a plurality of pieces of universal file behavior information of the file to be detected in a first preset time period when the detection result corresponding to the file to be detected indicates that the file to be detected is not a malicious file; each piece of general file behavior information corresponds to a general file behavior, and the general file behavior represents file behaviors included in a plurality of malicious behavior types corresponding to a plurality of malicious sample files;
the first vector determining module is used for determining a first behavior vector of the file to be detected according to the behavior information of the plurality of universal files;
The matching degree determining module is used for obtaining a first matching degree of the first behavior vector and each first sample vector according to the first behavior vector and the plurality of first sample vectors; the first sample vector is obtained according to the general file behavior information of the malicious sample file; each first sample vector corresponds to a preset malicious attack type;
the full-quantity behavior acquisition module is used for acquiring a plurality of pieces of full-quantity file behavior information of the file to be detected in a second preset time period when any first matching degree is larger than a preset matching degree threshold value; the number of the file behavior types corresponding to the full-quantity file behavior information is more than that of the file behavior types corresponding to the general file behavior information; the second preset time period is after the first preset time period; each full-volume file behavior information corresponds to a full-volume file behavior, wherein the full-volume file behavior is all file behaviors of the file to be detected in a second preset time period;
the second vector determining module is used for determining a second behavior vector of the file to be detected according to the behavior information of the plurality of full files;
the file identification determining module is used for inputting the second behavior vector into the target model to obtain a corresponding target file identification; the target model is obtained by training according to file behaviors of malicious sample files; the target file identifier is used for identifying whether the file to be detected is a malicious file or not;
And the malicious file determining module is used for determining the file to be detected as a malicious file when the target file identifier is a malicious file identifier.
9. A non-transitory computer readable storage medium having stored therein at least one instruction or at least one program, wherein the at least one instruction or the at least one program is loaded and executed by a processor to implement the malicious file detection method of any one of claims 1-7.
10. An electronic device comprising a processor and the non-transitory computer readable storage medium of claim 9.
CN202311131112.6A 2023-09-04 2023-09-04 Malicious file detection method, device, equipment and medium Active CN116861430B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311131112.6A CN116861430B (en) 2023-09-04 2023-09-04 Malicious file detection method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311131112.6A CN116861430B (en) 2023-09-04 2023-09-04 Malicious file detection method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN116861430A CN116861430A (en) 2023-10-10
CN116861430B true CN116861430B (en) 2023-11-17

Family

ID=88229042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311131112.6A Active CN116861430B (en) 2023-09-04 2023-09-04 Malicious file detection method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116861430B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056915B (en) * 2023-10-11 2024-02-02 深圳安天网络安全技术有限公司 File detection method and device, medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740707A (en) * 2016-01-20 2016-07-06 北京京东尚科信息技术有限公司 Malicious file identification method and device
CN110399720A (en) * 2018-12-14 2019-11-01 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of file detection
CN110889113A (en) * 2019-10-30 2020-03-17 泰康保险集团股份有限公司 Log analysis method, server, electronic device and storage medium
CN113536300A (en) * 2021-07-12 2021-10-22 杭州安恒信息技术股份有限公司 PDF file trust filtering and analyzing method, device, equipment and medium
CN116578537A (en) * 2023-07-12 2023-08-11 北京安天网络安全技术有限公司 File detection method, readable storage medium and electronic device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10878090B2 (en) * 2017-10-18 2020-12-29 AO Kaspersky Lab System and method of detecting malicious files using a trained machine learning model
RU2739830C1 (en) * 2019-09-30 2020-12-28 Акционерное общество "Лаборатория Касперского" System and method of selecting means of detecting malicious files

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740707A (en) * 2016-01-20 2016-07-06 北京京东尚科信息技术有限公司 Malicious file identification method and device
CN110399720A (en) * 2018-12-14 2019-11-01 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of file detection
CN110889113A (en) * 2019-10-30 2020-03-17 泰康保险集团股份有限公司 Log analysis method, server, electronic device and storage medium
CN113536300A (en) * 2021-07-12 2021-10-22 杭州安恒信息技术股份有限公司 PDF file trust filtering and analyzing method, device, equipment and medium
CN116578537A (en) * 2023-07-12 2023-08-11 北京安天网络安全技术有限公司 File detection method, readable storage medium and electronic device

Also Published As

Publication number Publication date
CN116861430A (en) 2023-10-10

Similar Documents

Publication Publication Date Title
US8108536B1 (en) Systems and methods for determining the trustworthiness of a server in a streaming environment
CN107391359B (en) Service testing method and device
CN116861430B (en) Malicious file detection method, device, equipment and medium
WO2020000743A1 (en) Webshell detection method and related device
JP2019079492A (en) System and method for detection of anomalous events on the basis of popularity of convolutions
WO2021196314A1 (en) Device health monitoring and early-warning method and system, storage medium, and device
CN110502399B (en) Fault detection method and device
CN114553543A (en) Network attack detection method, hardware chip and electronic equipment
CN117033146A (en) Identification method, device, equipment and medium for appointed consensus contract execution process
CN116861428B (en) Malicious detection method, device, equipment and medium based on associated files
CN116881913B (en) Staged malicious file detection method, device, equipment and medium
CN116861429B (en) Malicious detection method, device, equipment and medium based on sample behaviors
CN113312619B (en) Malicious process detection method and device based on small sample learning, electronic equipment and storage medium
CN115964701A (en) Application security detection method and device, storage medium and electronic equipment
CN113238815B (en) Interface access control method, device, equipment and storage medium
CN111382017A (en) Fault query method, device, server and storage medium
CN114925365A (en) File processing method and device, electronic equipment and storage medium
CN110442508B (en) Test task processing method, device, equipment and medium
CN116992439B (en) User behavior habit model determining method, device, equipment and medium
CN117009961B (en) Method, device, equipment and medium for determining behavior detection rule
CN116760644B (en) Terminal abnormality judging method, system, storage medium and electronic equipment
CN115022002B (en) Verification mode determining method and device, storage medium and electronic equipment
CN114710354B (en) Abnormal event detection method and device, storage medium and electronic equipment
CN110569646B (en) File recognition method and medium
US20240054213A1 (en) Attack information generation apparatus, control method, and non-transitory computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant