CN110868405B - Malicious code detection method and device, computer equipment and storage medium - Google Patents

Malicious code detection method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN110868405B
CN110868405B CN201911071345.5A CN201911071345A CN110868405B CN 110868405 B CN110868405 B CN 110868405B CN 201911071345 A CN201911071345 A CN 201911071345A CN 110868405 B CN110868405 B CN 110868405B
Authority
CN
China
Prior art keywords
sequence
calling
malicious
risk value
api
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911071345.5A
Other languages
Chinese (zh)
Other versions
CN110868405A (en
Inventor
梁志宏
胡朝辉
陈佳捷
罗强
高健
伍思廉
郑伟文
吴佩泽
彭伯庄
王金贺
陈鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Southern Power Grid Digital Platform Technology Guangdong Co ltd
Original Assignee
Southern Power Grid Digital Grid Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern Power Grid Digital Grid Research Institute Co Ltd filed Critical Southern Power Grid Digital Grid Research Institute Co Ltd
Priority to CN201911071345.5A priority Critical patent/CN110868405B/en
Publication of CN110868405A publication Critical patent/CN110868405A/en
Application granted granted Critical
Publication of CN110868405B publication Critical patent/CN110868405B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements

Abstract

The application discloses a malicious code detection method and device, computer equipment and a storage medium, and relates to the technical field of information security. In the method, a server of the network equipment can select a first calling sequence and a second calling sequence from an Application Program Interface (API) sequence set, respectively calculate a normal risk value and a malicious risk value of the first calling sequence, and a normal risk value and a malicious risk value of the second calling sequence, and label a target sequence according to the normal risk value and the malicious risk value of the first calling sequence and the normal risk value and the malicious risk value of the second calling sequence to obtain a labeling result; and removing the target sequence from the API sequence set, taking the API sequence set without the target sequence as the next circulating API sequence set, and circularly detecting and labeling the calling sequences to be detected until all the calling sequences in the API sequence set are labeled. According to the technical scheme, the malicious code processing efficiency can be improved.

Description

Malicious code detection method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of information security technologies, and in particular, to a malicious code detection method and apparatus, a computer device, and a storage medium.
Background
The network attack is a behavior of attacking an information system and data resources of the network equipment by using vulnerabilities and security defects existing in the network information system, and specifically, the network attack can tamper the authority of the attacked network equipment so as to steal files; and the attacked network equipment can also refuse service, so that the user can not normally use the network equipment, thereby bringing huge loss to the user.
In the prior art, a method for detecting whether a malicious code exists in a target file received by a network device is provided, and the method includes: the method comprises the steps of obtaining dynamic action information of a target file to be detected, wherein the dynamic action information comprises action information and access information generated after the target file runs, and judging that malicious codes exist in the target file when the action information cannot pass a security baseline verification or the access information points to a core unit of network equipment.
However, when the malicious code exists in the target file, the source code of the target file is detected again to determine the location of the malicious code, which requires additional time and labor, and results in low efficiency in processing the malicious code.
Disclosure of Invention
Based on this, it is necessary to provide a malicious code detection method, apparatus, computer device and storage medium for solving the above-mentioned problem of the specific location of the undeterminable malicious code in the target file.
In a first aspect, an embodiment of the present application provides a malicious code detection method, where the method includes:
selecting a first calling sequence and a second calling sequence from an Application Program Interface (API) sequence set, wherein the API sequence set comprises a plurality of calling sequences to be detected;
respectively calculating a normal risk value and a malicious risk value of the first calling sequence and a normal risk value and a malicious risk value of the second calling sequence, wherein the normal risk value represents the probability of malicious codes existing in the calling sequences; the malicious risk value represents the probability of no malicious code existing in the calling sequence;
labeling a target sequence according to a normal risk value of the first calling sequence, a malicious risk value of the first calling sequence, a normal risk value of the second calling sequence and a malicious risk value of the second calling sequence to obtain a labeling result, wherein the target sequence is the first calling sequence or the second calling sequence, and the labeling result comprises the existence of malicious codes or the absence of malicious codes;
and removing the target sequence from the API sequence set, taking the API sequence set without the target sequence as the next circulating API sequence set, and circularly detecting and labeling the calling sequences to be detected until all the calling sequences in the API sequence set are labeled.
In one embodiment, calculating the normal risk value and the malicious risk value for the first call sequence comprises:
acquiring a malicious sample set and a normal sample set, wherein the malicious sample set comprises a plurality of known malicious calling sequences containing malicious codes; the normal sample set comprises a plurality of normal calling sequences which are known not to contain malicious code;
respectively calculating the malicious similarity between the first calling sequence and the malicious calling sequence aiming at each malicious calling sequence in the malicious sample set; respectively calculating the normal similarity of the first calling sequence and the normal calling sequence aiming at each normal calling sequence in the normal sample set;
acquiring a malicious predicted value corresponding to each malicious calling sequence, a normal predicted value corresponding to each normal calling sequence and a predicted value of the first calling sequence;
calculating a malicious risk value of the first calling sequence according to the malicious predicted value, the malicious similarity and the predicted value of the first calling sequence; and calculating the normal risk value of the first calling sequence according to the normal predicted value, the normal similarity and the predicted value of the first calling sequence.
In one embodiment, the labeling the target sequence according to the normal risk value and the malicious risk value of the first call sequence and the normal risk value and the malicious risk value of the second call sequence to obtain a labeling result includes:
selecting a minimum risk value from the normal risk value of the first calling sequence, the malicious risk value of the first calling sequence, the normal risk value of the second calling sequence and the malicious risk value of the second calling sequence;
determining the calling sequence corresponding to the minimum risk value as a target sequence;
and labeling the target sequence according to the minimum risk value to obtain a labeling result.
In one embodiment, labeling the target sequence according to the minimum risk value to obtain a labeling result includes:
when the minimum risk value is the normal risk value of the first calling sequence or the normal risk value of the second calling sequence, marking that the target sequence has no malicious code;
and when the minimum risk value is the malicious risk value of the first calling sequence or the malicious risk value of the second calling sequence, marking that the target sequence has malicious codes.
In one embodiment, selecting the first call sequence and the second call sequence from the API sequence set includes:
respectively calculating the Hamming distance of each two call sequences to be detected in the API sequence set;
and selecting two calling sequences with the maximum Hamming distance as a first calling sequence and a second calling sequence.
In one embodiment, before selecting the first call sequence and the second call sequence from the API sequence set, the method further comprises:
running the received target file to be detected in the virtual sandbox, and acquiring a calling sequence corresponding to an API (application program interface) function of the target file;
and for each calling sequence, obtaining a feature vector of the calling sequence to form an API sequence set.
In one embodiment, the step of taking the API sequence set excluding the target sequence as the API sequence set of the next cycle includes:
stopping detection when the API sequence set circulated next time only comprises a calling sequence to be detected;
and sending a detection instruction to the manual detection terminal, wherein the detection instruction is used for indicating manual detection of the call sequence to be detected in the API sequence set of the next cycle.
In a second aspect, an embodiment of the present application provides a malicious code detection apparatus, where the apparatus includes:
the calling sequence selection module is used for selecting a first calling sequence and a second calling sequence from an Application Program Interface (API) sequence set, wherein the API sequence set comprises a plurality of calling sequences to be detected;
the risk calculation module is used for calculating a normal risk value and a malicious risk value of the first calling sequence and a normal risk value and a malicious risk value of the second calling sequence respectively, wherein the normal risk value represents the probability of malicious codes existing in the calling sequences; the malicious risk value represents the probability of no malicious code existing in the calling sequence;
the marking module is used for marking the target sequence according to the normal risk value of the first calling sequence, the malicious risk value of the first calling sequence, the normal risk value of the second calling sequence and the malicious risk value of the second calling sequence to obtain a marking result, the target sequence is the first calling sequence or the second calling sequence, and the marking result comprises the existence of malicious codes or the absence of malicious codes;
and the cyclic processing module is used for removing the target sequence from the API sequence set, taking the API sequence set without the target sequence as the next cyclic API sequence set, and carrying out cyclic detection and labeling on the plurality of calling sequences to be detected until all calling sequences in the API sequence set are labeled.
In a third aspect, there is provided a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, performs the steps of the method of the first aspect described above.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method of the first aspect described above.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
a server (hereinafter, referred to as a server) of the network device may select a first call sequence and a second call sequence from an Application Programming Interface (API) sequence set, where the API sequence set includes a plurality of call sequences to be detected. The server can respectively calculate a normal risk value and a malicious risk value of the first calling sequence and a normal risk value and a malicious risk value of the second calling sequence, wherein the normal risk value represents the probability of malicious codes existing in the calling sequences; the malicious risk value represents the probability that malicious code is not present in the invocation sequence. The server can label the target sequence according to the normal risk value of the first calling sequence, the malicious risk value of the first calling sequence, the normal risk value of the second calling sequence and the malicious risk value of the second calling sequence to obtain a labeling result, wherein the target sequence is the first calling sequence or the second calling sequence, and the labeling result comprises the existence of malicious codes or the absence of malicious codes. The server can remove the target sequence from the API sequence set, the API sequence set without the target sequence is used as the next circulating API sequence set, and the cyclic detection and labeling are carried out on the plurality of calling sequences to be detected until all the calling sequences in the API sequence set are labeled. Therefore, in the embodiment of the application, the server of the network device marks all the calling sequences, so that whether malicious codes exist in each calling sequence of the multiple calling sequences to be detected or not can be determined, and in the process of determining whether the malicious codes exist in the target file, the calling sequence in which the malicious codes exist can be directly determined, and a user can directly process the calling sequences in which the malicious codes exist.
Drawings
Fig. 1 is a schematic diagram of an implementation environment of a malicious code detection method according to an embodiment of the present application;
fig. 2 is a schematic diagram of another implementation environment of a malicious code detection method according to an embodiment of the present disclosure;
fig. 3 is a flowchart of a malicious code detection method according to an embodiment of the present disclosure;
fig. 4 is a flowchart of another malicious code detection method according to an embodiment of the present disclosure;
fig. 5 is a flowchart of another malicious code detection method according to an embodiment of the present disclosure;
fig. 6 is a flowchart of another malicious code detection method according to an embodiment of the present disclosure;
fig. 7 is a block diagram of a malicious code detection apparatus according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
With the development of computer technology, network devices are more and more widely used, and the number of files transmitted between the network devices is increased dramatically. Wherein, part of the file may be embedded with malicious code, which refers to computer code that is intentionally programmed or set up to cause a threat or potential threat to a network or system, such as: computer viruses, trojan horses, and the like. The malicious codes can perform actions such as anonymous advertisement pushing, silent software downloading, even fee stealing and the like, and when the network equipment opens a file carrying the malicious codes, the network equipment can be attacked by the network. The network attack is a behavior of attacking an information system and data resources of the network equipment by using vulnerabilities and security defects existing in the network information system, and specifically, the network attack can tamper the authority of the attacked network equipment so as to steal files; and the attacked network equipment can also refuse service, so that the user can not normally use the network equipment, thereby bringing huge loss to the user.
In the prior art, a method for detecting malicious codes is provided, which detects a target file received by a network device, obtains dynamic action information of the target file to be detected, where the dynamic action information includes action information and access information generated after the target file runs, and determines that the malicious codes exist in the target file when the action information cannot be checked through a security baseline or the access information points to a core unit of the network device. However, the method cannot directly determine the position of the malicious code when the target file is determined to have the malicious code, and therefore, when the target file has the malicious code, the source code of the target file needs to be detected again to determine the position of the malicious code, and the malicious code needs to be processed.
According to the method, when the malicious codes exist in the target file, the source codes of the target file are detected again to determine the positions of the malicious codes, extra time and labor are needed, and the processing efficiency of the malicious codes is low.
The malicious code detection method, the malicious code detection device, the computer equipment and the storage medium can improve the processing efficiency of the malicious code. In the method, a server (hereinafter referred to as a server) of the network device may select a first call sequence and a second call sequence from an Application Program Interface (API) sequence set, where the API sequence set includes a plurality of call sequences to be detected. The server can respectively calculate a normal risk value and a malicious risk value of the first calling sequence and a normal risk value and a malicious risk value of the second calling sequence, wherein the normal risk value represents the probability of malicious codes existing in the calling sequences; the malicious risk value represents the probability that malicious code is not present in the invocation sequence. The server can label the target sequence according to the normal risk value of the first calling sequence, the malicious risk value of the first calling sequence, the normal risk value of the second calling sequence and the malicious risk value of the second calling sequence to obtain a labeling result, wherein the target sequence is the first calling sequence or the second calling sequence, and the labeling result comprises the existence of malicious codes or the absence of malicious codes. The server can remove the target sequence from the API sequence set, the API sequence set without the target sequence is used as the next circulating API sequence set, and the cyclic detection and labeling are carried out on the plurality of calling sequences to be detected until all the calling sequences in the API sequence set are labeled. Therefore, in the embodiment of the application, the server of the network device marks all the calling sequences, so that whether malicious codes exist in each calling sequence of the multiple calling sequences to be detected or not can be determined, and in the process of determining whether the malicious codes exist in the target file, the calling sequence in which the malicious codes exist can be directly determined, and a user can directly process the calling sequences in which the malicious codes exist.
In the following, a brief description will be given of an implementation environment related to the malicious code detection method provided in the embodiment of the present application.
Referring to fig. 1, fig. 1 is a schematic diagram of an implementation environment related to a malicious code detection method provided in an embodiment of the present application, where the implementation environment may be as shown in fig. 1, and includes a network device (fig. 1 shows a computer) in which a malicious code detection program is installed on a server, where the malicious code detection program may be called by the server of the network device to implement the malicious code detection method provided in the embodiment of the present application.
Optionally, in this embodiment of the present application, the network device may be a router, a computer, a switch, and the like.
Referring to fig. 2, a server of a network device (hereinafter, referred to as a server) is provided, an internal structure of the server may be as shown in fig. 2, and the server includes a processor, a memory, a network interface, and a database connected through a system bus. Wherein the processor of the server is configured to provide computing and control capabilities. The memory of the server comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the server is used for storing a malicious sample set and a normal sample set, wherein the malicious sample set comprises a plurality of known malicious calling sequences containing malicious codes; the normal sample set includes a plurality of normal call sequences that are known to contain no malicious code. The network interface of the server is used for communicating with an external terminal through network connection. The computer program is executed by a processor to implement a malicious code detection method.
The structure shown in fig. 2 is a block diagram of only a portion of the structure associated with the present application, and does not constitute a limitation on the network devices to which the present application is applied, and a particular network device may include more or less components than those shown in fig. 2, or combine certain components, or have a different arrangement of components.
Referring to fig. 3, a flowchart of a malicious code detection method provided by an embodiment of the present application is shown, where the malicious code detection method may be applied to the server shown in fig. 2. As shown in fig. 3, the malicious code detection method may include the steps of:
step 301, the server selects a first calling sequence and a second calling sequence from the API sequence set.
In this embodiment of the application, the API sequence set includes a plurality of call sequences to be detected, and the first call sequence and the second call sequence may be two call sequences of the plurality of call sequences to be detected.
In an alternative implementation, as shown in fig. 4, before the server selects the first call sequence and the second call sequence from the API sequence set, the following steps 401 to 402 are further included:
step 401, after the network device receives the target file, the server may operate the received target file to be detected in the virtual sandbox to obtain a call sequence corresponding to the API function of the target file.
The virtual sandbox refers to a virtual system program, can run in a virtual environment to run a target file, and can delete changes generated by running the target file. The virtual sandbox can direct files generated and modified by operating the target file to a folder of the virtual sandbox through a redirection technology, so that the target file is prevented from modifying the local system file. Therefore, the attack on the local system by the malicious codes possibly appearing in the target file can be avoided.
The API calling sequence obtained by the server refers to the combination of API calls, and the API calling sequence is formed by a plurality of API calls based on the front and back dependency relations.
In the embodiment of the application, when the external world sends the target file to the network device through the network protocol, the server places the received target file in the virtual sandbox for operation, and in the operation process, the server can obtain the static action information of the target file, obtain the source code of the target file according to the static action information, and extract the API calling sequence from the source code of the target file.
In this embodiment, the static action information includes an MD5 (english: MD5Message-Digest Algorithm; abbreviation: MD5 information Digest Algorithm) value of the target file, and the process of the server obtaining the source code of the target file according to the static action information may be: and the server judges whether to call a shelling tool according to the MD5 value of the target file, and when the MD5 value of the target file is greater than a threshold value, the shelling tool is adopted to obtain the source code of the target file. When the MD5 value of the target file is less than or equal to the threshold, then no shelling tool is needed. Note that "shelling" is the inverse operation of adding shells to software. Software shelling refers to setting a program which is specially responsible for protecting software from being illegally modified or decompiled on written software.
Step 402, for each calling sequence, the server obtains the feature vector of the calling sequence to form an API sequence set.
In the embodiment of the application, the calling sequence corresponding to the API function can be processed through a locality sensitive hash sim-hash algorithm to obtain the feature vector H of the binary API calling sequenceiFeature vectors for multiple call sequences may constitute a set of API sequences.
In an alternative implementation manner, in the embodiment of the present application, two call sequences may be arbitrarily selected from the API sequence set as the first call sequence and the second call sequence.
In an alternative implementation, in order to increase the difference between the first call sequence and the second call sequence, so as to distinguish the first call sequence from the second call sequence, the process of the server selecting the first call sequence and the second call sequence from the API sequence set may include the following steps B1-B2:
and step B1, the server calculates the Hamming distance of each two call sequences to be detected in the API sequence set respectively.
The hamming distance is used to indicate the different number of corresponding bits of two (same length) words, for example: codeword a is 10001001 and codeword B is 10110001, so the number of different characters in codeword a and codeword B is 3, which means that the hamming distance between codeword a and codeword B is 3.
The server can calculate the hamming distance of any two call sequences in the API sequence set.
Optionally, formula (1) may be used to calculate the hamming distance between every two call sequences in the API sequence set:
Figure BDA0002261044510000111
in the formula (1), yrA bit value, z, corresponding to a calling sequence in the API sequence setrBit value, D, corresponding to another calling sequence in the API sequence setham(y, z) is the Hamming distance, r is the number of groups in the API sequence set, grouped two by two, and m is the sample capacity.
And step B2, the server selects two calling sequences with the maximum Hamming distance as a first calling sequence and a second calling sequence.
The larger the hamming distance is, the lower the similarity between two code words is, and the smaller the hamming distance is, the higher the similarity between two code words is.
In the embodiment of the application, two calling sequences with the largest Hamming distance are selected, namely two calling sequences with the lowest similarity are selected as a first calling sequence and a second calling sequence respectively.
Step 302, the server calculates a normal risk value and a malicious risk value of the first calling sequence and a normal risk value and a malicious risk value of the second calling sequence respectively.
Wherein the normal risk value represents a probability that malicious code exists in the call sequence; the malicious risk value represents the probability that malicious code is not present in the invocation sequence. Wherein, the larger the normal risk value of the calling sequence is, the lower the possibility that the calling sequence is a normal sequence is. The smaller the normal risk value of the call sequence, the higher the probability that the call sequence is a normal sequence. The greater the malicious risk value of the call sequence, the lower the likelihood that the call sequence is a malicious sequence. The smaller the malicious risk value of the call sequence, the higher the probability that the call sequence is a malicious sequence.
In the embodiment of the application, the server may calculate the respective normal risk value and the malicious risk value for the first call sequence and the second call sequence respectively. In an alternative implementation manner, taking the first call sequence as an example, in this embodiment of the application, as shown in fig. 5, a process of the server calculating a normal risk value and a malicious risk value of the first call sequence may include the following steps:
step 501, the server obtains a malicious sample set and a normal sample set.
In the embodiment of the application, Advanced Persistent Threat APT (Advanced Persistent Threat) team can be tracked, various main types of malicious codes such as Backdoor, Trojan (Chinese: Trojan Virus), Virus (Chinese: Virus) and Worm (Chinese: Worm) are collected through malicious code sharing websites such as VXHeavens and Malshare, malicious call sequences corresponding to API functions corresponding to the malicious codes are obtained, known malicious call sequences containing the malicious codes are processed through a sim-hash algorithm, and feature vectors H of binary malicious call sequences are obtainedi-', feature vector H of multiple malicious call sequencesi-' combining forms a set of malicious samples.
Meanwhile, in the embodiment of the application, the server can acquire the normal calling sequence corresponding to the API function corresponding to the known normal code without the malicious code, and the known normal calling sequence containing the normal code is processed through the sim-hash algorithm to obtain the characteristic of the binary normal calling sequenceVector Hi+', feature vector H of multiple normal call sequencesi+' combining to form a normal sample set.
Step 502, aiming at each malicious calling sequence in the malicious sample set, the server respectively calculates the malicious similarity between the first calling sequence and the malicious calling sequence; and aiming at each normal calling sequence in the normal sample set, the server respectively calculates the normal similarity of the first calling sequence and the normal calling sequence.
In the embodiment of the present application, formula (2) may be adopted to calculate the similarity between the first call sequence and each malicious call sequence and each normal call sequence. In the embodiment of the present application, for convenience of distinguishing, a similarity between the first call sequence and the malicious call sequence is referred to as a malicious similarity, and a similarity between the first call sequence and the normal sequence is referred to as a normal similarity.
Figure BDA0002261044510000121
Wherein, sim (H)i,Hi') is a similarity measure, yrA bit value, z, corresponding to a malicious (or normal) calling sequence in the malicious sample set (or normal sample set)rAnd the bit value corresponding to the first calling sequence is r is the number of groups grouped in pairs in the API sequence set, and m is the sample capacity.
Optionally, in order to distinguish the malicious sample set from the normal sample set, in this embodiment of the present application, H may be usedi+' denotes a normal call sequence corresponding to a normal sample set, Hi-' denotes a malicious call sequence corresponding to a malicious sample set. Then, the malicious similarity of the first call sequence to the malicious call sequence can be expressed as: sim (H)i,Hi-'), the normal similarity of the first call sequence to the normal call sequence can be expressed as: sim (H)i,Hi+')。
For example, in the embodiment of the present application, it is assumed that the normal sample set includes 5 eigenvectors Hi+', denoted L1, L2, L3, L4 and L5, respectively. The malicious sample set comprises 5Feature vector Hi-', denoted L6, L7, L8, L9 and L10, respectively. The first sequence of calls is denoted A1, then the server can calculate the normal degree of similarity between A1L1, A1L2, A1L3, A1L4 and A1L5, hereinafter referred to as A1L1, A1L2, A1L3, A1L4 and A1L 5. Accordingly, the malicious similarity of the first call sequence to each of the malicious call sequences may be expressed as: A1L6, A1L7, A1L8, A1L9, and A1L 10.
Step 503, the server obtains a malicious predicted value corresponding to each malicious calling sequence, a normal predicted value corresponding to each normal calling sequence, and a predicted value of the first calling sequence.
In the embodiment of the application, a classifier C comprising a random forest algorithm is established, a malicious sample set and a normal sample set are respectively input into the classifier C for training, and a prediction result C of the classifier C is obtainedi. In the embodiment of the application, the result of classifying the malicious call sequence in the malicious sample set by the classifier can be Ci-Indicating that the result of the classifier classifying the normal call sequence in the normal sample set can be Ci+And (4) showing. The prediction result represents the probability that no malicious code exists in the call sequence or the probability that malicious code exists in the call sequence.
By taking the above example as an example, five prediction results can be obtained after classifying the normal sample sets of L1, L2, L3, L4 and L5, which are respectively L1Ci+、L2Ci+、L3Ci+、L4Ci+And L5Ci+And (4) showing.
Five prediction results can be obtained after classification aiming at L6, L7, L8, L9 and L10 in a malicious sample set, and the five prediction results are respectively used as L6Ci-、L7Ci-、L8Ci-、L9Ci-And L10Ci-And (4) showing.
Meanwhile, the server can also input the first calling sequence A1 into the classifier to obtain the prediction result of A1, A1Ci' means.
Step 504, the server calculates a malicious risk value of the first calling sequence according to the malicious predicted value, the malicious similarity and the predicted value of the first calling sequence; and calculating the normal risk value of the first calling sequence according to the normal predicted value, the normal similarity and the predicted value of the first calling sequence.
In the embodiment of the application, the server may calculate the normal risk value of the first call sequence according to formula (3), and calculate the malicious risk value of the first call sequence according to formula (4).
RS+=∑(Ci+-Ci)2/sim(Hi,Hi') equation (3).
RS-=∑(Ci--Ci)2/sim(Hi,Hi') equation (4).
Taking the above example, the normal risk value A1R for the first call sequenceS+Can be expressed as:
Figure BDA0002261044510000141
malicious risk value A1R for a first sequence of callsS-Can be expressed as:
Figure BDA0002261044510000142
based on the same principle of steps 501 to 504, in the embodiment of the present application, the server may calculate the normal risk value and the malicious risk value of the second call sequence, which are respectively used as A2RS+And A2RS-And (4) showing.
Step 303, the server labels the target sequence according to the normal risk value of the first calling sequence, the malicious risk value of the first calling sequence, the normal risk value of the second calling sequence and the malicious risk value of the second calling sequence, and obtains a labeling result.
The target sequence is a first calling sequence or a second calling sequence, and the labeling result comprises the existence of malicious codes or the absence of the malicious codes.
In an alternative implementation manner, as shown in fig. 6, the process of labeling the target sequence by the server to obtain the labeling result may include the following steps:
step 601, the server may select a minimum risk value from the normal risk value of the first call sequence, the malicious risk value of the first call sequence, the normal risk value of the second call sequence, and the malicious risk value of the second call sequence.
Bearing the above example, the normal risk value for the first call sequence is A1RS+The malicious risk value of the first call sequence is A1RS-The normal risk value for the second call sequence is A2RS+The malicious risk value of the second call sequence is A2RS-. From A1RS+、A1RS-、A2RS+、A2RS-The minimum risk value is selected.
For example A2RS-Is the minimum risk value.
In step 602, the server may determine the call sequence corresponding to the minimum risk value as the target sequence.
A2RS-The corresponding calling sequence is the second calling sequence, namely the second calling sequence is the target sequence.
And 603, the server marks the target sequence according to the minimum risk value to obtain a marking result.
In the embodiment of the application, when the minimum risk value is the normal risk value of the first call sequence or the normal risk value of the second call sequence, the marking result indicates that the target sequence does not have malicious codes.
And when the minimum risk value is the malicious risk value of the first calling sequence or the malicious risk value of the second calling sequence, marking that the target sequence has malicious codes.
By way of example, in the present embodiment, the minimum risk value A2RS-And the second calling sequence is a malicious risk value of the second calling sequence, so that the marking result is that malicious code exists in the second calling sequence.
And step 304, the server eliminates the target sequence from the API sequence set, takes the API sequence set without the target sequence as the next circulating API sequence set, and circularly detects and labels a plurality of calling sequences to be detected until all calling sequences of the API sequence set are labeled.
In the embodiment of the application, the server marks the second calling sequence A2 as the existence of malicious code and removes the second calling sequence A2 from the API sequence set.
For example: the API sequence set comprises A1-A10 calling sequences to be detected, and after A2 is removed, the API sequence set with target sequences removed comprises A1 and A3-A10. And taking the API sequence set with the target sequence removed as the API sequence set of the next loop, then selecting a new first calling sequence and a new second calling sequence from A1 and A3 to A10 by the server, and performing the steps circularly to realize the labeling of each calling sequence in the API sequence set.
Wherein, the API sequence set with the target sequence removed is used as the API sequence set of the next cycle, and the method further includes:
the detection is stopped when the next set of looped API sequences includes only one call sequence to be detected.
When only one target sequence to be detected is left in the API sequence set after the target sequence is removed, the API sequence set serving as the next cycle cannot meet the condition of selecting the first calling sequence and the second calling sequence from the API sequence set in the next cycle, so that the cycle detection process cannot be continued, and at the moment, when the server detects that the API sequence set without the target sequence only comprises one calling sequence to be detected, the detection is stopped.
The server can send a detection instruction to the manual detection terminal.
And the detection instruction is used for indicating manual detection of the call sequence to be detected included in the API sequence set of the next cycle.
Namely, the server can send a detection instruction and a code related to the last call sequence to be detected to the manual detection terminal, and the staff can manually detect and label the last call sequence to be detected to obtain a labeling result, wherein the labeling result comprises the existence of a malicious code or the absence of a malicious code. The manual detection terminal can feed back the labeling result to the server.
In the malicious code detection method provided by the embodiment of the Application, a server (hereinafter referred to as a server) of the network device may select a first calling sequence and a second calling sequence from an Application Programming Interface (API) sequence set, where the API sequence set includes a plurality of calling sequences to be detected. The server can respectively calculate a normal risk value and a malicious risk value of the first calling sequence and a normal risk value and a malicious risk value of the second calling sequence, wherein the normal risk value represents the probability of malicious codes existing in the calling sequences; the malicious risk value represents the probability that malicious code is not present in the invocation sequence. The server can label the target sequence according to the normal risk value of the first calling sequence, the malicious risk value of the first calling sequence, the normal risk value of the second calling sequence and the malicious risk value of the second calling sequence to obtain a labeling result, wherein the target sequence is the first calling sequence or the second calling sequence, and the labeling result comprises the existence of malicious codes or the absence of malicious codes. The server can remove the target sequence from the API sequence set, the API sequence set without the target sequence is used as the next circulating API sequence set, and the cyclic detection and labeling are carried out on the plurality of calling sequences to be detected until all the calling sequences in the API sequence set are labeled. Therefore, in the embodiment of the application, the server of the network device marks all the calling sequences, so that whether malicious codes exist in each calling sequence of the multiple calling sequences to be detected or not can be determined, and in the process of determining whether the malicious codes exist in the target file, the calling sequence in which the malicious codes exist can be directly determined, and a user can directly process the calling sequences in which the malicious codes exist.
Furthermore, in the embodiment of the application, whether the target file has the malicious codes or not can be accurately and efficiently detected, and the calling sequence with the malicious codes is determined, so that the analysis, tracking and positioning capabilities of a user on the target file with the malicious codes are greatly improved. The method is greatly helpful for the user to track the identity of the APT attacker.
Referring to fig. 7, a block diagram of a malicious code detection apparatus provided in an embodiment of the present application is shown, where the malicious code detection apparatus may be configured in a server in the implementation environment shown in fig. 2. As shown in fig. 7, the malicious code detection apparatus may include a call sequence selection module 701, a risk calculation module 702, a labeling module 703, and a loop processing module 704, where:
a calling sequence selection module 701, configured to select a first calling sequence and a second calling sequence from an application program interface API sequence set, where the API sequence set includes multiple calling sequences to be detected;
a risk calculation module 702, configured to calculate a normal risk value and a malicious risk value of the first call sequence, and a normal risk value and a malicious risk value of the second call sequence, respectively, where the normal risk value indicates a probability that a malicious code exists in the call sequence; the malicious risk value represents the probability of no malicious code existing in the calling sequence;
the labeling module 703 is configured to label a target sequence according to a normal risk value of the first call sequence, a malicious risk value of the first call sequence, a normal risk value of the second call sequence, and a malicious risk value of the second call sequence, to obtain a labeling result, where the target sequence is the first call sequence or the second call sequence, and the labeling result includes existence of a malicious code or absence of a malicious code;
and the cyclic processing module 704 is configured to remove the target sequence from the API sequence set, use the API sequence set from which the target sequence is removed as an API sequence set of the next cycle, and cyclically detect and label the multiple call sequences to be detected until all call sequences in the API sequence set are labeled.
In an embodiment of the present application, the risk calculation module 702 is further configured to obtain a malicious sample set and a normal sample set, where the malicious sample set includes a plurality of malicious call sequences known to contain malicious code; the normal sample set comprises a plurality of normal calling sequences which are known not to contain malicious code; respectively calculating the malicious similarity between the first calling sequence and the malicious calling sequence aiming at each malicious calling sequence in the malicious sample set; respectively calculating the normal similarity of the first calling sequence and the normal calling sequence aiming at each normal calling sequence in the normal sample set; acquiring a malicious predicted value corresponding to each malicious calling sequence, a normal predicted value corresponding to each normal calling sequence and a predicted value of the first calling sequence; calculating a malicious risk value of the first calling sequence according to the malicious predicted value, the malicious similarity and the predicted value of the first calling sequence; and calculating the normal risk value of the first calling sequence according to the normal predicted value, the normal similarity and the predicted value of the first calling sequence.
In an embodiment of the present application, the labeling module 703 is further configured to select a minimum risk value from the normal risk value of the first call sequence, the malicious risk value of the first call sequence, the normal risk value of the second call sequence, and the malicious risk value of the second call sequence; determining the calling sequence corresponding to the minimum risk value as a target sequence; and labeling the target sequence according to the minimum risk value to obtain a labeling result.
In an embodiment of the present application, the labeling module 703 is further configured to label that the target sequence does not have a malicious code when the minimum risk value is a normal risk value of the first call sequence or a normal risk value of the second call sequence; and when the minimum risk value is the malicious risk value of the first calling sequence or the malicious risk value of the second calling sequence, marking that the target sequence has malicious codes.
In an embodiment of the present application, the calling sequence selecting module 701 is further configured to calculate hamming distances of every two calling sequences to be detected in the API sequence set respectively; and selecting two calling sequences with the maximum Hamming distance as a first calling sequence and a second calling sequence.
In an embodiment of the present application, the calling sequence selecting module 701 is further configured to run the received target file to be detected in the virtual sandbox, and obtain a calling sequence corresponding to an API function of the target file; and for each calling sequence, obtaining a feature vector of the calling sequence to form an API sequence set.
In an embodiment of the present application, the loop processing module 704 is further configured to stop detecting when the API sequence set of the next loop includes only one call sequence to be detected; and sending a detection instruction to the manual detection terminal, wherein the detection instruction is used for indicating manual detection of the call sequence to be detected in the API sequence set of the next cycle.
In one embodiment of the present application, there is provided a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
selecting a first calling sequence and a second calling sequence from an Application Program Interface (API) sequence set, wherein the API sequence set comprises a plurality of calling sequences to be detected; respectively calculating a normal risk value and a malicious risk value of the first calling sequence and a normal risk value and a malicious risk value of the second calling sequence, wherein the normal risk value represents the probability of malicious codes existing in the calling sequences; the malicious risk value represents the probability of no malicious code existing in the calling sequence; labeling a target sequence according to a normal risk value of the first calling sequence, a malicious risk value of the first calling sequence, a normal risk value of the second calling sequence and a malicious risk value of the second calling sequence to obtain a labeling result, wherein the target sequence is the first calling sequence or the second calling sequence, and the labeling result comprises the existence of malicious codes or the absence of malicious codes; and removing the target sequence from the API sequence set, taking the API sequence set without the target sequence as the next circulating API sequence set, and circularly detecting and labeling the calling sequences to be detected until all the calling sequences in the API sequence set are labeled.
In one embodiment of the application, the processor when executing the computer program may further implement the steps of: acquiring a malicious sample set and a normal sample set, wherein the malicious sample set comprises a plurality of known malicious calling sequences containing malicious codes; the normal sample set comprises a plurality of normal calling sequences which are known not to contain malicious code; respectively calculating the malicious similarity between the first calling sequence and the malicious calling sequence aiming at each malicious calling sequence in the malicious sample set; respectively calculating the normal similarity of the first calling sequence and the normal calling sequence aiming at each normal calling sequence in the normal sample set; acquiring a malicious predicted value corresponding to each malicious calling sequence, a normal predicted value corresponding to each normal calling sequence and a predicted value of the first calling sequence; calculating a malicious risk value of the first calling sequence according to the malicious predicted value, the malicious similarity and the predicted value of the first calling sequence; and calculating the normal risk value of the first calling sequence according to the normal predicted value, the normal similarity and the predicted value of the first calling sequence.
In one embodiment of the application, the processor when executing the computer program may further implement the steps of: selecting a minimum risk value from the normal risk value of the first calling sequence, the malicious risk value of the first calling sequence, the normal risk value of the second calling sequence and the malicious risk value of the second calling sequence; determining the calling sequence corresponding to the minimum risk value as a target sequence; and labeling the target sequence according to the minimum risk value to obtain a labeling result.
In one embodiment of the application, the processor when executing the computer program may further implement the steps of: when the minimum risk value is the normal risk value of the first calling sequence or the normal risk value of the second calling sequence, marking that the target sequence has no malicious code; and when the minimum risk value is the malicious risk value of the first calling sequence or the malicious risk value of the second calling sequence, marking that the target sequence has malicious codes.
In one embodiment of the application, the processor when executing the computer program may further implement the steps of: respectively calculating the Hamming distance of each two call sequences to be detected in the API sequence set; and selecting two calling sequences with the maximum Hamming distance as a first calling sequence and a second calling sequence.
In one embodiment of the application, the processor when executing the computer program may further implement the steps of: running the received target file to be detected in the virtual sandbox, and acquiring a calling sequence corresponding to an API (application program interface) function of the target file; and for each calling sequence, obtaining a feature vector of the calling sequence to form an API sequence set.
In one embodiment of the application, the processor when executing the computer program may further implement the steps of: stopping detection when the API sequence set circulated next time only comprises a calling sequence to be detected; and sending a detection instruction to the manual detection terminal, wherein the detection instruction is used for indicating manual detection of the call sequence to be detected in the API sequence set of the next cycle.
The implementation principle and technical effect of the computer device provided by the embodiment of the present application are similar to those of the method embodiment described above, and are not described herein again.
In an embodiment of the application, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of:
selecting a first calling sequence and a second calling sequence from an Application Program Interface (API) sequence set, wherein the API sequence set comprises a plurality of calling sequences to be detected; respectively calculating a normal risk value and a malicious risk value of the first calling sequence and a normal risk value and a malicious risk value of the second calling sequence, wherein the normal risk value represents the probability of malicious codes existing in the calling sequences; the malicious risk value represents the probability of no malicious code existing in the calling sequence; labeling a target sequence according to a normal risk value of the first calling sequence, a malicious risk value of the first calling sequence, a normal risk value of the second calling sequence and a malicious risk value of the second calling sequence to obtain a labeling result, wherein the target sequence is the first calling sequence or the second calling sequence, and the labeling result comprises the existence of malicious codes or the absence of malicious codes; and removing the target sequence from the API sequence set, taking the API sequence set without the target sequence as the next circulating API sequence set, and circularly detecting and labeling the calling sequences to be detected until all the calling sequences in the API sequence set are labeled.
In one embodiment of the application, the computer program, when executed by the processor, may further implement the steps of: acquiring a malicious sample set and a normal sample set, wherein the malicious sample set comprises a plurality of known malicious calling sequences containing malicious codes; the normal sample set comprises a plurality of normal calling sequences which are known not to contain malicious code; respectively calculating the malicious similarity between the first calling sequence and the malicious calling sequence aiming at each malicious calling sequence in the malicious sample set; respectively calculating the normal similarity of the first calling sequence and the normal calling sequence aiming at each normal calling sequence in the normal sample set; acquiring a malicious predicted value corresponding to each malicious calling sequence, a normal predicted value corresponding to each normal calling sequence and a predicted value of the first calling sequence; calculating a malicious risk value of the first calling sequence according to the malicious predicted value, the malicious similarity and the predicted value of the first calling sequence; and calculating the normal risk value of the first calling sequence according to the normal predicted value, the normal similarity and the predicted value of the first calling sequence.
In one embodiment of the application, the computer program, when executed by the processor, may further implement the steps of: selecting a minimum risk value from the normal risk value of the first calling sequence, the malicious risk value of the first calling sequence, the normal risk value of the second calling sequence and the malicious risk value of the second calling sequence; determining the calling sequence corresponding to the minimum risk value as a target sequence; and labeling the target sequence according to the minimum risk value to obtain a labeling result.
In one embodiment of the application, the computer program, when executed by the processor, may further implement the steps of: when the minimum risk value is the normal risk value of the first calling sequence or the normal risk value of the second calling sequence, marking that the target sequence has no malicious code; and when the minimum risk value is the malicious risk value of the first calling sequence or the malicious risk value of the second calling sequence, marking that the target sequence has malicious codes.
In one embodiment of the application, the computer program, when executed by the processor, may further implement the steps of: respectively calculating the Hamming distance of each two call sequences to be detected in the API sequence set; and selecting two calling sequences with the maximum Hamming distance as a first calling sequence and a second calling sequence.
In one embodiment of the application, the computer program, when executed by the processor, may further implement the steps of: running the received target file to be detected in the virtual sandbox, and acquiring a calling sequence corresponding to an API (application program interface) function of the target file; and for each calling sequence, obtaining a feature vector of the calling sequence to form an API sequence set.
In one embodiment of the application, the computer program, when executed by the processor, may further implement the steps of: stopping detection when the API sequence set circulated next time only comprises a calling sequence to be detected; and sending a detection instruction to the manual detection terminal, wherein the detection instruction is used for indicating manual detection of the call sequence to be detected in the API sequence set of the next cycle.
The implementation principle and technical effect of the computer-readable storage medium provided in the embodiment of the present application are similar to those of the method embodiment described above, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the claims. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for malicious code detection, the method comprising:
acquiring static action information of a target file, wherein the static action information comprises an MD5 value of the target file, judging whether the MD5 value of the target file is greater than a threshold value, and if the MD5 value of the target file is greater than the threshold value, acquiring a source code of the target file by adopting a shelling tool; if the MD5 value of the target file is less than or equal to the threshold value, directly acquiring a source code of the target file;
extracting an API calling sequence from the source code of the target file according to the source code of the target file to obtain a plurality of API calling sequences;
acquiring a characteristic vector of each API calling sequence, and acquiring an API sequence set according to the characteristic vector of each API calling sequence; the API sequence set comprises a plurality of call sequences to be detected;
selecting a first calling sequence and a second calling sequence from the API sequence set;
respectively calculating a normal risk value and a malicious risk value of the first calling sequence, and a normal risk value and a malicious risk value of the second calling sequence, wherein the normal risk value represents the probability of malicious codes existing in the calling sequences; the malicious risk value represents the probability of no malicious code existing in the calling sequence;
labeling a target sequence according to the normal risk value of the first calling sequence, the malicious risk value of the first calling sequence, the normal risk value of the second calling sequence and the malicious risk value of the second calling sequence to obtain a labeling result, wherein the target sequence is the first calling sequence or the second calling sequence, and the labeling result comprises the existence of malicious codes or the absence of malicious codes;
and removing the target sequence from the API sequence set, taking the API sequence set without the target sequence as an API sequence set of the next cycle, and performing cyclic detection and labeling on the plurality of calling sequences to be detected until all calling sequences in the API sequence set are labeled.
2. The method of claim 1, wherein the calculating the normal risk value and the malicious risk value for the first sequence of calls comprises:
acquiring a malicious sample set and a normal sample set, wherein the malicious sample set comprises a plurality of known malicious calling sequences containing malicious codes; the normal sample set comprises a plurality of normal call sequences known to be free of malicious code;
respectively calculating the malicious similarity of the first calling sequence and the malicious calling sequence aiming at each malicious calling sequence in the malicious sample set; respectively calculating the normal similarity of the first calling sequence and the normal calling sequence aiming at each normal calling sequence in the normal sample set;
acquiring a malicious predicted value corresponding to each malicious calling sequence, a normal predicted value corresponding to each normal calling sequence and a predicted value of the first calling sequence;
calculating the malicious risk value of the first calling sequence according to the malicious predicted value, the malicious similarity and the predicted value of the first calling sequence; and calculating the normal risk value of the first calling sequence according to the normal predicted value, the normal similarity and the predicted value of the first calling sequence.
3. The method according to claim 1, wherein the labeling a target sequence according to the normal risk value and the malicious risk value of the first call sequence and the normal risk value and the malicious risk value of the second call sequence to obtain a labeling result comprises:
selecting a minimum risk value from the normal risk value of the first call sequence, the malicious risk value of the first call sequence, the normal risk value of the second call sequence and the malicious risk value of the second call sequence;
determining the calling sequence corresponding to the minimum risk value as the target sequence;
and labeling the target sequence according to the minimum risk value to obtain a labeling result.
4. The method according to claim 3, wherein the labeling the target sequence according to the minimum risk value to obtain a labeling result comprises:
when the minimum risk value is the normal risk value of the first calling sequence or the normal risk value of the second calling sequence, the marking result indicates that the target sequence does not have malicious codes;
and when the minimum risk value is the malicious risk value of the first calling sequence or the malicious risk value of the second calling sequence, the marking result is that the target sequence has malicious codes.
5. The method of claim 1, wherein said selecting a first call sequence and a second call sequence from said API sequence set comprises:
respectively calculating the Hamming distance of each two call sequences to be detected in the API sequence set;
and selecting two calling sequences with the maximum Hamming distance as the first calling sequence and the second calling sequence.
6. The method of claim 1, wherein prior to said selecting the first call sequence and the second call sequence from the API sequence set, the method further comprises:
running a received target file to be detected in a virtual sandbox, and acquiring a calling sequence corresponding to an API (application program interface) function of the target file;
and for each calling sequence, obtaining a feature vector of the calling sequence to form the API sequence set.
7. The method of claim 1, wherein the step of using the set of API sequences excluding the target sequence as the set of API sequences for the next cycle comprises:
stopping detection when the API sequence set of the next cycle only comprises a calling sequence to be detected;
and sending a detection instruction to a manual detection terminal, wherein the detection instruction is used for indicating manual detection of the calling sequence to be detected in the API sequence set of the next cycle.
8. An apparatus for malicious code detection, the apparatus comprising:
calling a sequence selection module to obtain static action information of a target file, wherein the static action information comprises an MD5 value of the target file, judging whether the MD5 value of the target file is greater than a threshold value, and if the MD5 value of the target file is greater than the threshold value, obtaining a source code of the target file by adopting a shelling tool; if the MD5 value of the target file is less than or equal to the threshold value, directly acquiring a source code of the target file; extracting an API calling sequence from the source code of the target file according to the source code of the target file to obtain a plurality of API calling sequences; acquiring a characteristic vector of each API calling sequence, and acquiring an API sequence set according to the characteristic vector of each API calling sequence; the API sequence set comprises a plurality of call sequences to be detected; selecting a first calling sequence and a second calling sequence from the API sequence set;
the risk calculation module is used for calculating a normal risk value and a malicious risk value of the first calling sequence, a normal risk value and a malicious risk value of the second calling sequence respectively, wherein the normal risk value represents the probability of malicious codes existing in the calling sequences; the malicious risk value represents the probability of no malicious code existing in the calling sequence;
the labeling module is used for labeling a target sequence according to the normal risk value of the first calling sequence, the malicious risk value of the first calling sequence, the normal risk value of the second calling sequence and the malicious risk value of the second calling sequence to obtain a labeling result, wherein the target sequence is the first calling sequence or the second calling sequence, and the labeling result comprises the existence of malicious codes or the absence of malicious codes;
and the cyclic processing module is used for removing the target sequence from the API sequence set, taking the API sequence set without the target sequence as the next cyclic API sequence set, and carrying out cyclic detection and labeling on the plurality of calling sequences to be detected until all calling sequences in the API sequence set are labeled.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN201911071345.5A 2019-11-05 2019-11-05 Malicious code detection method and device, computer equipment and storage medium Active CN110868405B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911071345.5A CN110868405B (en) 2019-11-05 2019-11-05 Malicious code detection method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911071345.5A CN110868405B (en) 2019-11-05 2019-11-05 Malicious code detection method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110868405A CN110868405A (en) 2020-03-06
CN110868405B true CN110868405B (en) 2022-03-04

Family

ID=69654735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911071345.5A Active CN110868405B (en) 2019-11-05 2019-11-05 Malicious code detection method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110868405B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052454B (en) * 2020-10-12 2022-04-15 腾讯科技(深圳)有限公司 Method, device and equipment for searching and killing applied viruses and computer storage medium
CN113591073B (en) * 2021-06-11 2023-10-13 中国科学院信息工程研究所 Web API security threat detection method and device
CN115511015B (en) * 2022-11-23 2023-04-07 中国人民解放军国防科技大学 Sample screening method, device, equipment and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550581A (en) * 2015-12-10 2016-05-04 北京奇虎科技有限公司 Malicious code detection method and device
CN107590388A (en) * 2017-09-12 2018-01-16 南方电网科学研究院有限责任公司 Malicious code detecting method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034043B (en) * 2010-12-13 2012-12-05 四川大学 Malicious software detection method based on file static structure attributes
CN103177222B (en) * 2011-12-23 2015-08-12 腾讯科技(深圳)有限公司 A kind of file adds shell, the disposal route of shelling and equipment thereof
US10693896B2 (en) * 2015-01-14 2020-06-23 Virta Laboratories, Inc. Anomaly and malware detection using side channel analysis
CN106599623B (en) * 2016-12-09 2019-10-18 江苏通付盾科技有限公司 A kind of application similarity calculating method and device
CN108073814B (en) * 2017-12-29 2021-10-15 安天科技集团股份有限公司 Shelling method and system based on static structured shelling parameters and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550581A (en) * 2015-12-10 2016-05-04 北京奇虎科技有限公司 Malicious code detection method and device
CN107590388A (en) * 2017-09-12 2018-01-16 南方电网科学研究院有限责任公司 Malicious code detecting method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种改进主动学习的恶意代码检测算法;李翼宏 等;《计算机科学》;20190515;第46卷(第5期);第92-99页 *

Also Published As

Publication number Publication date
CN110868405A (en) 2020-03-06

Similar Documents

Publication Publication Date Title
Kolosnjaji et al. Adversarial malware binaries: Evading deep learning for malware detection in executables
US10956477B1 (en) System and method for detecting malicious scripts through natural language processing modeling
US10114946B2 (en) Method and device for detecting malicious code in an intelligent terminal
US9846779B2 (en) Detecting a return-oriented programming exploit
CN110868405B (en) Malicious code detection method and device, computer equipment and storage medium
US10430586B1 (en) Methods of identifying heap spray attacks using memory anomaly detection
JP6670907B2 (en) System and method for blocking script execution
US10007784B2 (en) Technologies for control flow exploit mitigation using processor trace
EP2588983B1 (en) Systems and methods for alternating malware classifiers in an attempt to frustrate brute-force malware testing
US9553889B1 (en) System and method of detecting malicious files on mobile devices
RU2680736C1 (en) Malware files in network traffic detection server and method
US20180083770A1 (en) Detecting encoding attack
WO2017012241A1 (en) File inspection method, device, apparatus and non-volatile computer storage medium
US10601847B2 (en) Detecting user behavior activities of interest in a network
Martinelli et al. I find your behavior disturbing: Static and dynamic app behavioral analysis for detection of android malware
US11847223B2 (en) Method and system for generating a list of indicators of compromise
US10860719B1 (en) Detecting and protecting against security vulnerabilities in dynamic linkers and scripts
CN113472803A (en) Vulnerability attack state detection method and device, computer equipment and storage medium
KR20160099160A (en) Method of modelling behavior pattern of instruction set in n-gram manner, computing device operating with the method, and program stored in storage medium configured to execute the method in computing device
EP3113065B1 (en) System and method of detecting malicious files on mobile devices
CN110135154B (en) Injection attack detection system and method for application program
CN112395603B (en) Vulnerability attack identification method and device based on instruction execution sequence characteristics and computer equipment
CN113849859A (en) Linux kernel modification method, terminal device and storage medium
CN108256327B (en) File detection method and device
CN115545091A (en) Integrated learner-based malicious program API (application program interface) calling sequence detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230901

Address after: 518000 building 501, 502, 601, 602, building D, wisdom Plaza, Qiaoxiang Road, Gaofa community, Shahe street, Nanshan District, Shenzhen City, Guangdong Province

Patentee after: China Southern Power Grid Digital Platform Technology (Guangdong) Co.,Ltd.

Address before: Room 1301, Chengtou building, No. 106, Fengze East Road, Nansha District, Guangzhou City, Guangdong Province

Patentee before: Southern Power Grid Digital Grid Research Institute Co.,Ltd.