CN110245494B - Method for detecting malicious software, electronic device and computer readable storage medium - Google Patents

Method for detecting malicious software, electronic device and computer readable storage medium Download PDF

Info

Publication number
CN110245494B
CN110245494B CN201910528130.5A CN201910528130A CN110245494B CN 110245494 B CN110245494 B CN 110245494B CN 201910528130 A CN201910528130 A CN 201910528130A CN 110245494 B CN110245494 B CN 110245494B
Authority
CN
China
Prior art keywords
assembly
codes
code
neural network
symbols
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910528130.5A
Other languages
Chinese (zh)
Other versions
CN110245494A (en
Inventor
李坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910528130.5A priority Critical patent/CN110245494B/en
Publication of CN110245494A publication Critical patent/CN110245494A/en
Application granted granted Critical
Publication of CN110245494B publication Critical patent/CN110245494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a method for detecting malicious software, an electronic device and a computer readable storage medium, which can acquire assembly codes of the software to be detected as first assembly codes, and preset the first assembly codes so that the number of assembly symbols of each row of codes in the first assembly codes is equal; acquiring a replacement code corresponding to each assembly symbol in the first assembly code, and replacing the corresponding assembly symbol with each replacement code to obtain a code file; in the scheme, the coding method and the device for the malicious software code are used for coding the assembly code of the software to be detected, so that the coding characteristics of the software to be detected are obtained, the commonality among the malicious software is easier to find by the deep neural network, the method and the device can be applied to the current malicious software ecosystem which changes rapidly.

Description

Method for detecting malicious software, electronic device and computer readable storage medium
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method for detecting malicious software, an electronic device, and a computer readable storage medium.
Background
With the development of information technology, the production and life modes of human beings are changed by the Internet, so that people are more and more separated from the network, and various 'security' problems are caused.
The security of software is an important component of network security, and malware refers to programs of viruses, worms and trojan horses that perform malicious tasks on a computer system, and the malware exercises control by destroying the processes of normal software, destroying the security of the software.
Currently, "how to effectively detect malware" has become a common network security issue. While a malware infection event would likely result in millions of dollars of economic loss to users, current anti-virus products and malware detection tools typically use signature-based detection techniques that require manual setting of a series of rules to identify known different types of malware. The method has the advantages of stronger pertinence, and the disadvantage of being incapable of identifying novel malicious software.
Millions of new types of malware will appear each day due to the different elements employed in different environments. The malware recognition method in the related art has a problem of low recognition success rate, so that a detection technology applicable to the current rapidly-changing malware ecosystem is urgently needed.
Disclosure of Invention
The embodiment of the application provides a method for detecting malicious software, an electronic device and a computer readable storage medium, which can be applied to a current rapidly-changed malicious software ecosystem.
An embodiment of the present application provides a method for detecting malware, where the method includes:
step 1, acquiring assembly codes of software to be detected, wherein the assembly codes are first assembly codes;
Step 2, carrying out preset processing on the first assembly codes, wherein the number of assembly symbols forming each row of codes in the preset processed first assembly codes is equal;
Step3, obtaining replacement codes corresponding to all assembly symbols in the first assembly code, and replacing all assembly symbols by the replacement codes corresponding to all assembly symbols to obtain a code file corresponding to the software to be detected, wherein the character length of all the replacement codes is the same;
and 4, acquiring a trained deep neural network, inputting the encoded file into the deep neural network, acquiring an output result of the deep neural network, and judging whether the software to be detected is malicious software according to the output result, wherein the deep neural network is trained based on a known malicious software training sample, and the malicious software training sample is the encoded file obtained by taking the known malicious software as the software to be detected and processing in the steps 1-3.
Optionally, the performing the preset processing on the first assembly code includes:
And filling a preset field in the first assembly code so as to make the number of assembly symbols in each row of code of the first assembly code equal, wherein the preset field does not belong to the assembly symbols appearing in the first assembly code before filling, and the preset field is regarded as the assembly symbols.
Optionally, the performing filling processing of the preset field on the first assembly code so that the number of assembly symbols in each line of codes of the first assembly code is equal includes:
determining the number of assembly symbols of each row of codes in the first assembly code, and taking the maximum value in the number of assembly symbols as a target number;
And for each row of codes with the number of assembly symbols lower than the target number in the first assembly code, filling the preset field from the tail end of each row of codes until the number of assembly symbols of each row of codes is the target number.
Optionally, before the filling processing of the preset field is performed on the first assembly code, the method further includes:
Splitting the first assembly code according to a preset splitting rule.
Optionally, the obtaining the replacement code corresponding to each assembly symbol in the first assembly code includes:
Acquiring a preset mapping relation between the assembly symbols and the substitution codes;
and searching for the replacement codes corresponding to the assembly symbols in the first assembly code in the preset mapping relation.
Optionally, before the acquiring the assembly code of the software to be detected, the method further includes:
acquiring assembly codes of a plurality of malicious software, wherein the assembly codes are used as second assembly codes;
The second assembly codes are subjected to preset processing, wherein the number of assembly symbols forming each row of codes in the second assembly codes after the preset processing is equal;
Acquiring a replacement code corresponding to each assembly symbol in the second assembly code, and replacing each assembly symbol by the replacement code corresponding to each assembly symbol to obtain a code file corresponding to the malicious software, wherein the character length of each replacement code is the same;
Taking the code file corresponding to the malicious software as a malicious software training sample of a deep neural network, and training the deep neural network;
And stopping training the deep neural network when the deep neural network meets the training completion condition.
Optionally, when the deep neural network meets the training completion condition, stopping training of the deep neural network includes:
when the training times of the deep neural network reach a preset time threshold, judging that the deep neural network meets the training completion condition, and stopping training the deep neural network;
or when the recognition accuracy of the deep neural network to the malicious software reaches a preset accuracy threshold, judging that the deep neural network meets the training completion condition, and stopping training the deep neural network.
A second aspect of an embodiment of the present application provides an electronic device, including:
The acquisition module is used for acquiring assembly codes of the software to be detected, wherein the assembly codes are first assembly codes;
The preprocessing module is used for carrying out preset processing on the first assembly codes, wherein the number of assembly symbols forming each row of codes in the preset processed first assembly codes is equal;
The replacing module is used for acquiring the replacing codes corresponding to all the assembly symbols in the first assembly code, replacing all the assembly symbols with the replacing codes corresponding to all the assembly symbols to obtain the code file corresponding to the software to be detected, wherein the character length of each replacing code is the same;
The judgment module is used for acquiring the trained deep neural network, inputting the coding file into the deep neural network, acquiring an output result of the deep neural network, and judging whether the software to be detected is malicious software according to the output result, wherein the deep neural network is obtained by training based on a known malicious software training sample, and the malicious software training sample is the coding file obtained by taking the known malicious software as the software to be detected and processing the software to be detected through the acquisition module, the preprocessing module and the replacement module.
A third aspect of an embodiment of the present application provides an electronic device, including: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps in the method according to the first aspect of the embodiment of the application when executing the computer program.
A fourth aspect of the embodiments of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method according to the first aspect of the embodiments of the present application.
The embodiment of the application discloses a method for detecting malicious software, an electronic device and a computer readable storage medium, wherein the method can acquire assembly codes of the software to be detected as first assembly codes, and the first assembly codes are subjected to preset processing so that the number of assembly symbols of each row of codes in the first assembly codes is equal; acquiring a replacement code corresponding to each assembly symbol in the first assembly code, and replacing each assembly symbol by the replacement code corresponding to each assembly symbol to obtain a code file corresponding to the software to be detected; in the foregoing process, the embodiment obtains the assembly code of the software to be detected, and encodes the assembly code, so as to obtain the encoding characteristics of the software to be detected, and the encoding scheme adopted in the embodiment can effectively reduce the difference between the software in different environments, so that the deep neural network can more easily find the commonality between the malicious software, and promote the universality of the embodiment to various types of malicious software, so that the method for detecting the malicious software in the embodiment can be applied to the current rapidly-changed malicious software ecosystem, and after the encoded file is obtained, the embodiment obtains the trained deep neural network, inputs the encoded file into the deep neural network, judges whether the software to be detected is the malicious software according to the output result of the deep neural network, and can further ensure the effective identification of the malicious program through the high efficiency of machine learning.
Drawings
Fig. 1 is a schematic diagram of a hardware structure of an electronic device according to the present application;
Fig. 2 is a flowchart illustrating a method for detecting malware according to a first embodiment of the present application;
FIG. 3 is a diagram illustrating assembly codes corresponding to Malware and Malware according to a first embodiment of the present application;
FIG. 4 is a schematic diagram of a code sequence obtained after splitting and filling the assembly codes corresponding to Malware and Malware2 in FIG. 3;
FIG. 5 is a schematic diagram of a network architecture of ResNet neural networks according to a first embodiment of the present application;
Fig. 6 is a schematic structural diagram of an electronic device according to a second embodiment of the present application;
Fig. 7 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.
Detailed Description
In order to make the objects, features and advantages of the present application more comprehensible, the technical solutions in the embodiments of the present application will be clearly described in conjunction with the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, fig. 1 shows a block diagram of an electronic device. The method for detecting the malicious software provided by the embodiment of the invention can be applied to the electronic device 10 shown in fig. 1, and the electronic device 10 includes but is not limited to: mobile terminals such as smartphones, notebooks, wearable smart devices, etc., and stationary terminals such as desktop computers, smart televisions, etc.
As shown in fig. 1, the electronic device 10 includes a memory 101, a memory controller 102, one or more (only one is shown in the figure) processors 103, a peripheral interface 104, and a touch screen 105. These components communicate with each other via one or more communication buses/signal lines 106.
It will be appreciated that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the electronic device. The electronic device 10 may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
The memory 101 may be used to store software programs and modules, such as the detection method of malware and program instructions/modules corresponding to the electronic device in the embodiments of the present invention, and the processor 103 executes the software programs and modules stored in the memory 101, thereby performing various functional applications and data processing, for example, implementing the detection method of malware described above.
Memory 101 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 101 may further include memory remotely located relative to processor 103, which may be connected to electronic device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. Access to the memory 101 by the processor 103, as well as other possible components, may be under the control of the memory controller 102.
The peripheral interface 104 couples various input/output devices to the CPU and the memory 101. The processor 103 runs various software, instructions within the memory 101 to perform various functions of the electronic device 10 and to perform data processing.
In some embodiments, the peripheral interface 104, the processor 103, and the memory controller 102 may be implemented in a single chip. In other examples, they may be implemented by separate chips.
The touch screen 105 provides one output and input interface between the electronic device and the user at the same time. Specifically, the touch screen 105 displays video outputs to the user, the content of which may include text, graphics, video, and any combination thereof. Some output results of the touch screen 105 correspond to some user interface objects. The touch screen 105 also receives user inputs, such as user clicks, swipes, and the like, so that the user interface object responds to these user inputs. The technique of detecting user input may be based on resistive, capacitive, or any other possible touch detection technique. Specific examples of the display unit of the touch screen 105 include, but are not limited to, a liquid crystal display or a light emitting polymer display.
The method for detecting the malicious software in the embodiment of the invention is described based on the electronic device.
First embodiment:
The embodiment of the application provides a method for detecting malicious software, which can enable a deep neural network to find the commonality of the malicious software more easily and enable the deep neural network to be more suitable for detecting the current rapidly-changed malicious software ecosystem.
Referring to fig. 2, the method for detecting malware of the present embodiment includes the following steps:
Step 201, acquiring assembly codes of software to be detected, wherein the assembly codes are first assembly codes;
In step 201, the software to be tested, when written, is typically programmed in a high-level language such as C, C++, delphi, etc. The software to be detected written in a high-level language needs to be compiled into a file (machine language) which can be directly executed by a computer system through a compiler, and the assembly code of the software to be detected needs to be disassembled to generate the software to be detected written in the machine language.
Optionally, in this embodiment, obtaining the assembly code of the software to be detected includes: acquiring software to be detected written in a high-level language, generating a file written in a machine language by the software to be detected through a compiler, and processing the file through disassembly to obtain assembly codes of the software to be detected; it can be understood that the styles of the generated assembly codes are different after disassembly processing for the software to be detected written in different high-level languages. For example, for malware1 (Malware 1) and malware2 (Malware 2), assembly code as shown in FIG. 3 is obtained by disassembly.
Step 202, carrying out preset processing on the first assembly codes, wherein the number of assembly symbols forming each row of codes in the preset processed first assembly codes is equal;
in this embodiment, the assembly symbols include, but are not limited to, all variable names, index names, record names, instruction mnemonics, register names, and the like in the macro assembly language. These symbols are identifiers defined by a programmer in a program to represent a certain memory location, data, expression, name, etc., and can be divided into five categories of registers, labels, variables, numbers, and names. For example, for Malware a in fig. 3, "mov", "ecx", "dword", "esp" and "+" of the first row are all assembly symbols.
The purpose of step 202 is in effect to ensure that the number of assembly symbols for each line in the first assembly code is the same, thereby ensuring that the number of characters for each line of code is the same after step 203.
Optionally, the method for performing the preset processing on the first assembly code includes, but is not limited to, performing preset field filling on the first assembly code.
Further, the presetting processing of the first assembly code includes: and filling the preset field of the first assembly code so that the number of the assembly symbols in each row of the first assembly code is equal, wherein the preset field does not belong to the assembly symbols appearing in the first assembly code before filling, and the preset field is regarded as the assembly symbols.
In one example, all preset fields are filled with the same field to reduce the interference of the filled preset fields on the coding. It will be appreciated that the populated preset field should select an assembly symbol that is not used in the first assembly code or that does not belong to any type of assembly code that has a particular meaning so as not to affect the semantic meaning of the first assembly code. For example, in one example, the preset field may be selected as a "pad" field, or other field that meets the foregoing requirements, which is not limited in this embodiment.
Further, in this embodiment, when the preset field filling process is performed, the preset field may be selectively filled from the first end of the line code of the first assembly code, or the field may be filled from the end of the line code of the first assembly code, but by analyzing and experimental verification, the preset field is filled from the end of the line code of the assembly code, which is more favorable for identifying the malicious software in this embodiment.
Optionally, in one example, performing the filling processing of the preset field on the first assembly code so that the number of assembly symbols in each line of code of the first assembly code is equal includes:
Determining the number of assembly symbols of each row of codes in the first assembly code, and taking the maximum value in the number of assembly symbols as a target number;
And for each row of codes with the number of the assembly symbols lower than the target number in the first assembly code, filling a preset field from the tail end of each row of codes until the number of the assembly symbols of each row of codes is the target number.
For example, for Malware and Malware in fig. 3, the maximum number of assembly symbols for each line is 7, and then 7 is taken as the target number, and for each line of code, the number of assembly symbols is less than 7, the "pad" field is filled from the end of the line change until the number of assembly symbols for each line is 7, and one of the "pad" fields is counted as one assembly symbol.
The end filling mode can facilitate the deep neural network to identify the filled preset fields.
Optionally, before the filling processing of the preset field is performed on the first assembly code, the method further includes: splitting the first assembly code according to a preset splitting rule.
In this embodiment, the first assembly code is split according to a preset splitting rule, including but not limited to the following three splitting methods:
First kind: splitting each line of code in the first assembly code into an assembly sequence of individual assembly symbols, e.g., for the first segment "mov ecx, dword [ esp+4]" of Malware in FIG. 3, see FIG. 4, into "mov", "ecx", "dword", "esp", "++", and "int"; for the second segment "moveax, dword [0x42eb88]" of Malware in fig. 3, see the second segment in fig. 4, split into "mov", "eax", "dword", and "mem".
Secondly, after the first split, counting the number of the assembly symbols of each row of the first assembly code, if the number of the assembly symbols of each row does not exceed a preset number threshold, continuing to execute the step of filling the preset field of the first assembly code, otherwise, splitting each row of codes exceeding the preset number threshold in the first assembly code, wherein the number of the assembly symbols of each row of codes after the split does not exceed the preset number threshold.
Thirdly, after the first split, searching whether a preset split symbol exists in each row of codes of the first assembly code, and if so, splitting the code content after the preset split symbol to the next row. The preset split symbol includes, but is not limited to, preset keywords, such as int, etc.
In this embodiment, the number of assembly symbols of the codes in each row is equal as can be seen from fig. 4, and the assembly codes obtained by splitting the assembly codes in fig. 3 and filling the preset fields are shown in fig. 4.
Step 203, obtaining replacement codes corresponding to all assembly symbols in the first assembly code, and replacing all assembly symbols by the replacement codes corresponding to all assembly symbols to obtain a code file corresponding to the software to be detected, wherein the character lengths of all the replacement codes are the same;
In this embodiment, there are various ways of obtaining the replacement codes corresponding to the assembly symbols in the first assembly code, for example, a mapping relationship between the replacement codes and the assembly symbols may be preset, the replacement codes corresponding to the assembly symbols in the first assembly code may be obtained based on the mapping relationship, and for example, each assembly symbol appearing in the first assembly code may be counted, and a one-to-one replacement code may be generated for each assembly symbol in real time to obtain the replacement code corresponding to each assembly symbol in the first assembly code.
Optionally, in this embodiment, obtaining the replacement code corresponding to each assembly symbol in the first assembly code includes:
Acquiring a preset mapping relation between the assembly symbols and the substitution codes;
And searching for the replacement codes corresponding to the assembly symbols in the first assembly code in a preset mapping relation.
Alternatively, when the preset mapping relation between the assembly symbols and the replacement codes is set, all the assembly symbols (including the preset fields used as filling) possibly existing in various assembly codes can be enumerated first, then a replacement code is set for each assembly symbol, and optionally, the coding formats of the replacement codes of different assembly symbols are identical and different in single content.
In this embodiment, when an assembly symbol that may appear in the assembly code is encoded to obtain a substitution code, any one or more of a number, a letter, a symbol, and the like may be used for encoding, which is not limited in this embodiment.
One way to encode the assembly symbols that may occur in the assembly code is listed below in conjunction with table 1, which is only exemplary and not limiting of the way in which the assembly symbols may occur in the assembly code in this embodiment.
As shown in table 1, for an assembly symbol in assembly code, it is encoded as a three-dimensional numerical substitution code, for example, for an assembly symbol [ push ] encoding as a three-dimensional substitution code:
Emb Dim1:5
Emb Dim2:3
Emb Dim3:9
therefore, the substitution code corresponding to the assembly symbol [ push ] is [5,3,9].
TABLE 1
For another example, for a preset field [ pad ], the corresponding substitution code is [1, 3].
In this embodiment, after all the assembly symbols in the assembly code are encoded, a corresponding relationship between each assembly symbol and the replacement code may be established, and a mapping relationship between the replacement code and the assembly symbol may be generated, where the mapping relationship may be stored in a form of table 1 or in a form of Key-Value Key Value pair.
Step 204, obtaining a trained deep neural network, inputting the encoded file into the deep neural network, obtaining an output result of the deep neural network, and judging whether the software to be detected is malicious software according to the output result, wherein the deep neural network is obtained by training based on a known malicious software training sample, and the malicious software training sample is the encoded file obtained by taking the known malicious software as the software to be detected and processing in steps 201-203.
Further, in this embodiment, before acquiring the assembly code of the software to be detected, the method further includes:
acquiring assembly codes of a plurality of malicious software, wherein the assembly codes are used as second assembly codes;
The second assembly codes are subjected to preset processing, wherein the number of assembly symbols forming each row of codes in the second assembly codes after the preset processing is equal;
Acquiring a replacement code corresponding to each assembly symbol in the second assembly code, and replacing each assembly symbol by the replacement code corresponding to each assembly symbol to obtain a code file corresponding to the malicious software, wherein the character length of each replacement code is the same;
Taking the code file corresponding to the malicious software as a malicious software training sample of a deep neural network, and training the deep neural network;
And stopping training the deep neural network when the deep neural network meets the training completion condition.
Through the steps, training of the known malicious software on the neural network can be achieved, wherein the number of the malicious software training samples can be selected according to practical situations, and the embodiment is not limited to the training. It can be understood that the above-mentioned malicious software training sample is labeled, and optionally, the label is 1, which indicates that the software corresponding to the malicious software training sample is malicious software.
Optionally, in one example, based on the training times of the deep neural network, determining whether the deep neural network is trained, for example, when the deep neural network meets the training completion condition, stopping training of the deep neural network includes: and accumulating training times of the deep neural network based on the malicious software training sample, and judging that the deep neural network meets the training completion condition when the training times of the deep neural network reach a preset time threshold value, and stopping training the deep neural network. The preset time threshold may be set according to actual needs, for example, 10000 times.
Or in another example, whether the deep neural network is trained is judged to be completed according to the recognition accuracy of the deep neural network to the known malicious software, and when the deep neural network meets the training completion condition, the training of the deep neural network is stopped optionally including: when the recognition accuracy of the deep neural network to the malicious software reaches a preset accuracy threshold, judging that the deep neural network meets a training completion condition, and stopping training the deep neural network. The preset accuracy threshold may be 98%,97.5% or other numbers, and the specific value of the preset accuracy threshold is not limited in this embodiment.
Alternatively, the type of the deep neural network in this embodiment is not limited, and may be ResNet neural networks, the architecture of ResNet neural networks is shown in fig. 5, LSTM (Long Short-Term Memory) or CNN (Convolutional Neural Network ), and so on.
The embodiment discloses a method for detecting malicious software, which can acquire assembly codes of software to be detected as first assembly codes, and preset the first assembly codes so that the number of assembly symbols of each row of codes in the first assembly codes is equal; acquiring a replacement code corresponding to each assembly symbol in the first assembly code, and replacing the corresponding assembly symbol with each replacement code to obtain a code file; in the scheme, the method and the device encode the assembly code of the software to be detected to obtain the encoding characteristics of the software to be detected, so that the commonality among the malicious software is easier to find by the deep neural network, the universality of the method and the device for identifying various types of malicious software is improved, after the encoded file is obtained, the trained deep neural network is obtained, the encoded file is input, whether the software to be detected is the malicious software or not is judged according to the output result of the deep neural network, and the effective identification of the malicious software can be further ensured through the high efficiency of machine learning.
Second embodiment:
a second embodiment of the present invention provides an electronic device, see fig. 6, comprising:
the acquiring module 601 is configured to acquire an assembly code of software to be detected, where the assembly code is a first assembly code;
the preprocessing module 602 is configured to perform a preset process on the first assembly code, where the number of assembly symbols that form each line of code in the first assembly code after the preset process is equal;
A replacing module 603, configured to obtain a replacing code corresponding to each assembly symbol in the first assembly code, replace each assembly symbol with the replacing code corresponding to each assembly symbol, and obtain a code file corresponding to the software to be detected, where a character length of each replacing code is the same;
The judging module 604 is configured to obtain a trained deep neural network, input the encoded file into the deep neural network, obtain an output result of the deep neural network, and judge whether the software to be detected is malicious software according to the output result, where the deep neural network is obtained by training based on a known malicious software training sample, and the malicious software training sample is an encoded file obtained by using the known malicious software as the software to be detected and processing the encoded file by the obtaining module, the preprocessing module and the replacing module.
Optionally, the preprocessing module 602 is configured to perform filling processing on a preset field of the first assembly code, so that the number of assembly symbols in each line of code of the first assembly code is equal, where the preset field does not belong to an assembly symbol that appears in the first assembly code before filling, and the preset field is regarded as an assembly symbol.
Further, a preprocessing module 602, configured to determine the number of assembly symbols of each line of codes in the first assembly code, and take the maximum value in the number of assembly symbols as a target number; and for each row of codes with the number of assembly symbols lower than the target number in the first assembly code, filling the preset field from the tail end of each row of codes until the number of assembly symbols of each row of codes is the target number.
Optionally, the preprocessing module 602 is further configured to split the first assembly code according to a preset splitting rule before performing filling processing of a preset field on the first assembly code.
Optionally, a replacing module 603 is configured to obtain a preset mapping relationship between the assembly symbol and the replacing code; and searching for the replacement codes corresponding to the assembly symbols in the first assembly code in the preset mapping relation.
Optionally, the obtaining module 601 of the present embodiment is further configured to obtain assembly codes of a plurality of malicious software, where the assembly codes are used as second assembly codes;
The preprocessing module 602 is further configured to perform a preset process on the second assembly code, where the number of assembly symbols that form each line of code in the second assembly code after the preset process is equal;
And the replacing module 603 is further configured to obtain a replacing code corresponding to each assembly symbol in the second assembly code, replace each assembly symbol with the replacing code corresponding to each assembly symbol, and obtain the code file corresponding to the malicious software, where the character length of each replacing code is the same.
Optionally, the electronic device of this embodiment further includes a training module, configured to train the deep neural network by using the code file corresponding to the malicious software as a malicious software training sample of the deep neural network; and stopping training the deep neural network when the deep neural network meets the training completion condition.
Further, the training module is configured to determine that the deep neural network meets a training completion condition when the training frequency of the deep neural network reaches a preset frequency threshold, and stop training the deep neural network; or when the recognition accuracy of the deep neural network to the malicious software reaches a preset accuracy threshold, judging that the deep neural network meets the training completion condition, and stopping training the deep neural network.
By adopting the electronic device of the embodiment, the assembly code of the software to be detected can be obtained as the first assembly code, and the first assembly code is subjected to preset processing so that the number of assembly symbols of each row of codes in the first assembly code is equal; acquiring a replacement code corresponding to each assembly symbol in the first assembly code, and replacing the corresponding assembly symbol with each replacement code to obtain a code file; the electronic device obtains the coding characteristics of the software to be detected based on the coding processing of the assembly codes of the software to be detected, so that the commonality among the malicious software is easier to find by the deep neural network, the universality of the method for identifying various types of malicious software is improved, the trained deep neural network is obtained after the electronic device obtains the coding file, the coding file is input, whether the software to be detected is the malicious software or not is judged according to the output result of the deep neural network, and the effective identification of the malicious software can be further ensured through the high efficiency of machine learning.
Third embodiment:
referring to fig. 7, fig. 7 is a schematic diagram of an electronic device according to a third embodiment of the invention. The electronic device may be used to implement the method of malware detection in the embodiment shown in fig. 2. As shown in fig. 7, the electronic device mainly includes:
memory 701, processor 702, bus 703, and a computer program stored on memory 701 and executable on processor 702, the memory 701 and processor 702 being connected by bus 703. The processor 702, when executing the computer program, implements the method of malware detection in the embodiment shown in fig. 2. The number of processors may be one or more, which is not limited in this embodiment.
The memory 701 may be a high-speed random access memory (RAM, random Access Memory) memory or a non-volatile memory (non-volatile memory), such as a disk memory. The memory 701 is used for storing executable program elements, and the processor 702 is coupled with the memory 701.
Further, an embodiment of the present application further provides a computer readable storage medium, which may be provided in the electronic device in each of the foregoing embodiments, and the computer readable storage medium may be a memory in the foregoing embodiment shown in fig. 7.
The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method of malware detection in the embodiment shown in fig. 2. Further, the computer-readable medium may be any medium capable of storing a program element, such as a usb (universal serial bus), a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk.
By adopting the electronic device of the embodiment, based on the coding processing of the assembly code of the software to be detected, the coding characteristics of the software to be detected can be obtained, so that the commonality among the malicious software can be found more easily by the deep neural network, the universality of the method for identifying various types of malicious software is improved, and the method for identifying the malicious software by adopting the deep neural network can further ensure the effective identification of the malicious software.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.
The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a readable storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned readable storage medium includes: various media capable of storing program elements, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk.
It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The foregoing describes a method for detecting malware, an electronic device, and a computer readable storage medium provided by the present application, and those skilled in the art should not understand the description to limit the present application to any extent in light of the foregoing, and the detailed description and examples of the present application should not be construed as limiting the scope of the present application.

Claims (7)

1. A method for detecting malware, comprising:
step 1, acquiring assembly codes of software to be detected, wherein the assembly codes are first assembly codes;
Step 2, carrying out preset processing on the first assembly codes, wherein the number of assembly symbols forming each row of codes in the preset processed first assembly codes is equal;
Step3, obtaining replacement codes corresponding to all assembly symbols in the first assembly code, and replacing all assembly symbols by the replacement codes corresponding to all assembly symbols to obtain a code file corresponding to the software to be detected, wherein the character length of all the replacement codes is the same;
step 4, obtaining a trained deep neural network, inputting the encoded file into the deep neural network, obtaining an output result of the deep neural network, and judging whether the software to be detected is malicious software according to the output result, wherein the deep neural network is obtained by training based on a known malicious software training sample, and the malicious software training sample is the encoded file obtained by taking the known malicious software as the software to be detected and processing in the steps 1-3;
The presetting processing of the first assembly code comprises the following steps: filling a preset field in the first assembly code to make the number of assembly symbols in each row of codes of the first assembly code equal, wherein the preset field does not belong to the assembly symbols appearing in the first assembly code before filling, and the preset field is regarded as the assembly symbols;
Filling the preset field of the first assembly code so that the number of assembly symbols in each row of codes of the first assembly code is equal comprises the following steps: determining the number of assembly symbols of each row of codes in the first assembly code, and taking the maximum value in the number of assembly symbols as a target number; for each row of codes with the number of assembly symbols lower than the target number in the first assembly code, filling the preset field from the tail end of each row of codes until the number of assembly symbols of each row of codes is the target number;
before the filling processing of the preset field is performed on the first assembly code, the method further comprises the following steps: splitting the first assembly code according to a preset splitting rule.
2. The method for detecting malicious software according to claim 1, wherein the obtaining the replacement code corresponding to each assembly symbol in the first assembly code comprises:
Acquiring a preset mapping relation between the assembly symbols and the substitution codes;
and searching for the replacement codes corresponding to the assembly symbols in the first assembly code in the preset mapping relation.
3. The method for detecting malware according to claim 1, further comprising, before the acquiring the assembly code of the software to be detected:
acquiring assembly codes of a plurality of malicious software, wherein the assembly codes are used as second assembly codes;
The second assembly codes are subjected to preset processing, wherein the number of assembly symbols forming each row of codes in the second assembly codes after the preset processing is equal;
Acquiring a replacement code corresponding to each assembly symbol in the second assembly code, and replacing each assembly symbol by the replacement code corresponding to each assembly symbol to obtain a code file corresponding to the malicious software, wherein the character length of each replacement code is the same;
Taking the code file corresponding to the malicious software as a malicious software training sample of a deep neural network, and training the deep neural network;
And stopping training the deep neural network when the deep neural network meets the training completion condition.
4. The method for detecting malware according to claim 3, wherein stopping training of the deep neural network when the deep neural network satisfies a training completion condition comprises:
when the training times of the deep neural network reach a preset time threshold, judging that the deep neural network meets the training completion condition, and stopping training the deep neural network;
or when the recognition accuracy of the deep neural network to the malicious software reaches a preset accuracy threshold, judging that the deep neural network meets the training completion condition, and stopping training the deep neural network.
5. An electronic device, comprising:
The acquisition module is used for acquiring assembly codes of the software to be detected, wherein the assembly codes are first assembly codes;
the preprocessing module is used for splitting the first assembly code according to a preset splitting rule; determining the number of assembly symbols of each row of codes in the first assembly code, and taking the maximum value in the number of assembly symbols as a target number; for each row of codes with the number of assembly symbols lower than the target number in the first assembly code, filling a preset field from the tail end of each row of codes until the number of assembly symbols of each row of codes is the target number; wherein the preset field does not belong to an assembly symbol occurring in the first assembly code before population, and the preset field is regarded as an assembly symbol; the number of assembly symbols forming each row of codes in the first preset assembly code is equal;
The replacing module is used for acquiring the replacing codes corresponding to all the assembly symbols in the first assembly code, replacing all the assembly symbols with the replacing codes corresponding to all the assembly symbols to obtain the code file corresponding to the software to be detected, wherein the character length of each replacing code is the same;
The judgment module is used for acquiring the trained deep neural network, inputting the coding file into the deep neural network, acquiring an output result of the deep neural network, and judging whether the software to be detected is malicious software according to the output result, wherein the deep neural network is obtained by training based on a known malicious software training sample, and the malicious software training sample is the coding file obtained by taking the known malicious software as the software to be detected and processing the software to be detected through the acquisition module, the preprocessing module and the replacement module.
6. An electronic device, comprising: memory, a processor and a computer program stored on said memory and executable on said processor, characterized in that the processor implements the steps of the method according to any of claims 1-4 when executing said computer program.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1-4.
CN201910528130.5A 2019-06-18 2019-06-18 Method for detecting malicious software, electronic device and computer readable storage medium Active CN110245494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910528130.5A CN110245494B (en) 2019-06-18 2019-06-18 Method for detecting malicious software, electronic device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910528130.5A CN110245494B (en) 2019-06-18 2019-06-18 Method for detecting malicious software, electronic device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110245494A CN110245494A (en) 2019-09-17
CN110245494B true CN110245494B (en) 2024-05-24

Family

ID=67887899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910528130.5A Active CN110245494B (en) 2019-06-18 2019-06-18 Method for detecting malicious software, electronic device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110245494B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989288A (en) * 2015-12-31 2016-10-05 武汉安天信息技术有限责任公司 Deep learning-based malicious code sample classification method and system
CN108804919A (en) * 2018-05-03 2018-11-13 上海交通大学 The homologous determination method of malicious code based on deep learning
CN109101815A (en) * 2018-07-27 2018-12-28 平安科技(深圳)有限公司 A kind of malware detection method and relevant device
KR20190049286A (en) * 2017-11-01 2019-05-09 국민대학교산학협력단 Cnn learning based malware analysis apparatus, cnn learning based malware analysis method of performing the same and storage media storing the same
CN109829306A (en) * 2019-02-20 2019-05-31 哈尔滨工程大学 A kind of Malware classification method optimizing feature extraction

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6952679B2 (en) * 2015-07-15 2021-10-20 サイランス・インコーポレイテッドCylance Inc. Malware detection
US20170068816A1 (en) * 2015-09-04 2017-03-09 University Of Delaware Malware analysis and detection using graph-based characterization and machine learning
US20190042743A1 (en) * 2017-12-15 2019-02-07 Intel Corporation Malware detection and classification using artificial neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989288A (en) * 2015-12-31 2016-10-05 武汉安天信息技术有限责任公司 Deep learning-based malicious code sample classification method and system
KR20190049286A (en) * 2017-11-01 2019-05-09 국민대학교산학협력단 Cnn learning based malware analysis apparatus, cnn learning based malware analysis method of performing the same and storage media storing the same
CN108804919A (en) * 2018-05-03 2018-11-13 上海交通大学 The homologous determination method of malicious code based on deep learning
CN109101815A (en) * 2018-07-27 2018-12-28 平安科技(深圳)有限公司 A kind of malware detection method and relevant device
CN109829306A (en) * 2019-02-20 2019-05-31 哈尔滨工程大学 A kind of Malware classification method optimizing feature extraction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于灰度图纹理指纹的恶意软件分类;张晨斌 等;计算机科学;20180630;第45卷(第6A期);第383-386页 *

Also Published As

Publication number Publication date
CN110245494A (en) 2019-09-17

Similar Documents

Publication Publication Date Title
US10284577B2 (en) Method and apparatus for file identification
WO2019227710A1 (en) Network public opinion analysis method and apparatus, and computer-readable storage medium
Sureka et al. Detecting duplicate bug report using character n-gram-based features
CN109582833B (en) Abnormal text detection method and device
CN109063055B (en) Method and device for searching homologous binary files
CN111241389B (en) Sensitive word filtering method and device based on matrix, electronic equipment and storage medium
KR101858620B1 (en) Device and method for analyzing javascript using machine learning
CN110991171B (en) Sensitive word detection method and device
US20110218947A1 (en) Ontological categorization of question concepts from document summaries
CN103605691A (en) Device and method used for processing issued contents in social network
CN109933502B (en) Electronic device, user operation record processing method and storage medium
CN113051356A (en) Open relationship extraction method and device, electronic equipment and storage medium
CN111930610B (en) Software homology detection method, device, equipment and storage medium
Tian et al. Fine-grained compiler identification with sequence-oriented neural modeling
CN112860855A (en) Information extraction method and device and electronic equipment
CN110245357B (en) Main entity identification method and device
CN104685493A (en) Dictionary creation device for monitoring text information, dictionary creation method for monitoring text information, and dictionary creation program for monitoring text information
CN112579937A (en) Character highlight display method and device
CN110245494B (en) Method for detecting malicious software, electronic device and computer readable storage medium
CN114004277A (en) Small sample threat risk early warning method and device based on deep learning
CN113688239A (en) Text classification method and device under few samples, electronic equipment and storage medium
CN108287831B (en) URL classification method and system and data processing method and system
CN111125704B (en) Webpage Trojan horse recognition method and system
Chowdhary et al. PIMiner: a web tool for extraction of protein interactions from biomedical literature
US11574053B1 (en) System and method for detecting malicious scripts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant