CN114491528A - Malicious software detection method, device and equipment - Google Patents

Malicious software detection method, device and equipment Download PDF

Info

Publication number
CN114491528A
CN114491528A CN202111562465.2A CN202111562465A CN114491528A CN 114491528 A CN114491528 A CN 114491528A CN 202111562465 A CN202111562465 A CN 202111562465A CN 114491528 A CN114491528 A CN 114491528A
Authority
CN
China
Prior art keywords
information
software
characteristic information
vector
target software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111562465.2A
Other languages
Chinese (zh)
Inventor
张栋栋
齐向东
吴云坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Original Assignee
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianxin Technology Group Co Ltd, Secworld Information Technology Beijing Co Ltd filed Critical Qianxin Technology Group Co Ltd
Priority to CN202111562465.2A priority Critical patent/CN114491528A/en
Publication of CN114491528A publication Critical patent/CN114491528A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Abstract

The embodiment of the invention provides a method, a device and equipment for detecting malicious software. The method comprises the following steps: extracting first characteristic information and second characteristic information of the target software based on the installation package of the target software; the first characteristic information is used for representing static attribute information of the software; the second characteristic information is used for representing instruction information of the software; respectively carrying out vector conversion on the first characteristic information and the second characteristic information, and splicing the converted vectors to obtain a characteristic vector of the target software; and acquiring a detection result of the target software by using the trained detection model according to the feature vector, wherein the detection result is used for indicating whether the target software is malicious software or not. The method of the embodiment of the invention effectively realizes the detection of the malicious software.

Description

Malicious software detection method, device and equipment
Technical Field
The invention relates to the technical field of information security, in particular to a method, a device and equipment for detecting malicious software.
Background
At present, a large amount of malicious software is installed and operated on user equipment under the condition of not being licensed by a user, which violates the legal rights and interests of the user, threatens the information and property safety of an end user all the time, and poses safety threats to the user and the equipment.
In the prior art, under the condition that a software code is not operated, a decompiling tool is used for converting software into a readable text file, and then whether the software is malicious software is detected; but a large amount of unnecessary intermediate files are generated in the decompilation process, and a large amount of unnecessary disk read-write operations are increased; under the background that massive software needs to be detected, a large amount of time is consumed in the decompiling process, so that the malicious software detection efficiency is low, and the requirement for rapidly detecting the malicious software in a large-scale software scene cannot be met.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiments of the present invention provide a method, an apparatus, and a device for detecting malicious software.
Specifically, the embodiment of the invention provides the following technical scheme:
in a first aspect, an embodiment of the present invention provides a method for detecting malware, including:
extracting first characteristic information and second characteristic information of target software based on an installation package of the target software; the first characteristic information is used for representing static attribute information of the software; the second characteristic information is used for representing instruction information of software;
respectively carrying out vector conversion on the first characteristic information and the second characteristic information, and splicing the converted vectors to obtain a characteristic vector of the target software;
and acquiring a detection result of the target software by using the trained detection model according to the feature vector, wherein the detection result is used for indicating whether the target software is malicious software or not.
Further, respectively performing vector conversion on the first feature information and the second feature information, and splicing the converted vectors to obtain a feature vector of the target software, including:
performing hash processing on the first characteristic information to obtain a numerical value vector;
performing vector conversion on the second characteristic information to obtain a histogram vector;
and splicing the numerical value vector and the histogram vector to obtain the feature vector.
Further, the first feature information includes: first attribute information and URL information; the first attribute information includes at least one of: component information, permission information, and component invocation information.
Further, the second feature information includes: a byte code sequence, wherein the vector conversion is performed on the second feature information to obtain a histogram vector, and the method includes:
acquiring an operation code in the byte code sequence and establishing a histogram; the histogram comprises at least one type of operation codes and the occurrence times of the various types of operation codes;
and carrying out vector representation on the histogram to obtain a histogram vector.
Further, the method comprises:
and extracting URL information by using a regular expression, and determining the URL information as the first characteristic information.
Further, the extracting the first feature information and the second feature information of the target software based on the installation package of the target software includes:
extracting the first characteristic information from a manifest file and an executable file of the installation package; extracting the second feature information from the executable file.
Further, the method further comprises:
building a lifting tree model;
training the lifting tree model by using training data to obtain the detection model; the training data includes: feature vectors of a plurality of pieces of software and a label of whether each of the pieces of software is malware.
Further, the target software comprises android software.
In a second aspect, an embodiment of the present invention further provides a device for detecting malware, including:
the extraction module is used for extracting first characteristic information and second characteristic information of the target software based on the installation package of the target software; the first characteristic information is used for representing static attribute information of the software; the second characteristic information is used for representing instruction information of software;
the conversion module is used for respectively carrying out vector conversion on the first characteristic information and the second characteristic information and splicing the converted vectors to obtain a characteristic vector of the target software;
and the detection module is used for acquiring a detection result of the target software by utilizing the trained detection model according to the feature vector, wherein the detection result is used for indicating whether the target software is malicious software or not.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method for detecting malware according to the first aspect when executing the program.
In a fourth aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the malware detection method according to the first aspect.
In a fifth aspect, the present invention further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps of the method for detecting malware according to the first aspect are implemented.
According to the detection method, the device and the equipment for the malicious software, provided by the embodiment of the invention, the first characteristic information used for representing the static attribute information of the software and the second characteristic information used for representing the instruction information of the software are extracted from the installation package of the software, so that the problem of overlong consumed time caused by the fact that a large number of intermediate files need to be generated in the traditional decompiling method is solved, and the extraction speed of the characteristic information is greatly improved; and then, vector conversion is carried out on the first characteristic information and the second characteristic information respectively, and the converted vectors are spliced and input into a detection model, so that the detection of the malicious software can be realized efficiently.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a malware detection method according to an embodiment of the present invention;
fig. 2 is another schematic flowchart of a malware detection method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a malware detection apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method provided by the embodiment of the invention can be applied to an information security scene, and the detection of the malicious software is efficiently realized.
The method provided by the embodiment of the invention can be applied to an android system, and currently, android occupies 73% of the market share of global mobile terminals, and more than 30 hundred million android active devices exist. The android application store of the mobile terminal provides millions of software, about 1% of which may be malware.
In the related art, under the condition that a software code is not operated, a decompiling tool is used for converting software into a readable text file, and then whether the software is malicious software is detected; but a large amount of unnecessary intermediate files are generated in the decompilation process, and a large amount of unnecessary disk read-write operations are increased; under the background that massive software needs to be detected, a large amount of time is consumed in the decompiling process, so that the malicious software detection efficiency is low, and the requirement for rapidly detecting the malicious software in a large-scale software scene cannot be met.
According to the detection method of the malicious software, the first characteristic information used for representing the static attribute information of the software and the second characteristic information used for representing the instruction information of the software are extracted from the installation package of the software, so that the problem that time consumption is too long due to the fact that a large number of intermediate files need to be generated in a traditional decompiling method is solved, and the extraction speed of the characteristic information is greatly improved; and then, vector conversion is carried out on the first characteristic information and the second characteristic information respectively, and the converted vectors are spliced and input into a detection model, so that the detection of the malicious software can be realized efficiently.
The following takes an android malware detection method as an example, and details a technical solution of the present invention with a specific embodiment in conjunction with fig. 1 to 4. These several specific embodiments may be combined with each other below, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 is a flowchart illustrating a malware detection method according to an embodiment of the present invention. As shown in fig. 1, the method provided by this embodiment includes:
step 101, extracting first characteristic information and second characteristic information of target software based on an installation package of the target software; the first characteristic information is used for representing static attribute information of the software; the second characteristic information is used for representing instruction information of the software; wherein the first characteristic information includes: first attribute information and URL information; the first attribute information includes at least one of: component information, permission information, and component invocation information.
Specifically, the target software may be android software, for example, system software, or other software such as application software, and the embodiment of the present invention is not limited. The installation Package of the Android software takes an Android Package (APK) as an extension, and binary files contained in the APK all exist in a specific format constraint mode, so that the characteristic information of the software can be directly acquired from the binary files decompressed from the APK compression Package; in the traditional mode, an executable file is converted into a readable text file by using a decompilation tool to obtain characteristic information representing the software, a large amount of unnecessary intermediate files need to be generated, so that a large amount of unnecessary disk reading and writing operations and text processing are increased, and the speed of extracting the characteristic information is seriously slowed down; compared with the traditional decompilation method, the method has the advantages that the characteristic information representing the software is extracted from the installation package directly and quickly, the extraction speed of the characteristic information is greatly improved, and the problem that a large amount of intermediate files need to be generated in the decompilation process, so that the consumed time is too long is solved.
The installation package of the android software contains an application Manifest file (Manifest), one or more virtual machine Executable files (DEX) containing program code, and some resource files. Acquiring first characteristic information and second characteristic information of software through analysis; the first characteristic information is used for representing static attribute information of the software; the second characteristic information is used for representing instruction information of the software; the first characteristic information is used for representing static attribute information of the software; the method comprises the following steps of extracting first characteristic information from a Manifest file by using an AXMLPrinter2 tool, and extracting first characteristic information from a DEX file by using a LIEF tool: first attribute information and URL information; the first attribute information includes at least one of: component information, permission information and component calling information; the extracted component information includes, for example, Activity, Receiver, and Service component information. The component information, the authority information, the component calling information and the URL information included by the software are acquired by acquiring the first characteristic information including the first attribute information and the URL information, and basic data and judgment basis are provided for detecting whether the software provides malicious software or not through the acquisition of the information, so that the detection result is more accurate.
The second characteristic information includes a byte code sequence; the byte code sequence comprises an operation code and an operand, wherein the operation code indicates an operation executed by an instruction and reflects the intention of software to a certain extent. Therefore, the content of the feature information can be enriched by extracting the second feature information of the target software, the extraction speed is high, and the detection result is more comprehensive, accurate and rapid.
102, respectively carrying out vector conversion on the first characteristic information and the second characteristic information, and splicing the converted vectors to obtain a characteristic vector of the target software;
specifically, vector conversion is carried out on the first characteristic information and the second characteristic information respectively, and the converted vectors are spliced to obtain a characteristic vector of the target software; the static attribute information extracted after the software installation package is analyzed is converted into a vector A, the instruction information extracted after the software installation package is analyzed is converted into a vector B, and then the vector A and the vector B are spliced to obtain the feature vector of the software.
And 103, acquiring a detection result of the target software by using the trained detection model according to the feature vector, wherein the detection result is used for indicating whether the target software is malicious software or not.
Specifically, a detection result of the target software is obtained by using the trained detection model according to the feature vector, and the detection result is used for indicating whether the target software is malicious software or not; the feature vectors of the software obtained by splicing are input into the trained detection model, and then the feature vectors can be predicted, so that the detection task is realized.
According to the method, the first characteristic information used for representing the static attribute information of the software and the second characteristic information used for representing the instruction information of the software are extracted from the installation package of the software, so that the problem that time consumption is too long due to the fact that a large number of intermediate files need to be generated in a traditional decompilation method is solved, and the extraction speed of the characteristic information is greatly improved; and then, vector conversion is carried out on the first characteristic information and the second characteristic information respectively, and the converted vectors are spliced and input into a detection model, so that the detection of the malicious software can be realized efficiently.
In an embodiment, the vector conversion of the first feature information and the second feature information is performed respectively, and the converted vectors are spliced to obtain the feature vector of the target software, and the method includes:
performing hash processing on the first characteristic information to obtain a numerical value vector;
performing vector conversion on the second characteristic information to obtain a histogram vector;
and splicing the numerical value vector and the histogram vector to obtain a feature vector.
Specifically, the first characteristic information is subjected to hash processing to obtain a numerical value vector; the first characteristic information is used for representing static attribute information of the software; the characteristic hash-based vectorization method utilizes the hash function to convert the original characteristics into low-dimensional numerical vectors under the condition of not losing the characteristic information as much as possible, supports online learning, has low model operation cost, is convenient for model deployment and update iteration, and has high speed and little memory consumption. Performing vector conversion on the second characteristic information to obtain a histogram vector; the second characteristic information is used for representing instruction information of the software, and the second characteristic information comprises a byte code sequence. By extracting the second characteristic information and constructing a DEX operation code histogram vector, the model classification effect can be improved. And finally, splicing the numerical value vector and the histogram vector to obtain a feature vector which is used as the feature representation of a single APK and input into the machine learning model.
According to the method, the hash processing is performed on the first characteristic information, so that the model for obtaining the numerical vector is low in running cost, convenient for model deployment and updating iteration, high in speed and low in memory consumption; the second feature information is subjected to vector conversion to obtain a histogram vector, so that the model classification effect can be improved; and finally, splicing the data vector and the histogram vector to obtain a feature vector, namely obtaining the feature vector which can be used as the input of the detection model.
In one embodiment, the second characteristic information includes: the byte code sequence, which performs vector conversion on the histogram to obtain a histogram vector, includes:
acquiring an operation code in the byte code sequence and establishing a histogram; the histogram comprises at least one type of operation codes and the occurrence times of the various types of operation codes; and carrying out vector representation on the histogram to obtain a histogram vector.
Specifically, the installation package of the android software includes an application Manifest file, a Manifest file, one or more DEX executable files containing program code, and some resource files. And extracting second characteristic information: extracting second characteristic information from the file of the DEX by using a Library to Executable format (LIEF) tool for detecting an Executable format; in this case, for the case of multiple DEXs, since they range from 0x00 to 0xFF, the necessary merging can be performed to ensure that the characteristic dimensions are consistent for each APK. The extracted second feature information includes: a byte code sequence; the byte code sequence comprises operation codes and operands of DEX instructions, the range of the operation codes is 0x00-0xFF, because each DEX instruction has a specific length, the operation codes are extracted by adopting a linear scanning algorithm to obtain an operation code histogram vector, wherein the Davlik instruction set comprises at least one operation code (the range is 0x00-0xFF), so that a 256-dimensional histogram vector can be constructed, and the value of each position in the histogram vector records the occurrence frequency of the operation code.
The method of the above embodiment generates a histogram vector by extracting second feature information including the bytecode sequence and according to an operation code in the bytecode sequence; by adding the feature information of the operation codes and the feature vectors thereof, the model classification effect is improved, and the detection result is more accurate.
In one embodiment, a method for detecting malware includes:
and extracting URL information by using a regular expression, and determining the URL information as the first characteristic information.
Specifically, the URL information in the file of the software installation package is a special type, because the URL information indicates a network address that may be requested when the software runs, it is important to detect whether the software is malicious software; therefore, the regular expression is used for screening out URL information from the files of the software installation package, namely the regular expression is used for matching character strings representing URLs, basic data are provided for detection of malicious software, and detection accuracy is improved.
According to the method, the regular expression is used for extracting the URL information, namely the regular expression is used for matching the character strings representing the URL, the network address required by the software in operation can be obtained, and whether the software is malicious software or not can be further analyzed according to the required network address information; therefore, the URL information is extracted by the regular expression, necessary basic data are provided for the detection of the malicious software, and the detection accuracy is improved.
In one embodiment, extracting the first feature information and the second feature information of the target software based on the installation package of the target software includes:
extracting first characteristic information from a manifest file and an executable file of the installation package;
second feature information is extracted from the executable file.
APK is used as an extension name, and is essentially a compression package in a Zip format, and binary files contained in the APK all exist in specific format constraints, so that the binary files decompressed from the APK compression package can be directly analyzed to obtain characteristic information representing the software. The installation package of the android software comprises an application Manifest file Manifest file, one or more DEX executable files containing program codes and a plurality of resource files. Extracting first characteristic information from a Manifest file Manifest and an executable file DEX of the installation package, wherein the first characteristic information is extracted from the Manifest file by using an AXMLPrinter2 tool, and the first characteristic information is extracted from the DEX file by using a LIEF tool; the first characteristic information is used for representing static attribute information of the software; the first feature information includes: first attribute information and URL information; the first attribute information includes at least one of: component information, permission information, and component invocation information. Extracting the second feature information from the executable file DEX includes: extracting second characteristic information from the file of the DEX by using a LIEF tool; wherein, for the case of multiple DEXs, since the range is 0x00-0xFF, necessary merging can be performed to ensure that the characteristic dimension is consistent for each APK; the second characteristic information is used for representing instruction information of the software; the second feature information includes: a byte code sequence.
The method of the above embodiment extracts the first characteristic information from the manifest file and the executable file of the installation package; extracting second characteristic information from the executable file; abundant basic data are provided for detecting whether the software is malicious software or not, and the detection result is more accurate.
In one embodiment, the detection method further comprises:
building a lifting tree model;
training the lifting tree model by using the training data to obtain a detection model; the training data includes: feature vectors of the plurality of software and a label of whether each software is malware.
Specifically, a machine learning algorithm is used to establish a classification model, for example, a lifting tree model in an XGBoost tool, and training data is used to train the lifting tree model, where the training data includes: feature vectors of a plurality of pieces of software and whether each piece of software is a label of malicious software or not are obtained, and therefore learning is carried out on a labeled training set; predicting the software to be detected by using the trained detection model, thereby completing intelligent automatic detection of the android malicious software and obtaining a detection result of the target software, wherein the detection result is used for indicating whether the target software is the malicious software; and a benign/malicious classification result can be obtained, whether the sample is a malicious program or not is judged, and the detection of malicious software is completed.
It should be noted that the detection model may be a model established by other machine learning algorithms, for example, a convolutional neural network model, a decision tree model, and the like, which is not limited in the embodiment of the present invention.
In the method of the embodiment, the lifting tree model is established before software detection;
training the lifting tree model by using the feature vectors of a plurality of pieces of software and training data of whether each piece of software is a label of malicious software or not to obtain a detection model; through the trained detection model, the detection result is more accurate due to the detection of the malicious software.
For example, the specific flow of the detection method of the malware is shown in fig. 2:
step 1, extracting characteristic information;
the APK is used as an extension name and is essentially a compression package in a Zip format, and binary files contained in the APK all exist in specific format constraints, so that the binary files decompressed from the APK compression package can be directly analyzed to obtain characteristic information representing the software; the installation package of the android software comprises an application Manifest file Manifest file, one or more DEX executable files containing program codes and a plurality of resource files.
Extracting first characteristic information from a Manifest file Manifest and an executable file DEX of the installation package, wherein the first characteristic information is extracted from the Manifest file by using an AXMLPrinter2 tool, and the first characteristic information is extracted from the DEX file by using a LIEF tool; the first characteristic information is used for representing static attribute information of the software; the first feature information includes: first attribute information and URL information; the first attribute information includes at least one of: component information, permission information, and component invocation information.
Extracting the second feature information from the executable file DEX includes: extracting second characteristic information from the file of the DEX by using a LIEF tool; wherein, for the case of multiple DEXs, since the range is 0x00-0xFF, necessary merging can be performed to ensure that the characteristic dimension is consistent for each APK; the second characteristic information is used for representing instruction information of the software; the second characteristic information includes: a byte code sequence. The byte code sequence comprises operation codes and operands of DEX instructions, the range of the operation codes is 0x00-0xFF, because each DEX instruction has a specific length, the operation codes are extracted by adopting a linear scanning algorithm to obtain an operation code histogram vector, wherein the Davlik instruction set comprises at least one operation code (the range is 0x00-0xFF), so that a 256-dimensional histogram vector can be constructed, and the value of each position in the histogram vector records the occurrence frequency of the operation code.
Step 2, vector conversion is carried out;
specifically, the first characteristic information is subjected to hash processing to obtain a numerical value vector; before the Hash processing, extracting URL information by using a regular expression; the first characteristic information is used for representing static attribute information of the software; the characteristic hash-based vectorization method utilizes the hash function to convert the original characteristics into low-dimensional numerical vectors under the condition of not losing the characteristic information as much as possible, supports online learning, has low model operation cost, is convenient for model deployment and update iteration, and has high speed and little memory consumption.
Performing vector conversion on the second characteristic information to obtain a histogram vector; the second characteristic information is used for representing instruction information of the software, and the second characteristic information comprises a byte code sequence. By extracting the second characteristic information and constructing a DEX operation code histogram vector, the model classification effect can be improved.
And finally, splicing the numerical value vector and the histogram vector to obtain a feature vector, and inputting the feature vector serving as the feature representation of a single APK into the machine learning model.
Step 3, obtaining a detection result;
specifically, a machine learning algorithm is used to establish a classification model, for example, a lifting tree model in an XGBoost tool, and training data is used to train the lifting tree model, where the training data includes: feature vectors of a plurality of pieces of software and whether each piece of software is a label of malicious software or not are obtained, and therefore learning is carried out on a labeled training set; predicting software to be detected by using the trained detection model, thereby completing intelligent automatic detection of the android malicious software and obtaining a detection result of the target software, wherein the detection result is used for indicating whether the target software is the malicious software; and a benign/malicious classification result can be obtained, whether the sample is a malicious program or not is judged, and the detection of malicious software is completed.
The following describes the apparatus for detecting malware according to the present invention, and the apparatus for detecting malware described below and the method for detecting malware described above may be referred to in correspondence with each other.
Fig. 3 is a schematic structural diagram of a malware detection apparatus provided in the present invention. The detection apparatus for malicious software provided by this embodiment includes:
an extraction module 710, configured to extract first feature information and second feature information of target software based on an installation package of the target software; the first characteristic information is used for representing static attribute information of the software; the second characteristic information is used for representing instruction information of the software;
the conversion module 720 is configured to perform vector conversion on the first feature information and the second feature information respectively, and splice the converted vectors to obtain a feature vector of the target software;
the detection module 730 is configured to obtain a detection result of the target software according to the feature vector by using the trained detection model, where the detection result is used to indicate whether the target software is malware.
Optionally, the conversion module 720 is specifically configured to:
performing hash processing on the first characteristic information to obtain a numerical value vector;
performing vector conversion on the second characteristic information to obtain a histogram vector;
and splicing the numerical value vector and the histogram vector to obtain a feature vector.
Optionally, the extracting module 710 is specifically configured to:
extracting the first feature information includes: first attribute information and URL information; the first attribute information includes at least one of: component information, permission information, and component invocation information.
Optionally, the conversion module 720 is specifically configured to:
the second characteristic information includes: the byte code sequence performs vector conversion on the second feature information to obtain a histogram vector, and the method includes:
acquiring an operation code in the byte code sequence and establishing a histogram; the histogram comprises at least one type of operation codes and the occurrence times of the various types of operation codes; and carrying out vector representation on the histogram to obtain a histogram vector. Optionally, the extracting module 710 is specifically configured to:
and extracting URL information by using the regular expression.
Optionally, the extracting module 710 is specifically configured to:
extracting first characteristic information from a manifest file and an executable file of the installation package;
second feature information is extracted from the executable file.
Optionally, the detecting module 730 is specifically configured to:
building a lifting tree model;
training the lifting tree model by using the training data to obtain a detection model; the training data includes: feature vectors of the plurality of software and a label of whether each software is malware.
It should be noted that the method in the embodiment of the present invention may also be used in other systems, such as IOS, for example, and the embodiment of the present invention is not limited thereto.
The apparatus of the embodiment of the present invention is configured to perform the method of any of the foregoing method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. The processor 810 may call logic instructions in the memory 830 to perform a malware detection method comprising: extracting first characteristic information and second characteristic information of the target software based on the installation package of the target software; the first characteristic information is used for representing static attribute information of the software; the second characteristic information is used for representing instruction information of the software; respectively carrying out vector conversion on the first characteristic information and the second characteristic information, and splicing the converted vectors to obtain a characteristic vector of the target software; and acquiring a detection result of the target software by using the trained detection model according to the feature vector, wherein the detection result is used for indicating whether the target software is malicious software or not.
In addition, the logic instructions in the memory 830 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the malware detection method provided by the above methods, the method comprising: extracting first characteristic information and second characteristic information of the target software based on the installation package of the target software; the first characteristic information is used for representing static attribute information of the software; the second characteristic information is used for representing instruction information of the software; respectively carrying out vector conversion on the first characteristic information and the second characteristic information, and splicing the converted vectors to obtain a characteristic vector of the target software; and acquiring a detection result of the target software by using the trained detection model according to the feature vector, wherein the detection result is used for indicating whether the target software is malicious software or not.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the above-provided malware detection method, the method including: extracting first characteristic information and second characteristic information of the target software based on the installation package of the target software; the first characteristic information is used for representing static attribute information of the software; the second characteristic information is used for representing instruction information of the software; respectively carrying out vector conversion on the first characteristic information and the second characteristic information, and splicing the converted vectors to obtain a characteristic vector of the target software; and acquiring a detection result of the target software by using the trained detection model according to the feature vector, wherein the detection result is used for indicating whether the target software is malicious software or not.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be substantially or partially embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer,
a server, or a network device, etc.) performs the methods described in the various embodiments or portions of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (12)

1. A method for malware detection, comprising:
extracting first characteristic information and second characteristic information of target software based on an installation package of the target software; the first characteristic information is used for representing static attribute information of the software; the second characteristic information is used for representing instruction information of software;
respectively carrying out vector conversion on the first characteristic information and the second characteristic information, and splicing the converted vectors to obtain a characteristic vector of the target software;
and acquiring a detection result of the target software by using the trained detection model according to the feature vector, wherein the detection result is used for indicating whether the target software is malicious software or not.
2. The method according to claim 1, wherein performing vector transformation on the first feature information and the second feature information, and splicing the transformed vectors to obtain a feature vector of the target software comprises:
performing hash processing on the first characteristic information to obtain a numerical value vector;
performing vector conversion on the second characteristic information to obtain a histogram vector;
and splicing the numerical value vector and the histogram vector to obtain the feature vector.
3. The malware detection method according to claim 1 or 2, wherein the first feature information includes: first attribute information and URL information; the first attribute information includes at least one of: component information, permission information, and component invocation information.
4. The malware detection method according to claim 2, wherein the second feature information includes: the byte code sequence, performing vector conversion on the second feature information to obtain a histogram vector, includes:
acquiring an operation code in the byte code sequence and establishing a histogram; the histogram comprises at least one type of operation codes and the occurrence times of the operation codes;
and carrying out vector representation on the histogram to obtain the histogram vector.
5. The method for detecting malware according to claim 1, wherein the extracting first feature information of the target software comprises:
and extracting URL information by using a regular expression, and determining the URL information as the first characteristic information.
6. The malware detection method according to claim 4, wherein extracting the first feature information and the second feature information of the target software based on the installation package of the target software comprises:
extracting the first characteristic information from a manifest file and an executable file of the installation package;
extracting the second feature information from the executable file.
7. The malware detection method of claim 1 or 2, further comprising:
building a lifting tree model;
training the lifting tree model by using training data to obtain the detection model; the training data includes: feature vectors of a plurality of pieces of software and a label of whether each of the pieces of software is malware.
8. The method of detecting malware according to claim 1 or 2, wherein the target software comprises android software.
9. An apparatus for detecting malware, comprising:
the extraction module is used for extracting first characteristic information and second characteristic information of the target software based on the installation package of the target software; the first characteristic information is used for representing static attribute information of the software; the second characteristic information is used for representing instruction information of software;
the conversion module is used for respectively carrying out vector conversion on the first characteristic information and the second characteristic information and splicing the converted vectors to obtain a characteristic vector of the target software;
and the detection module is used for acquiring a detection result of the target software by utilizing the trained detection model according to the feature vector, wherein the detection result is used for indicating whether the target software is malicious software or not.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method for malware detection as claimed in any one of claims 1 to 8 are implemented when the program is executed by the processor.
11. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the malware detection method of any one of claims 1 to 8.
12. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements the steps of the malware detection method of any one of claims 1 to 8.
CN202111562465.2A 2021-12-20 2021-12-20 Malicious software detection method, device and equipment Pending CN114491528A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111562465.2A CN114491528A (en) 2021-12-20 2021-12-20 Malicious software detection method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111562465.2A CN114491528A (en) 2021-12-20 2021-12-20 Malicious software detection method, device and equipment

Publications (1)

Publication Number Publication Date
CN114491528A true CN114491528A (en) 2022-05-13

Family

ID=81494612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111562465.2A Pending CN114491528A (en) 2021-12-20 2021-12-20 Malicious software detection method, device and equipment

Country Status (1)

Country Link
CN (1) CN114491528A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115221522A (en) * 2022-09-20 2022-10-21 北京微步在线科技有限公司 Rapid static detection method and device for ELF malicious file and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115221522A (en) * 2022-09-20 2022-10-21 北京微步在线科技有限公司 Rapid static detection method and device for ELF malicious file and electronic equipment
CN115221522B (en) * 2022-09-20 2022-12-16 北京微步在线科技有限公司 Rapid static detection method and device for ELF malicious file and electronic equipment

Similar Documents

Publication Publication Date Title
CN109753800B (en) Android malicious application detection method and system fusing frequent item set and random forest algorithm
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
CN109905385B (en) Webshell detection method, device and system
CN108734012B (en) Malicious software identification method and device and electronic equipment
CN111460446B (en) Malicious file detection method and device based on model
CN110795732A (en) SVM-based dynamic and static combination detection method for malicious codes of Android mobile network terminal
KR101858620B1 (en) Device and method for analyzing javascript using machine learning
CN111737692B (en) Application program risk detection method and device, equipment and storage medium
CN103778373A (en) Virus detection method and device
CN104680065A (en) Virus detection method, virus detection device and virus detection equipment
CN105653949A (en) Malicious program detection method and device
CN112148305A (en) Application detection method and device, computer equipment and readable storage medium
CN115730313A (en) Malicious document detection method and device, storage medium and equipment
CN107577943B (en) Sample prediction method and device based on machine learning and server
CN113468524B (en) RASP-based machine learning model security detection method
CN108229168B (en) Heuristic detection method, system and storage medium for nested files
CN114491528A (en) Malicious software detection method, device and equipment
CN113869789A (en) Risk monitoring method and device, computer equipment and storage medium
KR101557455B1 (en) Application Code Analysis Apparatus and Method For Code Analysis Using The Same
CN113971284B (en) JavaScript-based malicious webpage detection method, equipment and computer readable storage medium
CN113971283A (en) Malicious application program detection method and device based on features
CN116932381A (en) Automatic evaluation method for security risk of applet and related equipment
CN112163217B (en) Malware variant identification method, device, equipment and computer storage medium
CN115134153A (en) Safety evaluation method and device and model training method and device
CN114266906A (en) Method, device, medium, and program product for identifying violation data at user side

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 332, 3 / F, Building 102, 28 xinjiekouwei street, Xicheng District, Beijing 100088

Applicant after: Qianxin Technology Group Co.,Ltd.

Applicant after: Qianxin Wangshen information technology (Beijing) Co., Ltd

Address before: Room 332, 3 / F, Building 102, 28 xinjiekouwei street, Xicheng District, Beijing 100088

Applicant before: Qianxin Technology Group Co.,Ltd.

Applicant before: LEGENDSEC INFORMATION TECHNOLOGY (BEIJING) Inc.

CB02 Change of applicant information